From alan.orth at gmail.com Sat Jun 1 16:07:44 2019 From: alan.orth at gmail.com (Alan Orth) Date: Sat, 1 Jun 2019 19:07:44 +0300 Subject: [Gluster-users] Does replace-brick migrate data? In-Reply-To: References: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com> <0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com> <39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com> Message-ID: Dear Ravi, The .glusterfs hardlinks/symlinks should be fine. I'm not sure how I could verify them for six bricks and millions of files, though... :\ I had a small success in fixing some issues with duplicated files on the FUSE mount point yesterday. I read quite a bit about the elastic hashing algorithm that determines which files get placed on which bricks based on the hash of their filename and the trusted.glusterfs.dht xattr on brick directories (thanks to Joe Julian's blog post and Python script for showing how it works?). With that knowledge I looked closer at one of the files that was appearing as duplicated on the FUSE mount and found that it was also duplicated on more than `replica 2` bricks. For this particular file I found two "real" files and several zero-size files with trusted.glusterfs.dht.linkto xattrs. Neither of the "real" files were on the correct brick as far as the DHT layout is concerned, so I copied one of them to the correct brick, deleted the others and their hard links, and did a `stat` on the file from the FUSE mount point and it fixed itself. Yay! Could this have been caused by a replace-brick that got interrupted and didn't finish re-labeling the xattrs? Should I be thinking of some heuristics to identify and fix these issues with a script (incorrect brick placement), or is this something a fix layout or repeated volume heals can fix? I've already completed a whole heal on this particular volume this week and it did heal about 1,000,000 files (mostly data and metadata, but about 20,000 entry heals as well). Thanks for your support, ? https://joejulian.name/post/dht-misses-are-expensive/ On Fri, May 31, 2019 at 7:57 AM Ravishankar N wrote: > > On 31/05/19 3:20 AM, Alan Orth wrote: > > Dear Ravi, > > I spent a bit of time inspecting the xattrs on some files and directories > on a few bricks for this volume and it looks a bit messy. Even if I could > make sense of it for a few and potentially heal them manually, there are > millions of files and directories in total so that's definitely not a > scalable solution. After a few missteps with `replace-brick ... commit > force` in the last week?one of which on a brick that was dead/offline?as > well as some premature `remove-brick` commands, I'm unsure how how to > proceed and I'm getting demotivated. It's scary how quickly things get out > of hand in distributed systems... > > Hi Alan, > The one good thing about gluster is it that the data is always available > directly on the backed bricks even if your volume has inconsistencies at > the gluster level. So theoretically, if your cluster is FUBAR, you could > just create a new volume and copy all data onto it via its mount from the > old volume's bricks. > > > I had hoped that bringing the old brick back up would help, but by the > time I added it again a few days had passed and all the brick-id's had > changed due to the replace/remove brick commands, not to mention that the > trusted.afr.$volume-client-xx values were now probably pointing to the > wrong bricks (?). > > Anyways, a few hours ago I started a full heal on the volume and I see > that there is a sustained 100MiB/sec of network traffic going from the old > brick's host to the new one. The completed heals reported in the logs look > promising too: > > Old brick host: > > # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E > 'Completed (data|metadata|entry) selfheal' | sort | uniq -c > 281614 Completed data selfheal > 84 Completed entry selfheal > 299648 Completed metadata selfheal > > New brick host: > > # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E > 'Completed (data|metadata|entry) selfheal' | sort | uniq -c > 198256 Completed data selfheal > 16829 Completed entry selfheal > 229664 Completed metadata selfheal > > So that's good I guess, though I have no idea how long it will take or if > it will fix the "missing files" issue on the FUSE mount. I've increased > cluster.shd-max-threads to 8 to hopefully speed up the heal process. > > The afr xattrs should not cause files to disappear from mount. If the > xattr names do not match what each AFR subvol expects (for eg. in a replica > 2 volume, trusted.afr.*-client-{0,1} for 1st subvol, client-{2,3} for 2nd > subvol and so on - ) for its children then it won't heal the data, that is > all. But in your case I see some inconsistencies like one brick having the > actual file (licenseserver.cfg) and the other having a linkto file (the > one with the dht.linkto xattr) *in the same replica pair*. > > > I'd be happy for any advice or pointers, > > Did you check if the .glusterfs hardlinks/symlinks exist and are in order > for all bricks? > > -Ravi > > > On Wed, May 29, 2019 at 5:20 PM Alan Orth wrote: > >> Dear Ravi, >> >> Thank you for the link to the blog post series?it is very informative and >> current! If I understand your blog post correctly then I think the answer >> to your previous question about pending AFRs is: no, there are no pending >> AFRs. I have identified one file that is a good test case to try to >> understand what happened after I issued the `gluster volume replace-brick >> ... commit force` a few days ago and then added the same original brick >> back to the volume later. This is the current state of the replica 2 >> distribute/replicate volume: >> >> [root at wingu0 ~]# gluster volume info apps >> >> Volume Name: apps >> Type: Distributed-Replicate >> Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 3 x 2 = 6 >> Transport-type: tcp >> Bricks: >> Brick1: wingu3:/mnt/gluster/apps >> Brick2: wingu4:/mnt/gluster/apps >> Brick3: wingu05:/data/glusterfs/sdb/apps >> Brick4: wingu06:/data/glusterfs/sdb/apps >> Brick5: wingu0:/mnt/gluster/apps >> Brick6: wingu05:/data/glusterfs/sdc/apps >> Options Reconfigured: >> diagnostics.client-log-level: DEBUG >> storage.health-check-interval: 10 >> nfs.disable: on >> >> I checked the xattrs of one file that is missing from the volume's FUSE >> mount (though I can read it if I access its full path explicitly), but is >> present in several of the volume's bricks (some with full size, others >> empty): >> >> [root at wingu0 ~]# getfattr -d -m. -e hex >> /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg >> >> getfattr: Removing leading '/' from absolute path names >> # file: mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg >> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >> trusted.afr.apps-client-3=0x000000000000000000000000 >> trusted.afr.apps-client-5=0x000000000000000000000000 >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.bit-rot.version=0x0200000000000000585a396f00046e15 >> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >> >> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >> getfattr: Removing leading '/' from absolute path names >> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 >> >> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg >> getfattr: Removing leading '/' from absolute path names >> # file: data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg >> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >> >> [root at wingu06 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >> getfattr: Removing leading '/' from absolute path names >> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 >> >> According to the trusted.afr.apps-client-xx xattrs this particular file >> should be on bricks with id "apps-client-3" and "apps-client-5". It took me >> a few hours to realize that the brick-id values are recorded in the >> volume's volfiles in /var/lib/glusterd/vols/apps/bricks. After comparing >> those brick-id values with a volfile backup from before the replace-brick, >> I realized that the files are simply on the wrong brick now as far as >> Gluster is concerned. This particular file is now on the brick for >> "apps-client-4". As an experiment I copied this one file to the two >> bricks listed in the xattrs and I was then able to see the file from the >> FUSE mount (yay!). >> >> Other than replacing the brick, removing it, and then adding the old >> brick on the original server back, there has been no change in the data >> this entire time. Can I change the brick IDs in the volfiles so they >> reflect where the data actually is? Or perhaps script something to reset >> all the xattrs on the files/directories to point to the correct bricks? >> >> Thank you for any help or pointers, >> >> On Wed, May 29, 2019 at 7:24 AM Ravishankar N >> wrote: >> >>> >>> On 29/05/19 9:50 AM, Ravishankar N wrote: >>> >>> >>> On 29/05/19 3:59 AM, Alan Orth wrote: >>> >>> Dear Ravishankar, >>> >>> I'm not sure if Brick4 had pending AFRs because I don't know what that >>> means and it's been a few days so I am not sure I would be able to find >>> that information. >>> >>> When you find some time, have a look at a blog >>> series I wrote about AFR- I've tried to explain what one needs to know to >>> debug replication related issues in it. >>> >>> Made a typo error. The URL for the blog is https://wp.me/peiBB-6b >>> >>> -Ravi >>> >>> >>> Anyways, after wasting a few days rsyncing the old brick to a new host I >>> decided to just try to add the old brick back into the volume instead of >>> bringing it up on the new host. I created a new brick directory on the old >>> host, moved the old brick's contents into that new directory (minus the >>> .glusterfs directory), added the new brick to the volume, and then did >>> Vlad's find/stat trick? from the brick to the FUSE mount point. >>> >>> The interesting problem I have now is that some files don't appear in >>> the FUSE mount's directory listings, but I can actually list them directly >>> and even read them. What could cause that? >>> >>> Not sure, too many variables in the hacks that you did to take a guess. >>> You can check if the contents of the .glusterfs folder are in order on the >>> new brick (example hardlink for files and symlinks for directories are >>> present etc.) . >>> Regards, >>> Ravi >>> >>> >>> Thanks, >>> >>> ? >>> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html >>> >>> On Fri, May 24, 2019 at 4:59 PM Ravishankar N >>> wrote: >>> >>>> >>>> On 23/05/19 2:40 AM, Alan Orth wrote: >>>> >>>> Dear list, >>>> >>>> I seem to have gotten into a tricky situation. Today I brought up a >>>> shiny new server with new disk arrays and attempted to replace one brick of >>>> a replica 2 distribute/replicate volume on an older server using the >>>> `replace-brick` command: >>>> >>>> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes >>>> wingu06:/data/glusterfs/sdb/homes commit force >>>> >>>> The command was successful and I see the new brick in the output of >>>> `gluster volume info`. The problem is that Gluster doesn't seem to be >>>> migrating the data, >>>> >>>> `replace-brick` definitely must heal (not migrate) the data. In your >>>> case, data must have been healed from Brick-4 to the replaced Brick-3. Are >>>> there any errors in the self-heal daemon logs of Brick-4's node? Does >>>> Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit out of >>>> date. replace-brick command internally does all the setfattr steps that are >>>> mentioned in the doc. >>>> >>>> -Ravi >>>> >>>> >>>> and now the original brick that I replaced is no longer part of the >>>> volume (and a few terabytes of data are just sitting on the old brick): >>>> >>>> # gluster volume info homes | grep -E "Brick[0-9]:" >>>> Brick1: wingu4:/mnt/gluster/homes >>>> Brick2: wingu3:/mnt/gluster/homes >>>> Brick3: wingu06:/data/glusterfs/sdb/homes >>>> Brick4: wingu05:/data/glusterfs/sdb/homes >>>> Brick5: wingu05:/data/glusterfs/sdc/homes >>>> Brick6: wingu06:/data/glusterfs/sdc/homes >>>> >>>> I see the Gluster docs have a more complicated procedure for replacing >>>> bricks that involves getfattr/setfattr?. How can I tell Gluster about the >>>> old brick? I see that I have a backup of the old volfile thanks to yum's >>>> rpmsave function if that helps. >>>> >>>> We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can >>>> give. >>>> >>>> ? >>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick >>>> >>>> -- >>>> Alan Orth >>>> alan.orth at gmail.com >>>> https://picturingjordan.com >>>> https://englishbulgaria.net >>>> https://mjanja.ch >>>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >>>> >>>> _______________________________________________ >>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>> >>> -- >>> Alan Orth >>> alan.orth at gmail.com >>> https://picturingjordan.com >>> https://englishbulgaria.net >>> https://mjanja.ch >>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >>> >>> >>> _______________________________________________ >>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >> >> -- >> Alan Orth >> alan.orth at gmail.com >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >> > > > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche > > -- Alan Orth alan.orth at gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgrep at 139.com Mon Jun 3 06:27:43 2019 From: zgrep at 139.com (=?utf-8?B?WGllIENoYW5nbG9uZw==?=) Date: 03 Jun 2019 14:27:43 +0800 Subject: [Gluster-users] write request hung in write-behind Message-ID: 2019060314274320643802@139.com> Hi all Test gluster 3.8.4-54.15 gnfs, i saw a write request hung in write-behind followed by 1545 FLUSH requests. I found a similar bugfix https://bugzilla.redhat.com/show_bug.cgi?id=1626787, but not sure if it's the right one. [xlator.performance.write-behind.wb_inode] path=/575/1e/5751e318f21f605f2aac241bf042e7a8.jpg inode=0x7f51775b71a0 window_conf=1073741824 window_current=293822 transit-size=293822 dontsync=0 [.WRITE] request-ptr=0x7f516eec2060 refcount=1 wound=yes generation-number=1 req->op_ret=293822 req->op_errno=0 sync-attempts=1 sync-in-progress=yes size=293822 offset=1048576 lied=-1 append=0 fulfilled=0 go=-1 [.FLUSH] request-ptr=0x7f517c2badf0 refcount=1 wound=no generation-number=2 req->op_ret=-1 req->op_errno=116 sync-attempts=0 [.FLUSH] request-ptr=0x7f5173e9f7b0 refcount=1 wound=no generation-number=2 req->op_ret=0 req->op_errno=0 sync-attempts=0 [.FLUSH] request-ptr=0x7f51640b8ca0 refcount=1 wound=no generation-number=2 req->op_ret=0 req->op_errno=0 sync-attempts=0 [.FLUSH] request-ptr=0x7f516f3979d0 refcount=1 wound=no generation-number=2 req->op_ret=0 req->op_errno=0 sync-attempts=0 [.FLUSH] request-ptr=0x7f516f6ac8d0 refcount=1 wound=no generation-number=2 req->op_ret=0 req->op_errno=0 sync-attempts=0 Any comments would be appreciated! Thanks -Xie -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Mon Jun 3 06:46:07 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Mon, 3 Jun 2019 12:16:07 +0530 Subject: [Gluster-users] write request hung in write-behind In-Reply-To: <5cf4bde4.1c69fb81.42b08.1408SMTPIN_ADDED_BROKEN@mx.google.com> References: <5cf4bde4.1c69fb81.42b08.1408SMTPIN_ADDED_BROKEN@mx.google.com> Message-ID: On Mon, Jun 3, 2019 at 11:57 AM Xie Changlong wrote: > Hi all > > Test gluster 3.8.4-54.15 gnfs, i saw a write request hung in write-behind > followed by 1545 FLUSH requests. I found a similar > bugfix https://bugzilla.redhat.com/show_bug.cgi?id=1626787, but not sure > if it's the right one. > > [xlator.performance.write-behind.wb_inode] > path=/575/1e/5751e318f21f605f2aac241bf042e7a8.jpg > inode=0x7f51775b71a0 > window_conf=1073741824 > window_current=293822 > transit-size=293822 > dontsync=0 > > [.WRITE] > request-ptr=0x7f516eec2060 > refcount=1 > wound=yes > generation-number=1 > req->op_ret=293822 > req->op_errno=0 > sync-attempts=1 > sync-in-progress=yes > Note that the sync is still in progress. This means, write-behind has wound the write-request to its children and yet to receive the response (unless there is a bug in accounting of sync-in-progress). So, its likely that there are callstacks into children of write-behind, which are not complete yet. Are you sure the deepest hung call-stack is in write-behind? Can you check for frames with "complete=0"? size=293822 > offset=1048576 > lied=-1 > append=0 > fulfilled=0 > go=-1 > > [.FLUSH] > request-ptr=0x7f517c2badf0 > refcount=1 > wound=no > generation-number=2 > req->op_ret=-1 > req->op_errno=116 > sync-attempts=0 > > [.FLUSH] > request-ptr=0x7f5173e9f7b0 > refcount=1 > wound=no > generation-number=2 > req->op_ret=0 > req->op_errno=0 > sync-attempts=0 > > [.FLUSH] > request-ptr=0x7f51640b8ca0 > refcount=1 > wound=no > generation-number=2 > req->op_ret=0 > req->op_errno=0 > sync-attempts=0 > > [.FLUSH] > request-ptr=0x7f516f3979d0 > refcount=1 > wound=no > generation-number=2 > req->op_ret=0 > req->op_errno=0 > sync-attempts=0 > > [.FLUSH] > request-ptr=0x7f516f6ac8d0 > refcount=1 > wound=no > generation-number=2 > req->op_ret=0 > req->op_errno=0 > sync-attempts=0 > > > Any comments would be appreciated! > > Thanks > -Xie > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgrep at 139.com Mon Jun 3 07:40:47 2019 From: zgrep at 139.com (=?utf-8?B?WGllIENoYW5nbG9uZw==?=) Date: 03 Jun 2019 15:40:47 +0800 Subject: [Gluster-users] write request hung in write-behind Message-ID: 201906031540473580561@139.com> Firstly i correct myself, write request followed by 771(not 1545) FLUSH requests. I've attach gnfs dump file, totally 774 pending call-stacks, 771 of them pending on write-behind and the deepest call-stack is afr. [global.callpool.stack.771] stack=0x7f517f557f60 uid=0 gid=0 pid=0 unique=0 lk-owner= op=stack type=0 cnt=3 [global.callpool.stack.771.frame.1] frame=0x7f517f655880 ref_count=0 translator=cl35vol01-replicate-7 complete=0 parent=cl35vol01-dht wind_from=dht_writev wind_to=subvol->fops->writev unwind_to=dht_writev_cbk [global.callpool.stack.771.frame.2] frame=0x7f518ed90340 ref_count=1 translator=cl35vol01-dht complete=0 parent=cl35vol01-write-behind wind_from=wb_fulfill_head wind_to=FIRST_CHILD (frame->this)->fops->writev unwind_to=wb_fulfill_cbk [global.callpool.stack.771.frame.3] frame=0x7f516d3baf10 ref_count=1 translator=cl35vol01-write-behind complete=0 [global.callpool.stack.772] stack=0x7f51607a5a20 uid=0 gid=0 pid=0 unique=0 lk-owner=a0715b77517f0000 op=stack type=0 cnt=1 [global.callpool.stack.772.frame.1] frame=0x7f516ca2d1b0 ref_count=0 translator=cl35vol01-replicate-7 complete=0 [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep translator | wc -l 774 [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep complete |wc -l 774 [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep -E "complete=0" |wc -l 774 [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep translator | grep write-behind |wc -l 771 [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep translator | grep replicate-7 | wc -l 2 [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep translator | grep glusterfs | wc -l 1 ???: Raghavendra Gowdappa ??: 2019/06/03(???)14:46 ???: Xie Changlong; ???: gluster-users; ??: Re: write request hung in write-behind On Mon, Jun 3, 2019 at 11:57 AM Xie Changlong wrote: Hi all Test gluster 3.8.4-54.15 gnfs, i saw a write request hung in write-behind followed by 1545 FLUSH requests. I found a similar bugfix https://bugzilla.redhat.com/show_bug.cgi?id=1626787, but not sure if it's the right one. [xlator.performance.write-behind.wb_inode] path=/575/1e/5751e318f21f605f2aac241bf042e7a8.jpg inode=0x7f51775b71a0 window_conf=1073741824 window_current=293822 transit-size=293822 dontsync=0 [.WRITE] request-ptr=0x7f516eec2060 refcount=1 wound=yes generation-number=1 req->op_ret=293822 req->op_errno=0 sync-attempts=1 sync-in-progress=yes Note that the sync is still in progress. This means, write-behind has wound the write-request to its children and yet to receive the response (unless there is a bug in accounting of sync-in-progress). So, its likely that there are callstacks into children of write-behind, which are not complete yet. Are you sure the deepest hung call-stack is in write-behind? Can you check for frames with "complete=0"? size=293822 offset=1048576 lied=-1 append=0 fulfilled=0 go=-1 [.FLUSH] request-ptr=0x7f517c2badf0 refcount=1 wound=no generation-number=2 req->op_ret=-1 req->op_errno=116 sync-attempts=0 [.FLUSH] request-ptr=0x7f5173e9f7b0 refcount=1 wound=no generation-number=2 req->op_ret=0 req->op_errno=0 sync-attempts=0 [.FLUSH] request-ptr=0x7f51640b8ca0 refcount=1 wound=no generation-number=2 req->op_ret=0 req->op_errno=0 sync-attempts=0 [.FLUSH] request-ptr=0x7f516f3979d0 refcount=1 wound=no generation-number=2 req->op_ret=0 req->op_errno=0 sync-attempts=0 [.FLUSH] request-ptr=0x7f516f6ac8d0 refcount=1 wound=no generation-number=2 req->op_ret=0 req->op_errno=0 sync-attempts=0 Any comments would be appreciated! Thanks -Xie -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: glusterdump.20106.dump.1559038081 Type: application/octet-stream Size: 678986 bytes Desc: not available URL: From ravishankar at redhat.com Mon Jun 3 16:40:00 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Mon, 3 Jun 2019 22:10:00 +0530 Subject: [Gluster-users] Does replace-brick migrate data? In-Reply-To: References: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com> <0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com> <39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com> Message-ID: <0aa881db-a724-13be-ff63-6c346d7f55d8@redhat.com> On 01/06/19 9:37 PM, Alan Orth wrote: > Dear Ravi, > > The .glusterfs hardlinks/symlinks should be fine. I'm not sure how I > could verify them for six bricks and millions of files, though... :\ Hi Alan, The reason I asked this is because you had mentioned in one of your earlier emails that when you moved content from the old brick to the new one, you had skipped the .glusterfs directory. So I was assuming that when you added back this new brick to the cluster, it might have been missing the .glusterfs entries. If that is the cae, one way to verify could be to check using a script if all files on the brick have a link-count of at least 2 and all dirs have valid symlinks inside .glusterfs pointing to themselves. > > I had a small success in fixing some issues with duplicated files on > the FUSE mount point yesterday. I read quite a bit about the elastic > hashing algorithm that determines which files get placed on which > bricks based on the hash of their filename and the > trusted.glusterfs.dht xattr on brick directories (thanks to Joe > Julian's blog post and Python script for showing how it works?). With > that knowledge I looked closer at one of the files that was appearing > as duplicated on the FUSE mount and found that it was also duplicated > on more than `replica 2` bricks. For this particular file I found two > "real" files and several zero-size files with > trusted.glusterfs.dht.linkto xattrs. Neither of the "real" files were > on the correct brick as far as the DHT layout is concerned, so I > copied one of them to the correct brick, deleted the others and their > hard links, and did a `stat` on the file from the FUSE mount point and > it fixed itself. Yay! > > Could this have been caused by a replace-brick that got interrupted > and didn't finish re-labeling the xattrs? No, replace-brick only initiates AFR self-heal, which just copies the contents from the other brick(s) of the *same* replica pair into the replaced brick.? The link-to files are created by DHT when you rename a file from the client. If the new name hashes to a different? brick, DHT does not move the entire file there. It instead creates the link-to file (the one with the dht.linkto xattrs) on the hashed subvol. The value of this xattr points to the brick where the actual data is there (`getfattr -e text` to see it for yourself).? Perhaps you had attempted a rebalance or remove-brick earlier and interrupted that? > Should I be thinking of some heuristics to identify and fix these > issues with a script (incorrect brick placement), or is this something > a fix layout or repeated volume heals can fix? I've already completed > a whole heal on this particular volume this week and it did heal about > 1,000,000 files (mostly data and metadata, but about 20,000 entry > heals as well). > Maybe you should let the AFR self-heals complete first and then attempt a full rebalance to take care of the dht link-to files. But? if the files are in millions, it could take quite some time to complete. Regards, Ravi > Thanks for your support, > > ? https://joejulian.name/post/dht-misses-are-expensive/ > > On Fri, May 31, 2019 at 7:57 AM Ravishankar N > wrote: > > > On 31/05/19 3:20 AM, Alan Orth wrote: >> Dear Ravi, >> >> I spent a bit of time inspecting the xattrs on some files and >> directories on a few bricks for this volume and it looks a bit >> messy. Even if I could make sense of it for a few and potentially >> heal them manually, there are millions of files and directories >> in total so that's definitely not a scalable solution. After a >> few missteps with `replace-brick ... commit force` in the last >> week?one of which on a brick that was dead/offline?as well as >> some premature `remove-brick` commands, I'm unsure how how to >> proceed and I'm getting demotivated. It's scary how quickly >> things get out of hand in distributed systems... > Hi Alan, > The one good thing about gluster is it that the data is always > available directly on the backed bricks even if your volume has > inconsistencies at the gluster level. So theoretically, if your > cluster is FUBAR, you could just create a new volume and copy all > data onto it via its mount from the old volume's bricks. >> >> I had hoped that bringing the old brick back up would help, but >> by the time I added it again a few days had passed and all the >> brick-id's had changed due to the replace/remove brick commands, >> not to mention that the trusted.afr.$volume-client-xx values were >> now probably pointing to the wrong bricks (?). >> >> Anyways, a few hours ago I started a full heal on the volume and >> I see that there is a sustained 100MiB/sec of network traffic >> going from the old brick's host to the new one. The completed >> heals reported in the logs look promising too: >> >> Old brick host: >> >> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o >> -E 'Completed (data|metadata|entry) selfheal' | sort | uniq -c >> ?281614 Completed data selfheal >> ? ? ?84 Completed entry selfheal >> ?299648 Completed metadata selfheal >> >> New brick host: >> >> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o >> -E 'Completed (data|metadata|entry) selfheal' | sort | uniq -c >> ?198256 Completed data selfheal >> ? 16829 Completed entry selfheal >> ?229664 Completed metadata selfheal >> >> So that's good I guess, though I have no idea how long it will >> take or if it will fix the "missing files" issue on the FUSE >> mount. I've increased cluster.shd-max-threads to 8 to hopefully >> speed up the heal process. > The afr xattrs should not cause files to disappear from mount. If > the xattr names do not match what each AFR subvol expects (for eg. > in a replica 2 volume, trusted.afr.*-client-{0,1} for 1st subvol, > client-{2,3} for 2nd subvol and so on - ) for its children then it > won't heal the data, that is all. But in your case I see some > inconsistencies like one brick having the actual file > (licenseserver.cfg) and the other having a linkto file (the one > with thedht.linkto xattr) /in the same replica pair/. >> >> I'd be happy for any advice or pointers, > > Did you check if the .glusterfs hardlinks/symlinks exist and are > in order for all bricks? > > -Ravi > >> >> On Wed, May 29, 2019 at 5:20 PM Alan Orth > > wrote: >> >> Dear Ravi, >> >> Thank you for the link to the blog post series?it is very >> informative and current! If I understand your blog post >> correctly then I think the answer to your previous question >> about pending AFRs is: no, there are no pending AFRs. I have >> identified one file that is a good test case to try to >> understand what happened after I issued the `gluster volume >> replace-brick ... commit force` a few days ago and then added >> the same original brick back to the volume later. This is the >> current state of the replica 2 distribute/replicate volume: >> >> [root at wingu0 ~]# gluster volume info apps >> >> Volume Name: apps >> Type: Distributed-Replicate >> Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 3 x 2 = 6 >> Transport-type: tcp >> Bricks: >> Brick1: wingu3:/mnt/gluster/apps >> Brick2: wingu4:/mnt/gluster/apps >> Brick3: wingu05:/data/glusterfs/sdb/apps >> Brick4: wingu06:/data/glusterfs/sdb/apps >> Brick5: wingu0:/mnt/gluster/apps >> Brick6: wingu05:/data/glusterfs/sdc/apps >> Options Reconfigured: >> diagnostics.client-log-level: DEBUG >> storage.health-check-interval: 10 >> nfs.disable: on >> >> I checked the xattrs of one file that is missing from the >> volume's FUSE mount (though I can read it if I access its >> full path explicitly), but is present in several of the >> volume's bricks (some with full size, others empty): >> >> [root at wingu0 ~]# getfattr -d -m. -e hex >> /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg >> >> getfattr: Removing leading '/' from absolute path names # >> file: >> mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg >> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >> trusted.afr.apps-client-3=0x000000000000000000000000 >> trusted.afr.apps-client-5=0x000000000000000000000000 >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.bit-rot.version=0x0200000000000000585a396f00046e15 >> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd [root at wingu05 >> ~]# getfattr -d -m. -e hex >> /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >> getfattr: Removing leading '/' from absolute path names # >> file: >> data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 >> [root at wingu05 ~]# getfattr -d -m. -e hex >> /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg >> getfattr: Removing leading '/' from absolute path names # >> file: >> data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg >> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >> [root at wingu06 ~]# getfattr -d -m. -e hex >> /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >> getfattr: Removing leading '/' from absolute path names # >> file: >> data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 >> >> According to the trusted.afr.apps-client-xxxattrs this >> particular file should be on bricks with id "apps-client-3" >> and "apps-client-5". It took me a few hours to realize that >> the brick-id values are recorded in the volume's volfiles in >> /var/lib/glusterd/vols/apps/bricks. After comparing those >> brick-id values with a volfile backup from before the >> replace-brick, I realized that the files are simply on the >> wrong brick now as far as Gluster is concerned. This >> particular file is now on the brick for "apps-client-4". As >> an experiment I copied this one file to the two bricks listed >> in the xattrs and I was then able to see the file from the >> FUSE mount (yay!). >> >> Other than replacing the brick, removing it, and then adding >> the old brick on the original server back, there has been no >> change in the data this entire time. Can I change the brick >> IDs in the volfiles so they reflect where the data actually >> is? Or perhaps script something to reset all the xattrs on >> the files/directories to point to the correct bricks? >> >> Thank you for any help or pointers, >> >> On Wed, May 29, 2019 at 7:24 AM Ravishankar N >> > wrote: >> >> >> On 29/05/19 9:50 AM, Ravishankar N wrote: >>> >>> >>> On 29/05/19 3:59 AM, Alan Orth wrote: >>>> Dear Ravishankar, >>>> >>>> I'm not sure if Brick4 had pending AFRs because I don't >>>> know what that means and it's been a few days so I am >>>> not sure I would be able to find that information. >>> When you find some time, have a look at a blog >>> series I wrote about AFR- I've >>> tried to explain what one needs to know to debug >>> replication related issues in it. >> >> Made a typo error. The URL for the blog is >> https://wp.me/peiBB-6b >> >> -Ravi >> >>>> >>>> Anyways, after wasting a few days rsyncing the old >>>> brick to a new host I decided to just try to add the >>>> old brick back into the volume instead of bringing it >>>> up on the new host. I created a new brick directory on >>>> the old host, moved the old brick's contents into that >>>> new directory (minus the .glusterfs directory), added >>>> the new brick to the volume, and then did Vlad's >>>> find/stat trick? from the brick to the FUSE mount point. >>>> >>>> The interesting problem I have now is that some files >>>> don't appear in the FUSE mount's directory listings, >>>> but I can actually list them directly and even read >>>> them. What could cause that? >>> Not sure, too many variables in the hacks that you did >>> to take a guess. You can check if the contents of the >>> .glusterfs folder are in order on the new brick (example >>> hardlink for files and symlinks for directories are >>> present etc.) . >>> Regards, >>> Ravi >>>> >>>> Thanks, >>>> >>>> ? >>>> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html >>>> >>>> On Fri, May 24, 2019 at 4:59 PM Ravishankar N >>>> >>> > wrote: >>>> >>>> >>>> On 23/05/19 2:40 AM, Alan Orth wrote: >>>>> Dear list, >>>>> >>>>> I seem to have gotten into a tricky situation. >>>>> Today I brought up a shiny new server with new >>>>> disk arrays and attempted to replace one brick of >>>>> a replica 2 distribute/replicate volume on an >>>>> older server using the `replace-brick` command: >>>>> >>>>> # gluster volume replace-brick homes >>>>> wingu0:/mnt/gluster/homes >>>>> wingu06:/data/glusterfs/sdb/homes commit force >>>>> >>>>> The command was successful and I see the new brick >>>>> in the output of `gluster volume info`. The >>>>> problem is that Gluster doesn't seem to be >>>>> migrating the data, >>>> >>>> `replace-brick` definitely must heal (not migrate) >>>> the data. In your case, data must have been healed >>>> from Brick-4 to the replaced Brick-3. Are there any >>>> errors in the self-heal daemon logs of Brick-4's >>>> node? Does Brick-4 have pending AFR xattrs blaming >>>> Brick-3? The doc is a bit out of date. >>>> replace-brick command internally does all the >>>> setfattr steps that are mentioned in the doc. >>>> >>>> -Ravi >>>> >>>> >>>>> and now the original brick that I replaced is no >>>>> longer part of the volume (and a few terabytes of >>>>> data are just sitting on the old brick): >>>>> >>>>> # gluster volume info homes | grep -E "Brick[0-9]:" >>>>> Brick1: wingu4:/mnt/gluster/homes >>>>> Brick2: wingu3:/mnt/gluster/homes >>>>> Brick3: wingu06:/data/glusterfs/sdb/homes >>>>> Brick4: wingu05:/data/glusterfs/sdb/homes >>>>> Brick5: wingu05:/data/glusterfs/sdc/homes >>>>> Brick6: wingu06:/data/glusterfs/sdc/homes >>>>> >>>>> I see the Gluster docs have a more complicated >>>>> procedure for replacing bricks that involves >>>>> getfattr/setfattr?. How can I tell Gluster about >>>>> the old brick? I see that I have a backup of the >>>>> old volfile thanks to yum's rpmsave function if >>>>> that helps. >>>>> >>>>> We are using Gluster 5.6 on CentOS 7. Thank you >>>>> for any advice you can give. >>>>> >>>>> ? >>>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick >>>>> >>>>> -- >>>>> Alan Orth >>>>> alan.orth at gmail.com >>>>> https://picturingjordan.com >>>>> https://englishbulgaria.net >>>>> https://mjanja.ch >>>>> "In heaven all the interesting people are >>>>> missing." ?Friedrich Nietzsche >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>>> >>>> -- >>>> Alan Orth >>>> alan.orth at gmail.com >>>> https://picturingjordan.com >>>> https://englishbulgaria.net >>>> https://mjanja.ch >>>> "In heaven all the interesting people are missing." >>>> ?Friedrich Nietzsche >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> Alan Orth >> alan.orth at gmail.com >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> "In heaven all the interesting people are missing." >> ?Friedrich Nietzsche >> >> >> >> -- >> Alan Orth >> alan.orth at gmail.com >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> "In heaven all the interesting people are missing." ?Friedrich >> Nietzsche > > > > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From snowmailer at gmail.com Mon Jun 3 16:58:01 2019 From: snowmailer at gmail.com (Martin) Date: Mon, 3 Jun 2019 18:58:01 +0200 Subject: [Gluster-users] No healing on peer disconnect - is it correct? Message-ID: <10D708D0-E523-46A0-91BF-FFC41886E316@gmail.com> Hi all, I need someone to explain if my gluster behaviour is correct. I am not sure if my gluster works as it should. I have simple Replica 3 - Number of Bricks: 1 x 3 = 3. When one of my hypervisor is disconnected as peer, i.e. gluster process is down but bricks running, other two healthy nodes start signalling that they lost one peer. This is correct. Next, I restart gluster process on node where gluster process failed and I thought It should trigger healing of files on failed node but nothing is happening. I run VMs disks on this gluster volume. No healing is triggered after gluster restart, remaining two nodes get peer back after restart of gluster and everything is running without down time. Even VMs that are running on ?failed? node where gluster process was down (bricks were up) are running without down time. Is this behaviour correct? I mean No healing is triggered after peer is reconnected back and VMs. Thanks for explanation. BR! Martin From hunter86_bg at yahoo.com Mon Jun 3 17:40:19 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Mon, 03 Jun 2019 20:40:19 +0300 Subject: [Gluster-users] No healing on peer disconnect - is it correct? Message-ID: Hi Martin, By default gluster will proactively start to heal every 10 min - so this is not OK. Usually, I do not wait for that to get triggered and i run gluster volume heal full (using replica 3 with sharding of 4 MB -> oVirt default). Best Regards, Strahil NikolovOn Jun 3, 2019 19:58, Martin wrote: > > Hi all, > > I need someone to explain if my gluster behaviour is correct. I am not sure if my gluster works as it should. I have simple Replica 3 - Number of Bricks: 1 x 3 = 3. > > When one of my hypervisor is disconnected as peer, i.e. gluster process is down but bricks running, other two healthy nodes start signalling that they lost one peer. This is correct. > Next, I restart gluster process on node where gluster process failed and I thought It should trigger healing of files on failed node but nothing is happening. > > I run VMs disks on this gluster volume. No healing is triggered after gluster restart, remaining two nodes get peer back after restart of gluster and everything is running without down time. > Even VMs that are running on ?failed? node where gluster process was down (bricks were up) are running without down time. > > Is this behaviour correct? I mean No healing is triggered after peer is reconnected back and VMs. > > Thanks for explanation. > > BR! > Martin > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From dcunningham at voisonics.com Mon Jun 3 22:15:21 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Tue, 4 Jun 2019 10:15:21 +1200 Subject: [Gluster-users] Transport endpoint is not connected In-Reply-To: <20r8rlguxb86gpnxjwe3wpqw.1559189511842@email.android.com> References: <20r8rlguxb86gpnxjwe3wpqw.1559189511842@email.android.com> Message-ID: Hello all, We confirmed that the network provider blocking port 49152 was the issue. Thanks for all the help. On Thu, 30 May 2019 at 16:11, Strahil wrote: > You can try to run a ncat from gfs3: > > ncat -z -v gfs1 49152 > ncat -z -v gfs2 49152 > > If ncat fails to connect -> it's definately a firewall. > > Best Regards, > Strahil Nikolov > On May 30, 2019 01:33, David Cunningham wrote: > > Hi Ravi, > > I think it probably is a firewall issue with the network provider. I was > hoping to see a specific connection failure message we could send to them, > but will take it up with them anyway. > > Thanks for your help. > > > On Wed, 29 May 2019 at 23:10, Ravishankar N > wrote: > > I don't see a "Connected to gvol0-client-1" in the log. Perhaps a > firewall issue like the last time? Even in the earlier add-brick log from > the other email thread, connection to the 2nd brick was not established. > > -Ravi > On 29/05/19 2:26 PM, David Cunningham wrote: > > Hi Ravi and Joe, > > The command "gluster volume status gvol0" shows all 3 nodes as being > online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, in > which I can't see anything like a connection error. Would you have any > further suggestions? Thank you. > > [root at gfs3 glusterfs]# gluster volume status gvol0 > Status of volume: gvol0 > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0 Y > 7706 > Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0 Y > 7625 > Brick gfs3:/nodirectwritedata/gluster/gvol0 49152 0 Y > 7307 > Self-heal Daemon on localhost N/A N/A Y > 7316 > Self-heal Daemon on gfs1 N/A N/A Y > 40591 > Self-heal Daemon on gfs2 N/A N/A Y > 7634 > > Task Status of Volume gvol0 > > ------------------------------------------------------------------------------ > There are no active volume tasks > > > On Wed, 29 May 2019 at 16:26, Ravishankar N > wrote: > > > On 29/05/19 6:21 AM, David Cunningham wrote: > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Tue Jun 4 01:55:25 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Tue, 4 Jun 2019 07:25:25 +0530 Subject: [Gluster-users] write request hung in write-behind In-Reply-To: <5cf4cf0d.1c69fb81.9003f.c502SMTPIN_ADDED_BROKEN@mx.google.com> References: <5cf4cf0d.1c69fb81.9003f.c502SMTPIN_ADDED_BROKEN@mx.google.com> Message-ID: On Mon, Jun 3, 2019 at 1:11 PM Xie Changlong wrote: > Firstly i correct myself, write request followed by 771(not 1545) FLUSH > requests. I've attach gnfs dump file, totally 774 pending call-stacks, > 771 of them pending on write-behind and the deepest call-stack is afr. > +Ravishankar Narayanankutty +Karampuri, Pranith Are you sure these were not call-stacks of in-progress ops? One way of confirming that would be to take statedumps periodically (say 3 min apart). Hung call stacks will be common to all the statedumps. > [global.callpool.stack.771] > stack=0x7f517f557f60 > uid=0 > gid=0 > pid=0 > unique=0 > lk-owner= > op=stack > type=0 > cnt=3 > > [global.callpool.stack.771.frame.1] > frame=0x7f517f655880 > ref_count=0 > translator=cl35vol01-replicate-7 > complete=0 > parent=cl35vol01-dht > wind_from=dht_writev > wind_to=subvol->fops->writev > unwind_to=dht_writev_cbk > > [global.callpool.stack.771.frame.2] > frame=0x7f518ed90340 > ref_count=1 > translator=cl35vol01-dht > complete=0 > parent=cl35vol01-write-behind > wind_from=wb_fulfill_head > wind_to=FIRST_CHILD (frame->this)->fops->writev > unwind_to=wb_fulfill_cbk > > [global.callpool.stack.771.frame.3] > frame=0x7f516d3baf10 > ref_count=1 > translator=cl35vol01-write-behind > complete=0 > > [global.callpool.stack.772] > stack=0x7f51607a5a20 > uid=0 > gid=0 > pid=0 > unique=0 > lk-owner=a0715b77517f0000 > op=stack > type=0 > cnt=1 > > [global.callpool.stack.772.frame.1] > frame=0x7f516ca2d1b0 > ref_count=0 > translator=cl35vol01-replicate-7 > complete=0 > > [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 > glusterdump.20106.dump.1559038081 |grep translator | wc -l > 774 > [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 > glusterdump.20106.dump.1559038081 |grep complete |wc -l > 774 > [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 > glusterdump.20106.dump.1559038081 |grep -E "complete=0" |wc -l > 774 > [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 > glusterdump.20106.dump.1559038081 |grep translator | grep write-behind > |wc -l > 771 > [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 > glusterdump.20106.dump.1559038081 |grep translator | grep replicate-7 | > wc -l > 2 > [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 > glusterdump.20106.dump.1559038081 |grep translator | grep glusterfs | wc > -l > 1 > > > > > ???: Raghavendra Gowdappa > ??: 2019/06/03(???)14:46 > ???: Xie Changlong ; > ???: gluster-users ; > ??: Re: write request hung in write-behind > > > > On Mon, Jun 3, 2019 at 11:57 AM Xie Changlong wrote: > >> Hi all >> >> Test gluster 3.8.4-54.15 gnfs, i saw a write request hung in write-behind >> followed by 1545 FLUSH requests. I found a similar >> bugfix https://bugzilla.redhat.com/show_bug.cgi?id=1626787, but not sure >> if it's the right one. >> >> [xlator.performance.write-behind.wb_inode] >> path=/575/1e/5751e318f21f605f2aac241bf042e7a8.jpg >> inode=0x7f51775b71a0 >> window_conf=1073741824 >> window_current=293822 >> transit-size=293822 >> dontsync=0 >> >> [.WRITE] >> request-ptr=0x7f516eec2060 >> refcount=1 >> wound=yes >> generation-number=1 >> req->op_ret=293822 >> req->op_errno=0 >> sync-attempts=1 >> sync-in-progress=yes >> > > Note that the sync is still in progress. This means, write-behind has > wound the write-request to its children and yet to receive the response > (unless there is a bug in accounting of sync-in-progress). So, its likely > that there are callstacks into children of write-behind, which are not > complete yet. Are you sure the deepest hung call-stack is in write-behind? > Can you check for frames with "complete=0"? > > size=293822 >> offset=1048576 >> lied=-1 >> append=0 >> fulfilled=0 >> go=-1 >> >> [.FLUSH] >> request-ptr=0x7f517c2badf0 >> refcount=1 >> wound=no >> generation-number=2 >> req->op_ret=-1 >> req->op_errno=116 >> sync-attempts=0 >> >> [.FLUSH] >> request-ptr=0x7f5173e9f7b0 >> refcount=1 >> wound=no >> generation-number=2 >> req->op_ret=0 >> req->op_errno=0 >> sync-attempts=0 >> >> [.FLUSH] >> request-ptr=0x7f51640b8ca0 >> refcount=1 >> wound=no >> generation-number=2 >> req->op_ret=0 >> req->op_errno=0 >> sync-attempts=0 >> >> [.FLUSH] >> request-ptr=0x7f516f3979d0 >> refcount=1 >> wound=no >> generation-number=2 >> req->op_ret=0 >> req->op_errno=0 >> sync-attempts=0 >> >> [.FLUSH] >> request-ptr=0x7f516f6ac8d0 >> refcount=1 >> wound=no >> generation-number=2 >> req->op_ret=0 >> req->op_errno=0 >> sync-attempts=0 >> >> >> Any comments would be appreciated! >> >> Thanks >> -Xie >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgrep at 139.com Tue Jun 4 02:06:24 2019 From: zgrep at 139.com (=?utf-8?B?WGllIENoYW5nbG9uZw==?=) Date: 04 Jun 2019 10:06:24 +0800 Subject: [Gluster-users] write request hung in write-behind Message-ID: 201906041006244014963@139.com> To me, all 'df' commands on specific(not all) nfs client hung forever. The temporary solution is disable performance.nfs.write-behind and cluster.eager-lock. I'll try to get more info back if encounter this problem again . ???: Raghavendra Gowdappa ??: 2019/06/04(???)09:55 ???: Xie Changlong;Ravishankar Narayanankutty;Karampuri, Pranith; ???: gluster-users; ??: Re: Re: write request hung in write-behind On Mon, Jun 3, 2019 at 1:11 PM Xie Changlong wrote: Firstly i correct myself, write request followed by 771(not 1545) FLUSH requests. I've attach gnfs dump file, totally 774 pending call-stacks, 771 of them pending on write-behind and the deepest call-stack is afr. +Ravishankar Narayanankutty +Karampuri, Pranith Are you sure these were not call-stacks of in-progress ops? One way of confirming that would be to take statedumps periodically (say 3 min apart). Hung call stacks will be common to all the statedumps. [global.callpool.stack.771] stack=0x7f517f557f60 uid=0 gid=0 pid=0 unique=0 lk-owner= op=stack type=0 cnt=3 [global.callpool.stack.771.frame.1] frame=0x7f517f655880 ref_count=0 translator=cl35vol01-replicate-7 complete=0 parent=cl35vol01-dht wind_from=dht_writev wind_to=subvol->fops->writev unwind_to=dht_writev_cbk [global.callpool.stack.771.frame.2] frame=0x7f518ed90340 ref_count=1 translator=cl35vol01-dht complete=0 parent=cl35vol01-write-behind wind_from=wb_fulfill_head wind_to=FIRST_CHILD (frame->this)->fops->writev unwind_to=wb_fulfill_cbk [global.callpool.stack.771.frame.3] frame=0x7f516d3baf10 ref_count=1 translator=cl35vol01-write-behind complete=0 [global.callpool.stack.772] stack=0x7f51607a5a20 uid=0 gid=0 pid=0 unique=0 lk-owner=a0715b77517f0000 op=stack type=0 cnt=1 [global.callpool.stack.772.frame.1] frame=0x7f516ca2d1b0 ref_count=0 translator=cl35vol01-replicate-7 complete=0 [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep translator | wc -l 774 [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep complete |wc -l 774 [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep -E "complete=0" |wc -l 774 [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep translator | grep write-behind |wc -l 771 [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep translator | grep replicate-7 | wc -l 2 [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 glusterdump.20106.dump.1559038081 |grep translator | grep glusterfs | wc -l 1 ???: Raghavendra Gowdappa ??: 2019/06/03(???)14:46 ???: Xie Changlong; ???: gluster-users; ??: Re: write request hung in write-behind On Mon, Jun 3, 2019 at 11:57 AM Xie Changlong wrote: Hi all Test gluster 3.8.4-54.15 gnfs, i saw a write request hung in write-behind followed by 1545 FLUSH requests. I found a similar bugfix https://bugzilla.redhat.com/show_bug.cgi?id=1626787, but not sure if it's the right one. [xlator.performance.write-behind.wb_inode] path=/575/1e/5751e318f21f605f2aac241bf042e7a8.jpg inode=0x7f51775b71a0 window_conf=1073741824 window_current=293822 transit-size=293822 dontsync=0 [.WRITE] request-ptr=0x7f516eec2060 refcount=1 wound=yes generation-number=1 req->op_ret=293822 req->op_errno=0 sync-attempts=1 sync-in-progress=yes Note that the sync is still in progress. This means, write-behind has wound the write-request to its children and yet to receive the response (unless there is a bug in accounting of sync-in-progress). So, its likely that there are callstacks into children of write-behind, which are not complete yet. Are you sure the deepest hung call-stack is in write-behind? Can you check for frames with "complete=0"? size=293822 offset=1048576 lied=-1 append=0 fulfilled=0 go=-1 [.FLUSH] request-ptr=0x7f517c2badf0 refcount=1 wound=no generation-number=2 req->op_ret=-1 req->op_errno=116 sync-attempts=0 [.FLUSH] request-ptr=0x7f5173e9f7b0 refcount=1 wound=no generation-number=2 req->op_ret=0 req->op_errno=0 sync-attempts=0 [.FLUSH] request-ptr=0x7f51640b8ca0 refcount=1 wound=no generation-number=2 req->op_ret=0 req->op_errno=0 sync-attempts=0 [.FLUSH] request-ptr=0x7f516f3979d0 refcount=1 wound=no generation-number=2 req->op_ret=0 req->op_errno=0 sync-attempts=0 [.FLUSH] request-ptr=0x7f516f6ac8d0 refcount=1 wound=no generation-number=2 req->op_ret=0 req->op_errno=0 sync-attempts=0 Any comments would be appreciated! Thanks -Xie -------------- next part -------------- An HTML attachment was scrubbed... URL: From abhishpaliwal at gmail.com Tue Jun 4 10:09:59 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Tue, 4 Jun 2019 15:39:59 +0530 Subject: [Gluster-users] Memory leak in glusterfs In-Reply-To: References: Message-ID: Hi Team, Please respond on the issue which I raised. Regards, Abhishek On Fri, May 17, 2019 at 2:46 PM ABHISHEK PALIWAL wrote: > Anyone please reply.... > > On Thu, May 16, 2019, 10:49 ABHISHEK PALIWAL > wrote: > >> Hi Team, >> >> I upload some valgrind logs from my gluster 5.4 setup. This is writing to >> the volume every 15 minutes. I stopped glusterd and then copy away the >> logs. The test was running for some simulated days. They are zipped in >> valgrind-54.zip. >> >> Lots of info in valgrind-2730.log. Lots of possibly lost bytes in >> glusterfs and even some definitely lost bytes. >> >> ==2737== 1,572,880 bytes in 1 blocks are possibly lost in loss record 391 >> of 391 >> ==2737== at 0x4C29C25: calloc (in >> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==2737== by 0xA22485E: ??? (in >> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >> ==2737== by 0xA217C94: ??? (in >> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >> ==2737== by 0xA21D9F8: ??? (in >> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >> ==2737== by 0xA21DED9: ??? (in >> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >> ==2737== by 0xA21E685: ??? (in >> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >> ==2737== by 0xA1B9D8C: init (in >> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >> ==2737== by 0x4E511CE: xlator_init (in /usr/lib64/libglusterfs.so.0.0.1) >> ==2737== by 0x4E8A2B8: ??? (in /usr/lib64/libglusterfs.so.0.0.1) >> ==2737== by 0x4E8AAB3: glusterfs_graph_activate (in >> /usr/lib64/libglusterfs.so.0.0.1) >> ==2737== by 0x409C35: glusterfs_process_volfp (in /usr/sbin/glusterfsd) >> ==2737== by 0x409D99: glusterfs_volumes_init (in /usr/sbin/glusterfsd) >> ==2737== >> ==2737== LEAK SUMMARY: >> ==2737== definitely lost: 1,053 bytes in 10 blocks >> ==2737== indirectly lost: 317 bytes in 3 blocks >> ==2737== possibly lost: 2,374,971 bytes in 524 blocks >> ==2737== still reachable: 53,277 bytes in 201 blocks >> ==2737== suppressed: 0 bytes in 0 blocks >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal >> > -- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgrep at 139.com Tue Jun 4 11:33:54 2019 From: zgrep at 139.com (=?utf-8?B?WGllIENoYW5nbG9uZw==?=) Date: 04 Jun 2019 19:33:54 +0800 Subject: [Gluster-users] GETXATTR op pending on index xlator for more than 10 hours Message-ID: 2019060419335438074695@139.com> Hi all, Today, i found gnfs GETXATTR bailing out on gluster release 3.12.0. I have a simple 4*2 Distributed-Rep volume. [2019-06-03 19:58:33.085880] E [rpc-clnt.c:185:Call_bail] 0-cl25vol01-client-4: bailing out frame type(GlusterFS 3.3) op(GETXATTR(18)) xid=0x21de4275 sent = 2019-06-03 19:28:30.552356. timeout = 1800 for 10.3.133.57:49153 xid= 0x21de4275 = 568214133 Then i try to dump brick 10.3.133.57:49153, and find the GETXATTR op pending on index xlator for more than 10 hours! 1111MicrosoftInternetExplorer402DocumentNotSpecified7.8 ?Normal0 [root at node0001 gluster]# grep -rn 568214133 gluster-brick-1-cl25vol01.6078.dump.15596* gluster-brick-1-cl25vol01.6078.dump.1559617125:5093:unique=568214133 gluster-brick-1-cl25vol01.6078.dump.1559618121:5230:unique=568214133 gluster-brick-1-cl25vol01.6078.dump.1559618912:5434:unique=568214133 gluster-brick-1-cl25vol01.6078.dump.1559628467:6921:unique=568214133 [root at node0001 gluster]# date -d @1559617125 Tue Jun 4 10:58:45 CST 2019 [root at node0001 gluster]# date -d @1559628467 Tue Jun 4 14:07:47 CST 2019 1111MicrosoftInternetExplorer402DocumentNotSpecified7.8 ?Normal0 [root at node0001 gluster]# [global.callpool.stack.115] stack=0x7f8b342623c0 uid=500 gid=500 pid=-6 unique=568214133 lk-owner=faffffff op=stack type=0 cnt=4 [global.callpool.stack.115.frame.1] frame=0x7f8b1d6fb540 ref_count=0 translator=cl25vol01-index complete=0 parent=cl25vol01-quota wind_from=quota_getxattr wind_to=(this->children->xlator)->fops->getxattr unwind_to=default_getxattr_cbk [global.callpool.stack.115.frame.2] frame=0x7f8b30a14da0 ref_count=1 translator=cl25vol01-quota complete=0 parent=cl25vol01-io-stats wind_from=io_stats_getxattr wind_to=(this->children->xlator)->fops->getxattr unwind_to=io_stats_getxattr_cbk [global.callpool.stack.115.frame.3] frame=0x7f8b6debada0 ref_count=1 translator=cl25vol01-io-stats complete=0 parent=cl25vol01-server wind_from=server_getxattr_resume wind_to=FIRST_CHILD(this)->fops->getxattr unwind_to=server_getxattr_cbk [global.callpool.stack.115.frame.4] frame=0x7f8b21962a60 ref_count=1 translator=cl25vol01-server complete=0 I've checked the code logic and got nothing, any advice? I still have the scene on my side, so we can dig more. Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Tue Jun 4 11:48:02 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Tue, 4 Jun 2019 11:48:02 +0000 (UTC) Subject: [Gluster-users] Transport endpoint is not connected In-Reply-To: References: <20r8rlguxb86gpnxjwe3wpqw.1559189511842@email.android.com> Message-ID: <863936144.3309002.1559648882741@mail.yahoo.com> Hi David, You can ensure that 49152-49160 are opened in advance...You never know when you will need to deploy another Gluster Volume. best Regards,Strahil Nikolov ? ??????????, 3 ??? 2019 ?., 18:16:00 ?. ???????-4, David Cunningham ??????: Hello all, We confirmed that the network provider blocking port 49152 was the issue. Thanks for all the help. On Thu, 30 May 2019 at 16:11, Strahil wrote: You can try to run a ncat from gfs3: ncat -z -v gfs1 49152 ncat -z -v gfs2 49152 If ncat fails to connect ->? it's definately a firewall. Best Regards, Strahil Nikolov On May 30, 2019 01:33, David Cunningham wrote: Hi Ravi, I think it probably is a firewall issue with the network provider. I was hoping to see a specific connection failure message we could send to them, but will take it up with them anyway. Thanks for your help. On Wed, 29 May 2019 at 23:10, Ravishankar N wrote: I don't see a "Connected to gvol0-client-1" in the log.? Perhaps a firewall issue like the last time? Even in the earlier add-brick log from the other email thread, connection to the 2nd brick was not established. -Ravi On 29/05/19 2:26 PM, David Cunningham wrote: Hi Ravi and Joe, The command "gluster volume status gvol0" shows all 3 nodes as being online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, in which I can't see anything like a connection error. Would you have any further suggestions? Thank you. [root at gfs3 glusterfs]# gluster volume status gvol0 Status of volume: gvol0 Gluster process???????????????????????????? TCP Port? RDMA Port? Online? Pid ------------------------------------------------------------------------------ Brick gfs1:/nodirectwritedata/gluster/gvol0 49152???? 0????????? Y?????? 7706 Brick gfs2:/nodirectwritedata/gluster/gvol0 49152???? 0????????? Y?????? 7625 Brick gfs3:/nodirectwritedata/gluster/gvol0 49152???? 0????????? Y?????? 7307 Self-heal Daemon on localhost?????????????? N/A?????? N/A??????? Y?????? 7316 Self-heal Daemon on gfs1??????????????????? N/A?????? N/A??????? Y?????? 40591 Self-heal Daemon on gfs2??????????????????? N/A?????? N/A??????? Y?????? 7634 ? Task Status of Volume gvol0 ------------------------------------------------------------------------------ There are no active volume tasks On Wed, 29 May 2019 at 16:26, Ravishankar N wrote: On 29/05/19 6:21 AM, David Cunningham wrote: -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From khiremat at redhat.com Tue Jun 4 11:57:26 2019 From: khiremat at redhat.com (Kotresh Hiremath Ravishankar) Date: Tue, 4 Jun 2019 17:27:26 +0530 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: could you please try adding /usr/sbin to $PATH for user 'sas'? If it's bash, add 'export PATH=/usr/sbin:$PATH' in /home/sas/.bashrc On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan wrote: > Hi Kortesh > Please find the logs of the above error > *Master log snippet* > >> [2019-06-04 11:52:09.254731] I [resource(worker >> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing >> SSH connection between master and slave... >> [2019-06-04 11:52:09.308923] D [repce(worker >> /home/sas/gluster/data/code-misc):196:push] RepceClient: call >> 89724:139652759443264:1559649129.31 __repce_version__() ... >> [2019-06-04 11:52:09.602792] E [syncdutils(worker >> /home/sas/gluster/data/code-misc):311:log_raise_exception] : >> connection to peer is broken >> [2019-06-04 11:52:09.603312] E [syncdutils(worker >> /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error >> cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >> /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S >> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock >> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ >> 192.168.185.107::code-misc --master-node 192.168.185.106 >> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick >> /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- >> id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 >> --slave-log-level DEBUG --slave-gluster-log-level INFO >> --slave-gluster-command-dir /usr/sbin error=1 >> [2019-06-04 11:52:09.614996] I [repce(agent >> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating >> on reaching EOF. >> [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor: >> worker(/home/sas/gluster/data/code-misc) connected >> [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor: >> worker died in startup phase brick=/home/sas/gluster/data/code-misc >> [2019-06-04 11:52:09.619391] I >> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status >> Change status=Faulty >> > > *Slave log snippet* > >> [2019-06-04 11:50:09.782668] E [syncdutils(slave >> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: >> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >> [2019-06-04 11:50:11.188167] W [gsyncd(slave >> 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : >> Session config file not exists, using the default config >> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf >> [2019-06-04 11:50:11.201070] I [resource(slave >> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER: >> Mounting gluster volume locally... >> [2019-06-04 11:50:11.271231] E [resource(slave >> 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] >> MountbrokerMounter: glusterd answered mnt= >> [2019-06-04 11:50:11.271998] E [syncdutils(slave >> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: >> command returned error cmd=/usr/sbin/gluster --remote-host=localhost >> system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO >> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log >> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >> [2019-06-04 11:50:11.272113] E [syncdutils(slave >> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: >> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) > > > On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan > wrote: > >> Hi >> As discussed I have upgraded gluster from 4.1 to 6.2 version. But the Geo >> replication failed to start. >> Stays in faulty state >> >> On Fri, May 31, 2019, 5:32 PM deepu srinivasan >> wrote: >> >>> Checked the data. It remains in 2708. No progress. >>> >>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < >>> khiremat at redhat.com> wrote: >>> >>>> That means it could be working and the defunct process might be some >>>> old zombie one. Could you check, that data progress ? >>>> >>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan >>>> wrote: >>>> >>>>> Hi >>>>> When i change the rsync option the rsync process doesnt seem to start >>>>> . Only a defunt process is listed in ps aux. Only when i set rsync option >>>>> to " " and restart all the process the rsync process is listed in ps aux. >>>>> >>>>> >>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >>>>> khiremat at redhat.com> wrote: >>>>> >>>>>> Yes, rsync config option should have fixed this issue. >>>>>> >>>>>> Could you share the output of the following? >>>>>> >>>>>> 1. gluster volume geo-replication :: >>>>>> config rsync-options >>>>>> 2. ps -ef | grep rsync >>>>>> >>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan >>>>>> wrote: >>>>>> >>>>>>> Done. >>>>>>> We got the following result . >>>>>>> >>>>>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>>>>> failed: No such file or directory (2)", 128 >>>>>>> >>>>>>> seems like a file is missing ? >>>>>>> >>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>>>>> khiremat at redhat.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Could you take the strace with with more string size? The argument >>>>>>>> strings are truncated. >>>>>>>> >>>>>>>> strace -s 500 -ttt -T -p >>>>>>>> >>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan < >>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Kotresh >>>>>>>>> The above-mentioned work around did not work properly. >>>>>>>>> >>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan < >>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Kotresh >>>>>>>>>> We have tried the above-mentioned rsync option and we are >>>>>>>>>> planning to have the version upgrade to 6.0. >>>>>>>>>> >>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> This looks like the hang because stderr buffer filled up with >>>>>>>>>>> errors messages and no one reading it. >>>>>>>>>>> I think this issue is fixed in latest releases. As a workaround, >>>>>>>>>>> you can do following and check if it works. >>>>>>>>>>> >>>>>>>>>>> Prerequisite: >>>>>>>>>>> rsync version should be > 3.1.0 >>>>>>>>>>> >>>>>>>>>>> Workaround: >>>>>>>>>>> gluster volume geo-replication >>>>>>>>>>> :: config rsync-options "--ignore-missing- >>>>>>>>>>> args" >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Kotresh HR >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi >>>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs one >>>>>>>>>>>> is in US west and one is in US east. We took multiple trials for different >>>>>>>>>>>> file size. >>>>>>>>>>>> The Geo Replication tends to stop replicating but while >>>>>>>>>>>> checking the status it appears to be in Active state. But the slave volume >>>>>>>>>>>> did not increase in size. >>>>>>>>>>>> So we have restarted the geo-replication session and checked >>>>>>>>>>>> the status. The status was in an active state and it was in History Crawl >>>>>>>>>>>> for a long time. We have enabled the DEBUG mode in logging and checked for >>>>>>>>>>>> any error. >>>>>>>>>>>> There was around 2000 file appeared for syncing candidate. The >>>>>>>>>>>> Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>>>>>> this problem? Is there anyway to debug it? >>>>>>>>>>>> >>>>>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>>>>> it displays something like this >>>>>>>>>>>> >>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> We are using the below specs >>>>>>>>>>>> >>>>>>>>>>>> Gluster version - 4.1.7 >>>>>>>>>>>> Sync mode - rsync >>>>>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Thanks and Regards, >>>>>>>>>>> Kotresh H R >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks and Regards, >>>>>>>> Kotresh H R >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Thanks and Regards, >>>>>> Kotresh H R >>>>>> >>>>> >>>> >>>> -- >>>> Thanks and Regards, >>>> Kotresh H R >>>> >>> -- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: From khiremat at redhat.com Tue Jun 4 17:49:55 2019 From: khiremat at redhat.com (Kotresh Hiremath Ravishankar) Date: Tue, 4 Jun 2019 23:19:55 +0530 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Ccing Sunny, who was investing similar issue. On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan wrote: > Have already added the path in bashrc . Still in faulty state > > On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> could you please try adding /usr/sbin to $PATH for user 'sas'? If it's >> bash, add 'export PATH=/usr/sbin:$PATH' in >> /home/sas/.bashrc >> >> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan >> wrote: >> >>> Hi Kortesh >>> Please find the logs of the above error >>> *Master log snippet* >>> >>>> [2019-06-04 11:52:09.254731] I [resource(worker >>>> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing >>>> SSH connection between master and slave... >>>> [2019-06-04 11:52:09.308923] D [repce(worker >>>> /home/sas/gluster/data/code-misc):196:push] RepceClient: call >>>> 89724:139652759443264:1559649129.31 __repce_version__() ... >>>> [2019-06-04 11:52:09.602792] E [syncdutils(worker >>>> /home/sas/gluster/data/code-misc):311:log_raise_exception] : >>>> connection to peer is broken >>>> [2019-06-04 11:52:09.603312] E [syncdutils(worker >>>> /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error >>>> cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >>>> /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S >>>> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock >>>> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ >>>> 192.168.185.107::code-misc --master-node 192.168.185.106 >>>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick >>>> /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- >>>> id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 >>>> --slave-log-level DEBUG --slave-gluster-log-level INFO >>>> --slave-gluster-command-dir /usr/sbin error=1 >>>> [2019-06-04 11:52:09.614996] I [repce(agent >>>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating >>>> on reaching EOF. >>>> [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor: >>>> worker(/home/sas/gluster/data/code-misc) connected >>>> [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor: >>>> worker died in startup phase brick=/home/sas/gluster/data/code-misc >>>> [2019-06-04 11:52:09.619391] I >>>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status >>>> Change status=Faulty >>>> >>> >>> *Slave log snippet* >>> >>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave >>>> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: >>>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave >>>> 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : >>>> Session config file not exists, using the default config >>>> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf >>>> [2019-06-04 11:50:11.201070] I [resource(slave >>>> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] >>>> GLUSTER: Mounting gluster volume locally... >>>> [2019-06-04 11:50:11.271231] E [resource(slave >>>> 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] >>>> MountbrokerMounter: glusterd answered mnt= >>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave >>>> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: >>>> command returned error cmd=/usr/sbin/gluster --remote-host=localhost >>>> system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO >>>> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log >>>> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave >>>> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: >>>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >>> >>> >>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan >>> wrote: >>> >>>> Hi >>>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But the >>>> Geo replication failed to start. >>>> Stays in faulty state >>>> >>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan >>>> wrote: >>>> >>>>> Checked the data. It remains in 2708. No progress. >>>>> >>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < >>>>> khiremat at redhat.com> wrote: >>>>> >>>>>> That means it could be working and the defunct process might be some >>>>>> old zombie one. Could you check, that data progress ? >>>>>> >>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan >>>>>> wrote: >>>>>> >>>>>>> Hi >>>>>>> When i change the rsync option the rsync process doesnt seem to >>>>>>> start . Only a defunt process is listed in ps aux. Only when i set rsync >>>>>>> option to " " and restart all the process the rsync process is listed in ps >>>>>>> aux. >>>>>>> >>>>>>> >>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >>>>>>> khiremat at redhat.com> wrote: >>>>>>> >>>>>>>> Yes, rsync config option should have fixed this issue. >>>>>>>> >>>>>>>> Could you share the output of the following? >>>>>>>> >>>>>>>> 1. gluster volume geo-replication >>>>>>>> :: config rsync-options >>>>>>>> 2. ps -ef | grep rsync >>>>>>>> >>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan < >>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>> >>>>>>>>> Done. >>>>>>>>> We got the following result . >>>>>>>>> >>>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>>>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>>>>>>> failed: No such file or directory (2)", 128 >>>>>>>>> >>>>>>>>> seems like a file is missing ? >>>>>>>>> >>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Could you take the strace with with more string size? The >>>>>>>>>> argument strings are truncated. >>>>>>>>>> >>>>>>>>>> strace -s 500 -ttt -T -p >>>>>>>>>> >>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan < >>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Kotresh >>>>>>>>>>> The above-mentioned work around did not work properly. >>>>>>>>>>> >>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan < >>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Kotresh >>>>>>>>>>>> We have tried the above-mentioned rsync option and we are >>>>>>>>>>>> planning to have the version upgrade to 6.0. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> This looks like the hang because stderr buffer filled up with >>>>>>>>>>>>> errors messages and no one reading it. >>>>>>>>>>>>> I think this issue is fixed in latest releases. As a >>>>>>>>>>>>> workaround, you can do following and check if it works. >>>>>>>>>>>>> >>>>>>>>>>>>> Prerequisite: >>>>>>>>>>>>> rsync version should be > 3.1.0 >>>>>>>>>>>>> >>>>>>>>>>>>> Workaround: >>>>>>>>>>>>> gluster volume geo-replication >>>>>>>>>>>>> :: config rsync-options "--ignore-missing >>>>>>>>>>>>> -args" >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Kotresh HR >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi >>>>>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs >>>>>>>>>>>>>> one is in US west and one is in US east. We took multiple trials for >>>>>>>>>>>>>> different file size. >>>>>>>>>>>>>> The Geo Replication tends to stop replicating but while >>>>>>>>>>>>>> checking the status it appears to be in Active state. But the slave volume >>>>>>>>>>>>>> did not increase in size. >>>>>>>>>>>>>> So we have restarted the geo-replication session and checked >>>>>>>>>>>>>> the status. The status was in an active state and it was in History Crawl >>>>>>>>>>>>>> for a long time. We have enabled the DEBUG mode in logging and checked for >>>>>>>>>>>>>> any error. >>>>>>>>>>>>>> There was around 2000 file appeared for syncing candidate. >>>>>>>>>>>>>> The Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>>>>>>>> this problem? Is there anyway to debug it? >>>>>>>>>>>>>> >>>>>>>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>>>>>>> it displays something like this >>>>>>>>>>>>>> >>>>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> We are using the below specs >>>>>>>>>>>>>> >>>>>>>>>>>>>> Gluster version - 4.1.7 >>>>>>>>>>>>>> Sync mode - rsync >>>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>> Kotresh H R >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Thanks and Regards, >>>>>>>>>> Kotresh H R >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks and Regards, >>>>>>>> Kotresh H R >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Thanks and Regards, >>>>>> Kotresh H R >>>>>> >>>>> >> >> -- >> Thanks and Regards, >> Kotresh H R >> > -- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.orth at gmail.com Tue Jun 4 22:08:34 2019 From: alan.orth at gmail.com (Alan Orth) Date: Wed, 5 Jun 2019 01:08:34 +0300 Subject: [Gluster-users] Does replace-brick migrate data? In-Reply-To: <0aa881db-a724-13be-ff63-6c346d7f55d8@redhat.com> References: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com> <0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com> <39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com> <0aa881db-a724-13be-ff63-6c346d7f55d8@redhat.com> Message-ID: Hi Ravi, You're right that I had mentioned using rsync to copy the brick content to a new host, but in the end I actually decided not to bring it up on a new brick. Instead I added the original brick back into the volume. So the xattrs and symlinks to .glusterfs on the original brick are fine. I think the problem probably lies with a remove-brick that got interrupted. A few weeks ago during the maintenance I had tried to remove a brick and then after twenty minutes and no obvious progress I stopped it?after that the bricks were still part of the volume. In the last few days I have run a fix-layout that took 26 hours and finished successfully. Then I started a full index heal and it has healed about 3.3 million files in a few days and I see a clear increase of network traffic from old brick host to new brick host over that time. Once the full index heal completes I will try to do a rebalance. Thank you, On Mon, Jun 3, 2019 at 7:40 PM Ravishankar N wrote: > > On 01/06/19 9:37 PM, Alan Orth wrote: > > Dear Ravi, > > The .glusterfs hardlinks/symlinks should be fine. I'm not sure how I could > verify them for six bricks and millions of files, though... :\ > > Hi Alan, > > The reason I asked this is because you had mentioned in one of your > earlier emails that when you moved content from the old brick to the new > one, you had skipped the .glusterfs directory. So I was assuming that when > you added back this new brick to the cluster, it might have been missing > the .glusterfs entries. If that is the cae, one way to verify could be to > check using a script if all files on the brick have a link-count of at > least 2 and all dirs have valid symlinks inside .glusterfs pointing to > themselves. > > > I had a small success in fixing some issues with duplicated files on the > FUSE mount point yesterday. I read quite a bit about the elastic hashing > algorithm that determines which files get placed on which bricks based on > the hash of their filename and the trusted.glusterfs.dht xattr on brick > directories (thanks to Joe Julian's blog post and Python script for showing > how it works?). With that knowledge I looked closer at one of the files > that was appearing as duplicated on the FUSE mount and found that it was > also duplicated on more than `replica 2` bricks. For this particular file I > found two "real" files and several zero-size files with > trusted.glusterfs.dht.linkto xattrs. Neither of the "real" files were on > the correct brick as far as the DHT layout is concerned, so I copied one of > them to the correct brick, deleted the others and their hard links, and did > a `stat` on the file from the FUSE mount point and it fixed itself. Yay! > > Could this have been caused by a replace-brick that got interrupted and > didn't finish re-labeling the xattrs? > > No, replace-brick only initiates AFR self-heal, which just copies the > contents from the other brick(s) of the *same* replica pair into the > replaced brick. The link-to files are created by DHT when you rename a > file from the client. If the new name hashes to a different brick, DHT > does not move the entire file there. It instead creates the link-to file > (the one with the dht.linkto xattrs) on the hashed subvol. The value of > this xattr points to the brick where the actual data is there (`getfattr -e > text` to see it for yourself). Perhaps you had attempted a rebalance or > remove-brick earlier and interrupted that? > > Should I be thinking of some heuristics to identify and fix these issues > with a script (incorrect brick placement), or is this something a fix > layout or repeated volume heals can fix? I've already completed a whole > heal on this particular volume this week and it did heal about 1,000,000 > files (mostly data and metadata, but about 20,000 entry heals as well). > > Maybe you should let the AFR self-heals complete first and then attempt a > full rebalance to take care of the dht link-to files. But if the files are > in millions, it could take quite some time to complete. > Regards, > Ravi > > Thanks for your support, > > ? https://joejulian.name/post/dht-misses-are-expensive/ > > On Fri, May 31, 2019 at 7:57 AM Ravishankar N > wrote: > >> >> On 31/05/19 3:20 AM, Alan Orth wrote: >> >> Dear Ravi, >> >> I spent a bit of time inspecting the xattrs on some files and directories >> on a few bricks for this volume and it looks a bit messy. Even if I could >> make sense of it for a few and potentially heal them manually, there are >> millions of files and directories in total so that's definitely not a >> scalable solution. After a few missteps with `replace-brick ... commit >> force` in the last week?one of which on a brick that was dead/offline?as >> well as some premature `remove-brick` commands, I'm unsure how how to >> proceed and I'm getting demotivated. It's scary how quickly things get out >> of hand in distributed systems... >> >> Hi Alan, >> The one good thing about gluster is it that the data is always available >> directly on the backed bricks even if your volume has inconsistencies at >> the gluster level. So theoretically, if your cluster is FUBAR, you could >> just create a new volume and copy all data onto it via its mount from the >> old volume's bricks. >> >> >> I had hoped that bringing the old brick back up would help, but by the >> time I added it again a few days had passed and all the brick-id's had >> changed due to the replace/remove brick commands, not to mention that the >> trusted.afr.$volume-client-xx values were now probably pointing to the >> wrong bricks (?). >> >> Anyways, a few hours ago I started a full heal on the volume and I see >> that there is a sustained 100MiB/sec of network traffic going from the old >> brick's host to the new one. The completed heals reported in the logs look >> promising too: >> >> Old brick host: >> >> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E >> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c >> 281614 Completed data selfheal >> 84 Completed entry selfheal >> 299648 Completed metadata selfheal >> >> New brick host: >> >> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E >> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c >> 198256 Completed data selfheal >> 16829 Completed entry selfheal >> 229664 Completed metadata selfheal >> >> So that's good I guess, though I have no idea how long it will take or if >> it will fix the "missing files" issue on the FUSE mount. I've increased >> cluster.shd-max-threads to 8 to hopefully speed up the heal process. >> >> The afr xattrs should not cause files to disappear from mount. If the >> xattr names do not match what each AFR subvol expects (for eg. in a replica >> 2 volume, trusted.afr.*-client-{0,1} for 1st subvol, client-{2,3} for 2nd >> subvol and so on - ) for its children then it won't heal the data, that is >> all. But in your case I see some inconsistencies like one brick having the >> actual file (licenseserver.cfg) and the other having a linkto file (the >> one with the dht.linkto xattr) *in the same replica pair*. >> >> >> I'd be happy for any advice or pointers, >> >> Did you check if the .glusterfs hardlinks/symlinks exist and are in order >> for all bricks? >> >> -Ravi >> >> >> On Wed, May 29, 2019 at 5:20 PM Alan Orth wrote: >> >>> Dear Ravi, >>> >>> Thank you for the link to the blog post series?it is very informative >>> and current! If I understand your blog post correctly then I think the >>> answer to your previous question about pending AFRs is: no, there are no >>> pending AFRs. I have identified one file that is a good test case to try to >>> understand what happened after I issued the `gluster volume replace-brick >>> ... commit force` a few days ago and then added the same original brick >>> back to the volume later. This is the current state of the replica 2 >>> distribute/replicate volume: >>> >>> [root at wingu0 ~]# gluster volume info apps >>> >>> Volume Name: apps >>> Type: Distributed-Replicate >>> Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 3 x 2 = 6 >>> Transport-type: tcp >>> Bricks: >>> Brick1: wingu3:/mnt/gluster/apps >>> Brick2: wingu4:/mnt/gluster/apps >>> Brick3: wingu05:/data/glusterfs/sdb/apps >>> Brick4: wingu06:/data/glusterfs/sdb/apps >>> Brick5: wingu0:/mnt/gluster/apps >>> Brick6: wingu05:/data/glusterfs/sdc/apps >>> Options Reconfigured: >>> diagnostics.client-log-level: DEBUG >>> storage.health-check-interval: 10 >>> nfs.disable: on >>> >>> I checked the xattrs of one file that is missing from the volume's FUSE >>> mount (though I can read it if I access its full path explicitly), but is >>> present in several of the volume's bricks (some with full size, others >>> empty): >>> >>> [root at wingu0 ~]# getfattr -d -m. -e hex >>> /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg >>> >>> getfattr: Removing leading '/' from absolute path names >>> # file: mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg >>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>> trusted.afr.apps-client-3=0x000000000000000000000000 >>> trusted.afr.apps-client-5=0x000000000000000000000000 >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.bit-rot.version=0x0200000000000000585a396f00046e15 >>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>> >>> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >>> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 >>> >>> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg >>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >>> >>> [root at wingu06 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >>> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 >>> >>> According to the trusted.afr.apps-client-xx xattrs this particular file >>> should be on bricks with id "apps-client-3" and "apps-client-5". It took me >>> a few hours to realize that the brick-id values are recorded in the >>> volume's volfiles in /var/lib/glusterd/vols/apps/bricks. After comparing >>> those brick-id values with a volfile backup from before the replace-brick, >>> I realized that the files are simply on the wrong brick now as far as >>> Gluster is concerned. This particular file is now on the brick for >>> "apps-client-4". As an experiment I copied this one file to the two >>> bricks listed in the xattrs and I was then able to see the file from the >>> FUSE mount (yay!). >>> >>> Other than replacing the brick, removing it, and then adding the old >>> brick on the original server back, there has been no change in the data >>> this entire time. Can I change the brick IDs in the volfiles so they >>> reflect where the data actually is? Or perhaps script something to reset >>> all the xattrs on the files/directories to point to the correct bricks? >>> >>> Thank you for any help or pointers, >>> >>> On Wed, May 29, 2019 at 7:24 AM Ravishankar N >>> wrote: >>> >>>> >>>> On 29/05/19 9:50 AM, Ravishankar N wrote: >>>> >>>> >>>> On 29/05/19 3:59 AM, Alan Orth wrote: >>>> >>>> Dear Ravishankar, >>>> >>>> I'm not sure if Brick4 had pending AFRs because I don't know what that >>>> means and it's been a few days so I am not sure I would be able to find >>>> that information. >>>> >>>> When you find some time, have a look at a blog >>>> series I wrote about AFR- I've tried to explain what one needs to know to >>>> debug replication related issues in it. >>>> >>>> Made a typo error. The URL for the blog is https://wp.me/peiBB-6b >>>> >>>> -Ravi >>>> >>>> >>>> Anyways, after wasting a few days rsyncing the old brick to a new host >>>> I decided to just try to add the old brick back into the volume instead of >>>> bringing it up on the new host. I created a new brick directory on the old >>>> host, moved the old brick's contents into that new directory (minus the >>>> .glusterfs directory), added the new brick to the volume, and then did >>>> Vlad's find/stat trick? from the brick to the FUSE mount point. >>>> >>>> The interesting problem I have now is that some files don't appear in >>>> the FUSE mount's directory listings, but I can actually list them directly >>>> and even read them. What could cause that? >>>> >>>> Not sure, too many variables in the hacks that you did to take a guess. >>>> You can check if the contents of the .glusterfs folder are in order on the >>>> new brick (example hardlink for files and symlinks for directories are >>>> present etc.) . >>>> Regards, >>>> Ravi >>>> >>>> >>>> Thanks, >>>> >>>> ? >>>> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html >>>> >>>> On Fri, May 24, 2019 at 4:59 PM Ravishankar N >>>> wrote: >>>> >>>>> >>>>> On 23/05/19 2:40 AM, Alan Orth wrote: >>>>> >>>>> Dear list, >>>>> >>>>> I seem to have gotten into a tricky situation. Today I brought up a >>>>> shiny new server with new disk arrays and attempted to replace one brick of >>>>> a replica 2 distribute/replicate volume on an older server using the >>>>> `replace-brick` command: >>>>> >>>>> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes >>>>> wingu06:/data/glusterfs/sdb/homes commit force >>>>> >>>>> The command was successful and I see the new brick in the output of >>>>> `gluster volume info`. The problem is that Gluster doesn't seem to be >>>>> migrating the data, >>>>> >>>>> `replace-brick` definitely must heal (not migrate) the data. In your >>>>> case, data must have been healed from Brick-4 to the replaced Brick-3. Are >>>>> there any errors in the self-heal daemon logs of Brick-4's node? Does >>>>> Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit out of >>>>> date. replace-brick command internally does all the setfattr steps that are >>>>> mentioned in the doc. >>>>> >>>>> -Ravi >>>>> >>>>> >>>>> and now the original brick that I replaced is no longer part of the >>>>> volume (and a few terabytes of data are just sitting on the old brick): >>>>> >>>>> # gluster volume info homes | grep -E "Brick[0-9]:" >>>>> Brick1: wingu4:/mnt/gluster/homes >>>>> Brick2: wingu3:/mnt/gluster/homes >>>>> Brick3: wingu06:/data/glusterfs/sdb/homes >>>>> Brick4: wingu05:/data/glusterfs/sdb/homes >>>>> Brick5: wingu05:/data/glusterfs/sdc/homes >>>>> Brick6: wingu06:/data/glusterfs/sdc/homes >>>>> >>>>> I see the Gluster docs have a more complicated procedure for replacing >>>>> bricks that involves getfattr/setfattr?. How can I tell Gluster about the >>>>> old brick? I see that I have a backup of the old volfile thanks to yum's >>>>> rpmsave function if that helps. >>>>> >>>>> We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can >>>>> give. >>>>> >>>>> ? >>>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick >>>>> >>>>> -- >>>>> Alan Orth >>>>> alan.orth at gmail.com >>>>> https://picturingjordan.com >>>>> https://englishbulgaria.net >>>>> https://mjanja.ch >>>>> "In heaven all the interesting people are missing." ?Friedrich >>>>> Nietzsche >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >>>> >>>> -- >>>> Alan Orth >>>> alan.orth at gmail.com >>>> https://picturingjordan.com >>>> https://englishbulgaria.net >>>> https://mjanja.ch >>>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>> >>> -- >>> Alan Orth >>> alan.orth at gmail.com >>> https://picturingjordan.com >>> https://englishbulgaria.net >>> https://mjanja.ch >>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >>> >> >> >> -- >> Alan Orth >> alan.orth at gmail.com >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >> >> > > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche > > -- Alan Orth alan.orth at gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed Jun 5 07:00:16 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 5 Jun 2019 12:30:16 +0530 Subject: [Gluster-users] Update: GlusterFS code coverage Message-ID: All, I just wanted to update everyone about one of the initiatives we have undertaken, ie, increasing the overall code coverage of GlusterFS above 70%. You can have a look at current code coverage here: https://build.gluster.org/job/line-coverage/lastCompletedBuild/Line_20Coverage_20Report/ (This shows the latest all the time) The daily job, and its details are captured @ https://build.gluster.org/job/line-coverage/ When we started focus on code coverage 3 months back, our code coverage was around 60% overall. We kept the ambitious goal of increasing the code coverage by 10% before glusterfs-7.0 release, and I am happy to announce that we met this goal, before the branching. Before talking about next goals, I want to thank and call out few developers who made this happen. * Xavier Hernandez - Made EC cross 90% from < 70%. * Glusterd Team (Sanju, Rishub, Mohit, Atin) - Increased CLI/glusterd coverage * Geo-Rep Team (Kotresh, Sunny, Shwetha, Aravinda). * Sheetal (help to increase glfs-api test cases, which indirectly helped cover more code across). Also note that, Some components like AFR/replicate was already at 80%+ before we started the efforts. Now, our next goal is to make sure we have above 80% functions coverage in all of the top level components shown. Once that is done, we will focus on 75% code coverage across all components. (ie, no 'Red' in top level page). While it was possible to meet our goal of increasing the overall code coverage from 60% - 70%, increasing it above 70% is not going to be easy, mainly because it involves adding more tests for negative test cases, and adding tests with different options (currently >300 of them across). We also need to look at details from code coverage tests, and reverse engineer to see how to write a test to hit the particular line in the code. I personally invite everyone who is interested to contribute to gluster project to get involved in this effort. Help us write test cases, suggest how to improve it. Help by assigning interns write them for us (if your team has some of them). This is a good way to understand glusterfs code too. We are happy to organize sessions on how to walk through the code etc if required. Happy to hear feedback and see more contribution in this area. Regards, Amar -------------- next part -------------- An HTML attachment was scrubbed... URL: From emayoral at arsys.es Wed Jun 5 09:27:16 2019 From: emayoral at arsys.es (Eduardo Mayoral) Date: Wed, 5 Jun 2019 11:27:16 +0200 Subject: [Gluster-users] Advice for setup: SW RAID 6 vs JBOD Message-ID: Hi, ??? I am looking into a new gluster deployment to replace an ancient one. ??? For this deployment I will be using some repurposed servers I already have in stock. The disk specs are 12 * 3 TB SATA disks. No HW RAID controller. They also have some SSD which would be nice to leverage as cache or similar to improve performance, since it is already there. Advice on how to leverage the SSDs would be greatly appreciated. ??? One of the design choices I have to make is using 3 nodes for a replica-3 with JBOD, or using 2 nodes with a replica-2 and using SW RAID 6 for the disks, maybe adding a 3rd node with a smaller amount of disk as metadata node for the replica set. I would love to hear advice on the pros and cons of each setup from the gluster experts. ??? The data will be accessed from 4 to 6 systems with native gluster, not sure if that makes any difference. ??? The amount of data I have to store there is currently 20 TB, with moderate growth. iops are quite low so high performance is not an issue. The data will fit in any of the two setups. ??? Thanks in advance for your advice! -- Eduardo Mayoral Jimeno Systems engineer, platform department. Arsys Internet. emayoral at arsys.es - +34 941 620 105 - ext 2153 From hunter86_bg at yahoo.com Wed Jun 5 12:15:34 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Wed, 5 Jun 2019 12:15:34 +0000 (UTC) Subject: [Gluster-users] Advice for setup: SW RAID 6 vs JBOD In-Reply-To: References: Message-ID: <1735787204.221988.1559736934501@mail.yahoo.com> Hi Eduardo, >??? I am looking into a new gluster deployment to replace an ancient one. ? >? For this deployment I will be using some repurposed servers I >already have in stock. The disk specs are 12 * 3 TB SATA disks. No HW >RAID controller. They also have some SSD which would be nice to leverage >as cache or similar to improve performance, since it is already there. >Advice on how to leverage the SSDs would be greatly appreciated. Gluster Tiering was dropped in favour of the LVM cache.keep in mind that in RHEL/CentOS 7 you should be careful for migration_threshold value sometimes is smaller than the chunk size.For details check: https://bugzilla.redhat.com/show_bug.cgi?id=1668163 >??? One of the design choices I have to make is using 3 nodes for a >replica-3 with JBOD, or using 2 nodes with a replica-2 and using SW RAID >6 for the disks, maybe adding a 3rd node with a smaller amount of disk >as metadata node for the replica set. I would love to hear advice on the >pros and cons of each setup from the gluster experts. If you go with replica3 - your reads will be from 3 servers - thus higher speedsIf you chose replica2 - you will eventually enter a split brain (Not a good one)If you choose replica2 arbiter1 (old replica 3 arbiter1) - you will read from only 2 servers , but save bandwidth. keep in mind that you need high-bandwidth NICs (as bonding/teaming is balancing based on MAC, IP and Port which in your case will all be the same)Another option is to use GlusterD2 with replica2 and remote arbiter (for example in the cloud or somewhere away). This setup does not require the arbiter to responce in a timely manner and is used only if 1 data brick is down. ?> ? The data will be accessed from 4 to 6 systems with native gluster, >not sure if that makes any difference. ? >? The amount of data I have to store there is currently 20 TB, with >moderate growth. iops are quite low so high performance is not an issue. >The data will fit in any of the two setups. I would go with replica3 if NICs are 10gbit/s or bigger and replica2 arbiter1 if NICs are smaller.GlusterD2 is still new and might be too risky for production (Gluster Devs can correct me here). My current setup is with Gluster v6.1 on Ovirt in a replica2 arbiter1 with 6 NICs x 1gbit/s ports (consumer grade) and in order to overcome the load-balancing issue , I'm using multiple thin LVs ontop a single NVMe - each LV is a gluster brick . Each gluster? volume has a separate tcp port and thus the teaming device is load-balancing traffic on another NIC. This allows me to stripe my data on VM level , but this setup is only OK for labs . ?Best Regards,Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Wed Jun 5 13:52:56 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Wed, 5 Jun 2019 19:22:56 +0530 Subject: [Gluster-users] Memory leak in glusterfs In-Reply-To: References: Message-ID: Hi, Writing to a volume should not affect glusterd. The stack you have shown in the valgrind looks like the memory used to initialise the structures glusterd uses and will free only when it is stopped. Can you provide more details to what it is you are trying to test? Regards, Nithya On Tue, 4 Jun 2019 at 15:41, ABHISHEK PALIWAL wrote: > Hi Team, > > Please respond on the issue which I raised. > > Regards, > Abhishek > > On Fri, May 17, 2019 at 2:46 PM ABHISHEK PALIWAL > wrote: > >> Anyone please reply.... >> >> On Thu, May 16, 2019, 10:49 ABHISHEK PALIWAL >> wrote: >> >>> Hi Team, >>> >>> I upload some valgrind logs from my gluster 5.4 setup. This is writing >>> to the volume every 15 minutes. I stopped glusterd and then copy away the >>> logs. The test was running for some simulated days. They are zipped in >>> valgrind-54.zip. >>> >>> Lots of info in valgrind-2730.log. Lots of possibly lost bytes in >>> glusterfs and even some definitely lost bytes. >>> >>> ==2737== 1,572,880 bytes in 1 blocks are possibly lost in loss record >>> 391 of 391 >>> ==2737== at 0x4C29C25: calloc (in >>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==2737== by 0xA22485E: ??? (in >>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>> ==2737== by 0xA217C94: ??? (in >>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>> ==2737== by 0xA21D9F8: ??? (in >>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>> ==2737== by 0xA21DED9: ??? (in >>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>> ==2737== by 0xA21E685: ??? (in >>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>> ==2737== by 0xA1B9D8C: init (in >>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>> ==2737== by 0x4E511CE: xlator_init (in /usr/lib64/libglusterfs.so.0.0.1) >>> ==2737== by 0x4E8A2B8: ??? (in /usr/lib64/libglusterfs.so.0.0.1) >>> ==2737== by 0x4E8AAB3: glusterfs_graph_activate (in >>> /usr/lib64/libglusterfs.so.0.0.1) >>> ==2737== by 0x409C35: glusterfs_process_volfp (in /usr/sbin/glusterfsd) >>> ==2737== by 0x409D99: glusterfs_volumes_init (in /usr/sbin/glusterfsd) >>> ==2737== >>> ==2737== LEAK SUMMARY: >>> ==2737== definitely lost: 1,053 bytes in 10 blocks >>> ==2737== indirectly lost: 317 bytes in 3 blocks >>> ==2737== possibly lost: 2,374,971 bytes in 524 blocks >>> ==2737== still reachable: 53,277 bytes in 201 blocks >>> ==2737== suppressed: 0 bytes in 0 blocks >>> >>> -- >>> >>> >>> >>> >>> Regards >>> Abhishek Paliwal >>> >> > > -- > > > > > Regards > Abhishek Paliwal > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From khiremat at redhat.com Thu Jun 6 04:58:43 2019 From: khiremat at redhat.com (Kotresh Hiremath Ravishankar) Date: Thu, 6 Jun 2019 10:28:43 +0530 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Hi, I think the steps to setup non-root geo-rep is not followed properly. The following entry is missing in glusterd vol file which is required. The message "E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file" repeated 33 times between [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] Could you please the steps from below? https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/administration_guide/index#Setting_Up_the_Environment_for_a_Secure_Geo-replication_Slave And let us know if you still face the issue. On Thu, Jun 6, 2019 at 10:24 AM deepu srinivasan wrote: > Hi Kotresh, Sunny > I Have mailed the logs I found in one of the slave machines. Is there > anything to do with permission? Please help. > > On Wed, Jun 5, 2019 at 2:28 PM deepu srinivasan > wrote: > >> Hi Kotresh, Sunny >> Found this log in the slave machine. >> >>> [2019-06-05 08:49:10.632583] I [MSGID: 106488] >>> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: >>> Received get vol req >>> >>> The message "I [MSGID: 106488] >>> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: >>> Received get vol req" repeated 2 times between [2019-06-05 08:49:10.632583] >>> and [2019-06-05 08:49:10.670863] >>> >>> The message "I [MSGID: 106496] >>> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received >>> mount req" repeated 34 times between [2019-06-05 08:48:41.005398] and >>> [2019-06-05 08:50:37.254063] >>> >>> The message "E [MSGID: 106061] >>> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option >>> mountbroker-root' missing in glusterd vol file" repeated 34 times between >>> [2019-06-05 08:48:41.005434] and [2019-06-05 08:50:37.254079] >>> >>> The message "W [MSGID: 106176] >>> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful >>> mount request [No such file or directory]" repeated 34 times between >>> [2019-06-05 08:48:41.005444] and [2019-06-05 08:50:37.254080] >>> >>> [2019-06-05 08:50:46.361347] I [MSGID: 106496] >>> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received >>> mount req >>> >>> [2019-06-05 08:50:46.361384] E [MSGID: 106061] >>> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option >>> mountbroker-root' missing in glusterd vol file >>> >>> [2019-06-05 08:50:46.361419] W [MSGID: 106176] >>> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful >>> mount request [No such file or directory] >>> >>> The message "I [MSGID: 106496] >>> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received >>> mount req" repeated 33 times between [2019-06-05 08:50:46.361347] and >>> [2019-06-05 08:52:34.019741] >>> >>> The message "E [MSGID: 106061] >>> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option >>> mountbroker-root' missing in glusterd vol file" repeated 33 times between >>> [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] >>> >>> The message "W [MSGID: 106176] >>> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful >>> mount request [No such file or directory]" repeated 33 times between >>> [2019-06-05 08:50:46.361419] and [2019-06-05 08:52:34.019758] >>> >>> [2019-06-05 08:52:44.426839] I [MSGID: 106496] >>> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received >>> mount req >>> >>> [2019-06-05 08:52:44.426886] E [MSGID: 106061] >>> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option >>> mountbroker-root' missing in glusterd vol file >>> >>> [2019-06-05 08:52:44.426896] W [MSGID: 106176] >>> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful >>> mount request [No such file or directory] >>> >> >> On Wed, Jun 5, 2019 at 1:06 AM deepu srinivasan >> wrote: >> >>> Thankyou Kotresh >>> >>> On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar < >>> khiremat at redhat.com> wrote: >>> >>>> Ccing Sunny, who was investing similar issue. >>>> >>>> On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan >>>> wrote: >>>> >>>>> Have already added the path in bashrc . Still in faulty state >>>>> >>>>> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar < >>>>> khiremat at redhat.com> wrote: >>>>> >>>>>> could you please try adding /usr/sbin to $PATH for user 'sas'? If >>>>>> it's bash, add 'export PATH=/usr/sbin:$PATH' in >>>>>> /home/sas/.bashrc >>>>>> >>>>>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan >>>>>> wrote: >>>>>> >>>>>>> Hi Kortesh >>>>>>> Please find the logs of the above error >>>>>>> *Master log snippet* >>>>>>> >>>>>>>> [2019-06-04 11:52:09.254731] I [resource(worker >>>>>>>> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing >>>>>>>> SSH connection between master and slave... >>>>>>>> [2019-06-04 11:52:09.308923] D [repce(worker >>>>>>>> /home/sas/gluster/data/code-misc):196:push] RepceClient: call >>>>>>>> 89724:139652759443264:1559649129.31 __repce_version__() ... >>>>>>>> [2019-06-04 11:52:09.602792] E [syncdutils(worker >>>>>>>> /home/sas/gluster/data/code-misc):311:log_raise_exception] : >>>>>>>> connection to peer is broken >>>>>>>> [2019-06-04 11:52:09.603312] E [syncdutils(worker >>>>>>>> /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error >>>>>>>> cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >>>>>>>> /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S >>>>>>>> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock >>>>>>>> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc >>>>>>>> sas@ 192.168.185.107::code-misc --master-node 192.168.185.106 >>>>>>>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick >>>>>>>> /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- >>>>>>>> id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 >>>>>>>> --slave-log-level DEBUG --slave-gluster-log-level INFO >>>>>>>> --slave-gluster-command-dir /usr/sbin error=1 >>>>>>>> [2019-06-04 11:52:09.614996] I [repce(agent >>>>>>>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating >>>>>>>> on reaching EOF. >>>>>>>> [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] >>>>>>>> Monitor: worker(/home/sas/gluster/data/code-misc) connected >>>>>>>> [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] >>>>>>>> Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc >>>>>>>> [2019-06-04 11:52:09.619391] I >>>>>>>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status >>>>>>>> Change status=Faulty >>>>>>>> >>>>>>> >>>>>>> *Slave log snippet* >>>>>>> >>>>>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave >>>>>>>> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] >>>>>>>> Popen: /usr/sbin/gluster> 2 : failed with this errno (No such file or >>>>>>>> directory) >>>>>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave >>>>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : >>>>>>>> Session config file not exists, using the default config >>>>>>>> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf >>>>>>>> [2019-06-04 11:50:11.201070] I [resource(slave >>>>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] >>>>>>>> GLUSTER: Mounting gluster volume locally... >>>>>>>> [2019-06-04 11:50:11.271231] E [resource(slave >>>>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] >>>>>>>> MountbrokerMounter: glusterd answered mnt= >>>>>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave >>>>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] >>>>>>>> Popen: command returned error cmd=/usr/sbin/gluster --remote-host=localhost >>>>>>>> system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO >>>>>>>> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log >>>>>>>> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >>>>>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave >>>>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] >>>>>>>> Popen: /usr/sbin/gluster> 2 : failed with this errno (No such file or >>>>>>>> directory) >>>>>>> >>>>>>> >>>>>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan >>>>>>> wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But >>>>>>>> the Geo replication failed to start. >>>>>>>> Stays in faulty state >>>>>>>> >>>>>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Checked the data. It remains in 2708. No progress. >>>>>>>>> >>>>>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < >>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> That means it could be working and the defunct process might be >>>>>>>>>> some old zombie one. Could you check, that data progress ? >>>>>>>>>> >>>>>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan < >>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi >>>>>>>>>>> When i change the rsync option the rsync process doesnt seem to >>>>>>>>>>> start . Only a defunt process is listed in ps aux. Only when i set rsync >>>>>>>>>>> option to " " and restart all the process the rsync process is listed in ps >>>>>>>>>>> aux. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >>>>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Yes, rsync config option should have fixed this issue. >>>>>>>>>>>> >>>>>>>>>>>> Could you share the output of the following? >>>>>>>>>>>> >>>>>>>>>>>> 1. gluster volume geo-replication >>>>>>>>>>>> :: config rsync-options >>>>>>>>>>>> 2. ps -ef | grep rsync >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan < >>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Done. >>>>>>>>>>>>> We got the following result . >>>>>>>>>>>>> >>>>>>>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>>>>>>>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>>>>>>>>>>> failed: No such file or directory (2)", 128 >>>>>>>>>>>>> >>>>>>>>>>>>> seems like a file is missing ? >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>>>>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Could you take the strace with with more string size? The >>>>>>>>>>>>>> argument strings are truncated. >>>>>>>>>>>>>> >>>>>>>>>>>>>> strace -s 500 -ttt -T -p >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan < >>>>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Kotresh >>>>>>>>>>>>>>> The above-mentioned work around did not work properly. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan < >>>>>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Kotresh >>>>>>>>>>>>>>>> We have tried the above-mentioned rsync option and we are >>>>>>>>>>>>>>>> planning to have the version upgrade to 6.0. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath >>>>>>>>>>>>>>>> Ravishankar wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This looks like the hang because stderr buffer filled up >>>>>>>>>>>>>>>>> with errors messages and no one reading it. >>>>>>>>>>>>>>>>> I think this issue is fixed in latest releases. As a >>>>>>>>>>>>>>>>> workaround, you can do following and check if it works. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Prerequisite: >>>>>>>>>>>>>>>>> rsync version should be > 3.1.0 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Workaround: >>>>>>>>>>>>>>>>> gluster volume geo-replication >>>>>>>>>>>>>>>>> :: config rsync-options "--ignore- >>>>>>>>>>>>>>>>> missing-args" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Kotresh HR >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi >>>>>>>>>>>>>>>>>> We were evaluating Gluster geo Replication between two >>>>>>>>>>>>>>>>>> DCs one is in US west and one is in US east. We took multiple trials for >>>>>>>>>>>>>>>>>> different file size. >>>>>>>>>>>>>>>>>> The Geo Replication tends to stop replicating but while >>>>>>>>>>>>>>>>>> checking the status it appears to be in Active state. But the slave volume >>>>>>>>>>>>>>>>>> did not increase in size. >>>>>>>>>>>>>>>>>> So we have restarted the geo-replication session and >>>>>>>>>>>>>>>>>> checked the status. The status was in an active state and it was in History >>>>>>>>>>>>>>>>>> Crawl for a long time. We have enabled the DEBUG mode in logging and >>>>>>>>>>>>>>>>>> checked for any error. >>>>>>>>>>>>>>>>>> There was around 2000 file appeared for syncing >>>>>>>>>>>>>>>>>> candidate. The Rsync process starts but the rsync did not happen in the >>>>>>>>>>>>>>>>>> slave volume. Every time the rsync process appears in the "ps auxxx" list >>>>>>>>>>>>>>>>>> but the replication did not happen in the slave end. What would be the >>>>>>>>>>>>>>>>>> cause of this problem? Is there anyway to debug it? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>>>>>>>>>>> it displays something like this >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We are using the below specs >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Gluster version - 4.1.7 >>>>>>>>>>>>>>>>>> Sync mode - rsync >>>>>>>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>>>>>> Kotresh H R >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>>> Kotresh H R >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>> Kotresh H R >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Thanks and Regards, >>>>>>>>>> Kotresh H R >>>>>>>>>> >>>>>>>>> >>>>>> >>>>>> -- >>>>>> Thanks and Regards, >>>>>> Kotresh H R >>>>>> >>>>> >>>> >>>> -- >>>> Thanks and Regards, >>>> Kotresh H R >>>> >>> -- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: From sunkumar at redhat.com Thu Jun 6 05:04:46 2019 From: sunkumar at redhat.com (Sunny Kumar) Date: Thu, 6 Jun 2019 10:34:46 +0530 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Hi, Updated link for documentation : -- https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ You can use this tool as well: http://aravindavk.in/blog/gluster-georep-tools/ -Sunny On Thu, Jun 6, 2019 at 10:29 AM Kotresh Hiremath Ravishankar wrote: > > Hi, > > I think the steps to setup non-root geo-rep is not followed properly. The following entry is missing in glusterd vol file which is required. > > The message "E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file" repeated 33 times between [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] > > Could you please the steps from below? > > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/administration_guide/index#Setting_Up_the_Environment_for_a_Secure_Geo-replication_Slave > > And let us know if you still face the issue. > > > > > On Thu, Jun 6, 2019 at 10:24 AM deepu srinivasan wrote: >> >> Hi Kotresh, Sunny >> I Have mailed the logs I found in one of the slave machines. Is there anything to do with permission? Please help. >> >> On Wed, Jun 5, 2019 at 2:28 PM deepu srinivasan wrote: >>> >>> Hi Kotresh, Sunny >>> Found this log in the slave machine. >>>> >>>> [2019-06-05 08:49:10.632583] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req >>>> >>>> The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-06-05 08:49:10.632583] and [2019-06-05 08:49:10.670863] >>>> >>>> The message "I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req" repeated 34 times between [2019-06-05 08:48:41.005398] and [2019-06-05 08:50:37.254063] >>>> >>>> The message "E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file" repeated 34 times between [2019-06-05 08:48:41.005434] and [2019-06-05 08:50:37.254079] >>>> >>>> The message "W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory]" repeated 34 times between [2019-06-05 08:48:41.005444] and [2019-06-05 08:50:37.254080] >>>> >>>> [2019-06-05 08:50:46.361347] I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req >>>> >>>> [2019-06-05 08:50:46.361384] E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file >>>> >>>> [2019-06-05 08:50:46.361419] W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory] >>>> >>>> The message "I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req" repeated 33 times between [2019-06-05 08:50:46.361347] and [2019-06-05 08:52:34.019741] >>>> >>>> The message "E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file" repeated 33 times between [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] >>>> >>>> The message "W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory]" repeated 33 times between [2019-06-05 08:50:46.361419] and [2019-06-05 08:52:34.019758] >>>> >>>> [2019-06-05 08:52:44.426839] I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req >>>> >>>> [2019-06-05 08:52:44.426886] E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file >>>> >>>> [2019-06-05 08:52:44.426896] W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory] >>> >>> >>> On Wed, Jun 5, 2019 at 1:06 AM deepu srinivasan wrote: >>>> >>>> Thankyou Kotresh >>>> >>>> On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar wrote: >>>>> >>>>> Ccing Sunny, who was investing similar issue. >>>>> >>>>> On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan wrote: >>>>>> >>>>>> Have already added the path in bashrc . Still in faulty state >>>>>> >>>>>> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar wrote: >>>>>>> >>>>>>> could you please try adding /usr/sbin to $PATH for user 'sas'? If it's bash, add 'export PATH=/usr/sbin:$PATH' in >>>>>>> /home/sas/.bashrc >>>>>>> >>>>>>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan wrote: >>>>>>>> >>>>>>>> Hi Kortesh >>>>>>>> Please find the logs of the above error >>>>>>>> Master log snippet >>>>>>>>> >>>>>>>>> [2019-06-04 11:52:09.254731] I [resource(worker /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing SSH connection between master and slave... >>>>>>>>> [2019-06-04 11:52:09.308923] D [repce(worker /home/sas/gluster/data/code-misc):196:push] RepceClient: call 89724:139652759443264:1559649129.31 __repce_version__() ... >>>>>>>>> [2019-06-04 11:52:09.602792] E [syncdutils(worker /home/sas/gluster/data/code-misc):311:log_raise_exception] : connection to peer is broken >>>>>>>>> [2019-06-04 11:52:09.603312] E [syncdutils(worker /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ 192.168.185.107::code-misc --master-node 192.168.185.106 --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 --slave-log-level DEBUG --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbin error=1 >>>>>>>>> [2019-06-04 11:52:09.614996] I [repce(agent /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating on reaching EOF. >>>>>>>>> [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor: worker(/home/sas/gluster/data/code-misc) connected >>>>>>>>> [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc >>>>>>>>> [2019-06-04 11:52:09.619391] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty >>>>>>>> >>>>>>>> >>>>>>>> Slave log snippet >>>>>>>>> >>>>>>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >>>>>>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf >>>>>>>>> [2019-06-04 11:50:11.201070] I [resource(slave 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER: Mounting gluster volume locally... >>>>>>>>> [2019-06-04 11:50:11.271231] E [resource(slave 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] MountbrokerMounter: glusterd answered mnt= >>>>>>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error cmd=/usr/sbin/gluster --remote-host=localhost system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >>>>>>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan wrote: >>>>>>>>> >>>>>>>>> Hi >>>>>>>>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But the Geo replication failed to start. >>>>>>>>> Stays in faulty state >>>>>>>>> >>>>>>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan wrote: >>>>>>>>>> >>>>>>>>>> Checked the data. It remains in 2708. No progress. >>>>>>>>>> >>>>>>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar wrote: >>>>>>>>>>> >>>>>>>>>>> That means it could be working and the defunct process might be some old zombie one. Could you check, that data progress ? >>>>>>>>>>> >>>>>>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi >>>>>>>>>>>> When i change the rsync option the rsync process doesnt seem to start . Only a defunt process is listed in ps aux. Only when i set rsync option to " " and restart all the process the rsync process is listed in ps aux. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, rsync config option should have fixed this issue. >>>>>>>>>>>>> >>>>>>>>>>>>> Could you share the output of the following? >>>>>>>>>>>>> >>>>>>>>>>>>> 1. gluster volume geo-replication :: config rsync-options >>>>>>>>>>>>> 2. ps -ef | grep rsync >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Done. >>>>>>>>>>>>>> We got the following result . >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" failed: No such file or directory (2)", 128 >>>>>>>>>>>>>> >>>>>>>>>>>>>> seems like a file is missing ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Could you take the strace with with more string size? The argument strings are truncated. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> strace -s 500 -ttt -T -p >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Kotresh >>>>>>>>>>>>>>>> The above-mentioned work around did not work properly. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Kotresh >>>>>>>>>>>>>>>>> We have tried the above-mentioned rsync option and we are planning to have the version upgrade to 6.0. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This looks like the hang because stderr buffer filled up with errors messages and no one reading it. >>>>>>>>>>>>>>>>>> I think this issue is fixed in latest releases. As a workaround, you can do following and check if it works. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Prerequisite: >>>>>>>>>>>>>>>>>> rsync version should be > 3.1.0 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Workaround: >>>>>>>>>>>>>>>>>> gluster volume geo-replication :: config rsync-options "--ignore-missing-args" >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Kotresh HR >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi >>>>>>>>>>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs one is in US west and one is in US east. We took multiple trials for different file size. >>>>>>>>>>>>>>>>>>> The Geo Replication tends to stop replicating but while checking the status it appears to be in Active state. But the slave volume did not increase in size. >>>>>>>>>>>>>>>>>>> So we have restarted the geo-replication session and checked the status. The status was in an active state and it was in History Crawl for a long time. We have enabled the DEBUG mode in logging and checked for any error. >>>>>>>>>>>>>>>>>>> There was around 2000 file appeared for syncing candidate. The Rsync process starts but the rsync did not happen in the slave volume. Every time the rsync process appears in the "ps auxxx" list but the replication did not happen in the slave end. What would be the cause of this problem? Is there anyway to debug it? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>>>>>>>>>>>> it displays something like this >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> We are using the below specs >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Gluster version - 4.1.7 >>>>>>>>>>>>>>>>>>> Sync mode - rsync >>>>>>>>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>>>>>>> Kotresh H R >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>>>> Kotresh H R >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>> Kotresh H R >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Thanks and Regards, >>>>>>>>>>> Kotresh H R >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Thanks and Regards, >>>>>>> Kotresh H R >>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks and Regards, >>>>> Kotresh H R > > > > -- > Thanks and Regards, > Kotresh H R From abhishpaliwal at gmail.com Thu Jun 6 06:38:20 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Thu, 6 Jun 2019 12:08:20 +0530 Subject: [Gluster-users] Memory leak in glusterfs In-Reply-To: References: Message-ID: Hi Nithya, Here is the Setup details and test which we are doing as below: One client, two gluster Server. The client is writing and deleting one file each 15 minutes by script test_v4.15.sh. IP Server side: 128.224.98.157 /gluster/gv0/ 128.224.98.159 /gluster/gv0/ Client side: 128.224.98.160 /gluster_mount/ Server side: gluster volume create gv0 replica 2 128.224.98.157:/gluster/gv0/ 128.224.98.159:/gluster/gv0/ force gluster volume start gv0 root at 128:/tmp/brick/gv0# gluster volume info Volume Name: gv0 Type: Replicate Volume ID: 7105a475-5929-4d60-ba23-be57445d97b5 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 128.224.98.157:/gluster/gv0 Brick2: 128.224.98.159:/gluster/gv0 Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off exec script: ./ps_mem.py -p 605 -w 61 > log root at 128:/# ./ps_mem.py -p 605 Private + Shared = RAM used Program 23668.0 KiB + 1188.0 KiB = 24856.0 KiB glusterfsd --------------------------------- 24856.0 KiB ================================= Client side: mount -t glusterfs -o acl -o resolve-gids 128.224.98.157:gv0 /gluster_mount We are using the below script write and delete the file. *test_v4.15.sh * Also the below script to see the memory increase whihle the script is above script is running in background. *ps_mem.py* I am attaching the script files as well as the result got after testing the scenario. On Wed, Jun 5, 2019 at 7:23 PM Nithya Balachandran wrote: > Hi, > > Writing to a volume should not affect glusterd. The stack you have shown > in the valgrind looks like the memory used to initialise the structures > glusterd uses and will free only when it is stopped. > > Can you provide more details to what it is you are trying to test? > > Regards, > Nithya > > > On Tue, 4 Jun 2019 at 15:41, ABHISHEK PALIWAL > wrote: > >> Hi Team, >> >> Please respond on the issue which I raised. >> >> Regards, >> Abhishek >> >> On Fri, May 17, 2019 at 2:46 PM ABHISHEK PALIWAL >> wrote: >> >>> Anyone please reply.... >>> >>> On Thu, May 16, 2019, 10:49 ABHISHEK PALIWAL >>> wrote: >>> >>>> Hi Team, >>>> >>>> I upload some valgrind logs from my gluster 5.4 setup. This is writing >>>> to the volume every 15 minutes. I stopped glusterd and then copy away the >>>> logs. The test was running for some simulated days. They are zipped in >>>> valgrind-54.zip. >>>> >>>> Lots of info in valgrind-2730.log. Lots of possibly lost bytes in >>>> glusterfs and even some definitely lost bytes. >>>> >>>> ==2737== 1,572,880 bytes in 1 blocks are possibly lost in loss record >>>> 391 of 391 >>>> ==2737== at 0x4C29C25: calloc (in >>>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==2737== by 0xA22485E: ??? (in >>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>> ==2737== by 0xA217C94: ??? (in >>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>> ==2737== by 0xA21D9F8: ??? (in >>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>> ==2737== by 0xA21DED9: ??? (in >>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>> ==2737== by 0xA21E685: ??? (in >>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>> ==2737== by 0xA1B9D8C: init (in >>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>> ==2737== by 0x4E511CE: xlator_init (in /usr/lib64/libglusterfs.so.0.0.1) >>>> ==2737== by 0x4E8A2B8: ??? (in /usr/lib64/libglusterfs.so.0.0.1) >>>> ==2737== by 0x4E8AAB3: glusterfs_graph_activate (in >>>> /usr/lib64/libglusterfs.so.0.0.1) >>>> ==2737== by 0x409C35: glusterfs_process_volfp (in /usr/sbin/glusterfsd) >>>> ==2737== by 0x409D99: glusterfs_volumes_init (in /usr/sbin/glusterfsd) >>>> ==2737== >>>> ==2737== LEAK SUMMARY: >>>> ==2737== definitely lost: 1,053 bytes in 10 blocks >>>> ==2737== indirectly lost: 317 bytes in 3 blocks >>>> ==2737== possibly lost: 2,374,971 bytes in 524 blocks >>>> ==2737== still reachable: 53,277 bytes in 201 blocks >>>> ==2737== suppressed: 0 bytes in 0 blocks >>>> >>>> -- >>>> >>>> >>>> >>>> >>>> Regards >>>> Abhishek Paliwal >>>> >>> >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ps_mem.py Type: text/x-python Size: 18465 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_v4.15.sh Type: application/x-shellscript Size: 660 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ps_mem_server1.log Type: text/x-log Size: 135168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ps_mem_server2.log Type: text/x-log Size: 135168 bytes Desc: not available URL: From revirii at googlemail.com Thu Jun 6 06:53:49 2019 From: revirii at googlemail.com (Hu Bert) Date: Thu, 6 Jun 2019 08:53:49 +0200 Subject: [Gluster-users] Advice for setup: SW RAID 6 vs JBOD In-Reply-To: References: Message-ID: Good morning, my comment won't help you directly, but i thought i'd send it anyway... Our first glusterfs setup had 3 servers withs 4 disks=bricks (10TB, JBOD) each. Was running fine in the beginning, but then 1 disk failed. The following heal took ~1 month, with a bad performance (quite high IO). Shortly after the heal hat finished another disk failed -> same problems again. Not funny. For our new system we decided to use 3 servers with 10 disks (10 TB) each, but now the 10 disks in a SW RAID 10 (well, we split the 10 disks into 2 SW RAID 10, each of them is a brick, we have 2 gluster volumes). A lot of disk space "wasted", with this type of SW RAID and a replicate 3 setup, but we wanted to avoid the "healing takes a long time with bad performance" problems. Now mdadm takes care of replicating data, glusterfs should always see "good" bricks. And the decision may depend on what kind of data you have. Many small files, like tens of millions? Or not that much, but bigger files? I once watched a video (i think it was this one: https://www.youtube.com/watch?v=61HDVwttNYI). Recommendation there: RAID 6 or 10 for small files, for big files... well, already 2 years "old" ;-) As i said, this won't help you directly. You have to identify what's most important for your scenario; as you said, high performance is not an issue - if this is true even when you have slight performance issues after a disk fail then ok. My experience so far: the bigger and slower the disks are and the more data you have -> healing will hurt -> try to avoid this. If the disks are small and fast (SSDs), healing will be faster -> JBOD is an option. hth, Hubert Am Mi., 5. Juni 2019 um 11:33 Uhr schrieb Eduardo Mayoral : > > Hi, > > I am looking into a new gluster deployment to replace an ancient one. > > For this deployment I will be using some repurposed servers I > already have in stock. The disk specs are 12 * 3 TB SATA disks. No HW > RAID controller. They also have some SSD which would be nice to leverage > as cache or similar to improve performance, since it is already there. > Advice on how to leverage the SSDs would be greatly appreciated. > > One of the design choices I have to make is using 3 nodes for a > replica-3 with JBOD, or using 2 nodes with a replica-2 and using SW RAID > 6 for the disks, maybe adding a 3rd node with a smaller amount of disk > as metadata node for the replica set. I would love to hear advice on the > pros and cons of each setup from the gluster experts. > > The data will be accessed from 4 to 6 systems with native gluster, > not sure if that makes any difference. > > The amount of data I have to store there is currently 20 TB, with > moderate growth. iops are quite low so high performance is not an issue. > The data will fit in any of the two setups. > > Thanks in advance for your advice! > > -- > Eduardo Mayoral Jimeno > Systems engineer, platform department. Arsys Internet. > emayoral at arsys.es - +34 941 620 105 - ext 2153 > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From pkarampu at redhat.com Thu Jun 6 07:02:46 2019 From: pkarampu at redhat.com (Pranith Kumar Karampuri) Date: Thu, 6 Jun 2019 12:32:46 +0530 Subject: [Gluster-users] write request hung in write-behind In-Reply-To: <5cf5d239.1c69fb81.50f5a.c9f5SMTPIN_ADDED_BROKEN@mx.google.com> References: <5cf5d239.1c69fb81.50f5a.c9f5SMTPIN_ADDED_BROKEN@mx.google.com> Message-ID: On Tue, Jun 4, 2019 at 7:36 AM Xie Changlong wrote: > To me, all 'df' commands on specific(not all) nfs client hung forever. > The temporary solution is disable performance.nfs.write-behind and > cluster.eager-lock. > > I'll try to get more info back if encounter this problem again . > If you observe this issue again, take successive (at least a minute apart) statedumps of the processes and run https://github.com/gluster/glusterfs/blob/master/extras/identify-hangs.sh on it which will give the information about the hangs. > > > > ???: Raghavendra Gowdappa > ??: 2019/06/04(???)09:55 > ???: Xie Changlong ;Ravishankar Narayanankutty > ;Karampuri, Pranith ; > ???: gluster-users ; > ??: Re: Re: write request hung in write-behind > > > > On Mon, Jun 3, 2019 at 1:11 PM Xie Changlong wrote: > >> Firstly i correct myself, write request followed by 771(not 1545) FLUSH >> requests. I've attach gnfs dump file, totally 774 pending call-stacks, >> 771 of them pending on write-behind and the deepest call-stack is afr. >> > > +Ravishankar Narayanankutty +Karampuri, Pranith > > > Are you sure these were not call-stacks of in-progress ops? One way of > confirming that would be to take statedumps periodically (say 3 min apart). > Hung call stacks will be common to all the statedumps. > > >> [global.callpool.stack.771] >> stack=0x7f517f557f60 >> uid=0 >> gid=0 >> pid=0 >> unique=0 >> lk-owner= >> op=stack >> type=0 >> cnt=3 >> >> [global.callpool.stack.771.frame.1] >> frame=0x7f517f655880 >> ref_count=0 >> translator=cl35vol01-replicate-7 >> complete=0 >> parent=cl35vol01-dht >> wind_from=dht_writev >> wind_to=subvol->fops->writev >> unwind_to=dht_writev_cbk >> >> [global.callpool.stack.771.frame.2] >> frame=0x7f518ed90340 >> ref_count=1 >> translator=cl35vol01-dht >> complete=0 >> parent=cl35vol01-write-behind >> wind_from=wb_fulfill_head >> wind_to=FIRST_CHILD (frame->this)->fops->writev >> unwind_to=wb_fulfill_cbk >> >> [global.callpool.stack.771.frame.3] >> frame=0x7f516d3baf10 >> ref_count=1 >> translator=cl35vol01-write-behind >> complete=0 >> >> [global.callpool.stack.772] >> stack=0x7f51607a5a20 >> uid=0 >> gid=0 >> pid=0 >> unique=0 >> lk-owner=a0715b77517f0000 >> op=stack >> type=0 >> cnt=1 >> >> [global.callpool.stack.772.frame.1] >> frame=0x7f516ca2d1b0 >> ref_count=0 >> translator=cl35vol01-replicate-7 >> complete=0 >> >> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 >> glusterdump.20106.dump.1559038081 |grep translator | wc -l >> 774 >> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 >> glusterdump.20106.dump.1559038081 |grep complete |wc -l >> 774 >> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 >> glusterdump.20106.dump.1559038081 |grep -E "complete=0" |wc -l >> 774 >> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 >> glusterdump.20106.dump.1559038081 |grep translator | grep write-behind >> |wc -l >> 771 >> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 >> glusterdump.20106.dump.1559038081 |grep translator | grep replicate-7 | >> wc -l >> 2 >> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 >> glusterdump.20106.dump.1559038081 |grep translator | grep glusterfs | wc >> -l >> 1 >> >> >> >> >> ???: Raghavendra Gowdappa >> ??: 2019/06/03(???)14:46 >> ???: Xie Changlong ; >> ???: gluster-users ; >> ??: Re: write request hung in write-behind >> >> >> >> On Mon, Jun 3, 2019 at 11:57 AM Xie Changlong wrote: >> >>> Hi all >>> >>> Test gluster 3.8.4-54.15 gnfs, i saw a write request hung in >>> write-behind followed by 1545 FLUSH requests. I found a similar >>> bugfix https://bugzilla.redhat.com/show_bug.cgi?id=1626787, but not >>> sure if it's the right one. >>> >>> [xlator.performance.write-behind.wb_inode] >>> path=/575/1e/5751e318f21f605f2aac241bf042e7a8.jpg >>> inode=0x7f51775b71a0 >>> window_conf=1073741824 >>> window_current=293822 >>> transit-size=293822 >>> dontsync=0 >>> >>> [.WRITE] >>> request-ptr=0x7f516eec2060 >>> refcount=1 >>> wound=yes >>> generation-number=1 >>> req->op_ret=293822 >>> req->op_errno=0 >>> sync-attempts=1 >>> sync-in-progress=yes >>> >> >> Note that the sync is still in progress. This means, write-behind has >> wound the write-request to its children and yet to receive the response >> (unless there is a bug in accounting of sync-in-progress). So, its likely >> that there are callstacks into children of write-behind, which are not >> complete yet. Are you sure the deepest hung call-stack is in write-behind? >> Can you check for frames with "complete=0"? >> >> size=293822 >>> offset=1048576 >>> lied=-1 >>> append=0 >>> fulfilled=0 >>> go=-1 >>> >>> [.FLUSH] >>> request-ptr=0x7f517c2badf0 >>> refcount=1 >>> wound=no >>> generation-number=2 >>> req->op_ret=-1 >>> req->op_errno=116 >>> sync-attempts=0 >>> >>> [.FLUSH] >>> request-ptr=0x7f5173e9f7b0 >>> refcount=1 >>> wound=no >>> generation-number=2 >>> req->op_ret=0 >>> req->op_errno=0 >>> sync-attempts=0 >>> >>> [.FLUSH] >>> request-ptr=0x7f51640b8ca0 >>> refcount=1 >>> wound=no >>> generation-number=2 >>> req->op_ret=0 >>> req->op_errno=0 >>> sync-attempts=0 >>> >>> [.FLUSH] >>> request-ptr=0x7f516f3979d0 >>> refcount=1 >>> wound=no >>> generation-number=2 >>> req->op_ret=0 >>> req->op_errno=0 >>> sync-attempts=0 >>> >>> [.FLUSH] >>> request-ptr=0x7f516f6ac8d0 >>> refcount=1 >>> wound=no >>> generation-number=2 >>> req->op_ret=0 >>> req->op_errno=0 >>> sync-attempts=0 >>> >>> >>> Any comments would be appreciated! >>> >>> Thanks >>> -Xie >>> >>> >>> -- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Thu Jun 6 10:38:17 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Thu, 6 Jun 2019 16:08:17 +0530 Subject: [Gluster-users] Memory leak in glusterfs In-Reply-To: References: Message-ID: Hi Abhishek, I am still not clear as to the purpose of the tests. Can you clarify why you are using valgrind and why you think there is a memory leak? Regards, Nithya On Thu, 6 Jun 2019 at 12:09, ABHISHEK PALIWAL wrote: > Hi Nithya, > > Here is the Setup details and test which we are doing as below: > > > One client, two gluster Server. > The client is writing and deleting one file each 15 minutes by script > test_v4.15.sh. > > IP > Server side: > 128.224.98.157 /gluster/gv0/ > 128.224.98.159 /gluster/gv0/ > > Client side: > 128.224.98.160 /gluster_mount/ > > Server side: > gluster volume create gv0 replica 2 128.224.98.157:/gluster/gv0/ > 128.224.98.159:/gluster/gv0/ force > gluster volume start gv0 > > root at 128:/tmp/brick/gv0# gluster volume info > > Volume Name: gv0 > Type: Replicate > Volume ID: 7105a475-5929-4d60-ba23-be57445d97b5 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 128.224.98.157:/gluster/gv0 > Brick2: 128.224.98.159:/gluster/gv0 > Options Reconfigured: > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > > exec script: ./ps_mem.py -p 605 -w 61 > log > root at 128:/# ./ps_mem.py -p 605 > Private + Shared = RAM used Program > 23668.0 KiB + 1188.0 KiB = 24856.0 KiB glusterfsd > --------------------------------- > 24856.0 KiB > ================================= > > > Client side: > mount -t glusterfs -o acl -o resolve-gids 128.224.98.157:gv0 > /gluster_mount > > > We are using the below script write and delete the file. > > *test_v4.15.sh * > > Also the below script to see the memory increase whihle the script is > above script is running in background. > > *ps_mem.py* > > I am attaching the script files as well as the result got after testing > the scenario. > > On Wed, Jun 5, 2019 at 7:23 PM Nithya Balachandran > wrote: > >> Hi, >> >> Writing to a volume should not affect glusterd. The stack you have shown >> in the valgrind looks like the memory used to initialise the structures >> glusterd uses and will free only when it is stopped. >> >> Can you provide more details to what it is you are trying to test? >> >> Regards, >> Nithya >> >> >> On Tue, 4 Jun 2019 at 15:41, ABHISHEK PALIWAL >> wrote: >> >>> Hi Team, >>> >>> Please respond on the issue which I raised. >>> >>> Regards, >>> Abhishek >>> >>> On Fri, May 17, 2019 at 2:46 PM ABHISHEK PALIWAL < >>> abhishpaliwal at gmail.com> wrote: >>> >>>> Anyone please reply.... >>>> >>>> On Thu, May 16, 2019, 10:49 ABHISHEK PALIWAL >>>> wrote: >>>> >>>>> Hi Team, >>>>> >>>>> I upload some valgrind logs from my gluster 5.4 setup. This is writing >>>>> to the volume every 15 minutes. I stopped glusterd and then copy away the >>>>> logs. The test was running for some simulated days. They are zipped in >>>>> valgrind-54.zip. >>>>> >>>>> Lots of info in valgrind-2730.log. Lots of possibly lost bytes in >>>>> glusterfs and even some definitely lost bytes. >>>>> >>>>> ==2737== 1,572,880 bytes in 1 blocks are possibly lost in loss record >>>>> 391 of 391 >>>>> ==2737== at 0x4C29C25: calloc (in >>>>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >>>>> ==2737== by 0xA22485E: ??? (in >>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>> ==2737== by 0xA217C94: ??? (in >>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>> ==2737== by 0xA21D9F8: ??? (in >>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>> ==2737== by 0xA21DED9: ??? (in >>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>> ==2737== by 0xA21E685: ??? (in >>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>> ==2737== by 0xA1B9D8C: init (in >>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>> ==2737== by 0x4E511CE: xlator_init (in >>>>> /usr/lib64/libglusterfs.so.0.0.1) >>>>> ==2737== by 0x4E8A2B8: ??? (in /usr/lib64/libglusterfs.so.0.0.1) >>>>> ==2737== by 0x4E8AAB3: glusterfs_graph_activate (in >>>>> /usr/lib64/libglusterfs.so.0.0.1) >>>>> ==2737== by 0x409C35: glusterfs_process_volfp (in /usr/sbin/glusterfsd) >>>>> ==2737== by 0x409D99: glusterfs_volumes_init (in /usr/sbin/glusterfsd) >>>>> ==2737== >>>>> ==2737== LEAK SUMMARY: >>>>> ==2737== definitely lost: 1,053 bytes in 10 blocks >>>>> ==2737== indirectly lost: 317 bytes in 3 blocks >>>>> ==2737== possibly lost: 2,374,971 bytes in 524 blocks >>>>> ==2737== still reachable: 53,277 bytes in 201 blocks >>>>> ==2737== suppressed: 0 bytes in 0 blocks >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>>> >>>>> Regards >>>>> Abhishek Paliwal >>>>> >>>> >>> >>> -- >>> >>> >>> >>> >>> Regards >>> Abhishek Paliwal >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > > > > > Regards > Abhishek Paliwal > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sunkumar at redhat.com Thu Jun 6 10:40:16 2019 From: sunkumar at redhat.com (Sunny Kumar) Date: Thu, 6 Jun 2019 16:10:16 +0530 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Above error can be tracked here: https://bugzilla.redhat.com/show_bug.cgi?id=1709248 and patch link: https://review.gluster.org/#/c/glusterfs/+/22716/ You can apply patch and test it however its waiting on regression to pass and merge. -Sunny On Thu, Jun 6, 2019 at 4:00 PM deepu srinivasan wrote: > > Hi > I have followed the following steps to create the geo-replication but the status seems to be in a faulty state. > > Steps : > > Installed cluster version 5.6 in totally six nodes. >> >> glusterfs 5.6 >> >> Repository revision: git://git.gluster.org/glusterfs.git >> >> Copyright (c) 2006-2016 Red Hat, Inc. >> >> GlusterFS comes with ABSOLUTELY NO WARRANTY. >> >> It is licensed to you under your choice of the GNU Lesser >> >> General Public License, version 3 or any later version (LGPLv3 >> >> or later), or the GNU General Public License, version 2 (GPLv2), >> >> in all cases as published by the Free Software Foundation > > > peer_probed the first three nodes and second three nodes. > > > > Added new volume in both the clusters > > > > execute gluster-mountbroker commands and restarted glusterd. >> >> gluster-mountbroker setup /var/mountbroker-root sas >> >> gluster-mountbroker remove --volume code-misc --user sas > > > configured a passwordless sssh from master to slave >> >> ssh-keygen; ssh-copy-id sas at 192.168.185.107 > > created a common pem pub file >> >> gluster system:: execute gsec_create > > created geo-replication session. >> >> gluster volume geo-replication code-misc sas at 192.168.185.107::code-misc create push-pem > > executed the following command in slave >> >> /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh sas code-misc code-misc > > started the gluster geo-replication. >> >> gluster volume geo-replication code-misc sas at 192.168.185.107::code-misc start > > > Now the geo-replication works fine. > Tested with 2000 files All seems to sync finely. > > Now I updated all the node to version 6.2 by using rpms which were built by the source code in a docker container in my personal machine. > > >> gluster --version >> >> glusterfs 6.2 >> >> Repository revision: git://git.gluster.org/glusterfs.git >> >> Copyright (c) 2006-2016 Red Hat, Inc. >> >> GlusterFS comes with ABSOLUTELY NO WARRANTY. >> >> It is licensed to you under your choice of the GNU Lesser >> >> General Public License, version 3 or any later version (LGPLv3 >> >> or later), or the GNU General Public License, version 2 (GPLv2), >> >> in all cases as published by the Free Software Foundation. > > > I have stopped the glusterd daemons in all the node along with the volume and geo-replication. > Now I started the daemons, volume and geo-replication session the status seems to be faulty. > Also noted that the result of "gluster-mountbroker status" command always end in python exception like this >> >> Traceback (most recent call last): >> >> File "/usr/sbin/gluster-mountbroker", line 396, in >> >> runcli() >> >> File "/usr/lib/python2.7/site-packages/gluster/cliutils/cliutils.py", line 225, in runcli >> >> cls.run(args) >> >> File "/usr/sbin/gluster-mountbroker", line 275, in run >> >> out = execute_in_peers("node-status") >> >> File "/usr/lib/python2.7/site-packages/gluster/cliutils/cliutils.py", line 127, in execute_in_peers >> >> raise GlusterCmdException((rc, out, err, " ".join(cmd))) >> >> gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to end. Error : Success\n', 'gluster system:: execute mountbroker.py node-status') > > > Is it I or everyone gets an error for gluster-mountbroker command for gluster version greater than 6.0?. Please help. > > Thank you > Deepak > > > On Thu, Jun 6, 2019 at 10:35 AM Sunny Kumar wrote: >> >> Hi, >> >> Updated link for documentation : >> >> -- https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ >> >> You can use this tool as well: >> http://aravindavk.in/blog/gluster-georep-tools/ >> >> -Sunny >> >> On Thu, Jun 6, 2019 at 10:29 AM Kotresh Hiremath Ravishankar >> wrote: >> > >> > Hi, >> > >> > I think the steps to setup non-root geo-rep is not followed properly. The following entry is missing in glusterd vol file which is required. >> > >> > The message "E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file" repeated 33 times between [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] >> > >> > Could you please the steps from below? >> > >> > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/administration_guide/index#Setting_Up_the_Environment_for_a_Secure_Geo-replication_Slave >> > >> > And let us know if you still face the issue. >> > >> > >> > >> > >> > On Thu, Jun 6, 2019 at 10:24 AM deepu srinivasan wrote: >> >> >> >> Hi Kotresh, Sunny >> >> I Have mailed the logs I found in one of the slave machines. Is there anything to do with permission? Please help. >> >> >> >> On Wed, Jun 5, 2019 at 2:28 PM deepu srinivasan wrote: >> >>> >> >>> Hi Kotresh, Sunny >> >>> Found this log in the slave machine. >> >>>> >> >>>> [2019-06-05 08:49:10.632583] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req >> >>>> >> >>>> The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-06-05 08:49:10.632583] and [2019-06-05 08:49:10.670863] >> >>>> >> >>>> The message "I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req" repeated 34 times between [2019-06-05 08:48:41.005398] and [2019-06-05 08:50:37.254063] >> >>>> >> >>>> The message "E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file" repeated 34 times between [2019-06-05 08:48:41.005434] and [2019-06-05 08:50:37.254079] >> >>>> >> >>>> The message "W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory]" repeated 34 times between [2019-06-05 08:48:41.005444] and [2019-06-05 08:50:37.254080] >> >>>> >> >>>> [2019-06-05 08:50:46.361347] I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req >> >>>> >> >>>> [2019-06-05 08:50:46.361384] E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file >> >>>> >> >>>> [2019-06-05 08:50:46.361419] W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory] >> >>>> >> >>>> The message "I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req" repeated 33 times between [2019-06-05 08:50:46.361347] and [2019-06-05 08:52:34.019741] >> >>>> >> >>>> The message "E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file" repeated 33 times between [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] >> >>>> >> >>>> The message "W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory]" repeated 33 times between [2019-06-05 08:50:46.361419] and [2019-06-05 08:52:34.019758] >> >>>> >> >>>> [2019-06-05 08:52:44.426839] I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req >> >>>> >> >>>> [2019-06-05 08:52:44.426886] E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file >> >>>> >> >>>> [2019-06-05 08:52:44.426896] W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory] >> >>> >> >>> >> >>> On Wed, Jun 5, 2019 at 1:06 AM deepu srinivasan wrote: >> >>>> >> >>>> Thankyou Kotresh >> >>>> >> >>>> On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar wrote: >> >>>>> >> >>>>> Ccing Sunny, who was investing similar issue. >> >>>>> >> >>>>> On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan wrote: >> >>>>>> >> >>>>>> Have already added the path in bashrc . Still in faulty state >> >>>>>> >> >>>>>> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar wrote: >> >>>>>>> >> >>>>>>> could you please try adding /usr/sbin to $PATH for user 'sas'? If it's bash, add 'export PATH=/usr/sbin:$PATH' in >> >>>>>>> /home/sas/.bashrc >> >>>>>>> >> >>>>>>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan wrote: >> >>>>>>>> >> >>>>>>>> Hi Kortesh >> >>>>>>>> Please find the logs of the above error >> >>>>>>>> Master log snippet >> >>>>>>>>> >> >>>>>>>>> [2019-06-04 11:52:09.254731] I [resource(worker /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing SSH connection between master and slave... >> >>>>>>>>> [2019-06-04 11:52:09.308923] D [repce(worker /home/sas/gluster/data/code-misc):196:push] RepceClient: call 89724:139652759443264:1559649129.31 __repce_version__() ... >> >>>>>>>>> [2019-06-04 11:52:09.602792] E [syncdutils(worker /home/sas/gluster/data/code-misc):311:log_raise_exception] : connection to peer is broken >> >>>>>>>>> [2019-06-04 11:52:09.603312] E [syncdutils(worker /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ 192.168.185.107::code-misc --master-node 192.168.185.106 --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 --slave-log-level DEBUG --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbin error=1 >> >>>>>>>>> [2019-06-04 11:52:09.614996] I [repce(agent /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating on reaching EOF. >> >>>>>>>>> [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor: worker(/home/sas/gluster/data/code-misc) connected >> >>>>>>>>> [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc >> >>>>>>>>> [2019-06-04 11:52:09.619391] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Slave log snippet >> >>>>>>>>> >> >>>>>>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >> >>>>>>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf >> >>>>>>>>> [2019-06-04 11:50:11.201070] I [resource(slave 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER: Mounting gluster volume locally... >> >>>>>>>>> [2019-06-04 11:50:11.271231] E [resource(slave 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] MountbrokerMounter: glusterd answered mnt= >> >>>>>>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error cmd=/usr/sbin/gluster --remote-host=localhost system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >> >>>>>>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan wrote: >> >>>>>>>>> >> >>>>>>>>> Hi >> >>>>>>>>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But the Geo replication failed to start. >> >>>>>>>>> Stays in faulty state >> >>>>>>>>> >> >>>>>>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan wrote: >> >>>>>>>>>> >> >>>>>>>>>> Checked the data. It remains in 2708. No progress. >> >>>>>>>>>> >> >>>>>>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar wrote: >> >>>>>>>>>>> >> >>>>>>>>>>> That means it could be working and the defunct process might be some old zombie one. Could you check, that data progress ? >> >>>>>>>>>>> >> >>>>>>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan wrote: >> >>>>>>>>>>>> >> >>>>>>>>>>>> Hi >> >>>>>>>>>>>> When i change the rsync option the rsync process doesnt seem to start . Only a defunt process is listed in ps aux. Only when i set rsync option to " " and restart all the process the rsync process is listed in ps aux. >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar wrote: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Yes, rsync config option should have fixed this issue. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Could you share the output of the following? >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> 1. gluster volume geo-replication :: config rsync-options >> >>>>>>>>>>>>> 2. ps -ef | grep rsync >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan wrote: >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Done. >> >>>>>>>>>>>>>> We got the following result . >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" failed: No such file or directory (2)", 128 >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> seems like a file is missing ? >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar wrote: >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Hi, >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Could you take the strace with with more string size? The argument strings are truncated. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> strace -s 500 -ttt -T -p >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan wrote: >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Hi Kotresh >> >>>>>>>>>>>>>>>> The above-mentioned work around did not work properly. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan wrote: >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Hi Kotresh >> >>>>>>>>>>>>>>>>> We have tried the above-mentioned rsync option and we are planning to have the version upgrade to 6.0. >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar wrote: >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Hi, >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> This looks like the hang because stderr buffer filled up with errors messages and no one reading it. >> >>>>>>>>>>>>>>>>>> I think this issue is fixed in latest releases. As a workaround, you can do following and check if it works. >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Prerequisite: >> >>>>>>>>>>>>>>>>>> rsync version should be > 3.1.0 >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Workaround: >> >>>>>>>>>>>>>>>>>> gluster volume geo-replication :: config rsync-options "--ignore-missing-args" >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Thanks, >> >>>>>>>>>>>>>>>>>> Kotresh HR >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan wrote: >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Hi >> >>>>>>>>>>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs one is in US west and one is in US east. We took multiple trials for different file size. >> >>>>>>>>>>>>>>>>>>> The Geo Replication tends to stop replicating but while checking the status it appears to be in Active state. But the slave volume did not increase in size. >> >>>>>>>>>>>>>>>>>>> So we have restarted the geo-replication session and checked the status. The status was in an active state and it was in History Crawl for a long time. We have enabled the DEBUG mode in logging and checked for any error. >> >>>>>>>>>>>>>>>>>>> There was around 2000 file appeared for syncing candidate. The Rsync process starts but the rsync did not happen in the slave volume. Every time the rsync process appears in the "ps auxxx" list but the replication did not happen in the slave end. What would be the cause of this problem? Is there anyway to debug it? >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> We have also checked the strace of the rync program. >> >>>>>>>>>>>>>>>>>>> it displays something like this >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> We are using the below specs >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Gluster version - 4.1.7 >> >>>>>>>>>>>>>>>>>>> Sync mode - rsync >> >>>>>>>>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) >> >>>>>>>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>> Thanks and Regards, >> >>>>>>>>>>>>>>>>>> Kotresh H R >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>> Thanks and Regards, >> >>>>>>>>>>>>>>> Kotresh H R >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> -- >> >>>>>>>>>>>>> Thanks and Regards, >> >>>>>>>>>>>>> Kotresh H R >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> -- >> >>>>>>>>>>> Thanks and Regards, >> >>>>>>>>>>> Kotresh H R >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> Thanks and Regards, >> >>>>>>> Kotresh H R >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> Thanks and Regards, >> >>>>> Kotresh H R >> > >> > >> > >> > -- >> > Thanks and Regards, >> > Kotresh H R From sunkumar at redhat.com Thu Jun 6 11:38:31 2019 From: sunkumar at redhat.com (Sunny Kumar) Date: Thu, 6 Jun 2019 17:08:31 +0530 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Whats current trackback please share. -Sunny On Thu, Jun 6, 2019 at 4:53 PM deepu srinivasan wrote: > > Hi Sunny > I have changed the file in /usr/libexec/glusterfs/peer_mountbroker.py as mentioned in the patch. > Now the "gluster-mountbroker status" command is working fine. But the geo-replication seems to be in the faulty state still. > > > Thankyou > Deepak > > On Thu, Jun 6, 2019 at 4:10 PM Sunny Kumar wrote: >> >> Above error can be tracked here: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1709248 >> >> and patch link: >> https://review.gluster.org/#/c/glusterfs/+/22716/ >> >> You can apply patch and test it however its waiting on regression to >> pass and merge. >> >> -Sunny >> >> >> On Thu, Jun 6, 2019 at 4:00 PM deepu srinivasan wrote: >> > >> > Hi >> > I have followed the following steps to create the geo-replication but the status seems to be in a faulty state. >> > >> > Steps : >> > >> > Installed cluster version 5.6 in totally six nodes. >> >> >> >> glusterfs 5.6 >> >> >> >> Repository revision: git://git.gluster.org/glusterfs.git >> >> >> >> Copyright (c) 2006-2016 Red Hat, Inc. >> >> >> >> GlusterFS comes with ABSOLUTELY NO WARRANTY. >> >> >> >> It is licensed to you under your choice of the GNU Lesser >> >> >> >> General Public License, version 3 or any later version (LGPLv3 >> >> >> >> or later), or the GNU General Public License, version 2 (GPLv2), >> >> >> >> in all cases as published by the Free Software Foundation >> > >> > >> > peer_probed the first three nodes and second three nodes. >> > >> > >> > >> > Added new volume in both the clusters >> > >> > >> > >> > execute gluster-mountbroker commands and restarted glusterd. >> >> >> >> gluster-mountbroker setup /var/mountbroker-root sas >> >> >> >> gluster-mountbroker remove --volume code-misc --user sas >> > >> > >> > configured a passwordless sssh from master to slave >> >> >> >> ssh-keygen; ssh-copy-id sas at 192.168.185.107 >> > >> > created a common pem pub file >> >> >> >> gluster system:: execute gsec_create >> > >> > created geo-replication session. >> >> >> >> gluster volume geo-replication code-misc sas at 192.168.185.107::code-misc create push-pem >> > >> > executed the following command in slave >> >> >> >> /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh sas code-misc code-misc >> > >> > started the gluster geo-replication. >> >> >> >> gluster volume geo-replication code-misc sas at 192.168.185.107::code-misc start >> > >> > >> > Now the geo-replication works fine. >> > Tested with 2000 files All seems to sync finely. >> > >> > Now I updated all the node to version 6.2 by using rpms which were built by the source code in a docker container in my personal machine. >> > >> > >> >> gluster --version >> >> >> >> glusterfs 6.2 >> >> >> >> Repository revision: git://git.gluster.org/glusterfs.git >> >> >> >> Copyright (c) 2006-2016 Red Hat, Inc. >> >> >> >> GlusterFS comes with ABSOLUTELY NO WARRANTY. >> >> >> >> It is licensed to you under your choice of the GNU Lesser >> >> >> >> General Public License, version 3 or any later version (LGPLv3 >> >> >> >> or later), or the GNU General Public License, version 2 (GPLv2), >> >> >> >> in all cases as published by the Free Software Foundation. >> > >> > >> > I have stopped the glusterd daemons in all the node along with the volume and geo-replication. >> > Now I started the daemons, volume and geo-replication session the status seems to be faulty. >> > Also noted that the result of "gluster-mountbroker status" command always end in python exception like this >> >> >> >> Traceback (most recent call last): >> >> >> >> File "/usr/sbin/gluster-mountbroker", line 396, in >> >> >> >> runcli() >> >> >> >> File "/usr/lib/python2.7/site-packages/gluster/cliutils/cliutils.py", line 225, in runcli >> >> >> >> cls.run(args) >> >> >> >> File "/usr/sbin/gluster-mountbroker", line 275, in run >> >> >> >> out = execute_in_peers("node-status") >> >> >> >> File "/usr/lib/python2.7/site-packages/gluster/cliutils/cliutils.py", line 127, in execute_in_peers >> >> >> >> raise GlusterCmdException((rc, out, err, " ".join(cmd))) >> >> >> >> gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to end. Error : Success\n', 'gluster system:: execute mountbroker.py node-status') >> > >> > >> > Is it I or everyone gets an error for gluster-mountbroker command for gluster version greater than 6.0?. Please help. >> > >> > Thank you >> > Deepak >> > >> > >> > On Thu, Jun 6, 2019 at 10:35 AM Sunny Kumar wrote: >> >> >> >> Hi, >> >> >> >> Updated link for documentation : >> >> >> >> -- https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ >> >> >> >> You can use this tool as well: >> >> http://aravindavk.in/blog/gluster-georep-tools/ >> >> >> >> -Sunny >> >> >> >> On Thu, Jun 6, 2019 at 10:29 AM Kotresh Hiremath Ravishankar >> >> wrote: >> >> > >> >> > Hi, >> >> > >> >> > I think the steps to setup non-root geo-rep is not followed properly. The following entry is missing in glusterd vol file which is required. >> >> > >> >> > The message "E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file" repeated 33 times between [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] >> >> > >> >> > Could you please the steps from below? >> >> > >> >> > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/administration_guide/index#Setting_Up_the_Environment_for_a_Secure_Geo-replication_Slave >> >> > >> >> > And let us know if you still face the issue. >> >> > >> >> > >> >> > >> >> > >> >> > On Thu, Jun 6, 2019 at 10:24 AM deepu srinivasan wrote: >> >> >> >> >> >> Hi Kotresh, Sunny >> >> >> I Have mailed the logs I found in one of the slave machines. Is there anything to do with permission? Please help. >> >> >> >> >> >> On Wed, Jun 5, 2019 at 2:28 PM deepu srinivasan wrote: >> >> >>> >> >> >>> Hi Kotresh, Sunny >> >> >>> Found this log in the slave machine. >> >> >>>> >> >> >>>> [2019-06-05 08:49:10.632583] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req >> >> >>>> >> >> >>>> The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-06-05 08:49:10.632583] and [2019-06-05 08:49:10.670863] >> >> >>>> >> >> >>>> The message "I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req" repeated 34 times between [2019-06-05 08:48:41.005398] and [2019-06-05 08:50:37.254063] >> >> >>>> >> >> >>>> The message "E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file" repeated 34 times between [2019-06-05 08:48:41.005434] and [2019-06-05 08:50:37.254079] >> >> >>>> >> >> >>>> The message "W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory]" repeated 34 times between [2019-06-05 08:48:41.005444] and [2019-06-05 08:50:37.254080] >> >> >>>> >> >> >>>> [2019-06-05 08:50:46.361347] I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req >> >> >>>> >> >> >>>> [2019-06-05 08:50:46.361384] E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file >> >> >>>> >> >> >>>> [2019-06-05 08:50:46.361419] W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory] >> >> >>>> >> >> >>>> The message "I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req" repeated 33 times between [2019-06-05 08:50:46.361347] and [2019-06-05 08:52:34.019741] >> >> >>>> >> >> >>>> The message "E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file" repeated 33 times between [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] >> >> >>>> >> >> >>>> The message "W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory]" repeated 33 times between [2019-06-05 08:50:46.361419] and [2019-06-05 08:52:34.019758] >> >> >>>> >> >> >>>> [2019-06-05 08:52:44.426839] I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req >> >> >>>> >> >> >>>> [2019-06-05 08:52:44.426886] E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file >> >> >>>> >> >> >>>> [2019-06-05 08:52:44.426896] W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory] >> >> >>> >> >> >>> >> >> >>> On Wed, Jun 5, 2019 at 1:06 AM deepu srinivasan wrote: >> >> >>>> >> >> >>>> Thankyou Kotresh >> >> >>>> >> >> >>>> On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar wrote: >> >> >>>>> >> >> >>>>> Ccing Sunny, who was investing similar issue. >> >> >>>>> >> >> >>>>> On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan wrote: >> >> >>>>>> >> >> >>>>>> Have already added the path in bashrc . Still in faulty state >> >> >>>>>> >> >> >>>>>> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar wrote: >> >> >>>>>>> >> >> >>>>>>> could you please try adding /usr/sbin to $PATH for user 'sas'? If it's bash, add 'export PATH=/usr/sbin:$PATH' in >> >> >>>>>>> /home/sas/.bashrc >> >> >>>>>>> >> >> >>>>>>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan wrote: >> >> >>>>>>>> >> >> >>>>>>>> Hi Kortesh >> >> >>>>>>>> Please find the logs of the above error >> >> >>>>>>>> Master log snippet >> >> >>>>>>>>> >> >> >>>>>>>>> [2019-06-04 11:52:09.254731] I [resource(worker /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing SSH connection between master and slave... >> >> >>>>>>>>> [2019-06-04 11:52:09.308923] D [repce(worker /home/sas/gluster/data/code-misc):196:push] RepceClient: call 89724:139652759443264:1559649129.31 __repce_version__() ... >> >> >>>>>>>>> [2019-06-04 11:52:09.602792] E [syncdutils(worker /home/sas/gluster/data/code-misc):311:log_raise_exception] : connection to peer is broken >> >> >>>>>>>>> [2019-06-04 11:52:09.603312] E [syncdutils(worker /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ 192.168.185.107::code-misc --master-node 192.168.185.106 --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 --slave-log-level DEBUG --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbin error=1 >> >> >>>>>>>>> [2019-06-04 11:52:09.614996] I [repce(agent /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating on reaching EOF. >> >> >>>>>>>>> [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor: worker(/home/sas/gluster/data/code-misc) connected >> >> >>>>>>>>> [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc >> >> >>>>>>>>> [2019-06-04 11:52:09.619391] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> Slave log snippet >> >> >>>>>>>>> >> >> >>>>>>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >> >> >>>>>>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf >> >> >>>>>>>>> [2019-06-04 11:50:11.201070] I [resource(slave 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER: Mounting gluster volume locally... >> >> >>>>>>>>> [2019-06-04 11:50:11.271231] E [resource(slave 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] MountbrokerMounter: glusterd answered mnt= >> >> >>>>>>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error cmd=/usr/sbin/gluster --remote-host=localhost system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >> >> >>>>>>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan wrote: >> >> >>>>>>>>> >> >> >>>>>>>>> Hi >> >> >>>>>>>>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But the Geo replication failed to start. >> >> >>>>>>>>> Stays in faulty state >> >> >>>>>>>>> >> >> >>>>>>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan wrote: >> >> >>>>>>>>>> >> >> >>>>>>>>>> Checked the data. It remains in 2708. No progress. >> >> >>>>>>>>>> >> >> >>>>>>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar wrote: >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> That means it could be working and the defunct process might be some old zombie one. Could you check, that data progress ? >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan wrote: >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> Hi >> >> >>>>>>>>>>>> When i change the rsync option the rsync process doesnt seem to start . Only a defunt process is listed in ps aux. Only when i set rsync option to " " and restart all the process the rsync process is listed in ps aux. >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar wrote: >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> Yes, rsync config option should have fixed this issue. >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> Could you share the output of the following? >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> 1. gluster volume geo-replication :: config rsync-options >> >> >>>>>>>>>>>>> 2. ps -ef | grep rsync >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan wrote: >> >> >>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>> Done. >> >> >>>>>>>>>>>>>> We got the following result . >> >> >>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" failed: No such file or directory (2)", 128 >> >> >>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>> seems like a file is missing ? >> >> >>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar wrote: >> >> >>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>> Hi, >> >> >>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>> Could you take the strace with with more string size? The argument strings are truncated. >> >> >>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>> strace -s 500 -ttt -T -p >> >> >>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan wrote: >> >> >>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>> Hi Kotresh >> >> >>>>>>>>>>>>>>>> The above-mentioned work around did not work properly. >> >> >>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan wrote: >> >> >>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>> Hi Kotresh >> >> >>>>>>>>>>>>>>>>> We have tried the above-mentioned rsync option and we are planning to have the version upgrade to 6.0. >> >> >>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar wrote: >> >> >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> Hi, >> >> >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> This looks like the hang because stderr buffer filled up with errors messages and no one reading it. >> >> >>>>>>>>>>>>>>>>>> I think this issue is fixed in latest releases. As a workaround, you can do following and check if it works. >> >> >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> Prerequisite: >> >> >>>>>>>>>>>>>>>>>> rsync version should be > 3.1.0 >> >> >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> Workaround: >> >> >>>>>>>>>>>>>>>>>> gluster volume geo-replication :: config rsync-options "--ignore-missing-args" >> >> >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> Thanks, >> >> >>>>>>>>>>>>>>>>>> Kotresh HR >> >> >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan wrote: >> >> >>>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>>> Hi >> >> >>>>>>>>>>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs one is in US west and one is in US east. We took multiple trials for different file size. >> >> >>>>>>>>>>>>>>>>>>> The Geo Replication tends to stop replicating but while checking the status it appears to be in Active state. But the slave volume did not increase in size. >> >> >>>>>>>>>>>>>>>>>>> So we have restarted the geo-replication session and checked the status. The status was in an active state and it was in History Crawl for a long time. We have enabled the DEBUG mode in logging and checked for any error. >> >> >>>>>>>>>>>>>>>>>>> There was around 2000 file appeared for syncing candidate. The Rsync process starts but the rsync did not happen in the slave volume. Every time the rsync process appears in the "ps auxxx" list but the replication did not happen in the slave end. What would be the cause of this problem? Is there anyway to debug it? >> >> >>>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>>> We have also checked the strace of the rync program. >> >> >>>>>>>>>>>>>>>>>>> it displays something like this >> >> >>>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >> >> >>>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>>> We are using the below specs >> >> >>>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>>> Gluster version - 4.1.7 >> >> >>>>>>>>>>>>>>>>>>> Sync mode - rsync >> >> >>>>>>>>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) >> >> >>>>>>>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig >> >> >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>>>> -- >> >> >>>>>>>>>>>>>>>>>> Thanks and Regards, >> >> >>>>>>>>>>>>>>>>>> Kotresh H R >> >> >>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>> -- >> >> >>>>>>>>>>>>>>> Thanks and Regards, >> >> >>>>>>>>>>>>>>> Kotresh H R >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> >> >> >>>>>>>>>>>>> -- >> >> >>>>>>>>>>>>> Thanks and Regards, >> >> >>>>>>>>>>>>> Kotresh H R >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> -- >> >> >>>>>>>>>>> Thanks and Regards, >> >> >>>>>>>>>>> Kotresh H R >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> -- >> >> >>>>>>> Thanks and Regards, >> >> >>>>>>> Kotresh H R >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> -- >> >> >>>>> Thanks and Regards, >> >> >>>>> Kotresh H R >> >> > >> >> > >> >> > >> >> > -- >> >> > Thanks and Regards, >> >> > Kotresh H R From sunkumar at redhat.com Thu Jun 6 12:52:47 2019 From: sunkumar at redhat.com (Sunny Kumar) Date: Thu, 6 Jun 2019 18:22:47 +0530 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: You should not have used this one: > > gluster-mountbroker remove --volume code-misc --user sas -- This one is to remove volume/user from mount broker. Please try setting up mount broker once again. -Sunny On Thu, Jun 6, 2019 at 5:28 PM deepu srinivasan wrote: > > Hi Sunny > Please find the logs attached >> >> The message "E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file" repeated 13 times between [2019-06-06 11:51:43.986788] and [2019-06-06 11:52:32.764546] >> >> The message "W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory]" repeated 13 times between [2019-06-06 11:51:43.986798] and [2019-06-06 11:52:32.764548] >> >> The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-06-06 11:53:07.064332] and [2019-06-06 11:53:07.303978] >> >> [2019-06-06 11:55:35.624320] I [MSGID: 106495] [glusterd-handler.c:3137:__glusterd_handle_getwd] 0-glusterd: Received getwd req >> >> [2019-06-06 11:55:35.884345] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: quotad already stopped >> >> [2019-06-06 11:55:35.884373] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: quotad service is stopped >> >> [2019-06-06 11:55:35.884459] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: bitd already stopped >> >> [2019-06-06 11:55:35.884473] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: bitd service is stopped >> >> [2019-06-06 11:55:35.884554] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: scrub already stopped >> >> [2019-06-06 11:55:35.884567] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: scrub service is stopped >> >> [2019-06-06 11:55:35.893823] I [run.c:242:runner_log] (-->/usr/lib64/glusterfs/6.2/xlator/mgmt/glusterd.so(+0xe8e1a) [0x7f7380d60e1a] -->/usr/lib64/glusterfs/6.2/xlator/mgmt/glusterd.so(+0xe88e5) [0x7f7380d608e5] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f738cbc5df5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh --volname=code-misc -o features.read-only=on --gd-workdir=/var/lib/glusterd >> >> [2019-06-06 11:55:35.900465] I [run.c:242:runner_log] (-->/usr/lib64/glusterfs/6.2/xlator/mgmt/glusterd.so(+0xe8e1a) [0x7f7380d60e1a] -->/usr/lib64/glusterfs/6.2/xlator/mgmt/glusterd.so(+0xe88e5) [0x7f7380d608e5] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f738cbc5df5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh --volname=code-misc -o features.read-only=on --gd-workdir=/var/lib/glusterd >> >> [2019-06-06 11:55:43.485284] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req >> >> The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-06-06 11:55:43.485284] and [2019-06-06 11:55:43.512321] >> >> [2019-06-06 11:55:44.055419] I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req >> >> [2019-06-06 11:55:44.055473] E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file >> >> [2019-06-06 11:55:44.055483] W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory] >> >> [2019-06-06 11:55:44.056695] I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req >> >> [2019-06-06 11:55:44.056725] E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file >> >> [2019-06-06 11:55:44.056734] W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory] >> >> [2019-06-06 11:55:44.057522] I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req >> >> [2019-06-06 11:55:44.057552] E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file >> >> [2019-06-06 11:55:44.057562] W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory] >> >> [2019-06-06 11:55:54.655681] I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req >> >> [2019-06-06 11:55:54.655741] E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file >> >> [2019-06-06 11:55:54.655752] W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory] > > > On Thu, Jun 6, 2019 at 5:09 PM Sunny Kumar wrote: >> >> Whats current trackback please share. >> >> -Sunny >> >> >> On Thu, Jun 6, 2019 at 4:53 PM deepu srinivasan wrote: >> > >> > Hi Sunny >> > I have changed the file in /usr/libexec/glusterfs/peer_mountbroker.py as mentioned in the patch. >> > Now the "gluster-mountbroker status" command is working fine. But the geo-replication seems to be in the faulty state still. >> > >> > >> > Thankyou >> > Deepak >> > >> > On Thu, Jun 6, 2019 at 4:10 PM Sunny Kumar wrote: >> >> >> >> Above error can be tracked here: >> >> >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1709248 >> >> >> >> and patch link: >> >> https://review.gluster.org/#/c/glusterfs/+/22716/ >> >> >> >> You can apply patch and test it however its waiting on regression to >> >> pass and merge. >> >> >> >> -Sunny >> >> >> >> >> >> On Thu, Jun 6, 2019 at 4:00 PM deepu srinivasan wrote: >> >> > >> >> > Hi >> >> > I have followed the following steps to create the geo-replication but the status seems to be in a faulty state. >> >> > >> >> > Steps : >> >> > >> >> > Installed cluster version 5.6 in totally six nodes. >> >> >> >> >> >> glusterfs 5.6 >> >> >> >> >> >> Repository revision: git://git.gluster.org/glusterfs.git >> >> >> >> >> >> Copyright (c) 2006-2016 Red Hat, Inc. >> >> >> >> >> >> GlusterFS comes with ABSOLUTELY NO WARRANTY. >> >> >> >> >> >> It is licensed to you under your choice of the GNU Lesser >> >> >> >> >> >> General Public License, version 3 or any later version (LGPLv3 >> >> >> >> >> >> or later), or the GNU General Public License, version 2 (GPLv2), >> >> >> >> >> >> in all cases as published by the Free Software Foundation >> >> > >> >> > >> >> > peer_probed the first three nodes and second three nodes. >> >> > >> >> > >> >> > >> >> > Added new volume in both the clusters >> >> > >> >> > >> >> > >> >> > execute gluster-mountbroker commands and restarted glusterd. >> >> >> >> >> >> gluster-mountbroker setup /var/mountbroker-root sas >> >> >> >> >> >> gluster-mountbroker remove --volume code-misc --user sas >> >> > >> >> > >> >> > configured a passwordless sssh from master to slave >> >> >> >> >> >> ssh-keygen; ssh-copy-id sas at 192.168.185.107 >> >> > >> >> > created a common pem pub file >> >> >> >> >> >> gluster system:: execute gsec_create >> >> > >> >> > created geo-replication session. >> >> >> >> >> >> gluster volume geo-replication code-misc sas at 192.168.185.107::code-misc create push-pem >> >> > >> >> > executed the following command in slave >> >> >> >> >> >> /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh sas code-misc code-misc >> >> > >> >> > started the gluster geo-replication. >> >> >> >> >> >> gluster volume geo-replication code-misc sas at 192.168.185.107::code-misc start >> >> > >> >> > >> >> > Now the geo-replication works fine. >> >> > Tested with 2000 files All seems to sync finely. >> >> > >> >> > Now I updated all the node to version 6.2 by using rpms which were built by the source code in a docker container in my personal machine. >> >> > >> >> > >> >> >> gluster --version >> >> >> >> >> >> glusterfs 6.2 >> >> >> >> >> >> Repository revision: git://git.gluster.org/glusterfs.git >> >> >> >> >> >> Copyright (c) 2006-2016 Red Hat, Inc. >> >> >> >> >> >> GlusterFS comes with ABSOLUTELY NO WARRANTY. >> >> >> >> >> >> It is licensed to you under your choice of the GNU Lesser >> >> >> >> >> >> General Public License, version 3 or any later version (LGPLv3 >> >> >> >> >> >> or later), or the GNU General Public License, version 2 (GPLv2), >> >> >> >> >> >> in all cases as published by the Free Software Foundation. >> >> > >> >> > >> >> > I have stopped the glusterd daemons in all the node along with the volume and geo-replication. >> >> > Now I started the daemons, volume and geo-replication session the status seems to be faulty. >> >> > Also noted that the result of "gluster-mountbroker status" command always end in python exception like this >> >> >> >> >> >> Traceback (most recent call last): >> >> >> >> >> >> File "/usr/sbin/gluster-mountbroker", line 396, in >> >> >> >> >> >> runcli() >> >> >> >> >> >> File "/usr/lib/python2.7/site-packages/gluster/cliutils/cliutils.py", line 225, in runcli >> >> >> >> >> >> cls.run(args) >> >> >> >> >> >> File "/usr/sbin/gluster-mountbroker", line 275, in run >> >> >> >> >> >> out = execute_in_peers("node-status") >> >> >> >> >> >> File "/usr/lib/python2.7/site-packages/gluster/cliutils/cliutils.py", line 127, in execute_in_peers >> >> >> >> >> >> raise GlusterCmdException((rc, out, err, " ".join(cmd))) >> >> >> >> >> >> gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to end. Error : Success\n', 'gluster system:: execute mountbroker.py node-status') >> >> > >> >> > >> >> > Is it I or everyone gets an error for gluster-mountbroker command for gluster version greater than 6.0?. Please help. >> >> > >> >> > Thank you >> >> > Deepak >> >> > >> >> > >> >> > On Thu, Jun 6, 2019 at 10:35 AM Sunny Kumar wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> Updated link for documentation : >> >> >> >> >> >> -- https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ >> >> >> >> >> >> You can use this tool as well: >> >> >> http://aravindavk.in/blog/gluster-georep-tools/ >> >> >> >> >> >> -Sunny >> >> >> >> >> >> On Thu, Jun 6, 2019 at 10:29 AM Kotresh Hiremath Ravishankar >> >> >> wrote: >> >> >> > >> >> >> > Hi, >> >> >> > >> >> >> > I think the steps to setup non-root geo-rep is not followed properly. The following entry is missing in glusterd vol file which is required. >> >> >> > >> >> >> > The message "E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file" repeated 33 times between [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] >> >> >> > >> >> >> > Could you please the steps from below? >> >> >> > >> >> >> > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/administration_guide/index#Setting_Up_the_Environment_for_a_Secure_Geo-replication_Slave >> >> >> > >> >> >> > And let us know if you still face the issue. >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > On Thu, Jun 6, 2019 at 10:24 AM deepu srinivasan wrote: >> >> >> >> >> >> >> >> Hi Kotresh, Sunny >> >> >> >> I Have mailed the logs I found in one of the slave machines. Is there anything to do with permission? Please help. >> >> >> >> >> >> >> >> On Wed, Jun 5, 2019 at 2:28 PM deepu srinivasan wrote: >> >> >> >>> >> >> >> >>> Hi Kotresh, Sunny >> >> >> >>> Found this log in the slave machine. >> >> >> >>>> >> >> >> >>>> [2019-06-05 08:49:10.632583] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req >> >> >> >>>> >> >> >> >>>> The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-06-05 08:49:10.632583] and [2019-06-05 08:49:10.670863] >> >> >> >>>> >> >> >> >>>> The message "I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req" repeated 34 times between [2019-06-05 08:48:41.005398] and [2019-06-05 08:50:37.254063] >> >> >> >>>> >> >> >> >>>> The message "E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file" repeated 34 times between [2019-06-05 08:48:41.005434] and [2019-06-05 08:50:37.254079] >> >> >> >>>> >> >> >> >>>> The message "W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory]" repeated 34 times between [2019-06-05 08:48:41.005444] and [2019-06-05 08:50:37.254080] >> >> >> >>>> >> >> >> >>>> [2019-06-05 08:50:46.361347] I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req >> >> >> >>>> >> >> >> >>>> [2019-06-05 08:50:46.361384] E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file >> >> >> >>>> >> >> >> >>>> [2019-06-05 08:50:46.361419] W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory] >> >> >> >>>> >> >> >> >>>> The message "I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req" repeated 33 times between [2019-06-05 08:50:46.361347] and [2019-06-05 08:52:34.019741] >> >> >> >>>> >> >> >> >>>> The message "E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file" repeated 33 times between [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] >> >> >> >>>> >> >> >> >>>> The message "W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory]" repeated 33 times between [2019-06-05 08:50:46.361419] and [2019-06-05 08:52:34.019758] >> >> >> >>>> >> >> >> >>>> [2019-06-05 08:52:44.426839] I [MSGID: 106496] [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received mount req >> >> >> >>>> >> >> >> >>>> [2019-06-05 08:52:44.426886] E [MSGID: 106061] [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option mountbroker-root' missing in glusterd vol file >> >> >> >>>> >> >> >> >>>> [2019-06-05 08:52:44.426896] W [MSGID: 106176] [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful mount request [No such file or directory] >> >> >> >>> >> >> >> >>> >> >> >> >>> On Wed, Jun 5, 2019 at 1:06 AM deepu srinivasan wrote: >> >> >> >>>> >> >> >> >>>> Thankyou Kotresh >> >> >> >>>> >> >> >> >>>> On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar wrote: >> >> >> >>>>> >> >> >> >>>>> Ccing Sunny, who was investing similar issue. >> >> >> >>>>> >> >> >> >>>>> On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan wrote: >> >> >> >>>>>> >> >> >> >>>>>> Have already added the path in bashrc . Still in faulty state >> >> >> >>>>>> >> >> >> >>>>>> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar wrote: >> >> >> >>>>>>> >> >> >> >>>>>>> could you please try adding /usr/sbin to $PATH for user 'sas'? If it's bash, add 'export PATH=/usr/sbin:$PATH' in >> >> >> >>>>>>> /home/sas/.bashrc >> >> >> >>>>>>> >> >> >> >>>>>>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan wrote: >> >> >> >>>>>>>> >> >> >> >>>>>>>> Hi Kortesh >> >> >> >>>>>>>> Please find the logs of the above error >> >> >> >>>>>>>> Master log snippet >> >> >> >>>>>>>>> >> >> >> >>>>>>>>> [2019-06-04 11:52:09.254731] I [resource(worker /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing SSH connection between master and slave... >> >> >> >>>>>>>>> [2019-06-04 11:52:09.308923] D [repce(worker /home/sas/gluster/data/code-misc):196:push] RepceClient: call 89724:139652759443264:1559649129.31 __repce_version__() ... >> >> >> >>>>>>>>> [2019-06-04 11:52:09.602792] E [syncdutils(worker /home/sas/gluster/data/code-misc):311:log_raise_exception] : connection to peer is broken >> >> >> >>>>>>>>> [2019-06-04 11:52:09.603312] E [syncdutils(worker /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ 192.168.185.107::code-misc --master-node 192.168.185.106 --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 --slave-log-level DEBUG --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbin error=1 >> >> >> >>>>>>>>> [2019-06-04 11:52:09.614996] I [repce(agent /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating on reaching EOF. >> >> >> >>>>>>>>> [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor: worker(/home/sas/gluster/data/code-misc) connected >> >> >> >>>>>>>>> [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc >> >> >> >>>>>>>>> [2019-06-04 11:52:09.619391] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty >> >> >> >>>>>>>> >> >> >> >>>>>>>> >> >> >> >>>>>>>> Slave log snippet >> >> >> >>>>>>>>> >> >> >> >>>>>>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >> >> >> >>>>>>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf >> >> >> >>>>>>>>> [2019-06-04 11:50:11.201070] I [resource(slave 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER: Mounting gluster volume locally... >> >> >> >>>>>>>>> [2019-06-04 11:50:11.271231] E [resource(slave 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] MountbrokerMounter: glusterd answered mnt= >> >> >> >>>>>>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error cmd=/usr/sbin/gluster --remote-host=localhost system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >> >> >> >>>>>>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >> >> >> >>>>>>>> >> >> >> >>>>>>>> >> >> >> >>>>>>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan wrote: >> >> >> >>>>>>>>> >> >> >> >>>>>>>>> Hi >> >> >> >>>>>>>>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But the Geo replication failed to start. >> >> >> >>>>>>>>> Stays in faulty state >> >> >> >>>>>>>>> >> >> >> >>>>>>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan wrote: >> >> >> >>>>>>>>>> >> >> >> >>>>>>>>>> Checked the data. It remains in 2708. No progress. >> >> >> >>>>>>>>>> >> >> >> >>>>>>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar wrote: >> >> >> >>>>>>>>>>> >> >> >> >>>>>>>>>>> That means it could be working and the defunct process might be some old zombie one. Could you check, that data progress ? >> >> >> >>>>>>>>>>> >> >> >> >>>>>>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan wrote: >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> Hi >> >> >> >>>>>>>>>>>> When i change the rsync option the rsync process doesnt seem to start . Only a defunt process is listed in ps aux. Only when i set rsync option to " " and restart all the process the rsync process is listed in ps aux. >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar wrote: >> >> >> >>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>> Yes, rsync config option should have fixed this issue. >> >> >> >>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>> Could you share the output of the following? >> >> >> >>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>> 1. gluster volume geo-replication :: config rsync-options >> >> >> >>>>>>>>>>>>> 2. ps -ef | grep rsync >> >> >> >>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan wrote: >> >> >> >>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>> Done. >> >> >> >>>>>>>>>>>>>> We got the following result . >> >> >> >>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" failed: No such file or directory (2)", 128 >> >> >> >>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>> seems like a file is missing ? >> >> >> >>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar wrote: >> >> >> >>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>> Hi, >> >> >> >>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>> Could you take the strace with with more string size? The argument strings are truncated. >> >> >> >>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>> strace -s 500 -ttt -T -p >> >> >> >>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan wrote: >> >> >> >>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>> Hi Kotresh >> >> >> >>>>>>>>>>>>>>>> The above-mentioned work around did not work properly. >> >> >> >>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan wrote: >> >> >> >>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>> Hi Kotresh >> >> >> >>>>>>>>>>>>>>>>> We have tried the above-mentioned rsync option and we are planning to have the version upgrade to 6.0. >> >> >> >>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar wrote: >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> Hi, >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> This looks like the hang because stderr buffer filled up with errors messages and no one reading it. >> >> >> >>>>>>>>>>>>>>>>>> I think this issue is fixed in latest releases. As a workaround, you can do following and check if it works. >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> Prerequisite: >> >> >> >>>>>>>>>>>>>>>>>> rsync version should be > 3.1.0 >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> Workaround: >> >> >> >>>>>>>>>>>>>>>>>> gluster volume geo-replication :: config rsync-options "--ignore-missing-args" >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> Thanks, >> >> >> >>>>>>>>>>>>>>>>>> Kotresh HR >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan wrote: >> >> >> >>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>> Hi >> >> >> >>>>>>>>>>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs one is in US west and one is in US east. We took multiple trials for different file size. >> >> >> >>>>>>>>>>>>>>>>>>> The Geo Replication tends to stop replicating but while checking the status it appears to be in Active state. But the slave volume did not increase in size. >> >> >> >>>>>>>>>>>>>>>>>>> So we have restarted the geo-replication session and checked the status. The status was in an active state and it was in History Crawl for a long time. We have enabled the DEBUG mode in logging and checked for any error. >> >> >> >>>>>>>>>>>>>>>>>>> There was around 2000 file appeared for syncing candidate. The Rsync process starts but the rsync did not happen in the slave volume. Every time the rsync process appears in the "ps auxxx" list but the replication did not happen in the slave end. What would be the cause of this problem? Is there anyway to debug it? >> >> >> >>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>> We have also checked the strace of the rync program. >> >> >> >>>>>>>>>>>>>>>>>>> it displays something like this >> >> >> >>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >> >> >> >>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>> We are using the below specs >> >> >> >>>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>>> Gluster version - 4.1.7 >> >> >> >>>>>>>>>>>>>>>>>>> Sync mode - rsync >> >> >> >>>>>>>>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) >> >> >> >>>>>>>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>>>>> -- >> >> >> >>>>>>>>>>>>>>>>>> Thanks and Regards, >> >> >> >>>>>>>>>>>>>>>>>> Kotresh H R >> >> >> >>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>>>> -- >> >> >> >>>>>>>>>>>>>>> Thanks and Regards, >> >> >> >>>>>>>>>>>>>>> Kotresh H R >> >> >> >>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>> >> >> >> >>>>>>>>>>>>> -- >> >> >> >>>>>>>>>>>>> Thanks and Regards, >> >> >> >>>>>>>>>>>>> Kotresh H R >> >> >> >>>>>>>>>>> >> >> >> >>>>>>>>>>> >> >> >> >>>>>>>>>>> >> >> >> >>>>>>>>>>> -- >> >> >> >>>>>>>>>>> Thanks and Regards, >> >> >> >>>>>>>>>>> Kotresh H R >> >> >> >>>>>>> >> >> >> >>>>>>> >> >> >> >>>>>>> >> >> >> >>>>>>> -- >> >> >> >>>>>>> Thanks and Regards, >> >> >> >>>>>>> Kotresh H R >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>> -- >> >> >> >>>>> Thanks and Regards, >> >> >> >>>>> Kotresh H R >> >> >> > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > Thanks and Regards, >> >> >> > Kotresh H R From emayoral at arsys.es Thu Jun 6 16:48:02 2019 From: emayoral at arsys.es (Eduardo Mayoral) Date: Thu, 6 Jun 2019 18:48:02 +0200 Subject: [Gluster-users] Advice for setup: SW RAID 6 vs JBOD In-Reply-To: References: Message-ID: Your comment actually helps me more than you think, one of the main doubts I have is whether I go for JOBD with replica 3 or SW RAID 6 with replica2 + arbitrer. Before reading your email I was leaning more towards JOBD, as reconstruction of a moderately big RAID 6 with mdadm can be painful too. Now I see a reconstruct is going to be painful either way... For the record, the workload I am going to migrate is currently 18,314,445 MB and 34,752,784 inodes (which is not exactly the same as files, but let's use that for a rough estimate), for an average file size of about 539 KB per file. Thanks a lot for your time and insights! On 6/6/19 8:53, Hu Bert wrote: > Good morning, > > my comment won't help you directly, but i thought i'd send it anyway... > > Our first glusterfs setup had 3 servers withs 4 disks=bricks (10TB, > JBOD) each. Was running fine in the beginning, but then 1 disk failed. > The following heal took ~1 month, with a bad performance (quite high > IO). Shortly after the heal hat finished another disk failed -> same > problems again. Not funny. > > For our new system we decided to use 3 servers with 10 disks (10 TB) > each, but now the 10 disks in a SW RAID 10 (well, we split the 10 > disks into 2 SW RAID 10, each of them is a brick, we have 2 gluster > volumes). A lot of disk space "wasted", with this type of SW RAID and > a replicate 3 setup, but we wanted to avoid the "healing takes a long > time with bad performance" problems. Now mdadm takes care of > replicating data, glusterfs should always see "good" bricks. > > And the decision may depend on what kind of data you have. Many small > files, like tens of millions? Or not that much, but bigger files? I > once watched a video (i think it was this one: > https://www.youtube.com/watch?v=61HDVwttNYI). Recommendation there: > RAID 6 or 10 for small files, for big files... well, already 2 years > "old" ;-) > > As i said, this won't help you directly. You have to identify what's > most important for your scenario; as you said, high performance is not > an issue - if this is true even when you have slight performance > issues after a disk fail then ok. My experience so far: the bigger and > slower the disks are and the more data you have -> healing will hurt > -> try to avoid this. If the disks are small and fast (SSDs), healing > will be faster -> JBOD is an option. > > > hth, > Hubert > > Am Mi., 5. Juni 2019 um 11:33 Uhr schrieb Eduardo Mayoral : >> Hi, >> >> I am looking into a new gluster deployment to replace an ancient one. >> >> For this deployment I will be using some repurposed servers I >> already have in stock. The disk specs are 12 * 3 TB SATA disks. No HW >> RAID controller. They also have some SSD which would be nice to leverage >> as cache or similar to improve performance, since it is already there. >> Advice on how to leverage the SSDs would be greatly appreciated. >> >> One of the design choices I have to make is using 3 nodes for a >> replica-3 with JBOD, or using 2 nodes with a replica-2 and using SW RAID >> 6 for the disks, maybe adding a 3rd node with a smaller amount of disk >> as metadata node for the replica set. I would love to hear advice on the >> pros and cons of each setup from the gluster experts. >> >> The data will be accessed from 4 to 6 systems with native gluster, >> not sure if that makes any difference. >> >> The amount of data I have to store there is currently 20 TB, with >> moderate growth. iops are quite low so high performance is not an issue. >> The data will fit in any of the two setups. >> >> Thanks in advance for your advice! >> >> -- >> Eduardo Mayoral Jimeno >> Systems engineer, platform department. Arsys Internet. >> emayoral at arsys.es - +34 941 620 105 - ext 2153 >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users -- Eduardo Mayoral Jimeno Systems engineer, platform department. Arsys Internet. emayoral at arsys.es - +34 941 620 105 - ext 2153 From vincent at epicenergy.ca Thu Jun 6 17:07:19 2019 From: vincent at epicenergy.ca (Vincent Royer) Date: Thu, 6 Jun 2019 10:07:19 -0700 Subject: [Gluster-users] Advice for setup: SW RAID 6 vs JBOD In-Reply-To: References: Message-ID: What if you have two fast 2TB SSDs per server in hardware RAID 1, 3 hosts in replica 3. Dual 10gb enterprise nics. This would end up being a single 2TB volume, correct? Seems like that would offer great speed and have pretty decent survivability. On Wed, Jun 5, 2019 at 11:54 PM Hu Bert wrote: > Good morning, > > my comment won't help you directly, but i thought i'd send it anyway... > > Our first glusterfs setup had 3 servers withs 4 disks=bricks (10TB, > JBOD) each. Was running fine in the beginning, but then 1 disk failed. > The following heal took ~1 month, with a bad performance (quite high > IO). Shortly after the heal hat finished another disk failed -> same > problems again. Not funny. > > For our new system we decided to use 3 servers with 10 disks (10 TB) > each, but now the 10 disks in a SW RAID 10 (well, we split the 10 > disks into 2 SW RAID 10, each of them is a brick, we have 2 gluster > volumes). A lot of disk space "wasted", with this type of SW RAID and > a replicate 3 setup, but we wanted to avoid the "healing takes a long > time with bad performance" problems. Now mdadm takes care of > replicating data, glusterfs should always see "good" bricks. > > And the decision may depend on what kind of data you have. Many small > files, like tens of millions? Or not that much, but bigger files? I > once watched a video (i think it was this one: > https://www.youtube.com/watch?v=61HDVwttNYI). Recommendation there: > RAID 6 or 10 for small files, for big files... well, already 2 years > "old" ;-) > > As i said, this won't help you directly. You have to identify what's > most important for your scenario; as you said, high performance is not > an issue - if this is true even when you have slight performance > issues after a disk fail then ok. My experience so far: the bigger and > slower the disks are and the more data you have -> healing will hurt > -> try to avoid this. If the disks are small and fast (SSDs), healing > will be faster -> JBOD is an option. > > > hth, > Hubert > > Am Mi., 5. Juni 2019 um 11:33 Uhr schrieb Eduardo Mayoral < > emayoral at arsys.es>: > > > > Hi, > > > > I am looking into a new gluster deployment to replace an ancient one. > > > > For this deployment I will be using some repurposed servers I > > already have in stock. The disk specs are 12 * 3 TB SATA disks. No HW > > RAID controller. They also have some SSD which would be nice to leverage > > as cache or similar to improve performance, since it is already there. > > Advice on how to leverage the SSDs would be greatly appreciated. > > > > One of the design choices I have to make is using 3 nodes for a > > replica-3 with JBOD, or using 2 nodes with a replica-2 and using SW RAID > > 6 for the disks, maybe adding a 3rd node with a smaller amount of disk > > as metadata node for the replica set. I would love to hear advice on the > > pros and cons of each setup from the gluster experts. > > > > The data will be accessed from 4 to 6 systems with native gluster, > > not sure if that makes any difference. > > > > The amount of data I have to store there is currently 20 TB, with > > moderate growth. iops are quite low so high performance is not an issue. > > The data will fit in any of the two setups. > > > > Thanks in advance for your advice! > > > > -- > > Eduardo Mayoral Jimeno > > Systems engineer, platform department. Arsys Internet. > > emayoral at arsys.es - +34 941 620 105 - ext 2153 > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From emayoral at arsys.es Thu Jun 6 17:20:51 2019 From: emayoral at arsys.es (Eduardo Mayoral) Date: Thu, 6 Jun 2019 19:20:51 +0200 Subject: [Gluster-users] Advice for setup: SW RAID 6 vs JBOD In-Reply-To: References: Message-ID: <0115a686-2cf5-96d4-7a53-e9725f49da49@arsys.es> Yes to the 10 GB NICS (they are already on the servers). Nice idea with the SSDs, but I do not have a HW RAID card on these servers or the possibility to get / install one. What I do have is an extra SSD disk per server which I plan to use as LVM cache for the bricks (Maybe just 1 disk, maybe 2 with SW RAID 1). I still need to test how LVM / gluster are going to handle the failure of the cache disk. Thanks! On 6/6/19 19:07, Vincent Royer wrote: > What if you have two fast 2TB SSDs per server in hardware RAID 1, 3 > hosts in replica 3.? Dual 10gb enterprise nics.? This would end up > being a single 2TB volume, correct?? Seems like that would offer great > speed and have pretty decent survivability.? > > On Wed, Jun 5, 2019 at 11:54 PM Hu Bert > wrote: > > Good morning, > > my comment won't help you directly, but i thought i'd send it > anyway... > > Our first glusterfs setup had 3 servers withs 4 disks=bricks (10TB, > JBOD) each. Was running fine in the beginning, but then 1 disk failed. > The following heal took ~1 month, with a bad performance (quite high > IO). Shortly after the heal hat finished another disk failed -> same > problems again. Not funny. > > For our new system we decided to use 3 servers with 10 disks (10 TB) > each, but now the 10 disks in a SW RAID 10 (well, we split the 10 > disks into 2 SW RAID 10, each of them is a brick, we have 2 gluster > volumes). A lot of disk space "wasted", with this type of SW RAID and > a replicate 3 setup, but we wanted to avoid the "healing takes a long > time with bad performance" problems. Now mdadm takes care of > replicating data, glusterfs should always see "good" bricks. > > And the decision may depend on what kind of data you have. Many small > files, like tens of millions? Or not that much, but bigger files? I > once watched a video (i think it was this one: > https://www.youtube.com/watch?v=61HDVwttNYI). Recommendation there: > RAID 6 or 10 for small files, for big files... well, already 2 years > "old" ;-) > > As i said, this won't help you directly. You have to identify what's > most important for your scenario; as you said, high performance is not > an issue - if this is true even when you have slight performance > issues after a disk fail then ok. My experience so far: the bigger and > slower the disks are and the more data you have -> healing will hurt > -> try to avoid this. If the disks are small and fast (SSDs), healing > will be faster -> JBOD is an option. > > > hth, > Hubert > > Am Mi., 5. Juni 2019 um 11:33 Uhr schrieb Eduardo Mayoral > >: > > > > Hi, > > > >? ? ?I am looking into a new gluster deployment to replace an > ancient one. > > > >? ? ?For this deployment I will be using some repurposed servers I > > already have in stock. The disk specs are 12 * 3 TB SATA disks. > No HW > > RAID controller. They also have some SSD which would be nice to > leverage > > as cache or similar to improve performance, since it is already > there. > > Advice on how to leverage the SSDs would be greatly appreciated. > > > >? ? ?One of the design choices I have to make is using 3 nodes for a > > replica-3 with JBOD, or using 2 nodes with a replica-2 and using > SW RAID > > 6 for the disks, maybe adding a 3rd node with a smaller amount > of disk > > as metadata node for the replica set. I would love to hear > advice on the > > pros and cons of each setup from the gluster experts. > > > >? ? ?The data will be accessed from 4 to 6 systems with native > gluster, > > not sure if that makes any difference. > > > >? ? ?The amount of data I have to store there is currently 20 TB, > with > > moderate growth. iops are quite low so high performance is not > an issue. > > The data will fit in any of the two setups. > > > >? ? ?Thanks in advance for your advice! > > > > -- > > Eduardo Mayoral Jimeno > > Systems engineer, platform department. Arsys Internet. > > emayoral at arsys.es - +34 941 620 105 - > ext 2153 > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -- Eduardo Mayoral Jimeno Systems engineer, platform department. Arsys Internet. emayoral at arsys.es - +34 941 620 105 - ext 2153 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at michael-metz.de Thu Jun 6 18:46:11 2019 From: mail at michael-metz.de (Michael Metz-Martini) Date: Thu, 6 Jun 2019 20:46:11 +0200 Subject: [Gluster-users] Advice for setup: SW RAID 6 vs JBOD In-Reply-To: References: Message-ID: Hi Am 06.06.19 um 18:48 schrieb Eduardo Mayoral: > Your comment actually helps me more than you think, one of the main > doubts I have is whether I go for JOBD with replica 3 or SW RAID 6 with > replica2 + arbitrer. Before reading your email I was leaning more > towards JOBD, as reconstruction of a moderately big RAID 6 with mdadm > can be painful too. Now I see a reconstruct is going to be painful > either way... > > For the record, the workload I am going to migrate is currently > 18,314,445 MB and 34,752,784 inodes (which is not exactly the same as > files, but let's use that for a rough estimate), for an average file > size of about 539 KB per file. > > Thanks a lot for your time and insights! Currently we're hosting ~200 TB split into about 3.500.000.000 files on a Distributed-Replicate-2-gluster volume with each brick running on a hw-raid6 of 8 x 8 TB disks. As we never had a failed drive 'till now I can't tell you something about recovery times but rebalance is damn slow with such high number of small files (so should recovery on jbod-bricks). I think raid-recovery from local disks will be much faster. As our files are nearly 100% readonly and split-brain-issues could be resolevd more or less "easily" we decided against replica 3 in favor of hardware raid6 redundancy. -- Kind regards Michael Metz-Martini From Jim.Shelton at ibm.com Thu Jun 6 19:17:03 2019 From: Jim.Shelton at ibm.com (Jim Shelton) Date: Thu, 6 Jun 2019 14:17:03 -0500 Subject: [Gluster-users] geo-replication session information Message-ID: I need help cleaning up a faulty geo-replication session. I tried deleting all related directories and files. But I am currently in a state such that when I try and recreate the session via gluster> volume geo-replication icp_kube-system_nfs-pvc_69753a58-819f-11e9-b3a0-005056b694b5 root at rmtwrk1::icp_kube-system_nfs-pvc_6ef8d56c-70f6-11e9-b497-005056b667db create ssh-port 2222 push-pem Session between icp_kube-system_nfs-pvc_69753a58-819f-11e9-b3a0-005056b694b5 and rmtwrk1:icp_kube-system_nfs-pvc_6ef8d56c-70f6-11e9-b497-005056b667db is already created! Cannot create with new slave:rmtwrk1 again! geo-replication command failed but if I try and delete it via gluster> volume geo-replication icp_kube-system_nfs-pvc_69753a58-819f-11e9-b3a0-005056b694b5 root at rmtwrk1::icp_kube-system_nfs-pvc_6ef8d56c-70f6-11e9-b497-005056b667db delete Geo-replication session between icp_kube-system_nfs-pvc_69753a58-819f-11e9-b3a0-005056b694b5 and rmtwrk1::icp_kube-system_nfs-pvc_6ef8d56c-70f6-11e9-b497-005056b667db does not exist. geo-replication command failed Is there any way to clean this up? Jim Shelton IT Architect IBM Jim.Shelton at ibm.com 281 910 7914 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Jun 6 19:55:50 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Thu, 6 Jun 2019 19:55:50 +0000 (UTC) Subject: [Gluster-users] Advice for setup: SW RAID 6 vs JBOD In-Reply-To: <0115a686-2cf5-96d4-7a53-e9725f49da49@arsys.es> References: <0115a686-2cf5-96d4-7a53-e9725f49da49@arsys.es> Message-ID: <1185322881.1006889.1559850950810@mail.yahoo.com> >What I do have is an extra SSD disk per server which I plan to use as LVM cache for the bricks (Maybe just 1 disk, maybe 2 >with SW RAID 1). I still need to test how LVM / gluster are going to handle the failure of the cache disk. Are you planing to use LVM cache for reads (writethrough) only or both read + writes (writeback) ?? If you pick the writeback - which means that you first write to the SSDs and only then push the data to the slow HDDs , then you need a raid1 for the SSD-based LVM cache , or a pure replica3 cause if you loose 1 SSD - the whole brick is down and you need to sync from scratch - and with the size of data you have - this will not be so nice. Note: Don't forget that thin LVM is the only way that gluster can use snapshots. Best Regards,Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim.kinney at gmail.com Thu Jun 6 20:06:00 2019 From: jim.kinney at gmail.com (Jim Kinney) Date: Thu, 06 Jun 2019 16:06:00 -0400 Subject: [Gluster-users] Advice for setup: SW RAID 6 vs JBOD In-Reply-To: References: Message-ID: <1e41d8a788686fc73fcce38195d0baca36b34537.camel@gmail.com> I have about 200TB in a gluster replicate only 3-node setup. We stopped using hardware RAID6 after the third drive failed on one array at the same time we replaced the other two and before recovery could complete. 200TB is a mess to resync. So now each hard drive is a single entity. We add 1 drive to each node as it's own PV in gluster (with LUKS encryption). Each brick is mounted into the final tree on the client end. This way our recover is usually just a single drive to sync. With replica 3, we keep quorum if one brick fails. No RAID cards. Just big, multipath SAS JBOD arrays. The server head on each array is pretty beefy (24 cores, 128GB RAM, 40G IB, 40G Ethernet). On Thu, 2019-06-06 at 20:46 +0200, Michael Metz-Martini wrote: > Hi > Am 06.06.19 um 18:48 schrieb Eduardo Mayoral: > > Your comment actually helps me more than you think, one of the > > maindoubts I have is whether I go for JOBD with replica 3 or SW > > RAID 6 withreplica2 + arbitrer. Before reading your email I was > > leaning moretowards JOBD, as reconstruction of a moderately big > > RAID 6 with mdadmcan be painful too. Now I see a reconstruct is > > going to be painfuleither way... > > For the record, the workload I am going to migrate is > > currently18,314,445 MB and 34,752,784 inodes (which is not exactly > > the same asfiles, but let's use that for a rough estimate), for an > > average filesize of about 539 KB per file. > > Thanks a lot for your time and insights! > Currently we're hosting ~200 TB split into about 3.500.000.000 files > ona Distributed-Replicate-2-gluster volume with each brick running on > ahw-raid6 of 8 x 8 TB disks. As we never had a failed drive 'till now > Ican't tell you something about recovery times but rebalance is damn > slowwith such high number of small files (so should recovery onjbod- > bricks). I think raid-recovery from local disks will be much faster. > As our files are nearly 100% readonly and split-brain-issues could > beresolevd more or less "easily" we decided against replica 3 in > favor ofhardware raid6 redundancy. -- James P. Kinney III Every time you stop a school, you will have to build a jail. What you gain at one end you lose at the other. It's like feeding a dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark Twain http://heretothereideas.blogspot.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fusillator at gmail.com Thu Jun 6 20:39:01 2019 From: fusillator at gmail.com (fusillator) Date: Thu, 6 Jun 2019 22:39:01 +0200 Subject: [Gluster-users] healing of disperse volume Message-ID: <297722f5-9257-98c3-b4c0-3caad0cff5e1@gmail.com> Hi all, I'm pretty new to glusterfs, I managed to setup a dispersed volume (4+2) using the release 6.1 from centos repository.. Is it a stable release? Then I forced the volume stop when the application were writing on the mount point.. getting a wanted inconsistent state, I'm wondering what are the best practice to solve this kinds of situation...I just found a detailed explanation about how to solve splitting-head state of replicated volume at https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ but it seems to be not applicable to the disperse volume type. Do I miss to read some important piece of documentation? Please point me to some reference. Here's some command detail: #gluster volume info elastic-volume Volume Name: elastic-volume Type: Disperse Volume ID: 96773fef-c443-465b-a518-6630bcf83397 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (4 + 2) = 6 Transport-type: tcp Bricks: Brick1: dev-netflow01.fineco.it:/data/gfs/lv_elastic/brick1/brick Brick2: dev-netflow02.fineco.it:/data/gfs/lv_elastic/brick1/brick Brick3: dev-netflow03.fineco.it:/data/gfs/lv_elastic/brick1/brick Brick4: dev-netflow04.fineco.it:/data/gfs/lv_elastic/brick1/brick Brick5: dev-netflow05.fineco.it:/data/gfs/lv_elastic/brick1/brick Brick6: dev-netflow06.fineco.it:/data/gfs/lv_elastic/brick1/brick Options Reconfigured: performance.io-cache: off performance.io-thread-count: 64 performance.write-behind-window-size: 100MB performance.cache-size: 1GB nfs.disable: on transport.address-family: inet # gluster volume heal elastic-volume info Brick dev01:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log Status: Connected Number of entries: 12 Brick dev02:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log Status: Connected Number of entries: 12 Brick dev03:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log Status: Connected Number of entries: 12 Brick dev04:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log Status: Connected Number of entries: 12 Brick dev05:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log Status: Connected Number of entries: 12 Brick dev06:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log Status: Connected Number of entries: 12 # gluster volume heal elastic-volume info split-brain Volume elastic-volume is not of type replicate Any advice? Best regards Luca From abhishpaliwal at gmail.com Fri Jun 7 02:43:03 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Fri, 7 Jun 2019 08:13:03 +0530 Subject: [Gluster-users] Memory leak in glusterfs In-Reply-To: References: Message-ID: Hi Nithya, We are having the setup where copying the file to and deleting it from gluster mount point to update the latest file. We noticed due to this having some memory increase in glusterfsd process. To find the memory leak we are using valgrind but didn't get any help. That's why contacted to glusterfs community. Regards, Abhishek On Thu, Jun 6, 2019, 16:08 Nithya Balachandran wrote: > Hi Abhishek, > > I am still not clear as to the purpose of the tests. Can you clarify why > you are using valgrind and why you think there is a memory leak? > > Regards, > Nithya > > On Thu, 6 Jun 2019 at 12:09, ABHISHEK PALIWAL > wrote: > >> Hi Nithya, >> >> Here is the Setup details and test which we are doing as below: >> >> >> One client, two gluster Server. >> The client is writing and deleting one file each 15 minutes by script >> test_v4.15.sh. >> >> IP >> Server side: >> 128.224.98.157 /gluster/gv0/ >> 128.224.98.159 /gluster/gv0/ >> >> Client side: >> 128.224.98.160 /gluster_mount/ >> >> Server side: >> gluster volume create gv0 replica 2 128.224.98.157:/gluster/gv0/ >> 128.224.98.159:/gluster/gv0/ force >> gluster volume start gv0 >> >> root at 128:/tmp/brick/gv0# gluster volume info >> >> Volume Name: gv0 >> Type: Replicate >> Volume ID: 7105a475-5929-4d60-ba23-be57445d97b5 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 2 = 2 >> Transport-type: tcp >> Bricks: >> Brick1: 128.224.98.157:/gluster/gv0 >> Brick2: 128.224.98.159:/gluster/gv0 >> Options Reconfigured: >> transport.address-family: inet >> nfs.disable: on >> performance.client-io-threads: off >> >> exec script: ./ps_mem.py -p 605 -w 61 > log >> root at 128:/# ./ps_mem.py -p 605 >> Private + Shared = RAM used Program >> 23668.0 KiB + 1188.0 KiB = 24856.0 KiB glusterfsd >> --------------------------------- >> 24856.0 KiB >> ================================= >> >> >> Client side: >> mount -t glusterfs -o acl -o resolve-gids 128.224.98.157:gv0 >> /gluster_mount >> >> >> We are using the below script write and delete the file. >> >> *test_v4.15.sh * >> >> Also the below script to see the memory increase whihle the script is >> above script is running in background. >> >> *ps_mem.py* >> >> I am attaching the script files as well as the result got after testing >> the scenario. >> >> On Wed, Jun 5, 2019 at 7:23 PM Nithya Balachandran >> wrote: >> >>> Hi, >>> >>> Writing to a volume should not affect glusterd. The stack you have shown >>> in the valgrind looks like the memory used to initialise the structures >>> glusterd uses and will free only when it is stopped. >>> >>> Can you provide more details to what it is you are trying to test? >>> >>> Regards, >>> Nithya >>> >>> >>> On Tue, 4 Jun 2019 at 15:41, ABHISHEK PALIWAL >>> wrote: >>> >>>> Hi Team, >>>> >>>> Please respond on the issue which I raised. >>>> >>>> Regards, >>>> Abhishek >>>> >>>> On Fri, May 17, 2019 at 2:46 PM ABHISHEK PALIWAL < >>>> abhishpaliwal at gmail.com> wrote: >>>> >>>>> Anyone please reply.... >>>>> >>>>> On Thu, May 16, 2019, 10:49 ABHISHEK PALIWAL >>>>> wrote: >>>>> >>>>>> Hi Team, >>>>>> >>>>>> I upload some valgrind logs from my gluster 5.4 setup. This is >>>>>> writing to the volume every 15 minutes. I stopped glusterd and then copy >>>>>> away the logs. The test was running for some simulated days. They are >>>>>> zipped in valgrind-54.zip. >>>>>> >>>>>> Lots of info in valgrind-2730.log. Lots of possibly lost bytes in >>>>>> glusterfs and even some definitely lost bytes. >>>>>> >>>>>> ==2737== 1,572,880 bytes in 1 blocks are possibly lost in loss record >>>>>> 391 of 391 >>>>>> ==2737== at 0x4C29C25: calloc (in >>>>>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >>>>>> ==2737== by 0xA22485E: ??? (in >>>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>>> ==2737== by 0xA217C94: ??? (in >>>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>>> ==2737== by 0xA21D9F8: ??? (in >>>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>>> ==2737== by 0xA21DED9: ??? (in >>>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>>> ==2737== by 0xA21E685: ??? (in >>>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>>> ==2737== by 0xA1B9D8C: init (in >>>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>>> ==2737== by 0x4E511CE: xlator_init (in >>>>>> /usr/lib64/libglusterfs.so.0.0.1) >>>>>> ==2737== by 0x4E8A2B8: ??? (in /usr/lib64/libglusterfs.so.0.0.1) >>>>>> ==2737== by 0x4E8AAB3: glusterfs_graph_activate (in >>>>>> /usr/lib64/libglusterfs.so.0.0.1) >>>>>> ==2737== by 0x409C35: glusterfs_process_volfp (in >>>>>> /usr/sbin/glusterfsd) >>>>>> ==2737== by 0x409D99: glusterfs_volumes_init (in /usr/sbin/glusterfsd) >>>>>> ==2737== >>>>>> ==2737== LEAK SUMMARY: >>>>>> ==2737== definitely lost: 1,053 bytes in 10 blocks >>>>>> ==2737== indirectly lost: 317 bytes in 3 blocks >>>>>> ==2737== possibly lost: 2,374,971 bytes in 524 blocks >>>>>> ==2737== still reachable: 53,277 bytes in 201 blocks >>>>>> ==2737== suppressed: 0 bytes in 0 blocks >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Regards >>>>>> Abhishek Paliwal >>>>>> >>>>> >>>> >>>> -- >>>> >>>> >>>> >>>> >>>> Regards >>>> Abhishek Paliwal >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Fri Jun 7 03:09:03 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Fri, 7 Jun 2019 08:39:03 +0530 Subject: [Gluster-users] Memory leak in glusterfs In-Reply-To: References: Message-ID: Hi Abhishek, Please use statedumps taken at intervals to determine where the memory is increasing. See [1] for details. Regards, Nithya [1] https://docs.gluster.org/en/latest/Troubleshooting/statedump/ On Fri, 7 Jun 2019 at 08:13, ABHISHEK PALIWAL wrote: > Hi Nithya, > > We are having the setup where copying the file to and deleting it from > gluster mount point to update the latest file. We noticed due to this > having some memory increase in glusterfsd process. > > To find the memory leak we are using valgrind but didn't get any help. > > That's why contacted to glusterfs community. > > Regards, > Abhishek > > On Thu, Jun 6, 2019, 16:08 Nithya Balachandran > wrote: > >> Hi Abhishek, >> >> I am still not clear as to the purpose of the tests. Can you clarify why >> you are using valgrind and why you think there is a memory leak? >> >> Regards, >> Nithya >> >> On Thu, 6 Jun 2019 at 12:09, ABHISHEK PALIWAL >> wrote: >> >>> Hi Nithya, >>> >>> Here is the Setup details and test which we are doing as below: >>> >>> >>> One client, two gluster Server. >>> The client is writing and deleting one file each 15 minutes by script >>> test_v4.15.sh. >>> >>> IP >>> Server side: >>> 128.224.98.157 /gluster/gv0/ >>> 128.224.98.159 /gluster/gv0/ >>> >>> Client side: >>> 128.224.98.160 /gluster_mount/ >>> >>> Server side: >>> gluster volume create gv0 replica 2 128.224.98.157:/gluster/gv0/ >>> 128.224.98.159:/gluster/gv0/ force >>> gluster volume start gv0 >>> >>> root at 128:/tmp/brick/gv0# gluster volume info >>> >>> Volume Name: gv0 >>> Type: Replicate >>> Volume ID: 7105a475-5929-4d60-ba23-be57445d97b5 >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 1 x 2 = 2 >>> Transport-type: tcp >>> Bricks: >>> Brick1: 128.224.98.157:/gluster/gv0 >>> Brick2: 128.224.98.159:/gluster/gv0 >>> Options Reconfigured: >>> transport.address-family: inet >>> nfs.disable: on >>> performance.client-io-threads: off >>> >>> exec script: ./ps_mem.py -p 605 -w 61 > log >>> root at 128:/# ./ps_mem.py -p 605 >>> Private + Shared = RAM used Program >>> 23668.0 KiB + 1188.0 KiB = 24856.0 KiB glusterfsd >>> --------------------------------- >>> 24856.0 KiB >>> ================================= >>> >>> >>> Client side: >>> mount -t glusterfs -o acl -o resolve-gids 128.224.98.157:gv0 >>> /gluster_mount >>> >>> >>> We are using the below script write and delete the file. >>> >>> *test_v4.15.sh * >>> >>> Also the below script to see the memory increase whihle the script is >>> above script is running in background. >>> >>> *ps_mem.py* >>> >>> I am attaching the script files as well as the result got after testing >>> the scenario. >>> >>> On Wed, Jun 5, 2019 at 7:23 PM Nithya Balachandran >>> wrote: >>> >>>> Hi, >>>> >>>> Writing to a volume should not affect glusterd. The stack you have >>>> shown in the valgrind looks like the memory used to initialise the >>>> structures glusterd uses and will free only when it is stopped. >>>> >>>> Can you provide more details to what it is you are trying to test? >>>> >>>> Regards, >>>> Nithya >>>> >>>> >>>> On Tue, 4 Jun 2019 at 15:41, ABHISHEK PALIWAL >>>> wrote: >>>> >>>>> Hi Team, >>>>> >>>>> Please respond on the issue which I raised. >>>>> >>>>> Regards, >>>>> Abhishek >>>>> >>>>> On Fri, May 17, 2019 at 2:46 PM ABHISHEK PALIWAL < >>>>> abhishpaliwal at gmail.com> wrote: >>>>> >>>>>> Anyone please reply.... >>>>>> >>>>>> On Thu, May 16, 2019, 10:49 ABHISHEK PALIWAL >>>>>> wrote: >>>>>> >>>>>>> Hi Team, >>>>>>> >>>>>>> I upload some valgrind logs from my gluster 5.4 setup. This is >>>>>>> writing to the volume every 15 minutes. I stopped glusterd and then copy >>>>>>> away the logs. The test was running for some simulated days. They are >>>>>>> zipped in valgrind-54.zip. >>>>>>> >>>>>>> Lots of info in valgrind-2730.log. Lots of possibly lost bytes in >>>>>>> glusterfs and even some definitely lost bytes. >>>>>>> >>>>>>> ==2737== 1,572,880 bytes in 1 blocks are possibly lost in loss >>>>>>> record 391 of 391 >>>>>>> ==2737== at 0x4C29C25: calloc (in >>>>>>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >>>>>>> ==2737== by 0xA22485E: ??? (in >>>>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>>>> ==2737== by 0xA217C94: ??? (in >>>>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>>>> ==2737== by 0xA21D9F8: ??? (in >>>>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>>>> ==2737== by 0xA21DED9: ??? (in >>>>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>>>> ==2737== by 0xA21E685: ??? (in >>>>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>>>> ==2737== by 0xA1B9D8C: init (in >>>>>>> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) >>>>>>> ==2737== by 0x4E511CE: xlator_init (in >>>>>>> /usr/lib64/libglusterfs.so.0.0.1) >>>>>>> ==2737== by 0x4E8A2B8: ??? (in /usr/lib64/libglusterfs.so.0.0.1) >>>>>>> ==2737== by 0x4E8AAB3: glusterfs_graph_activate (in >>>>>>> /usr/lib64/libglusterfs.so.0.0.1) >>>>>>> ==2737== by 0x409C35: glusterfs_process_volfp (in >>>>>>> /usr/sbin/glusterfsd) >>>>>>> ==2737== by 0x409D99: glusterfs_volumes_init (in >>>>>>> /usr/sbin/glusterfsd) >>>>>>> ==2737== >>>>>>> ==2737== LEAK SUMMARY: >>>>>>> ==2737== definitely lost: 1,053 bytes in 10 blocks >>>>>>> ==2737== indirectly lost: 317 bytes in 3 blocks >>>>>>> ==2737== possibly lost: 2,374,971 bytes in 524 blocks >>>>>>> ==2737== still reachable: 53,277 bytes in 201 blocks >>>>>>> ==2737== suppressed: 0 bytes in 0 blocks >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Regards >>>>>>> Abhishek Paliwal >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>>> >>>>> Regards >>>>> Abhishek Paliwal >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>> >>> -- >>> >>> >>> >>> >>> Regards >>> Abhishek Paliwal >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Fri Jun 7 05:38:04 2019 From: revirii at googlemail.com (Hu Bert) Date: Fri, 7 Jun 2019 07:38:04 +0200 Subject: [Gluster-users] Advice for setup: SW RAID 6 vs JBOD In-Reply-To: References: Message-ID: If i remember correctly: in the video they suggested not to make a RAID 10 too big (i.e. too many (big) disks), because the RAID resync then could take a long time. They didn't mention a limit; on my 3 servers with 2 RAID 10 (1x4 disks, 1x6 disks), no disk failed so far, but there were automatic periodic redundancy checks (mdadm checkarray) which ran for a couple of days, increasing load on the servers and responsiveness of glusterfs on the clients. Almost no one even noticed that mdadm checks were running :-) But if i compare it with our old JBOD setup: after the disk change the heal took about a month, resulting in really poor performance on the client side. As we didn't want to experience that period again -> throw hardware at the problem. Maybe a different setup (10 disks -> 5 RAID 1, building a distribute replicate) would've been even better, but so far we're happy with the current setup. Am Do., 6. Juni 2019 um 18:48 Uhr schrieb Eduardo Mayoral : > > Your comment actually helps me more than you think, one of the main > doubts I have is whether I go for JOBD with replica 3 or SW RAID 6 with > replica2 + arbitrer. Before reading your email I was leaning more > towards JOBD, as reconstruction of a moderately big RAID 6 with mdadm > can be painful too. Now I see a reconstruct is going to be painful > either way... > > For the record, the workload I am going to migrate is currently > 18,314,445 MB and 34,752,784 inodes (which is not exactly the same as > files, but let's use that for a rough estimate), for an average file > size of about 539 KB per file. > > Thanks a lot for your time and insights! > > On 6/6/19 8:53, Hu Bert wrote: > > Good morning, > > > > my comment won't help you directly, but i thought i'd send it anyway... > > > > Our first glusterfs setup had 3 servers withs 4 disks=bricks (10TB, > > JBOD) each. Was running fine in the beginning, but then 1 disk failed. > > The following heal took ~1 month, with a bad performance (quite high > > IO). Shortly after the heal hat finished another disk failed -> same > > problems again. Not funny. > > > > For our new system we decided to use 3 servers with 10 disks (10 TB) > > each, but now the 10 disks in a SW RAID 10 (well, we split the 10 > > disks into 2 SW RAID 10, each of them is a brick, we have 2 gluster > > volumes). A lot of disk space "wasted", with this type of SW RAID and > > a replicate 3 setup, but we wanted to avoid the "healing takes a long > > time with bad performance" problems. Now mdadm takes care of > > replicating data, glusterfs should always see "good" bricks. > > > > And the decision may depend on what kind of data you have. Many small > > files, like tens of millions? Or not that much, but bigger files? I > > once watched a video (i think it was this one: > > https://www.youtube.com/watch?v=61HDVwttNYI). Recommendation there: > > RAID 6 or 10 for small files, for big files... well, already 2 years > > "old" ;-) > > > > As i said, this won't help you directly. You have to identify what's > > most important for your scenario; as you said, high performance is not > > an issue - if this is true even when you have slight performance > > issues after a disk fail then ok. My experience so far: the bigger and > > slower the disks are and the more data you have -> healing will hurt > > -> try to avoid this. If the disks are small and fast (SSDs), healing > > will be faster -> JBOD is an option. > > > > > > hth, > > Hubert > > > > Am Mi., 5. Juni 2019 um 11:33 Uhr schrieb Eduardo Mayoral : > >> Hi, > >> > >> I am looking into a new gluster deployment to replace an ancient one. > >> > >> For this deployment I will be using some repurposed servers I > >> already have in stock. The disk specs are 12 * 3 TB SATA disks. No HW > >> RAID controller. They also have some SSD which would be nice to leverage > >> as cache or similar to improve performance, since it is already there. > >> Advice on how to leverage the SSDs would be greatly appreciated. > >> > >> One of the design choices I have to make is using 3 nodes for a > >> replica-3 with JBOD, or using 2 nodes with a replica-2 and using SW RAID > >> 6 for the disks, maybe adding a 3rd node with a smaller amount of disk > >> as metadata node for the replica set. I would love to hear advice on the > >> pros and cons of each setup from the gluster experts. > >> > >> The data will be accessed from 4 to 6 systems with native gluster, > >> not sure if that makes any difference. > >> > >> The amount of data I have to store there is currently 20 TB, with > >> moderate growth. iops are quite low so high performance is not an issue. > >> The data will fit in any of the two setups. > >> > >> Thanks in advance for your advice! > >> > >> -- > >> Eduardo Mayoral Jimeno > >> Systems engineer, platform department. Arsys Internet. > >> emayoral at arsys.es - +34 941 620 105 - ext 2153 > >> > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -- > Eduardo Mayoral Jimeno > Systems engineer, platform department. Arsys Internet. > emayoral at arsys.es - +34 941 620 105 - ext 2153 > From spisla80 at gmail.com Fri Jun 7 06:24:12 2019 From: spisla80 at gmail.com (David Spisla) Date: Fri, 7 Jun 2019 08:24:12 +0200 Subject: [Gluster-users] [Gluster-devel] Improve stability between SMB/CTDB and Gluster (together with Samba Core Developer) In-Reply-To: References: Message-ID: Hello Guenther, thank you for the update. On next monday there will be a public holiday in germany. So tuesday to friday would be fine . The time suggestions, as mentioned in the last mails, should be suitable. Regards David Am Do., 6. Juni 2019 um 16:58 Uhr schrieb G?nther Deschner < gdeschne at redhat.com>: > Hello, > > just a quick heads-up, during this week pretty much all Samba engineers > are busy attending the SambaXP conference in Germany, in addition there > was a public holiday in India also this week. Not sure about the general > availability tomorrow, I would propose to look for a date maybe next week. > > Thanks, > Guenther > > On 31/05/2019 14:32, David Spisla wrote: > > Hello together, > > > > inorder not to lose the focus for the topic, I make new date suggestions > > for next week > > > > June 03th ? 07th at 12:30 - 14:30 IST or (9:00 - 11:00 CEST) > > > > June 03th ? 06th at 16:30 - 18:30 IST or (13:00 - 15:00 CEST) > > > > > > Regards > > > > David Spisla > > > > > > Am Di., 21. Mai 2019 um 11:24 Uhr schrieb David Spisla > > >: > > > > Hello together, > > > > we are still seeking a day and time to talk about interesting Samba > > / Glusterfs issues. Here is a new list of possible dates and time. > > > > May 22th ? 24th at 12:30 - 14:30 IST or (9:00 - 11:00 CEST) > > > > May 27th ? 29th and 31th at 12:30 - 14:30 IST (9:00 - 11:00 CEST) > > > > > > On May 30th there is a holiday here in germany. > > > > @Poornima Gurusiddaiah If there is any > > problem finding a date please contanct me. I will look for > alternatives > > > > > > Regards > > > > David Spisla > > > > > > > > Am Do., 16. Mai 2019 um 12:42 Uhr schrieb David Spisla > > >: > > > > Hello Amar, > > > > thank you for the information. Of course, we should wait for > > Poornima because of her knowledge. > > > > Regards > > David Spisla > > > > Am Do., 16. Mai 2019 um 12:23 Uhr schrieb Amar Tumballi > > Suryanarayan >: > > > > David, Poornima is on leave from today till 21st May. So > > having it after she comes back is better. She has more > > experience in SMB integration than many of us. > > > > -Amar > > > > On Thu, May 16, 2019 at 1:09 PM David Spisla > > > wrote: > > > > Hello everyone, > > > > if there is any problem in finding a date and time, > > please contact me. It would be fine to have a meeting > soon. > > > > Regards > > David Spisla > > > > Am Mo., 13. Mai 2019 um 12:38 Uhr schrieb David Spisla > > > >: > > > > Hi Poornima,____ > > > > __ __ > > > > thats fine. I would suggest this dates and times:____ > > > > __ __ > > > > May 15th ? 17th at 12:30, 13:30, 14:30 IST (9:00, > > 10:00, 11:00 CEST) ____ > > > > May 20th ? 24th at 12:30, 13:30, 14:30 IST (9:00, > > 10:00, 11:00 CEST)____ > > > > __ __ > > > > I add Volker Lendecke from Sernet to the mail. He is > > the Samba Expert.____ > > > > Can someone of you provide a host via bluejeans.com > > ? If not, I will try it with > > GoToMeeting (https://www.gotomeeting.com).____ > > > > __ __ > > > > @all Please write your prefered dates and times. For > > me, all oft the above dates and times are fine____ > > > > __ __ > > > > Regards____ > > > > David____ > > > > __ __ > > > > > > > > > > *Von:* Poornima Gurusiddaiah > > > > *Gesendet:* Montag, 13. Mai 2019 07:22 > > *An:* David Spisla > >; Anoop C S > > >; > > Gunther Deschner > > > > *Cc:* Gluster Devel > >; > > gluster-users at gluster.org > > List > > > > > > *Betreff:* Re: [Gluster-devel] Improve stability > > between SMB/CTDB and Gluster (together with Samba > > Core Developer)____ > > > > __ __ > > > > Hi,____ > > > > __ __ > > > > We would be definitely interested in this. Thank you > > for contacting us. For the starter we can have an > > online conference. Please suggest few possible date > > and times for the week(preferably between IST 7.00AM > > - 9.PM )?____ > > > > Adding Anoop and Gunther who are also the main > > contributors to the Gluster-Samba integration.____ > > > > __ __ > > > > Thanks,____ > > > > Poornima____ > > > > __ __ > > > > __ __ > > > > __ __ > > > > On Thu, May 9, 2019 at 7:43 PM David Spisla > > > > > wrote:____ > > > > Dear Gluster Community,____ > > > > at the moment we are improving the stability of > > SMB/CTDB and Gluster. For this purpose we are > > working together with an advanced SAMBA Core > > Developer. He did some debugging but needs more > > information about Gluster Core Behaviour. ____ > > > > __ __ > > > > *Would any of the Gluster Developer wants to > > have a online conference with him and me?* ____ > > > > __ __ > > > > I would organize everything. In my opinion this > > is a good chance to improve stability of > > Glusterfs and this is at the moment one of the > > major issues in the Community.____ > > > > __ __ > > > > Regards ____ > > > > David Spisla____ > > > > _______________________________________________ > > > > Community Meeting Calendar: > > > > APAC Schedule - > > Every 2nd and 4th Tuesday at 11:30 AM IST > > Bridge: https://bluejeans.com/836554017 > > > > NA/EMEA Schedule - > > Every 1st and 3rd Tuesday at 01:00 PM EDT > > Bridge: https://bluejeans.com/486278655 > > > > Gluster-devel mailing list > > Gluster-devel at gluster.org > > > > > https://lists.gluster.org/mailman/listinfo/gluster-devel____ > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org Gluster-users at gluster.org> > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > -- > > Amar Tumballi (amarts) > > > > > -- > G?nther Deschner GPG-ID: 8EE11688 > Red Hat gdeschner at redhat.com > Samba Team gd at samba.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aspandey at redhat.com Fri Jun 7 14:18:50 2019 From: aspandey at redhat.com (Ashish Pandey) Date: Fri, 7 Jun 2019 10:18:50 -0400 (EDT) Subject: [Gluster-users] healing of disperse volume In-Reply-To: <297722f5-9257-98c3-b4c0-3caad0cff5e1@gmail.com> References: <297722f5-9257-98c3-b4c0-3caad0cff5e1@gmail.com> Message-ID: <1883917003.21586447.1559917130491.JavaMail.zimbra@redhat.com> Hi, First of all , following command is not for disperse volume - gluster volume heal elastic-volume info split-brain This is applicable for replicate volumes only. Could you please let us know what exactly do you want to test? If you want to test disperse volume against failure of bricks or servers, you can kill some of the brick process. Maximum redundant number of brick process. In 4+2, 2 is the redundancy count. After killing two brick process, by using kill command, you can write some data on volume and the do force start of the volume. gluster v start force This will start all the killed brick processes also. At the end you can see tha heal should be done by self heal daemon and volume should become healthy again. --- Ashish ----- Original Message ----- From: "fusillator" To: gluster-users at gluster.org Sent: Friday, June 7, 2019 2:09:01 AM Subject: [Gluster-users] healing of disperse volume Hi all, I'm pretty new to glusterfs, I managed to setup a dispersed volume (4+2) using the release 6.1 from centos repository.. Is it a stable release? Then I forced the volume stop when the application were writing on the mount point.. getting a wanted inconsistent state, I'm wondering what are the best practice to solve this kinds of situation...I just found a detailed explanation about how to solve splitting-head state of replicated volume at https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ but it seems to be not applicable to the disperse volume type. Do I miss to read some important piece of documentation? Please point me to some reference. Here's some command detail: #gluster volume info elastic-volume Volume Name: elastic-volume Type: Disperse Volume ID: 96773fef-c443-465b-a518-6630bcf83397 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (4 + 2) = 6 Transport-type: tcp Bricks: Brick1: dev-netflow01.fineco.it:/data/gfs/lv_elastic/brick1/brick Brick2: dev-netflow02.fineco.it:/data/gfs/lv_elastic/brick1/brick Brick3: dev-netflow03.fineco.it:/data/gfs/lv_elastic/brick1/brick Brick4: dev-netflow04.fineco.it:/data/gfs/lv_elastic/brick1/brick Brick5: dev-netflow05.fineco.it:/data/gfs/lv_elastic/brick1/brick Brick6: dev-netflow06.fineco.it:/data/gfs/lv_elastic/brick1/brick Options Reconfigured: performance.io-cache: off performance.io-thread-count: 64 performance.write-behind-window-size: 100MB performance.cache-size: 1GB nfs.disable: on transport.address-family: inet # gluster volume heal elastic-volume info Brick dev01:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log Status: Connected Number of entries: 12 Brick dev02:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log Status: Connected Number of entries: 12 Brick dev03:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log Status: Connected Number of entries: 12 Brick dev04:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log Status: Connected Number of entries: 12 Brick dev05:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log Status: Connected Number of entries: 12 Brick dev06:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log Status: Connected Number of entries: 12 # gluster volume heal elastic-volume info split-brain Volume elastic-volume is not of type replicate Any advice? Best regards Luca _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From edward.clay at uk2group.com Fri Jun 7 18:50:03 2019 From: edward.clay at uk2group.com (Edward Clay) Date: Fri, 7 Jun 2019 18:50:03 +0000 Subject: [Gluster-users] Gluster quorum lost Message-ID: Hello, I have a replica 3 volume that has lost quorum twice this week causing us much pain. What seems to happen is one of the sans thinks one of the other two peers has disconnected. Then a few seconds later another disconnects causing quorum to be lost. This causes us pain since we have 7 ovirt host that are connected to this gluster volume and they never seem to reattach. I was able to unmount the brick manually on the ovirt host and then run the commands to mount them again and that seemed to get things working again. We have 3 sans running glusterfs 3.12.14-1 and nothing else. # gluster volume info gv1 Volume Name: gv1 Type: Replicate Volume ID: ea12f72d-a228-43ba-a360-4477cada292a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.4.16.19:/glusterfs/data1/gv1 Brick2: 10.4.16.11:/glusterfs/data1/gv1 Brick3: 10.4.16.12:/glusterfs/data1/gv1 Options Reconfigured: nfs.register-with-portmap: on diagnostics.count-fop-hits: on diagnostics.latency-measurement: on cluster.self-heal-daemon: enable cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off auth.allow: 10.4.16.* nfs.rpc-auth-allow: 10.4.16.* nfs.disable: off server.allow-insecure: on storage.owner-gid: 36 storage.owner-uid: 36 nfs.addr-namelookup: off nfs.export-volumes: on network.ping-timeout: 50 cluster.server-quorum-ratio: 51% They produced the following logs this morning. and the first entry is the first entry for 2019-06-07. san3 seems to have an issue first: [2019-06-07 14:23:20.670561] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <10.4.16.12> (), in state , has disconnected from glusterd. [2019-06-07 14:23:20.774127] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <10.4.16.11> (<0f3090ee-080b-4a6b-9964-0ca86d801469>), in state , has disconnected from glusterd. [2019-06-07 14:23:20.774413] C [MSGID: 106002] [glusterd-server-quorum.c:360:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume gv1. Stopping local bricks. san1 follows: [2019-06-07 14:23:22.137405] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <10.4.16.12> (), in state , has disconnected from glusterd. [2019-06-07 14:23:22.229343] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <10.4.16.19> (<238af98a-d2f1-491d-a1f1-64ace4eb6d3d>), in state , has disconnected from glusterd. [2019-06-07 14:23:22.229618] C [MSGID: 106002] [glusterd-server-quorum.c:360:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume gv1. Stopping local bricks. san2 seems to be the last one standing but quorum gets lost: [2019-06-07 14:23:26.611435] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <10.4.16.11> (<0f3090ee-080b-4a6b-9964-0ca86d801469>), in state , has disconnected from glusterd. [2019-06-07 14:23:26.714137] I [MSGID: 106004] [glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer <10.4.16.19> (<238af98a-d2f1-491d-a1f1-64ace4eb6d3d>), in state , has disconnected from glusterd. [2019-06-07 14:23:26.714405] C [MSGID: 106002] [glusterd-server-quorum.c:360:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume gv1. Stopping local bricks. On the ovirt host I see the following type of entries for the gluster brick that's mounted /var/log/glusterfs/rhev-data-center-mnt-glusterSD-10.4.16.11:gv1.log. They are all pretty much the same entries on all 7 host. hv6 seems to be the first host to complain: [2019-06-07 14:23:22.190493] I [glusterfsd-mgmt.c:2424:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: 10.4.16.11 [2019-06-07 14:23:22.190540] I [glusterfsd-mgmt.c:2464:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting to next volfile server 10.4.16.19 [2019-06-07 14:23:32.618071] I [glusterfsd-mgmt.c:2005:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing [2019-06-07 14:23:33.651755] W [socket.c:719:__socket_rwv] 0-gv1-client-4: readv on 10.4.16.12:49152 failed (No data available) [2019-06-07 14:23:33.651806] I [MSGID: 114018] [client.c:2288:client_rpc_notify] 0-gv1-client-4: disconnected from gv1-client-4. Client process will keep trying to connect to glusterd until brick's port is available One thing I should point out here that is probably important. We are running glusterfs 3.12.14-1 on the sans but ovirt host have been upgraded to 5.6-1. We stopped updating the sans gluster version after the previous version had a memory leak causing the sans to go down randomly. Version 3.12.14-1 has seemed to stop this from happening. What I'm not finding is there a incompatibility between these versions that could cause this? Are there any other steps I can take or logs I can collect to better identify what's causing this to happen? Edward Clay Systems Administrator The Hut Group Tel: Email: edward.clay at uk2group.com For the purposes of this email, the "company" means The Hut Group Limited, a company registered in England and Wales (company number 6539496) whose registered office is at Fifth Floor, Voyager House, Chicago Avenue, Manchester Airport, M90 3DQ and/or any of its respective subsidiaries. Confidentiality Notice This e-mail is confidential and intended for the use of the named recipient only. If you are not the intended recipient please notify us by telephone immediately on +44(0)1606 811888 or return it to us by e-mail. Please then delete it from your system and note that any use, dissemination, forwarding, printing or copying is strictly prohibited. Any views or opinions are solely those of the author and do not necessarily represent those of the company. Encryptions and Viruses Please note that this e-mail and any attachments have not been encrypted. They may therefore be liable to be compromised. Please also note that it is your responsibility to scan this e-mail and any attachments for viruses. We do not, to the extent permitted by law, accept any liability (whether in contract, negligence or otherwise) for any virus infection and/or external compromise of security and/or confidentiality in relation to transmissions sent by e-mail. Monitoring Activity and use of the company's systems is monitored to secure its effective use and operation and for other lawful business purposes. Communications using these systems will also be monitored and may be recorded to secure effective use and operation and for other lawful business purposes. hgvyjuv -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.orth at gmail.com Fri Jun 7 19:58:17 2019 From: alan.orth at gmail.com (Alan Orth) Date: Fri, 7 Jun 2019 22:58:17 +0300 Subject: [Gluster-users] Does replace-brick migrate data? In-Reply-To: References: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com> <0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com> <39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com> <0aa881db-a724-13be-ff63-6c346d7f55d8@redhat.com> Message-ID: Dear Ravi, In the last week I have completed a fix-layout and a full INDEX heal on this volume. Now I've started a rebalance and I see a few terabytes of data going around on different bricks since yesterday, which I'm sure is good. While I wait for the rebalance to finish, I'm wondering if you know what would cause directories to be missing from the FUSE mount point? If I list the directories explicitly I can see their contents, but they do not appear in their parent directories' listing. In the case of duplicated files it is always because the files are not on the correct bricks (according to the Dynamo/Elastic Hash algorithm), and I can fix it by copying the file to the correct brick(s) and removing it from the others (along with their .glusterfs hard links). So what could cause directories to be missing? Thank you, Thank you, On Wed, Jun 5, 2019 at 1:08 AM Alan Orth wrote: > Hi Ravi, > > You're right that I had mentioned using rsync to copy the brick content to > a new host, but in the end I actually decided not to bring it up on a new > brick. Instead I added the original brick back into the volume. So the > xattrs and symlinks to .glusterfs on the original brick are fine. I think > the problem probably lies with a remove-brick that got interrupted. A few > weeks ago during the maintenance I had tried to remove a brick and then > after twenty minutes and no obvious progress I stopped it?after that the > bricks were still part of the volume. > > In the last few days I have run a fix-layout that took 26 hours and > finished successfully. Then I started a full index heal and it has healed > about 3.3 million files in a few days and I see a clear increase of network > traffic from old brick host to new brick host over that time. Once the full > index heal completes I will try to do a rebalance. > > Thank you, > > > On Mon, Jun 3, 2019 at 7:40 PM Ravishankar N > wrote: > >> >> On 01/06/19 9:37 PM, Alan Orth wrote: >> >> Dear Ravi, >> >> The .glusterfs hardlinks/symlinks should be fine. I'm not sure how I >> could verify them for six bricks and millions of files, though... :\ >> >> Hi Alan, >> >> The reason I asked this is because you had mentioned in one of your >> earlier emails that when you moved content from the old brick to the new >> one, you had skipped the .glusterfs directory. So I was assuming that when >> you added back this new brick to the cluster, it might have been missing >> the .glusterfs entries. If that is the cae, one way to verify could be to >> check using a script if all files on the brick have a link-count of at >> least 2 and all dirs have valid symlinks inside .glusterfs pointing to >> themselves. >> >> >> I had a small success in fixing some issues with duplicated files on the >> FUSE mount point yesterday. I read quite a bit about the elastic hashing >> algorithm that determines which files get placed on which bricks based on >> the hash of their filename and the trusted.glusterfs.dht xattr on brick >> directories (thanks to Joe Julian's blog post and Python script for showing >> how it works?). With that knowledge I looked closer at one of the files >> that was appearing as duplicated on the FUSE mount and found that it was >> also duplicated on more than `replica 2` bricks. For this particular file I >> found two "real" files and several zero-size files with >> trusted.glusterfs.dht.linkto xattrs. Neither of the "real" files were on >> the correct brick as far as the DHT layout is concerned, so I copied one of >> them to the correct brick, deleted the others and their hard links, and did >> a `stat` on the file from the FUSE mount point and it fixed itself. Yay! >> >> Could this have been caused by a replace-brick that got interrupted and >> didn't finish re-labeling the xattrs? >> >> No, replace-brick only initiates AFR self-heal, which just copies the >> contents from the other brick(s) of the *same* replica pair into the >> replaced brick. The link-to files are created by DHT when you rename a >> file from the client. If the new name hashes to a different brick, DHT >> does not move the entire file there. It instead creates the link-to file >> (the one with the dht.linkto xattrs) on the hashed subvol. The value of >> this xattr points to the brick where the actual data is there (`getfattr -e >> text` to see it for yourself). Perhaps you had attempted a rebalance or >> remove-brick earlier and interrupted that? >> >> Should I be thinking of some heuristics to identify and fix these issues >> with a script (incorrect brick placement), or is this something a fix >> layout or repeated volume heals can fix? I've already completed a whole >> heal on this particular volume this week and it did heal about 1,000,000 >> files (mostly data and metadata, but about 20,000 entry heals as well). >> >> Maybe you should let the AFR self-heals complete first and then attempt a >> full rebalance to take care of the dht link-to files. But if the files are >> in millions, it could take quite some time to complete. >> Regards, >> Ravi >> >> Thanks for your support, >> >> ? https://joejulian.name/post/dht-misses-are-expensive/ >> >> On Fri, May 31, 2019 at 7:57 AM Ravishankar N >> wrote: >> >>> >>> On 31/05/19 3:20 AM, Alan Orth wrote: >>> >>> Dear Ravi, >>> >>> I spent a bit of time inspecting the xattrs on some files and >>> directories on a few bricks for this volume and it looks a bit messy. Even >>> if I could make sense of it for a few and potentially heal them manually, >>> there are millions of files and directories in total so that's definitely >>> not a scalable solution. After a few missteps with `replace-brick ... >>> commit force` in the last week?one of which on a brick that was >>> dead/offline?as well as some premature `remove-brick` commands, I'm unsure >>> how how to proceed and I'm getting demotivated. It's scary how quickly >>> things get out of hand in distributed systems... >>> >>> Hi Alan, >>> The one good thing about gluster is it that the data is always available >>> directly on the backed bricks even if your volume has inconsistencies at >>> the gluster level. So theoretically, if your cluster is FUBAR, you could >>> just create a new volume and copy all data onto it via its mount from the >>> old volume's bricks. >>> >>> >>> I had hoped that bringing the old brick back up would help, but by the >>> time I added it again a few days had passed and all the brick-id's had >>> changed due to the replace/remove brick commands, not to mention that the >>> trusted.afr.$volume-client-xx values were now probably pointing to the >>> wrong bricks (?). >>> >>> Anyways, a few hours ago I started a full heal on the volume and I see >>> that there is a sustained 100MiB/sec of network traffic going from the old >>> brick's host to the new one. The completed heals reported in the logs look >>> promising too: >>> >>> Old brick host: >>> >>> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E >>> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c >>> 281614 Completed data selfheal >>> 84 Completed entry selfheal >>> 299648 Completed metadata selfheal >>> >>> New brick host: >>> >>> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E >>> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c >>> 198256 Completed data selfheal >>> 16829 Completed entry selfheal >>> 229664 Completed metadata selfheal >>> >>> So that's good I guess, though I have no idea how long it will take or >>> if it will fix the "missing files" issue on the FUSE mount. I've increased >>> cluster.shd-max-threads to 8 to hopefully speed up the heal process. >>> >>> The afr xattrs should not cause files to disappear from mount. If the >>> xattr names do not match what each AFR subvol expects (for eg. in a replica >>> 2 volume, trusted.afr.*-client-{0,1} for 1st subvol, client-{2,3} for 2nd >>> subvol and so on - ) for its children then it won't heal the data, that is >>> all. But in your case I see some inconsistencies like one brick having the >>> actual file (licenseserver.cfg) and the other having a linkto file (the >>> one with the dht.linkto xattr) *in the same replica pair*. >>> >>> >>> I'd be happy for any advice or pointers, >>> >>> Did you check if the .glusterfs hardlinks/symlinks exist and are in >>> order for all bricks? >>> >>> -Ravi >>> >>> >>> On Wed, May 29, 2019 at 5:20 PM Alan Orth wrote: >>> >>>> Dear Ravi, >>>> >>>> Thank you for the link to the blog post series?it is very informative >>>> and current! If I understand your blog post correctly then I think the >>>> answer to your previous question about pending AFRs is: no, there are no >>>> pending AFRs. I have identified one file that is a good test case to try to >>>> understand what happened after I issued the `gluster volume replace-brick >>>> ... commit force` a few days ago and then added the same original brick >>>> back to the volume later. This is the current state of the replica 2 >>>> distribute/replicate volume: >>>> >>>> [root at wingu0 ~]# gluster volume info apps >>>> >>>> Volume Name: apps >>>> Type: Distributed-Replicate >>>> Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda >>>> Status: Started >>>> Snapshot Count: 0 >>>> Number of Bricks: 3 x 2 = 6 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: wingu3:/mnt/gluster/apps >>>> Brick2: wingu4:/mnt/gluster/apps >>>> Brick3: wingu05:/data/glusterfs/sdb/apps >>>> Brick4: wingu06:/data/glusterfs/sdb/apps >>>> Brick5: wingu0:/mnt/gluster/apps >>>> Brick6: wingu05:/data/glusterfs/sdc/apps >>>> Options Reconfigured: >>>> diagnostics.client-log-level: DEBUG >>>> storage.health-check-interval: 10 >>>> nfs.disable: on >>>> >>>> I checked the xattrs of one file that is missing from the volume's FUSE >>>> mount (though I can read it if I access its full path explicitly), but is >>>> present in several of the volume's bricks (some with full size, others >>>> empty): >>>> >>>> [root at wingu0 ~]# getfattr -d -m. -e hex >>>> /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>> >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>> trusted.afr.apps-client-3=0x000000000000000000000000 >>>> trusted.afr.apps-client-5=0x000000000000000000000000 >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.bit-rot.version=0x0200000000000000585a396f00046e15 >>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>> >>>> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >>>> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 >>>> >>>> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >>>> >>>> [root at wingu06 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >>>> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 >>>> >>>> According to the trusted.afr.apps-client-xx xattrs this particular >>>> file should be on bricks with id "apps-client-3" and "apps-client-5". It >>>> took me a few hours to realize that the brick-id values are recorded in the >>>> volume's volfiles in /var/lib/glusterd/vols/apps/bricks. After comparing >>>> those brick-id values with a volfile backup from before the replace-brick, >>>> I realized that the files are simply on the wrong brick now as far as >>>> Gluster is concerned. This particular file is now on the brick for >>>> "apps-client-4". As an experiment I copied this one file to the two >>>> bricks listed in the xattrs and I was then able to see the file from the >>>> FUSE mount (yay!). >>>> >>>> Other than replacing the brick, removing it, and then adding the old >>>> brick on the original server back, there has been no change in the data >>>> this entire time. Can I change the brick IDs in the volfiles so they >>>> reflect where the data actually is? Or perhaps script something to reset >>>> all the xattrs on the files/directories to point to the correct bricks? >>>> >>>> Thank you for any help or pointers, >>>> >>>> On Wed, May 29, 2019 at 7:24 AM Ravishankar N >>>> wrote: >>>> >>>>> >>>>> On 29/05/19 9:50 AM, Ravishankar N wrote: >>>>> >>>>> >>>>> On 29/05/19 3:59 AM, Alan Orth wrote: >>>>> >>>>> Dear Ravishankar, >>>>> >>>>> I'm not sure if Brick4 had pending AFRs because I don't know what that >>>>> means and it's been a few days so I am not sure I would be able to find >>>>> that information. >>>>> >>>>> When you find some time, have a look at a blog >>>>> series I wrote about AFR- I've tried to explain what one needs to know to >>>>> debug replication related issues in it. >>>>> >>>>> Made a typo error. The URL for the blog is https://wp.me/peiBB-6b >>>>> >>>>> -Ravi >>>>> >>>>> >>>>> Anyways, after wasting a few days rsyncing the old brick to a new host >>>>> I decided to just try to add the old brick back into the volume instead of >>>>> bringing it up on the new host. I created a new brick directory on the old >>>>> host, moved the old brick's contents into that new directory (minus the >>>>> .glusterfs directory), added the new brick to the volume, and then did >>>>> Vlad's find/stat trick? from the brick to the FUSE mount point. >>>>> >>>>> The interesting problem I have now is that some files don't appear in >>>>> the FUSE mount's directory listings, but I can actually list them directly >>>>> and even read them. What could cause that? >>>>> >>>>> Not sure, too many variables in the hacks that you did to take a >>>>> guess. You can check if the contents of the .glusterfs folder are in order >>>>> on the new brick (example hardlink for files and symlinks for directories >>>>> are present etc.) . >>>>> Regards, >>>>> Ravi >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> ? >>>>> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html >>>>> >>>>> On Fri, May 24, 2019 at 4:59 PM Ravishankar N >>>>> wrote: >>>>> >>>>>> >>>>>> On 23/05/19 2:40 AM, Alan Orth wrote: >>>>>> >>>>>> Dear list, >>>>>> >>>>>> I seem to have gotten into a tricky situation. Today I brought up a >>>>>> shiny new server with new disk arrays and attempted to replace one brick of >>>>>> a replica 2 distribute/replicate volume on an older server using the >>>>>> `replace-brick` command: >>>>>> >>>>>> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes >>>>>> wingu06:/data/glusterfs/sdb/homes commit force >>>>>> >>>>>> The command was successful and I see the new brick in the output of >>>>>> `gluster volume info`. The problem is that Gluster doesn't seem to be >>>>>> migrating the data, >>>>>> >>>>>> `replace-brick` definitely must heal (not migrate) the data. In your >>>>>> case, data must have been healed from Brick-4 to the replaced Brick-3. Are >>>>>> there any errors in the self-heal daemon logs of Brick-4's node? Does >>>>>> Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit out of >>>>>> date. replace-brick command internally does all the setfattr steps that are >>>>>> mentioned in the doc. >>>>>> >>>>>> -Ravi >>>>>> >>>>>> >>>>>> and now the original brick that I replaced is no longer part of the >>>>>> volume (and a few terabytes of data are just sitting on the old brick): >>>>>> >>>>>> # gluster volume info homes | grep -E "Brick[0-9]:" >>>>>> Brick1: wingu4:/mnt/gluster/homes >>>>>> Brick2: wingu3:/mnt/gluster/homes >>>>>> Brick3: wingu06:/data/glusterfs/sdb/homes >>>>>> Brick4: wingu05:/data/glusterfs/sdb/homes >>>>>> Brick5: wingu05:/data/glusterfs/sdc/homes >>>>>> Brick6: wingu06:/data/glusterfs/sdc/homes >>>>>> >>>>>> I see the Gluster docs have a more complicated procedure for >>>>>> replacing bricks that involves getfattr/setfattr?. How can I tell Gluster >>>>>> about the old brick? I see that I have a backup of the old volfile thanks >>>>>> to yum's rpmsave function if that helps. >>>>>> >>>>>> We are using Gluster 5.6 on CentOS 7. Thank you for any advice you >>>>>> can give. >>>>>> >>>>>> ? >>>>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick >>>>>> >>>>>> -- >>>>>> Alan Orth >>>>>> alan.orth at gmail.com >>>>>> https://picturingjordan.com >>>>>> https://englishbulgaria.net >>>>>> https://mjanja.ch >>>>>> "In heaven all the interesting people are missing." ?Friedrich >>>>>> Nietzsche >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Alan Orth >>>>> alan.orth at gmail.com >>>>> https://picturingjordan.com >>>>> https://englishbulgaria.net >>>>> https://mjanja.ch >>>>> "In heaven all the interesting people are missing." ?Friedrich >>>>> Nietzsche >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >>>> >>>> -- >>>> Alan Orth >>>> alan.orth at gmail.com >>>> https://picturingjordan.com >>>> https://englishbulgaria.net >>>> https://mjanja.ch >>>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >>>> >>> >>> >>> -- >>> Alan Orth >>> alan.orth at gmail.com >>> https://picturingjordan.com >>> https://englishbulgaria.net >>> https://mjanja.ch >>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >>> >>> >> >> -- >> Alan Orth >> alan.orth at gmail.com >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >> >> > > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche > -- Alan Orth alan.orth at gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Sat Jun 8 02:27:51 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Sat, 8 Jun 2019 07:57:51 +0530 Subject: [Gluster-users] Does replace-brick migrate data? In-Reply-To: References: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com> <0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com> <39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com> <0aa881db-a724-13be-ff63-6c346d7f55d8@redhat.com> Message-ID: On Sat, 8 Jun 2019 at 01:29, Alan Orth wrote: > Dear Ravi, > > In the last week I have completed a fix-layout and a full INDEX heal on > this volume. Now I've started a rebalance and I see a few terabytes of data > going around on different bricks since yesterday, which I'm sure is good. > > While I wait for the rebalance to finish, I'm wondering if you know what > would cause directories to be missing from the FUSE mount point? If I list > the directories explicitly I can see their contents, but they do not appear > in their parent directories' listing. In the case of duplicated files it is > always because the files are not on the correct bricks (according to the > Dynamo/Elastic Hash algorithm), and I can fix it by copying the file to the > correct brick(s) and removing it from the others (along with their > .glusterfs hard links). So what could cause directories to be missing? > > Hi Alan, The directories that don't show up in the parent directory listing is probably because they do not exist on the hashed subvol. Please check the backend bricks to see if they are missing on any of them. Regards, Nithya Thank you, > > Thank you, > > On Wed, Jun 5, 2019 at 1:08 AM Alan Orth wrote: > >> Hi Ravi, >> >> You're right that I had mentioned using rsync to copy the brick content >> to a new host, but in the end I actually decided not to bring it up on a >> new brick. Instead I added the original brick back into the volume. So the >> xattrs and symlinks to .glusterfs on the original brick are fine. I think >> the problem probably lies with a remove-brick that got interrupted. A few >> weeks ago during the maintenance I had tried to remove a brick and then >> after twenty minutes and no obvious progress I stopped it?after that the >> bricks were still part of the volume. >> >> In the last few days I have run a fix-layout that took 26 hours and >> finished successfully. Then I started a full index heal and it has healed >> about 3.3 million files in a few days and I see a clear increase of network >> traffic from old brick host to new brick host over that time. Once the full >> index heal completes I will try to do a rebalance. >> >> Thank you, >> >> >> On Mon, Jun 3, 2019 at 7:40 PM Ravishankar N >> wrote: >> >>> >>> On 01/06/19 9:37 PM, Alan Orth wrote: >>> >>> Dear Ravi, >>> >>> The .glusterfs hardlinks/symlinks should be fine. I'm not sure how I >>> could verify them for six bricks and millions of files, though... :\ >>> >>> Hi Alan, >>> >>> The reason I asked this is because you had mentioned in one of your >>> earlier emails that when you moved content from the old brick to the new >>> one, you had skipped the .glusterfs directory. So I was assuming that when >>> you added back this new brick to the cluster, it might have been missing >>> the .glusterfs entries. If that is the cae, one way to verify could be to >>> check using a script if all files on the brick have a link-count of at >>> least 2 and all dirs have valid symlinks inside .glusterfs pointing to >>> themselves. >>> >>> >>> I had a small success in fixing some issues with duplicated files on the >>> FUSE mount point yesterday. I read quite a bit about the elastic hashing >>> algorithm that determines which files get placed on which bricks based on >>> the hash of their filename and the trusted.glusterfs.dht xattr on brick >>> directories (thanks to Joe Julian's blog post and Python script for showing >>> how it works?). With that knowledge I looked closer at one of the files >>> that was appearing as duplicated on the FUSE mount and found that it was >>> also duplicated on more than `replica 2` bricks. For this particular file I >>> found two "real" files and several zero-size files with >>> trusted.glusterfs.dht.linkto xattrs. Neither of the "real" files were on >>> the correct brick as far as the DHT layout is concerned, so I copied one of >>> them to the correct brick, deleted the others and their hard links, and did >>> a `stat` on the file from the FUSE mount point and it fixed itself. Yay! >>> >>> Could this have been caused by a replace-brick that got interrupted and >>> didn't finish re-labeling the xattrs? >>> >>> No, replace-brick only initiates AFR self-heal, which just copies the >>> contents from the other brick(s) of the *same* replica pair into the >>> replaced brick. The link-to files are created by DHT when you rename a >>> file from the client. If the new name hashes to a different brick, DHT >>> does not move the entire file there. It instead creates the link-to file >>> (the one with the dht.linkto xattrs) on the hashed subvol. The value of >>> this xattr points to the brick where the actual data is there (`getfattr -e >>> text` to see it for yourself). Perhaps you had attempted a rebalance or >>> remove-brick earlier and interrupted that? >>> >>> Should I be thinking of some heuristics to identify and fix these issues >>> with a script (incorrect brick placement), or is this something a fix >>> layout or repeated volume heals can fix? I've already completed a whole >>> heal on this particular volume this week and it did heal about 1,000,000 >>> files (mostly data and metadata, but about 20,000 entry heals as well). >>> >>> Maybe you should let the AFR self-heals complete first and then attempt >>> a full rebalance to take care of the dht link-to files. But if the files >>> are in millions, it could take quite some time to complete. >>> Regards, >>> Ravi >>> >>> Thanks for your support, >>> >>> ? https://joejulian.name/post/dht-misses-are-expensive/ >>> >>> On Fri, May 31, 2019 at 7:57 AM Ravishankar N >>> wrote: >>> >>>> >>>> On 31/05/19 3:20 AM, Alan Orth wrote: >>>> >>>> Dear Ravi, >>>> >>>> I spent a bit of time inspecting the xattrs on some files and >>>> directories on a few bricks for this volume and it looks a bit messy. Even >>>> if I could make sense of it for a few and potentially heal them manually, >>>> there are millions of files and directories in total so that's definitely >>>> not a scalable solution. After a few missteps with `replace-brick ... >>>> commit force` in the last week?one of which on a brick that was >>>> dead/offline?as well as some premature `remove-brick` commands, I'm unsure >>>> how how to proceed and I'm getting demotivated. It's scary how quickly >>>> things get out of hand in distributed systems... >>>> >>>> Hi Alan, >>>> The one good thing about gluster is it that the data is always >>>> available directly on the backed bricks even if your volume has >>>> inconsistencies at the gluster level. So theoretically, if your cluster is >>>> FUBAR, you could just create a new volume and copy all data onto it via its >>>> mount from the old volume's bricks. >>>> >>>> >>>> I had hoped that bringing the old brick back up would help, but by the >>>> time I added it again a few days had passed and all the brick-id's had >>>> changed due to the replace/remove brick commands, not to mention that the >>>> trusted.afr.$volume-client-xx values were now probably pointing to the >>>> wrong bricks (?). >>>> >>>> Anyways, a few hours ago I started a full heal on the volume and I see >>>> that there is a sustained 100MiB/sec of network traffic going from the old >>>> brick's host to the new one. The completed heals reported in the logs look >>>> promising too: >>>> >>>> Old brick host: >>>> >>>> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E >>>> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c >>>> 281614 Completed data selfheal >>>> 84 Completed entry selfheal >>>> 299648 Completed metadata selfheal >>>> >>>> New brick host: >>>> >>>> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E >>>> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c >>>> 198256 Completed data selfheal >>>> 16829 Completed entry selfheal >>>> 229664 Completed metadata selfheal >>>> >>>> So that's good I guess, though I have no idea how long it will take or >>>> if it will fix the "missing files" issue on the FUSE mount. I've increased >>>> cluster.shd-max-threads to 8 to hopefully speed up the heal process. >>>> >>>> The afr xattrs should not cause files to disappear from mount. If the >>>> xattr names do not match what each AFR subvol expects (for eg. in a replica >>>> 2 volume, trusted.afr.*-client-{0,1} for 1st subvol, client-{2,3} for 2nd >>>> subvol and so on - ) for its children then it won't heal the data, that is >>>> all. But in your case I see some inconsistencies like one brick having the >>>> actual file (licenseserver.cfg) and the other having a linkto file >>>> (the one with the dht.linkto xattr) *in the same replica pair*. >>>> >>>> >>>> I'd be happy for any advice or pointers, >>>> >>>> Did you check if the .glusterfs hardlinks/symlinks exist and are in >>>> order for all bricks? >>>> >>>> -Ravi >>>> >>>> >>>> On Wed, May 29, 2019 at 5:20 PM Alan Orth wrote: >>>> >>>>> Dear Ravi, >>>>> >>>>> Thank you for the link to the blog post series?it is very informative >>>>> and current! If I understand your blog post correctly then I think the >>>>> answer to your previous question about pending AFRs is: no, there are no >>>>> pending AFRs. I have identified one file that is a good test case to try to >>>>> understand what happened after I issued the `gluster volume replace-brick >>>>> ... commit force` a few days ago and then added the same original brick >>>>> back to the volume later. This is the current state of the replica 2 >>>>> distribute/replicate volume: >>>>> >>>>> [root at wingu0 ~]# gluster volume info apps >>>>> >>>>> Volume Name: apps >>>>> Type: Distributed-Replicate >>>>> Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda >>>>> Status: Started >>>>> Snapshot Count: 0 >>>>> Number of Bricks: 3 x 2 = 6 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: wingu3:/mnt/gluster/apps >>>>> Brick2: wingu4:/mnt/gluster/apps >>>>> Brick3: wingu05:/data/glusterfs/sdb/apps >>>>> Brick4: wingu06:/data/glusterfs/sdb/apps >>>>> Brick5: wingu0:/mnt/gluster/apps >>>>> Brick6: wingu05:/data/glusterfs/sdc/apps >>>>> Options Reconfigured: >>>>> diagnostics.client-log-level: DEBUG >>>>> storage.health-check-interval: 10 >>>>> nfs.disable: on >>>>> >>>>> I checked the xattrs of one file that is missing from the volume's >>>>> FUSE mount (though I can read it if I access its full path explicitly), but >>>>> is present in several of the volume's bricks (some with full size, others >>>>> empty): >>>>> >>>>> [root at wingu0 ~]# getfattr -d -m. -e hex >>>>> /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>> >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>>> trusted.afr.apps-client-3=0x000000000000000000000000 >>>>> trusted.afr.apps-client-5=0x000000000000000000000000 >>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>> trusted.bit-rot.version=0x0200000000000000585a396f00046e15 >>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>>> >>>>> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >>>>> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 >>>>> >>>>> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >>>>> >>>>> [root at wingu06 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >>>>> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 >>>>> >>>>> According to the trusted.afr.apps-client-xx xattrs this particular >>>>> file should be on bricks with id "apps-client-3" and "apps-client-5". It >>>>> took me a few hours to realize that the brick-id values are recorded in the >>>>> volume's volfiles in /var/lib/glusterd/vols/apps/bricks. After comparing >>>>> those brick-id values with a volfile backup from before the replace-brick, >>>>> I realized that the files are simply on the wrong brick now as far as >>>>> Gluster is concerned. This particular file is now on the brick for >>>>> "apps-client-4". As an experiment I copied this one file to the two >>>>> bricks listed in the xattrs and I was then able to see the file from the >>>>> FUSE mount (yay!). >>>>> >>>>> Other than replacing the brick, removing it, and then adding the old >>>>> brick on the original server back, there has been no change in the data >>>>> this entire time. Can I change the brick IDs in the volfiles so they >>>>> reflect where the data actually is? Or perhaps script something to reset >>>>> all the xattrs on the files/directories to point to the correct bricks? >>>>> >>>>> Thank you for any help or pointers, >>>>> >>>>> On Wed, May 29, 2019 at 7:24 AM Ravishankar N >>>>> wrote: >>>>> >>>>>> >>>>>> On 29/05/19 9:50 AM, Ravishankar N wrote: >>>>>> >>>>>> >>>>>> On 29/05/19 3:59 AM, Alan Orth wrote: >>>>>> >>>>>> Dear Ravishankar, >>>>>> >>>>>> I'm not sure if Brick4 had pending AFRs because I don't know what >>>>>> that means and it's been a few days so I am not sure I would be able to >>>>>> find that information. >>>>>> >>>>>> When you find some time, have a look at a blog >>>>>> series I wrote about AFR- I've tried to >>>>>> explain what one needs to know to debug replication related issues in it. >>>>>> >>>>>> Made a typo error. The URL for the blog is https://wp.me/peiBB-6b >>>>>> >>>>>> -Ravi >>>>>> >>>>>> >>>>>> Anyways, after wasting a few days rsyncing the old brick to a new >>>>>> host I decided to just try to add the old brick back into the volume >>>>>> instead of bringing it up on the new host. I created a new brick directory >>>>>> on the old host, moved the old brick's contents into that new directory >>>>>> (minus the .glusterfs directory), added the new brick to the volume, and >>>>>> then did Vlad's find/stat trick? from the brick to the FUSE mount point. >>>>>> >>>>>> The interesting problem I have now is that some files don't appear in >>>>>> the FUSE mount's directory listings, but I can actually list them directly >>>>>> and even read them. What could cause that? >>>>>> >>>>>> Not sure, too many variables in the hacks that you did to take a >>>>>> guess. You can check if the contents of the .glusterfs folder are in order >>>>>> on the new brick (example hardlink for files and symlinks for directories >>>>>> are present etc.) . >>>>>> Regards, >>>>>> Ravi >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> ? >>>>>> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html >>>>>> >>>>>> On Fri, May 24, 2019 at 4:59 PM Ravishankar N >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> On 23/05/19 2:40 AM, Alan Orth wrote: >>>>>>> >>>>>>> Dear list, >>>>>>> >>>>>>> I seem to have gotten into a tricky situation. Today I brought up a >>>>>>> shiny new server with new disk arrays and attempted to replace one brick of >>>>>>> a replica 2 distribute/replicate volume on an older server using the >>>>>>> `replace-brick` command: >>>>>>> >>>>>>> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes >>>>>>> wingu06:/data/glusterfs/sdb/homes commit force >>>>>>> >>>>>>> The command was successful and I see the new brick in the output of >>>>>>> `gluster volume info`. The problem is that Gluster doesn't seem to be >>>>>>> migrating the data, >>>>>>> >>>>>>> `replace-brick` definitely must heal (not migrate) the data. In your >>>>>>> case, data must have been healed from Brick-4 to the replaced Brick-3. Are >>>>>>> there any errors in the self-heal daemon logs of Brick-4's node? Does >>>>>>> Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit out of >>>>>>> date. replace-brick command internally does all the setfattr steps that are >>>>>>> mentioned in the doc. >>>>>>> >>>>>>> -Ravi >>>>>>> >>>>>>> >>>>>>> and now the original brick that I replaced is no longer part of the >>>>>>> volume (and a few terabytes of data are just sitting on the old brick): >>>>>>> >>>>>>> # gluster volume info homes | grep -E "Brick[0-9]:" >>>>>>> Brick1: wingu4:/mnt/gluster/homes >>>>>>> Brick2: wingu3:/mnt/gluster/homes >>>>>>> Brick3: wingu06:/data/glusterfs/sdb/homes >>>>>>> Brick4: wingu05:/data/glusterfs/sdb/homes >>>>>>> Brick5: wingu05:/data/glusterfs/sdc/homes >>>>>>> Brick6: wingu06:/data/glusterfs/sdc/homes >>>>>>> >>>>>>> I see the Gluster docs have a more complicated procedure for >>>>>>> replacing bricks that involves getfattr/setfattr?. How can I tell Gluster >>>>>>> about the old brick? I see that I have a backup of the old volfile thanks >>>>>>> to yum's rpmsave function if that helps. >>>>>>> >>>>>>> We are using Gluster 5.6 on CentOS 7. Thank you for any advice you >>>>>>> can give. >>>>>>> >>>>>>> ? >>>>>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick >>>>>>> >>>>>>> -- >>>>>>> Alan Orth >>>>>>> alan.orth at gmail.com >>>>>>> https://picturingjordan.com >>>>>>> https://englishbulgaria.net >>>>>>> https://mjanja.ch >>>>>>> "In heaven all the interesting people are missing." ?Friedrich >>>>>>> Nietzsche >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Alan Orth >>>>>> alan.orth at gmail.com >>>>>> https://picturingjordan.com >>>>>> https://englishbulgaria.net >>>>>> https://mjanja.ch >>>>>> "In heaven all the interesting people are missing." ?Friedrich >>>>>> Nietzsche >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Alan Orth >>>>> alan.orth at gmail.com >>>>> https://picturingjordan.com >>>>> https://englishbulgaria.net >>>>> https://mjanja.ch >>>>> "In heaven all the interesting people are missing." ?Friedrich >>>>> Nietzsche >>>>> >>>> >>>> >>>> -- >>>> Alan Orth >>>> alan.orth at gmail.com >>>> https://picturingjordan.com >>>> https://englishbulgaria.net >>>> https://mjanja.ch >>>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >>>> >>>> >>> >>> -- >>> Alan Orth >>> alan.orth at gmail.com >>> https://picturingjordan.com >>> https://englishbulgaria.net >>> https://mjanja.ch >>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >>> >>> >> >> -- >> Alan Orth >> alan.orth at gmail.com >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >> > > > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.orth at gmail.com Sat Jun 8 08:25:12 2019 From: alan.orth at gmail.com (Alan Orth) Date: Sat, 8 Jun 2019 11:25:12 +0300 Subject: [Gluster-users] Does replace-brick migrate data? In-Reply-To: References: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com> <0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com> <39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com> <0aa881db-a724-13be-ff63-6c346d7f55d8@redhat.com> Message-ID: Thank you, Nithya. The "missing" directory is indeed present on all bricks. I enabled client-log-level DEBUG on the volume and then noticed the following in the FUSE mount log when doing a `stat` on the "missing" directory on the FUSE mount: [2019-06-08 08:03:30.240738] D [MSGID: 0] [dht-common.c:3454:dht_do_fresh_lookup] 0-homes-dht: Calling fresh lookup for /aorth/data on homes-replicate-2 [2019-06-08 08:03:30.241138] D [MSGID: 0] [dht-common.c:3013:dht_lookup_cbk] 0-homes-dht: fresh_lookup returned for /aorth/data with op_ret 0 [2019-06-08 08:03:30.241610] D [MSGID: 0] [dht-common.c:1354:dht_lookup_dir_cbk] 0-homes-dht: Internal xattr trusted.glusterfs.dht.mds is not present on path /aorth/data gfid is fb87699f-ebf3-4098-977d-85c3a70b849c [2019-06-08 08:06:18.880961] D [MSGID: 0] [dht-common.c:1559:dht_revalidate_cbk] 0-homes-dht: revalidate lookup of /aorth/data returned with op_ret 0 [2019-06-08 08:06:18.880963] D [MSGID: 0] [dht-common.c:1651:dht_revalidate_cbk] 0-homes-dht: internal xattr trusted.glusterfs.dht.mds is not present on path /aorth/data gfid is fb87699f-ebf3-4098-977d-85c3a70b849c [2019-06-08 08:06:18.880996] D [MSGID: 0] [dht-common.c:914:dht_common_mark_mdsxattr] 0-homes-dht: internal xattr trusted.glusterfs.dht.mds is present on subvolon path /aorth/data gfid is fb87699f-ebf3-4098-977d-85c3a70b849c One message says the trusted.glusterfs.dht.mds xattr is not present, then the next says it is present. Is that relevant? I looked at the xattrs of that directory on all the bricks and it does seem to be inconsistent (also the modification times on the directory are different): [root at wingu0 ~]# getfattr -d -m. -e hex /mnt/gluster/homes/aorth/data getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/homes/aorth/data security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.homes-client-3=0x000000000000000200000002 trusted.afr.homes-client-5=0x000000000000000000000000 trusted.gfid=0xfb87699febf34098977d85c3a70b849c trusted.glusterfs.dht=0xe7c11ff200000000b6dd59efffffffff [root at wingu3 ~]# getfattr -d -m. -e hex /mnt/gluster/homes/aorth/data getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/homes/aorth/data security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.homes-client-0=0x000000000000000000000000 trusted.afr.homes-client-1=0x000000000000000000000000 trusted.gfid=0xfb87699febf34098977d85c3a70b849c trusted.glusterfs.dht=0xe7c11ff2000000000000000049251e2d trusted.glusterfs.dht.mds=0x00000000 [root at wingu4 ~]# getfattr -d -m. -e hex /mnt/gluster/homes/aorth/data getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/homes/aorth/data security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.homes-client-0=0x000000000000000000000000 trusted.afr.homes-client-1=0x000000000000000000000000 trusted.gfid=0xfb87699febf34098977d85c3a70b849c trusted.glusterfs.dht=0xe7c11ff2000000000000000049251e2d trusted.glusterfs.dht.mds=0x00000000 [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/homes/aorth/data getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/sdb/homes/aorth/data security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.homes-client-2=0x000000000000000000000000 trusted.gfid=0xfb87699febf34098977d85c3a70b849c trusted.glusterfs.dht=0xe7c11ff20000000049251e2eb6dd59ee [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdc/homes/aorth/data getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/sdc/homes/aorth/data security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.gfid=0xfb87699febf34098977d85c3a70b849c trusted.glusterfs.dht=0xe7c11ff200000000b6dd59efffffffff [root at wingu06 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/homes/aorth/data getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/sdb/homes/aorth/data security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.gfid=0xfb87699febf34098977d85c3a70b849c trusted.glusterfs.dht=0xe7c11ff20000000049251e2eb6dd59ee This is a replica 2 volume on Gluster 5.6. Thank you, On Sat, Jun 8, 2019 at 5:28 AM Nithya Balachandran wrote: > > > On Sat, 8 Jun 2019 at 01:29, Alan Orth wrote: > >> Dear Ravi, >> >> In the last week I have completed a fix-layout and a full INDEX heal on >> this volume. Now I've started a rebalance and I see a few terabytes of data >> going around on different bricks since yesterday, which I'm sure is good. >> >> While I wait for the rebalance to finish, I'm wondering if you know what >> would cause directories to be missing from the FUSE mount point? If I list >> the directories explicitly I can see their contents, but they do not appear >> in their parent directories' listing. In the case of duplicated files it is >> always because the files are not on the correct bricks (according to the >> Dynamo/Elastic Hash algorithm), and I can fix it by copying the file to the >> correct brick(s) and removing it from the others (along with their >> .glusterfs hard links). So what could cause directories to be missing? >> >> Hi Alan, > > The directories that don't show up in the parent directory listing is > probably because they do not exist on the hashed subvol. Please check the > backend bricks to see if they are missing on any of them. > > Regards, > Nithya > > Thank you, >> >> Thank you, >> >> On Wed, Jun 5, 2019 at 1:08 AM Alan Orth wrote: >> >>> Hi Ravi, >>> >>> You're right that I had mentioned using rsync to copy the brick content >>> to a new host, but in the end I actually decided not to bring it up on a >>> new brick. Instead I added the original brick back into the volume. So the >>> xattrs and symlinks to .glusterfs on the original brick are fine. I think >>> the problem probably lies with a remove-brick that got interrupted. A few >>> weeks ago during the maintenance I had tried to remove a brick and then >>> after twenty minutes and no obvious progress I stopped it?after that the >>> bricks were still part of the volume. >>> >>> In the last few days I have run a fix-layout that took 26 hours and >>> finished successfully. Then I started a full index heal and it has healed >>> about 3.3 million files in a few days and I see a clear increase of network >>> traffic from old brick host to new brick host over that time. Once the full >>> index heal completes I will try to do a rebalance. >>> >>> Thank you, >>> >>> >>> On Mon, Jun 3, 2019 at 7:40 PM Ravishankar N >>> wrote: >>> >>>> >>>> On 01/06/19 9:37 PM, Alan Orth wrote: >>>> >>>> Dear Ravi, >>>> >>>> The .glusterfs hardlinks/symlinks should be fine. I'm not sure how I >>>> could verify them for six bricks and millions of files, though... :\ >>>> >>>> Hi Alan, >>>> >>>> The reason I asked this is because you had mentioned in one of your >>>> earlier emails that when you moved content from the old brick to the new >>>> one, you had skipped the .glusterfs directory. So I was assuming that when >>>> you added back this new brick to the cluster, it might have been missing >>>> the .glusterfs entries. If that is the cae, one way to verify could be to >>>> check using a script if all files on the brick have a link-count of at >>>> least 2 and all dirs have valid symlinks inside .glusterfs pointing to >>>> themselves. >>>> >>>> >>>> I had a small success in fixing some issues with duplicated files on >>>> the FUSE mount point yesterday. I read quite a bit about the elastic >>>> hashing algorithm that determines which files get placed on which bricks >>>> based on the hash of their filename and the trusted.glusterfs.dht xattr on >>>> brick directories (thanks to Joe Julian's blog post and Python script for >>>> showing how it works?). With that knowledge I looked closer at one of the >>>> files that was appearing as duplicated on the FUSE mount and found that it >>>> was also duplicated on more than `replica 2` bricks. For this particular >>>> file I found two "real" files and several zero-size files with >>>> trusted.glusterfs.dht.linkto xattrs. Neither of the "real" files were on >>>> the correct brick as far as the DHT layout is concerned, so I copied one of >>>> them to the correct brick, deleted the others and their hard links, and did >>>> a `stat` on the file from the FUSE mount point and it fixed itself. Yay! >>>> >>>> Could this have been caused by a replace-brick that got interrupted and >>>> didn't finish re-labeling the xattrs? >>>> >>>> No, replace-brick only initiates AFR self-heal, which just copies the >>>> contents from the other brick(s) of the *same* replica pair into the >>>> replaced brick. The link-to files are created by DHT when you rename a >>>> file from the client. If the new name hashes to a different brick, DHT >>>> does not move the entire file there. It instead creates the link-to file >>>> (the one with the dht.linkto xattrs) on the hashed subvol. The value of >>>> this xattr points to the brick where the actual data is there (`getfattr -e >>>> text` to see it for yourself). Perhaps you had attempted a rebalance or >>>> remove-brick earlier and interrupted that? >>>> >>>> Should I be thinking of some heuristics to identify and fix these >>>> issues with a script (incorrect brick placement), or is this something a >>>> fix layout or repeated volume heals can fix? I've already completed a whole >>>> heal on this particular volume this week and it did heal about 1,000,000 >>>> files (mostly data and metadata, but about 20,000 entry heals as well). >>>> >>>> Maybe you should let the AFR self-heals complete first and then attempt >>>> a full rebalance to take care of the dht link-to files. But if the files >>>> are in millions, it could take quite some time to complete. >>>> Regards, >>>> Ravi >>>> >>>> Thanks for your support, >>>> >>>> ? https://joejulian.name/post/dht-misses-are-expensive/ >>>> >>>> On Fri, May 31, 2019 at 7:57 AM Ravishankar N >>>> wrote: >>>> >>>>> >>>>> On 31/05/19 3:20 AM, Alan Orth wrote: >>>>> >>>>> Dear Ravi, >>>>> >>>>> I spent a bit of time inspecting the xattrs on some files and >>>>> directories on a few bricks for this volume and it looks a bit messy. Even >>>>> if I could make sense of it for a few and potentially heal them manually, >>>>> there are millions of files and directories in total so that's definitely >>>>> not a scalable solution. After a few missteps with `replace-brick ... >>>>> commit force` in the last week?one of which on a brick that was >>>>> dead/offline?as well as some premature `remove-brick` commands, I'm unsure >>>>> how how to proceed and I'm getting demotivated. It's scary how quickly >>>>> things get out of hand in distributed systems... >>>>> >>>>> Hi Alan, >>>>> The one good thing about gluster is it that the data is always >>>>> available directly on the backed bricks even if your volume has >>>>> inconsistencies at the gluster level. So theoretically, if your cluster is >>>>> FUBAR, you could just create a new volume and copy all data onto it via its >>>>> mount from the old volume's bricks. >>>>> >>>>> >>>>> I had hoped that bringing the old brick back up would help, but by the >>>>> time I added it again a few days had passed and all the brick-id's had >>>>> changed due to the replace/remove brick commands, not to mention that the >>>>> trusted.afr.$volume-client-xx values were now probably pointing to the >>>>> wrong bricks (?). >>>>> >>>>> Anyways, a few hours ago I started a full heal on the volume and I see >>>>> that there is a sustained 100MiB/sec of network traffic going from the old >>>>> brick's host to the new one. The completed heals reported in the logs look >>>>> promising too: >>>>> >>>>> Old brick host: >>>>> >>>>> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E >>>>> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c >>>>> 281614 Completed data selfheal >>>>> 84 Completed entry selfheal >>>>> 299648 Completed metadata selfheal >>>>> >>>>> New brick host: >>>>> >>>>> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E >>>>> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c >>>>> 198256 Completed data selfheal >>>>> 16829 Completed entry selfheal >>>>> 229664 Completed metadata selfheal >>>>> >>>>> So that's good I guess, though I have no idea how long it will take or >>>>> if it will fix the "missing files" issue on the FUSE mount. I've increased >>>>> cluster.shd-max-threads to 8 to hopefully speed up the heal process. >>>>> >>>>> The afr xattrs should not cause files to disappear from mount. If the >>>>> xattr names do not match what each AFR subvol expects (for eg. in a replica >>>>> 2 volume, trusted.afr.*-client-{0,1} for 1st subvol, client-{2,3} for 2nd >>>>> subvol and so on - ) for its children then it won't heal the data, that is >>>>> all. But in your case I see some inconsistencies like one brick having the >>>>> actual file (licenseserver.cfg) and the other having a linkto file >>>>> (the one with the dht.linkto xattr) *in the same replica pair*. >>>>> >>>>> >>>>> I'd be happy for any advice or pointers, >>>>> >>>>> Did you check if the .glusterfs hardlinks/symlinks exist and are in >>>>> order for all bricks? >>>>> >>>>> -Ravi >>>>> >>>>> >>>>> On Wed, May 29, 2019 at 5:20 PM Alan Orth wrote: >>>>> >>>>>> Dear Ravi, >>>>>> >>>>>> Thank you for the link to the blog post series?it is very informative >>>>>> and current! If I understand your blog post correctly then I think the >>>>>> answer to your previous question about pending AFRs is: no, there are no >>>>>> pending AFRs. I have identified one file that is a good test case to try to >>>>>> understand what happened after I issued the `gluster volume replace-brick >>>>>> ... commit force` a few days ago and then added the same original brick >>>>>> back to the volume later. This is the current state of the replica 2 >>>>>> distribute/replicate volume: >>>>>> >>>>>> [root at wingu0 ~]# gluster volume info apps >>>>>> >>>>>> Volume Name: apps >>>>>> Type: Distributed-Replicate >>>>>> Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda >>>>>> Status: Started >>>>>> Snapshot Count: 0 >>>>>> Number of Bricks: 3 x 2 = 6 >>>>>> Transport-type: tcp >>>>>> Bricks: >>>>>> Brick1: wingu3:/mnt/gluster/apps >>>>>> Brick2: wingu4:/mnt/gluster/apps >>>>>> Brick3: wingu05:/data/glusterfs/sdb/apps >>>>>> Brick4: wingu06:/data/glusterfs/sdb/apps >>>>>> Brick5: wingu0:/mnt/gluster/apps >>>>>> Brick6: wingu05:/data/glusterfs/sdc/apps >>>>>> Options Reconfigured: >>>>>> diagnostics.client-log-level: DEBUG >>>>>> storage.health-check-interval: 10 >>>>>> nfs.disable: on >>>>>> >>>>>> I checked the xattrs of one file that is missing from the volume's >>>>>> FUSE mount (though I can read it if I access its full path explicitly), but >>>>>> is present in several of the volume's bricks (some with full size, others >>>>>> empty): >>>>>> >>>>>> [root at wingu0 ~]# getfattr -d -m. -e hex >>>>>> /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>> >>>>>> getfattr: Removing leading '/' from absolute path names >>>>>> # file: mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>>>> trusted.afr.apps-client-3=0x000000000000000000000000 >>>>>> trusted.afr.apps-client-5=0x000000000000000000000000 >>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>> trusted.bit-rot.version=0x0200000000000000585a396f00046e15 >>>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>>>> >>>>>> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>> getfattr: Removing leading '/' from absolute path names >>>>>> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>>>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >>>>>> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 >>>>>> >>>>>> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>> getfattr: Removing leading '/' from absolute path names >>>>>> # file: data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>>>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >>>>>> >>>>>> [root at wingu06 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>> getfattr: Removing leading '/' from absolute path names >>>>>> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>>>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >>>>>> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 >>>>>> >>>>>> According to the trusted.afr.apps-client-xx xattrs this particular >>>>>> file should be on bricks with id "apps-client-3" and "apps-client-5". It >>>>>> took me a few hours to realize that the brick-id values are recorded in the >>>>>> volume's volfiles in /var/lib/glusterd/vols/apps/bricks. After comparing >>>>>> those brick-id values with a volfile backup from before the replace-brick, >>>>>> I realized that the files are simply on the wrong brick now as far as >>>>>> Gluster is concerned. This particular file is now on the brick for >>>>>> "apps-client-4". As an experiment I copied this one file to the two >>>>>> bricks listed in the xattrs and I was then able to see the file from the >>>>>> FUSE mount (yay!). >>>>>> >>>>>> Other than replacing the brick, removing it, and then adding the old >>>>>> brick on the original server back, there has been no change in the data >>>>>> this entire time. Can I change the brick IDs in the volfiles so they >>>>>> reflect where the data actually is? Or perhaps script something to reset >>>>>> all the xattrs on the files/directories to point to the correct bricks? >>>>>> >>>>>> Thank you for any help or pointers, >>>>>> >>>>>> On Wed, May 29, 2019 at 7:24 AM Ravishankar N >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> On 29/05/19 9:50 AM, Ravishankar N wrote: >>>>>>> >>>>>>> >>>>>>> On 29/05/19 3:59 AM, Alan Orth wrote: >>>>>>> >>>>>>> Dear Ravishankar, >>>>>>> >>>>>>> I'm not sure if Brick4 had pending AFRs because I don't know what >>>>>>> that means and it's been a few days so I am not sure I would be able to >>>>>>> find that information. >>>>>>> >>>>>>> When you find some time, have a look at a blog >>>>>>> series I wrote about AFR- I've tried to >>>>>>> explain what one needs to know to debug replication related issues in it. >>>>>>> >>>>>>> Made a typo error. The URL for the blog is https://wp.me/peiBB-6b >>>>>>> >>>>>>> -Ravi >>>>>>> >>>>>>> >>>>>>> Anyways, after wasting a few days rsyncing the old brick to a new >>>>>>> host I decided to just try to add the old brick back into the volume >>>>>>> instead of bringing it up on the new host. I created a new brick directory >>>>>>> on the old host, moved the old brick's contents into that new directory >>>>>>> (minus the .glusterfs directory), added the new brick to the volume, and >>>>>>> then did Vlad's find/stat trick? from the brick to the FUSE mount point. >>>>>>> >>>>>>> The interesting problem I have now is that some files don't appear >>>>>>> in the FUSE mount's directory listings, but I can actually list them >>>>>>> directly and even read them. What could cause that? >>>>>>> >>>>>>> Not sure, too many variables in the hacks that you did to take a >>>>>>> guess. You can check if the contents of the .glusterfs folder are in order >>>>>>> on the new brick (example hardlink for files and symlinks for directories >>>>>>> are present etc.) . >>>>>>> Regards, >>>>>>> Ravi >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> ? >>>>>>> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html >>>>>>> >>>>>>> On Fri, May 24, 2019 at 4:59 PM Ravishankar N < >>>>>>> ravishankar at redhat.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> On 23/05/19 2:40 AM, Alan Orth wrote: >>>>>>>> >>>>>>>> Dear list, >>>>>>>> >>>>>>>> I seem to have gotten into a tricky situation. Today I brought up a >>>>>>>> shiny new server with new disk arrays and attempted to replace one brick of >>>>>>>> a replica 2 distribute/replicate volume on an older server using the >>>>>>>> `replace-brick` command: >>>>>>>> >>>>>>>> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes >>>>>>>> wingu06:/data/glusterfs/sdb/homes commit force >>>>>>>> >>>>>>>> The command was successful and I see the new brick in the output of >>>>>>>> `gluster volume info`. The problem is that Gluster doesn't seem to be >>>>>>>> migrating the data, >>>>>>>> >>>>>>>> `replace-brick` definitely must heal (not migrate) the data. In >>>>>>>> your case, data must have been healed from Brick-4 to the replaced Brick-3. >>>>>>>> Are there any errors in the self-heal daemon logs of Brick-4's node? Does >>>>>>>> Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit out of >>>>>>>> date. replace-brick command internally does all the setfattr steps that are >>>>>>>> mentioned in the doc. >>>>>>>> >>>>>>>> -Ravi >>>>>>>> >>>>>>>> >>>>>>>> and now the original brick that I replaced is no longer part of the >>>>>>>> volume (and a few terabytes of data are just sitting on the old brick): >>>>>>>> >>>>>>>> # gluster volume info homes | grep -E "Brick[0-9]:" >>>>>>>> Brick1: wingu4:/mnt/gluster/homes >>>>>>>> Brick2: wingu3:/mnt/gluster/homes >>>>>>>> Brick3: wingu06:/data/glusterfs/sdb/homes >>>>>>>> Brick4: wingu05:/data/glusterfs/sdb/homes >>>>>>>> Brick5: wingu05:/data/glusterfs/sdc/homes >>>>>>>> Brick6: wingu06:/data/glusterfs/sdc/homes >>>>>>>> >>>>>>>> I see the Gluster docs have a more complicated procedure for >>>>>>>> replacing bricks that involves getfattr/setfattr?. How can I tell Gluster >>>>>>>> about the old brick? I see that I have a backup of the old volfile thanks >>>>>>>> to yum's rpmsave function if that helps. >>>>>>>> >>>>>>>> We are using Gluster 5.6 on CentOS 7. Thank you for any advice you >>>>>>>> can give. >>>>>>>> >>>>>>>> ? >>>>>>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick >>>>>>>> >>>>>>>> -- >>>>>>>> Alan Orth >>>>>>>> alan.orth at gmail.com >>>>>>>> https://picturingjordan.com >>>>>>>> https://englishbulgaria.net >>>>>>>> https://mjanja.ch >>>>>>>> "In heaven all the interesting people are missing." ?Friedrich >>>>>>>> Nietzsche >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Alan Orth >>>>>>> alan.orth at gmail.com >>>>>>> https://picturingjordan.com >>>>>>> https://englishbulgaria.net >>>>>>> https://mjanja.ch >>>>>>> "In heaven all the interesting people are missing." ?Friedrich >>>>>>> Nietzsche >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Alan Orth >>>>>> alan.orth at gmail.com >>>>>> https://picturingjordan.com >>>>>> https://englishbulgaria.net >>>>>> https://mjanja.ch >>>>>> "In heaven all the interesting people are missing." ?Friedrich >>>>>> Nietzsche >>>>>> >>>>> >>>>> >>>>> -- >>>>> Alan Orth >>>>> alan.orth at gmail.com >>>>> https://picturingjordan.com >>>>> https://englishbulgaria.net >>>>> https://mjanja.ch >>>>> "In heaven all the interesting people are missing." ?Friedrich >>>>> Nietzsche >>>>> >>>>> >>>> >>>> -- >>>> Alan Orth >>>> alan.orth at gmail.com >>>> https://picturingjordan.com >>>> https://englishbulgaria.net >>>> https://mjanja.ch >>>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >>>> >>>> >>> >>> -- >>> Alan Orth >>> alan.orth at gmail.com >>> https://picturingjordan.com >>> https://englishbulgaria.net >>> https://mjanja.ch >>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >>> >> >> >> -- >> Alan Orth >> alan.orth at gmail.com >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -- Alan Orth alan.orth at gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.orth at gmail.com Sun Jun 9 19:46:51 2019 From: alan.orth at gmail.com (Alan Orth) Date: Sun, 9 Jun 2019 22:46:51 +0300 Subject: [Gluster-users] Does replace-brick migrate data? In-Reply-To: References: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com> <0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com> <39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com> <0aa881db-a724-13be-ff63-6c346d7f55d8@redhat.com> Message-ID: Dear Nithya, A small update: shortly after I sent the message above with xattrs of one "missing" directory, the directory and several others magically appeared on the FUSE mount point (woohoo!). This was several days into the rebalance progress (and about 9 million files scanned!). Now I'm hopeful that Gluster has done the right thing and fixed some more of these issues. I'll wait until the rebalance is done and then assess its work. I will let you know if I have any more questions. Regards, On Sat, Jun 8, 2019 at 11:25 AM Alan Orth wrote: > Thank you, Nithya. > > The "missing" directory is indeed present on all bricks. I enabled > client-log-level DEBUG on the volume and then noticed the following in the > FUSE mount log when doing a `stat` on the "missing" directory on the FUSE > mount: > > [2019-06-08 08:03:30.240738] D [MSGID: 0] > [dht-common.c:3454:dht_do_fresh_lookup] 0-homes-dht: Calling fresh lookup > for /aorth/data on homes-replicate-2 > [2019-06-08 08:03:30.241138] D [MSGID: 0] > [dht-common.c:3013:dht_lookup_cbk] 0-homes-dht: fresh_lookup returned for > /aorth/data with op_ret 0 > [2019-06-08 08:03:30.241610] D [MSGID: 0] > [dht-common.c:1354:dht_lookup_dir_cbk] 0-homes-dht: Internal xattr > trusted.glusterfs.dht.mds is not present on path /aorth/data gfid is > fb87699f-ebf3-4098-977d-85c3a70b849c > [2019-06-08 08:06:18.880961] D [MSGID: 0] > [dht-common.c:1559:dht_revalidate_cbk] 0-homes-dht: revalidate lookup of > /aorth/data returned with op_ret 0 > [2019-06-08 08:06:18.880963] D [MSGID: 0] > [dht-common.c:1651:dht_revalidate_cbk] 0-homes-dht: internal xattr > trusted.glusterfs.dht.mds is not present on path /aorth/data gfid is > fb87699f-ebf3-4098-977d-85c3a70b849c > [2019-06-08 08:06:18.880996] D [MSGID: 0] > [dht-common.c:914:dht_common_mark_mdsxattr] 0-homes-dht: internal xattr > trusted.glusterfs.dht.mds is present on subvolon path /aorth/data gfid is > fb87699f-ebf3-4098-977d-85c3a70b849c > > One message says the trusted.glusterfs.dht.mds xattr is not present, then > the next says it is present. Is that relevant? I looked at the xattrs of > that directory on all the bricks and it does seem to be inconsistent (also > the modification times on the directory are different): > > [root at wingu0 ~]# getfattr -d -m. -e hex /mnt/gluster/homes/aorth/data > getfattr: Removing leading '/' from absolute path names > # file: mnt/gluster/homes/aorth/data > > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.homes-client-3=0x000000000000000200000002 > trusted.afr.homes-client-5=0x000000000000000000000000 > trusted.gfid=0xfb87699febf34098977d85c3a70b849c > trusted.glusterfs.dht=0xe7c11ff200000000b6dd59efffffffff > > [root at wingu3 ~]# getfattr -d -m. -e hex /mnt/gluster/homes/aorth/data > getfattr: Removing leading '/' from absolute path names > # file: mnt/gluster/homes/aorth/data > > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.afr.homes-client-0=0x000000000000000000000000 > trusted.afr.homes-client-1=0x000000000000000000000000 > trusted.gfid=0xfb87699febf34098977d85c3a70b849c > trusted.glusterfs.dht=0xe7c11ff2000000000000000049251e2d > trusted.glusterfs.dht.mds=0x00000000 > > [root at wingu4 ~]# getfattr -d -m. -e hex /mnt/gluster/homes/aorth/data > getfattr: Removing leading '/' from absolute path names > # file: mnt/gluster/homes/aorth/data > > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.afr.homes-client-0=0x000000000000000000000000 > trusted.afr.homes-client-1=0x000000000000000000000000 > trusted.gfid=0xfb87699febf34098977d85c3a70b849c > trusted.glusterfs.dht=0xe7c11ff2000000000000000049251e2d > trusted.glusterfs.dht.mds=0x00000000 > > [root at wingu05 ~]# getfattr -d -m. -e hex > /data/glusterfs/sdb/homes/aorth/data > getfattr: Removing leading '/' from absolute path names > # file: data/glusterfs/sdb/homes/aorth/data > > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.afr.homes-client-2=0x000000000000000000000000 > trusted.gfid=0xfb87699febf34098977d85c3a70b849c > trusted.glusterfs.dht=0xe7c11ff20000000049251e2eb6dd59ee > > [root at wingu05 ~]# getfattr -d -m. -e hex > /data/glusterfs/sdc/homes/aorth/data > getfattr: Removing leading '/' from absolute path names > # file: data/glusterfs/sdc/homes/aorth/data > > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.gfid=0xfb87699febf34098977d85c3a70b849c > trusted.glusterfs.dht=0xe7c11ff200000000b6dd59efffffffff > > [root at wingu06 ~]# getfattr -d -m. -e hex > /data/glusterfs/sdb/homes/aorth/data > getfattr: Removing leading '/' from absolute path names > # file: data/glusterfs/sdb/homes/aorth/data > > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.gfid=0xfb87699febf34098977d85c3a70b849c > trusted.glusterfs.dht=0xe7c11ff20000000049251e2eb6dd59ee > > This is a replica 2 volume on Gluster 5.6. > > Thank you, > > On Sat, Jun 8, 2019 at 5:28 AM Nithya Balachandran > wrote: > >> >> >> On Sat, 8 Jun 2019 at 01:29, Alan Orth wrote: >> >>> Dear Ravi, >>> >>> In the last week I have completed a fix-layout and a full INDEX heal on >>> this volume. Now I've started a rebalance and I see a few terabytes of data >>> going around on different bricks since yesterday, which I'm sure is good. >>> >>> While I wait for the rebalance to finish, I'm wondering if you know what >>> would cause directories to be missing from the FUSE mount point? If I list >>> the directories explicitly I can see their contents, but they do not appear >>> in their parent directories' listing. In the case of duplicated files it is >>> always because the files are not on the correct bricks (according to the >>> Dynamo/Elastic Hash algorithm), and I can fix it by copying the file to the >>> correct brick(s) and removing it from the others (along with their >>> .glusterfs hard links). So what could cause directories to be missing? >>> >>> Hi Alan, >> >> The directories that don't show up in the parent directory listing is >> probably because they do not exist on the hashed subvol. Please check the >> backend bricks to see if they are missing on any of them. >> >> Regards, >> Nithya >> >> Thank you, >>> >>> Thank you, >>> >>> On Wed, Jun 5, 2019 at 1:08 AM Alan Orth wrote: >>> >>>> Hi Ravi, >>>> >>>> You're right that I had mentioned using rsync to copy the brick content >>>> to a new host, but in the end I actually decided not to bring it up on a >>>> new brick. Instead I added the original brick back into the volume. So the >>>> xattrs and symlinks to .glusterfs on the original brick are fine. I think >>>> the problem probably lies with a remove-brick that got interrupted. A few >>>> weeks ago during the maintenance I had tried to remove a brick and then >>>> after twenty minutes and no obvious progress I stopped it?after that the >>>> bricks were still part of the volume. >>>> >>>> In the last few days I have run a fix-layout that took 26 hours and >>>> finished successfully. Then I started a full index heal and it has healed >>>> about 3.3 million files in a few days and I see a clear increase of network >>>> traffic from old brick host to new brick host over that time. Once the full >>>> index heal completes I will try to do a rebalance. >>>> >>>> Thank you, >>>> >>>> >>>> On Mon, Jun 3, 2019 at 7:40 PM Ravishankar N >>>> wrote: >>>> >>>>> >>>>> On 01/06/19 9:37 PM, Alan Orth wrote: >>>>> >>>>> Dear Ravi, >>>>> >>>>> The .glusterfs hardlinks/symlinks should be fine. I'm not sure how I >>>>> could verify them for six bricks and millions of files, though... :\ >>>>> >>>>> Hi Alan, >>>>> >>>>> The reason I asked this is because you had mentioned in one of your >>>>> earlier emails that when you moved content from the old brick to the new >>>>> one, you had skipped the .glusterfs directory. So I was assuming that when >>>>> you added back this new brick to the cluster, it might have been missing >>>>> the .glusterfs entries. If that is the cae, one way to verify could be to >>>>> check using a script if all files on the brick have a link-count of at >>>>> least 2 and all dirs have valid symlinks inside .glusterfs pointing to >>>>> themselves. >>>>> >>>>> >>>>> I had a small success in fixing some issues with duplicated files on >>>>> the FUSE mount point yesterday. I read quite a bit about the elastic >>>>> hashing algorithm that determines which files get placed on which bricks >>>>> based on the hash of their filename and the trusted.glusterfs.dht xattr on >>>>> brick directories (thanks to Joe Julian's blog post and Python script for >>>>> showing how it works?). With that knowledge I looked closer at one of the >>>>> files that was appearing as duplicated on the FUSE mount and found that it >>>>> was also duplicated on more than `replica 2` bricks. For this particular >>>>> file I found two "real" files and several zero-size files with >>>>> trusted.glusterfs.dht.linkto xattrs. Neither of the "real" files were on >>>>> the correct brick as far as the DHT layout is concerned, so I copied one of >>>>> them to the correct brick, deleted the others and their hard links, and did >>>>> a `stat` on the file from the FUSE mount point and it fixed itself. Yay! >>>>> >>>>> Could this have been caused by a replace-brick that got interrupted >>>>> and didn't finish re-labeling the xattrs? >>>>> >>>>> No, replace-brick only initiates AFR self-heal, which just copies the >>>>> contents from the other brick(s) of the *same* replica pair into the >>>>> replaced brick. The link-to files are created by DHT when you rename a >>>>> file from the client. If the new name hashes to a different brick, DHT >>>>> does not move the entire file there. It instead creates the link-to file >>>>> (the one with the dht.linkto xattrs) on the hashed subvol. The value of >>>>> this xattr points to the brick where the actual data is there (`getfattr -e >>>>> text` to see it for yourself). Perhaps you had attempted a rebalance or >>>>> remove-brick earlier and interrupted that? >>>>> >>>>> Should I be thinking of some heuristics to identify and fix these >>>>> issues with a script (incorrect brick placement), or is this something a >>>>> fix layout or repeated volume heals can fix? I've already completed a whole >>>>> heal on this particular volume this week and it did heal about 1,000,000 >>>>> files (mostly data and metadata, but about 20,000 entry heals as well). >>>>> >>>>> Maybe you should let the AFR self-heals complete first and then >>>>> attempt a full rebalance to take care of the dht link-to files. But if the >>>>> files are in millions, it could take quite some time to complete. >>>>> Regards, >>>>> Ravi >>>>> >>>>> Thanks for your support, >>>>> >>>>> ? https://joejulian.name/post/dht-misses-are-expensive/ >>>>> >>>>> On Fri, May 31, 2019 at 7:57 AM Ravishankar N >>>>> wrote: >>>>> >>>>>> >>>>>> On 31/05/19 3:20 AM, Alan Orth wrote: >>>>>> >>>>>> Dear Ravi, >>>>>> >>>>>> I spent a bit of time inspecting the xattrs on some files and >>>>>> directories on a few bricks for this volume and it looks a bit messy. Even >>>>>> if I could make sense of it for a few and potentially heal them manually, >>>>>> there are millions of files and directories in total so that's definitely >>>>>> not a scalable solution. After a few missteps with `replace-brick ... >>>>>> commit force` in the last week?one of which on a brick that was >>>>>> dead/offline?as well as some premature `remove-brick` commands, I'm unsure >>>>>> how how to proceed and I'm getting demotivated. It's scary how quickly >>>>>> things get out of hand in distributed systems... >>>>>> >>>>>> Hi Alan, >>>>>> The one good thing about gluster is it that the data is always >>>>>> available directly on the backed bricks even if your volume has >>>>>> inconsistencies at the gluster level. So theoretically, if your cluster is >>>>>> FUBAR, you could just create a new volume and copy all data onto it via its >>>>>> mount from the old volume's bricks. >>>>>> >>>>>> >>>>>> I had hoped that bringing the old brick back up would help, but by >>>>>> the time I added it again a few days had passed and all the brick-id's had >>>>>> changed due to the replace/remove brick commands, not to mention that the >>>>>> trusted.afr.$volume-client-xx values were now probably pointing to the >>>>>> wrong bricks (?). >>>>>> >>>>>> Anyways, a few hours ago I started a full heal on the volume and I >>>>>> see that there is a sustained 100MiB/sec of network traffic going from the >>>>>> old brick's host to the new one. The completed heals reported in the logs >>>>>> look promising too: >>>>>> >>>>>> Old brick host: >>>>>> >>>>>> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E >>>>>> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c >>>>>> 281614 Completed data selfheal >>>>>> 84 Completed entry selfheal >>>>>> 299648 Completed metadata selfheal >>>>>> >>>>>> New brick host: >>>>>> >>>>>> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E >>>>>> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c >>>>>> 198256 Completed data selfheal >>>>>> 16829 Completed entry selfheal >>>>>> 229664 Completed metadata selfheal >>>>>> >>>>>> So that's good I guess, though I have no idea how long it will take >>>>>> or if it will fix the "missing files" issue on the FUSE mount. I've >>>>>> increased cluster.shd-max-threads to 8 to hopefully speed up the heal >>>>>> process. >>>>>> >>>>>> The afr xattrs should not cause files to disappear from mount. If the >>>>>> xattr names do not match what each AFR subvol expects (for eg. in a replica >>>>>> 2 volume, trusted.afr.*-client-{0,1} for 1st subvol, client-{2,3} for 2nd >>>>>> subvol and so on - ) for its children then it won't heal the data, that is >>>>>> all. But in your case I see some inconsistencies like one brick having the >>>>>> actual file (licenseserver.cfg) and the other having a linkto file >>>>>> (the one with the dht.linkto xattr) *in the same replica pair*. >>>>>> >>>>>> >>>>>> I'd be happy for any advice or pointers, >>>>>> >>>>>> Did you check if the .glusterfs hardlinks/symlinks exist and are in >>>>>> order for all bricks? >>>>>> >>>>>> -Ravi >>>>>> >>>>>> >>>>>> On Wed, May 29, 2019 at 5:20 PM Alan Orth >>>>>> wrote: >>>>>> >>>>>>> Dear Ravi, >>>>>>> >>>>>>> Thank you for the link to the blog post series?it is very >>>>>>> informative and current! If I understand your blog post correctly then I >>>>>>> think the answer to your previous question about pending AFRs is: no, there >>>>>>> are no pending AFRs. I have identified one file that is a good test case to >>>>>>> try to understand what happened after I issued the `gluster volume >>>>>>> replace-brick ... commit force` a few days ago and then added the same >>>>>>> original brick back to the volume later. This is the current state of the >>>>>>> replica 2 distribute/replicate volume: >>>>>>> >>>>>>> [root at wingu0 ~]# gluster volume info apps >>>>>>> >>>>>>> Volume Name: apps >>>>>>> Type: Distributed-Replicate >>>>>>> Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda >>>>>>> Status: Started >>>>>>> Snapshot Count: 0 >>>>>>> Number of Bricks: 3 x 2 = 6 >>>>>>> Transport-type: tcp >>>>>>> Bricks: >>>>>>> Brick1: wingu3:/mnt/gluster/apps >>>>>>> Brick2: wingu4:/mnt/gluster/apps >>>>>>> Brick3: wingu05:/data/glusterfs/sdb/apps >>>>>>> Brick4: wingu06:/data/glusterfs/sdb/apps >>>>>>> Brick5: wingu0:/mnt/gluster/apps >>>>>>> Brick6: wingu05:/data/glusterfs/sdc/apps >>>>>>> Options Reconfigured: >>>>>>> diagnostics.client-log-level: DEBUG >>>>>>> storage.health-check-interval: 10 >>>>>>> nfs.disable: on >>>>>>> >>>>>>> I checked the xattrs of one file that is missing from the volume's >>>>>>> FUSE mount (though I can read it if I access its full path explicitly), but >>>>>>> is present in several of the volume's bricks (some with full size, others >>>>>>> empty): >>>>>>> >>>>>>> [root at wingu0 ~]# getfattr -d -m. -e hex >>>>>>> /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>>> >>>>>>> getfattr: Removing leading '/' from absolute path names >>>>>>> # file: mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>>>>> trusted.afr.apps-client-3=0x000000000000000000000000 >>>>>>> trusted.afr.apps-client-5=0x000000000000000000000000 >>>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>>> trusted.bit-rot.version=0x0200000000000000585a396f00046e15 >>>>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>>>>> >>>>>>> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>>> getfattr: Removing leading '/' from absolute path names >>>>>>> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>>>>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >>>>>>> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 >>>>>>> >>>>>>> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>>> getfattr: Removing leading '/' from absolute path names >>>>>>> # file: data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>>>>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >>>>>>> >>>>>>> [root at wingu06 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>>> getfattr: Removing leading '/' from absolute path names >>>>>>> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg >>>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 >>>>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd >>>>>>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 >>>>>>> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 >>>>>>> >>>>>>> According to the trusted.afr.apps-client-xx xattrs this particular >>>>>>> file should be on bricks with id "apps-client-3" and "apps-client-5". It >>>>>>> took me a few hours to realize that the brick-id values are recorded in the >>>>>>> volume's volfiles in /var/lib/glusterd/vols/apps/bricks. After comparing >>>>>>> those brick-id values with a volfile backup from before the replace-brick, >>>>>>> I realized that the files are simply on the wrong brick now as far as >>>>>>> Gluster is concerned. This particular file is now on the brick for >>>>>>> "apps-client-4". As an experiment I copied this one file to the two >>>>>>> bricks listed in the xattrs and I was then able to see the file from the >>>>>>> FUSE mount (yay!). >>>>>>> >>>>>>> Other than replacing the brick, removing it, and then adding the old >>>>>>> brick on the original server back, there has been no change in the data >>>>>>> this entire time. Can I change the brick IDs in the volfiles so they >>>>>>> reflect where the data actually is? Or perhaps script something to reset >>>>>>> all the xattrs on the files/directories to point to the correct bricks? >>>>>>> >>>>>>> Thank you for any help or pointers, >>>>>>> >>>>>>> On Wed, May 29, 2019 at 7:24 AM Ravishankar N < >>>>>>> ravishankar at redhat.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> On 29/05/19 9:50 AM, Ravishankar N wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 29/05/19 3:59 AM, Alan Orth wrote: >>>>>>>> >>>>>>>> Dear Ravishankar, >>>>>>>> >>>>>>>> I'm not sure if Brick4 had pending AFRs because I don't know what >>>>>>>> that means and it's been a few days so I am not sure I would be able to >>>>>>>> find that information. >>>>>>>> >>>>>>>> When you find some time, have a look at a blog >>>>>>>> series I wrote about AFR- I've tried to >>>>>>>> explain what one needs to know to debug replication related issues in it. >>>>>>>> >>>>>>>> Made a typo error. The URL for the blog is https://wp.me/peiBB-6b >>>>>>>> >>>>>>>> -Ravi >>>>>>>> >>>>>>>> >>>>>>>> Anyways, after wasting a few days rsyncing the old brick to a new >>>>>>>> host I decided to just try to add the old brick back into the volume >>>>>>>> instead of bringing it up on the new host. I created a new brick directory >>>>>>>> on the old host, moved the old brick's contents into that new directory >>>>>>>> (minus the .glusterfs directory), added the new brick to the volume, and >>>>>>>> then did Vlad's find/stat trick? from the brick to the FUSE mount point. >>>>>>>> >>>>>>>> The interesting problem I have now is that some files don't appear >>>>>>>> in the FUSE mount's directory listings, but I can actually list them >>>>>>>> directly and even read them. What could cause that? >>>>>>>> >>>>>>>> Not sure, too many variables in the hacks that you did to take a >>>>>>>> guess. You can check if the contents of the .glusterfs folder are in order >>>>>>>> on the new brick (example hardlink for files and symlinks for directories >>>>>>>> are present etc.) . >>>>>>>> Regards, >>>>>>>> Ravi >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> ? >>>>>>>> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html >>>>>>>> >>>>>>>> On Fri, May 24, 2019 at 4:59 PM Ravishankar N < >>>>>>>> ravishankar at redhat.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> On 23/05/19 2:40 AM, Alan Orth wrote: >>>>>>>>> >>>>>>>>> Dear list, >>>>>>>>> >>>>>>>>> I seem to have gotten into a tricky situation. Today I brought up >>>>>>>>> a shiny new server with new disk arrays and attempted to replace one brick >>>>>>>>> of a replica 2 distribute/replicate volume on an older server using the >>>>>>>>> `replace-brick` command: >>>>>>>>> >>>>>>>>> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes >>>>>>>>> wingu06:/data/glusterfs/sdb/homes commit force >>>>>>>>> >>>>>>>>> The command was successful and I see the new brick in the output >>>>>>>>> of `gluster volume info`. The problem is that Gluster doesn't seem to be >>>>>>>>> migrating the data, >>>>>>>>> >>>>>>>>> `replace-brick` definitely must heal (not migrate) the data. In >>>>>>>>> your case, data must have been healed from Brick-4 to the replaced Brick-3. >>>>>>>>> Are there any errors in the self-heal daemon logs of Brick-4's node? Does >>>>>>>>> Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit out of >>>>>>>>> date. replace-brick command internally does all the setfattr steps that are >>>>>>>>> mentioned in the doc. >>>>>>>>> >>>>>>>>> -Ravi >>>>>>>>> >>>>>>>>> >>>>>>>>> and now the original brick that I replaced is no longer part of >>>>>>>>> the volume (and a few terabytes of data are just sitting on the old brick): >>>>>>>>> >>>>>>>>> # gluster volume info homes | grep -E "Brick[0-9]:" >>>>>>>>> Brick1: wingu4:/mnt/gluster/homes >>>>>>>>> Brick2: wingu3:/mnt/gluster/homes >>>>>>>>> Brick3: wingu06:/data/glusterfs/sdb/homes >>>>>>>>> Brick4: wingu05:/data/glusterfs/sdb/homes >>>>>>>>> Brick5: wingu05:/data/glusterfs/sdc/homes >>>>>>>>> Brick6: wingu06:/data/glusterfs/sdc/homes >>>>>>>>> >>>>>>>>> I see the Gluster docs have a more complicated procedure for >>>>>>>>> replacing bricks that involves getfattr/setfattr?. How can I tell Gluster >>>>>>>>> about the old brick? I see that I have a backup of the old volfile thanks >>>>>>>>> to yum's rpmsave function if that helps. >>>>>>>>> >>>>>>>>> We are using Gluster 5.6 on CentOS 7. Thank you for any advice you >>>>>>>>> can give. >>>>>>>>> >>>>>>>>> ? >>>>>>>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Alan Orth >>>>>>>>> alan.orth at gmail.com >>>>>>>>> https://picturingjordan.com >>>>>>>>> https://englishbulgaria.net >>>>>>>>> https://mjanja.ch >>>>>>>>> "In heaven all the interesting people are missing." ?Friedrich >>>>>>>>> Nietzsche >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Alan Orth >>>>>>>> alan.orth at gmail.com >>>>>>>> https://picturingjordan.com >>>>>>>> https://englishbulgaria.net >>>>>>>> https://mjanja.ch >>>>>>>> "In heaven all the interesting people are missing." ?Friedrich >>>>>>>> Nietzsche >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Alan Orth >>>>>>> alan.orth at gmail.com >>>>>>> https://picturingjordan.com >>>>>>> https://englishbulgaria.net >>>>>>> https://mjanja.ch >>>>>>> "In heaven all the interesting people are missing." ?Friedrich >>>>>>> Nietzsche >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Alan Orth >>>>>> alan.orth at gmail.com >>>>>> https://picturingjordan.com >>>>>> https://englishbulgaria.net >>>>>> https://mjanja.ch >>>>>> "In heaven all the interesting people are missing." ?Friedrich >>>>>> Nietzsche >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Alan Orth >>>>> alan.orth at gmail.com >>>>> https://picturingjordan.com >>>>> https://englishbulgaria.net >>>>> https://mjanja.ch >>>>> "In heaven all the interesting people are missing." ?Friedrich >>>>> Nietzsche >>>>> >>>>> >>>> >>>> -- >>>> Alan Orth >>>> alan.orth at gmail.com >>>> https://picturingjordan.com >>>> https://englishbulgaria.net >>>> https://mjanja.ch >>>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >>>> >>> >>> >>> -- >>> Alan Orth >>> alan.orth at gmail.com >>> https://picturingjordan.com >>> https://englishbulgaria.net >>> https://mjanja.ch >>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche > -- Alan Orth alan.orth at gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Mon Jun 10 03:50:30 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Mon, 10 Jun 2019 15:50:30 +1200 Subject: [Gluster-users] Transport endpoint is not connected In-Reply-To: <863936144.3309002.1559648882741@mail.yahoo.com> References: <20r8rlguxb86gpnxjwe3wpqw.1559189511842@email.android.com> <863936144.3309002.1559648882741@mail.yahoo.com> Message-ID: Thank you Strahil. On Tue, 4 Jun 2019 at 23:48, Strahil Nikolov wrote: > Hi David, > > You can ensure that 49152-49160 are opened in advance... > You never know when you will need to deploy another Gluster Volume. > > best Regards, > Strahil Nikolov > > ? ??????????, 3 ??? 2019 ?., 18:16:00 ?. ???????-4, David Cunningham < > dcunningham at voisonics.com> ??????: > > > Hello all, > > We confirmed that the network provider blocking port 49152 was the issue. > Thanks for all the help. > > > On Thu, 30 May 2019 at 16:11, Strahil wrote: > > You can try to run a ncat from gfs3: > > ncat -z -v gfs1 49152 > ncat -z -v gfs2 49152 > > If ncat fails to connect -> it's definately a firewall. > > Best Regards, > Strahil Nikolov > On May 30, 2019 01:33, David Cunningham wrote: > > Hi Ravi, > > I think it probably is a firewall issue with the network provider. I was > hoping to see a specific connection failure message we could send to them, > but will take it up with them anyway. > > Thanks for your help. > > > On Wed, 29 May 2019 at 23:10, Ravishankar N > wrote: > > I don't see a "Connected to gvol0-client-1" in the log. Perhaps a > firewall issue like the last time? Even in the earlier add-brick log from > the other email thread, connection to the 2nd brick was not established. > > -Ravi > On 29/05/19 2:26 PM, David Cunningham wrote: > > Hi Ravi and Joe, > > The command "gluster volume status gvol0" shows all 3 nodes as being > online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, in > which I can't see anything like a connection error. Would you have any > further suggestions? Thank you. > > [root at gfs3 glusterfs]# gluster volume status gvol0 > Status of volume: gvol0 > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0 Y > 7706 > Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0 Y > 7625 > Brick gfs3:/nodirectwritedata/gluster/gvol0 49152 0 Y > 7307 > Self-heal Daemon on localhost N/A N/A Y > 7316 > Self-heal Daemon on gfs1 N/A N/A Y > 40591 > Self-heal Daemon on gfs2 N/A N/A Y > 7634 > > Task Status of Volume gvol0 > > ------------------------------------------------------------------------------ > There are no active volume tasks > > > On Wed, 29 May 2019 at 16:26, Ravishankar N > wrote: > > > On 29/05/19 6:21 AM, David Cunningham wrote: > > > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgowtham at redhat.com Mon Jun 10 13:28:10 2019 From: hgowtham at redhat.com (hgowtham at redhat.com) Date: Mon, 10 Jun 2019 13:28:10 +0000 Subject: [Gluster-users] Invitation: Gluster Community Meeting (APAC friendly hours) @ Tue Jun 11, 2019 11:30am - 12:30pm (IST) (gluster-users@gluster.org) Message-ID: <000000000000c30d9c058af8268f@google.com> You have been invited to the following event. Title: Gluster Community Meeting (APAC friendly hours) Bridge: https://bluejeans.com/836554017 Meeting minutes: https://hackmd.io/A07qMrezSOyeUUGxPhBHqQ?both Previous Meeting notes: http://github.com/gluster/community When: Tue Jun 11, 2019 11:30am ? 12:30pm India Standard Time - Kolkata Where: https://bluejeans.com/836554017 Calendar: gluster-users at gluster.org Who: * hgowtham at redhat.com - organizer * gluster-users at gluster.org * gluster-devel at gluster.org Event details: https://www.google.com/calendar/event?action=VIEW&eid=MmFrY3BnZ3I4MG5kdmcxbmJnYzlkcDBycmwgZ2x1c3Rlci11c2Vyc0BnbHVzdGVyLm9yZw&tok=MTkjaGdvd3RoYW1AcmVkaGF0LmNvbWM1YjRhYzhlMDU0ZDVkMTZjMTg4OTE3NTMzNTRiOGNiOTMxYTgxMzY&ctz=Asia%2FKolkata&hl=en&es=0 Invitation from Google Calendar: https://www.google.com/calendar/ You are receiving this courtesy email at the account gluster-users at gluster.org because you are an attendee of this event. To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar. Forwarding this invitation could allow any recipient to send a response to the organizer and be added to the guest list, or invite others regardless of their own invitation status, or to modify your RSVP. Learn more at https://support.google.com/calendar/answer/37135#forwarding -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1722 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: invite.ics Type: application/ics Size: 1760 bytes Desc: not available URL: From snowmailer at gmail.com Mon Jun 10 13:50:08 2019 From: snowmailer at gmail.com (snowmailer) Date: Mon, 10 Jun 2019 15:50:08 +0200 Subject: [Gluster-users] No healing on peer disconnect - is it correct? In-Reply-To: <10D708D0-E523-46A0-91BF-FFC41886E316@gmail.com> References: <10D708D0-E523-46A0-91BF-FFC41886E316@gmail.com> Message-ID: <3B1EE351-5F82-4D05-947A-4960BBAC885A@gmail.com> Can someone advice on this, please? BR! D?a 3. 6. 2019 o 18:58 u??vate? Martin nap?sal: > Hi all, > > I need someone to explain if my gluster behaviour is correct. I am not sure if my gluster works as it should. I have simple Replica 3 - Number of Bricks: 1 x 3 = 3. > > When one of my hypervisor is disconnected as peer, i.e. gluster process is down but bricks running, other two healthy nodes start signalling that they lost one peer. This is correct. > Next, I restart gluster process on node where gluster process failed and I thought It should trigger healing of files on failed node but nothing is happening. > > I run VMs disks on this gluster volume. No healing is triggered after gluster restart, remaining two nodes get peer back after restart of gluster and everything is running without down time. > Even VMs that are running on ?failed? node where gluster process was down (bricks were up) are running without down time. > > Is this behaviour correct? I mean No healing is triggered after peer is reconnected back and VMs. > > Thanks for explanation. > > BR! > Martin > > From hgowtham at redhat.com Mon Jun 10 14:07:24 2019 From: hgowtham at redhat.com (Hari Gowtham) Date: Mon, 10 Jun 2019 19:37:24 +0530 Subject: [Gluster-users] No healing on peer disconnect - is it correct? In-Reply-To: <3B1EE351-5F82-4D05-947A-4960BBAC885A@gmail.com> References: <10D708D0-E523-46A0-91BF-FFC41886E316@gmail.com> <3B1EE351-5F82-4D05-947A-4960BBAC885A@gmail.com> Message-ID: On Mon, Jun 10, 2019 at 7:21 PM snowmailer wrote: > > Can someone advice on this, please? > > BR! > > D?a 3. 6. 2019 o 18:58 u??vate? Martin nap?sal: > > > Hi all, > > > > I need someone to explain if my gluster behaviour is correct. I am not sure if my gluster works as it should. I have simple Replica 3 - Number of Bricks: 1 x 3 = 3. > > > > When one of my hypervisor is disconnected as peer, i.e. gluster process is down but bricks running, other two healthy nodes start signalling that they lost one peer. This is correct. > > Next, I restart gluster process on node where gluster process failed and I thought It should trigger healing of files on failed node but nothing is happening. > > > > I run VMs disks on this gluster volume. No healing is triggered after gluster restart, remaining two nodes get peer back after restart of gluster and everything is running without down time. > > Even VMs that are running on ?failed? node where gluster process was down (bricks were up) are running without down time. I assume your VMs use gluster as the storage. In that case, the gluster volume might be mounted on all the hypervisors. The mount/ client is smart enough to give the correct data from the other two machines which were always up. This is the reason things are working fine. Gluster should heal the brick. Adding people how can help you better with the heal part. @Karthik Subrahmanya @Ravishankar N do take a look and answer this part. > > > > Is this behaviour correct? I mean No healing is triggered after peer is reconnected back and VMs. > > > > Thanks for explanation. > > > > BR! > > Martin > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Regards, Hari Gowtham. From snowmailer at gmail.com Mon Jun 10 14:23:46 2019 From: snowmailer at gmail.com (Martin) Date: Mon, 10 Jun 2019 16:23:46 +0200 Subject: [Gluster-users] No healing on peer disconnect - is it correct? In-Reply-To: References: <10D708D0-E523-46A0-91BF-FFC41886E316@gmail.com> <3B1EE351-5F82-4D05-947A-4960BBAC885A@gmail.com> Message-ID: My VMs using Gluster as storage through libgfapi support in Qemu. But I dont see any healing of reconnected brick. Thanks Karthik / Ravishankar in advance! > On 10 Jun 2019, at 16:07, Hari Gowtham wrote: > > On Mon, Jun 10, 2019 at 7:21 PM snowmailer > wrote: >> >> Can someone advice on this, please? >> >> BR! >> >> D?a 3. 6. 2019 o 18:58 u??vate? Martin nap?sal: >> >>> Hi all, >>> >>> I need someone to explain if my gluster behaviour is correct. I am not sure if my gluster works as it should. I have simple Replica 3 - Number of Bricks: 1 x 3 = 3. >>> >>> When one of my hypervisor is disconnected as peer, i.e. gluster process is down but bricks running, other two healthy nodes start signalling that they lost one peer. This is correct. >>> Next, I restart gluster process on node where gluster process failed and I thought It should trigger healing of files on failed node but nothing is happening. >>> >>> I run VMs disks on this gluster volume. No healing is triggered after gluster restart, remaining two nodes get peer back after restart of gluster and everything is running without down time. >>> Even VMs that are running on ?failed? node where gluster process was down (bricks were up) are running without down time. > > I assume your VMs use gluster as the storage. In that case, the > gluster volume might be mounted on all the hypervisors. > The mount/ client is smart enough to give the correct data from the > other two machines which were always up. > This is the reason things are working fine. > > Gluster should heal the brick. > Adding people how can help you better with the heal part. > @Karthik Subrahmanya @Ravishankar N do take a look and answer this part. > >>> >>> Is this behaviour correct? I mean No healing is triggered after peer is reconnected back and VMs. >>> >>> Thanks for explanation. >>> >>> BR! >>> Martin >>> >>> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Regards, > Hari Gowtham. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Tue Jun 11 04:25:40 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Tue, 11 Jun 2019 16:25:40 +1200 Subject: [Gluster-users] Thin-arbiter questions In-Reply-To: <645227359.16980056.1557131647054.JavaMail.zimbra@redhat.com> References: <757816852.16925254.1557123682731.JavaMail.zimbra@redhat.com> <645227359.16980056.1557131647054.JavaMail.zimbra@redhat.com> Message-ID: Hi Ashish and Amar, Is there any news on when thin-arbiter might be in the regular GlusterFS, and the CentOS packages please? Thanks for your help. On Mon, 6 May 2019 at 20:34, Ashish Pandey wrote: > > > ------------------------------ > *From: *"David Cunningham" > *To: *"Ashish Pandey" > *Cc: *"gluster-users" > *Sent: *Monday, May 6, 2019 1:40:30 PM > *Subject: *Re: [Gluster-users] Thin-arbiter questions > > Hi Ashish, > > Thank you for the update. Does that mean they're now in the regular > Glusterfs? Any idea how long it typically takes the Ubuntu and CentOS > packages to be updated with the latest code? > > No, for regular glusterd, work is still in progress. It will be done soon. > I don't have answer for the next question. May be Amar have information > regarding this. Adding him in CC. > > > On Mon, 6 May 2019 at 18:21, Ashish Pandey wrote: > >> Hi, >> >> I can see that Amar has already committed the changes and those are >> visible on >> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ >> >> --- >> Ashish >> >> >> >> ------------------------------ >> *From: *"Strahil" >> *To: *"Ashish" , "David" >> *Cc: *"gluster-users" >> *Sent: *Saturday, May 4, 2019 12:10:01 AM >> *Subject: *Re: [Gluster-users] Thin-arbiter questions >> >> Hi Ashish, >> >> Can someone commit the doc change I have already proposed ? >> At least, the doc will clarify that fact . >> >> Best Regards, >> Strahil Nikolov >> On May 3, 2019 05:30, Ashish Pandey wrote: >> >> Hi David, >> >> Creation of thin-arbiter volume is currently supported by GD2 only. The >> command "glustercli" is available when glusterd2 is running. >> We are also working on providing thin-arbiter support on glusted however, >> it is not available right now. >> https://review.gluster.org/#/c/glusterfs/+/22612/ >> >> --- >> Ashish >> >> ------------------------------ >> *From: *"David Cunningham" >> *To: *gluster-users at gluster.org >> *Sent: *Friday, May 3, 2019 7:40:03 AM >> *Subject: *[Gluster-users] Thin-arbiter questions >> >> Hello, >> >> We are setting up a thin-arbiter and hope someone can help with some >> questions. We've been following the documentation from >> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ >> . >> >> 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume >> create" with the --thin-arbiter option on 5.5 and got an "unrecognized >> option --thin-arbiter" error. >> >> 2. The instruction to create a new volume with a thin-arbiter is clear. >> How do you add a thin-arbiter to an already existing volume though? >> >> 3. The documentation suggests running glusterfsd manually to start the >> thin-arbiter. Is there a service that can do this instead? I found a >> mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 >> but it's not really documented. >> >> Thanks in advance for your help, >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Tue Jun 11 04:50:10 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Tue, 11 Jun 2019 10:20:10 +0530 Subject: [Gluster-users] No healing on peer disconnect - is it correct? In-Reply-To: References: <10D708D0-E523-46A0-91BF-FFC41886E316@gmail.com> <3B1EE351-5F82-4D05-947A-4960BBAC885A@gmail.com> Message-ID: <28417fb7-5081-cc8e-7ffc-625f9905f9c2@redhat.com> There will be pending heals only when the brick process goes down or there is a disconnect between the client and that brick. When you say " gluster process is down but bricks running", I'm guessing you killed only glusterd and not the glusterfsd brick process. That won't cause any pending heals. If there is something to be healed, `gluster volume heal $volname info` will display the list of files. Hope that helps, Ravi On 10/06/19 7:53 PM, Martin wrote: > My VMs using Gluster as storage through libgfapi support in Qemu. But > I dont see any healing of reconnected brick. > > Thanks Karthik /?Ravishankar in advance! > >> On 10 Jun 2019, at 16:07, Hari Gowtham > > wrote: >> >> On Mon, Jun 10, 2019 at 7:21 PM snowmailer > > wrote: >>> >>> Can someone advice on this, please? >>> >>> BR! >>> >>> D?a 3. 6. 2019 o 18:58 u??vate? Martin >> > nap?sal: >>> >>>> Hi all, >>>> >>>> I need someone to explain if my gluster behaviour is correct. I am >>>> not sure if my gluster works as it should. I have simple Replica 3 >>>> - Number of Bricks: 1 x 3 = 3. >>>> >>>> When one of my hypervisor is disconnected as peer, i.e. gluster >>>> process is down but bricks running, other two healthy nodes start >>>> signalling that they lost one peer. This is correct. >>>> Next, I restart gluster process on node where gluster process >>>> failed and I thought It should trigger healing of files on failed >>>> node but nothing is happening. >>>> >>>> I run VMs disks on this gluster volume. No healing is triggered >>>> after gluster restart, remaining two nodes get peer back after >>>> restart of gluster and everything is running without down time. >>>> Even VMs that are running on ?failed? node where gluster process >>>> was down (bricks were up) are running without down time. >> >> I assume your VMs use gluster as the storage. In that case, the >> gluster volume might be mounted on all the hypervisors. >> The mount/ client is smart enough to give the correct data from the >> other two machines which were always up. >> This is the reason things are working fine. >> >> Gluster should heal the brick. >> Adding people how can help you better with the heal part. >> @Karthik Subrahmanya ?@Ravishankar N do take a look and answer this part. >> >>>> >>>> Is this behaviour correct? I mean No healing is triggered after >>>> peer is reconnected back and VMs. >>>> >>>> Thanks for explanation. >>>> >>>> BR! >>>> Martin >>>> >>>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> Regards, >> Hari Gowtham. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aspandey at redhat.com Tue Jun 11 05:15:22 2019 From: aspandey at redhat.com (Ashish Pandey) Date: Tue, 11 Jun 2019 01:15:22 -0400 (EDT) Subject: [Gluster-users] Thin-arbiter questions In-Reply-To: References: <757816852.16925254.1557123682731.JavaMail.zimbra@redhat.com> <645227359.16980056.1557131647054.JavaMail.zimbra@redhat.com> Message-ID: <2006209001.22227657.1560230122663.JavaMail.zimbra@redhat.com> Hi David, It should be any time soon as we are in last phase of patch reviews. You can follow this patch - https://review.gluster.org/#/c/glusterfs/+/22612/ --- Ashish ----- Original Message ----- From: "David Cunningham" To: "Ashish Pandey" Cc: "gluster-users" Sent: Tuesday, June 11, 2019 9:55:40 AM Subject: Re: [Gluster-users] Thin-arbiter questions Hi Ashish and Amar, Is there any news on when thin-arbiter might be in the regular GlusterFS, and the CentOS packages please? Thanks for your help. On Mon, 6 May 2019 at 20:34, Ashish Pandey < aspandey at redhat.com > wrote: From: "David Cunningham" < dcunningham at voisonics.com > To: "Ashish Pandey" < aspandey at redhat.com > Cc: "gluster-users" < gluster-users at gluster.org > Sent: Monday, May 6, 2019 1:40:30 PM Subject: Re: [Gluster-users] Thin-arbiter questions Hi Ashish, Thank you for the update. Does that mean they're now in the regular Glusterfs? Any idea how long it typically takes the Ubuntu and CentOS packages to be updated with the latest code? No, for regular glusterd, work is still in progress. It will be done soon. I don't have answer for the next question. May be Amar have information regarding this. Adding him in CC. On Mon, 6 May 2019 at 18:21, Ashish Pandey < aspandey at redhat.com > wrote:
Hi, I can see that Amar has already committed the changes and those are visible on https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ --- Ashish From: "Strahil" < hunter86_bg at yahoo.com > To: "Ashish" < aspandey at redhat.com >, "David" < dcunningham at voisonics.com > Cc: "gluster-users" < gluster-users at gluster.org > Sent: Saturday, May 4, 2019 12:10:01 AM Subject: Re: [Gluster-users] Thin-arbiter questions Hi Ashish, Can someone commit the doc change I have already proposed ? At least, the doc will clarify that fact . Best Regards, Strahil Nikolov On May 3, 2019 05:30, Ashish Pandey < aspandey at redhat.com > wrote:
Hi David, Creation of thin-arbiter volume is currently supported by GD2 only. The command " glustercli " is available when glusterd2 is running. We are also working on providing thin-arbiter support on glusted however, it is not available right now. https://review.gluster.org/#/c/glusterfs/+/22612/ --- Ashish From: "David Cunningham" < dcunningham at voisonics.com > To: gluster-users at gluster.org Sent: Friday, May 3, 2019 7:40:03 AM Subject: [Gluster-users] Thin-arbiter questions Hello, We are setting up a thin-arbiter and hope someone can help with some questions. We've been following the documentation from https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ . 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume create" with the --thin-arbiter option on 5.5 and got an "unrecognized option --thin-arbiter" error. 2. The instruction to create a new volume with a thin-arbiter is clear. How do you add a thin-arbiter to an already existing volume though? 3. The documentation suggests running glusterfsd manually to start the thin-arbiter. Is there a service that can do this instead? I found a mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but it's not really documented. Thanks in advance for your help, -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
-- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
-- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgowtham at redhat.com Tue Jun 11 11:56:51 2019 From: hgowtham at redhat.com (Hari Gowtham) Date: Tue, 11 Jun 2019 17:26:51 +0530 Subject: [Gluster-users] Announcing Gluster release 4.1.9 Message-ID: The Gluster community is pleased to announce the release of Gluster 4.1.9 (packages available at [1]). Release notes for the release can be found at [2]. Major changes, features and limitations addressed in this release: None Thanks, Gluster community [1] Packages for 4.1.9: https://download.gluster.org/pub/gluster/glusterfs/4/4.1.9/ [2] Release notes for 4.1.9: https://docs.gluster.org/en/latest/release-notes/4.1.9/ -- Regards, Hari Gowtham. From alan.orth at gmail.com Tue Jun 11 15:41:43 2019 From: alan.orth at gmail.com (Alan Orth) Date: Tue, 11 Jun 2019 18:41:43 +0300 Subject: [Gluster-users] =?utf-8?q?Proper_command_for_replace-brick_on_dis?= =?utf-8?q?tribute=E2=80=93replicate=3F?= Message-ID: Dear list, In a recent discussion on this list Ravi suggested that the documentation for replace-brick? was out of date. For a distribute?replicate volume the documentation currently says that we need to kill the old brick's PID, create a temporary empty directory on the FUSE mount, check the xattrs, replace-brick with commit force. Is all this still necessary? I'm running Gluster 5.6 on CentOS 7 with a distribute?replicate volume. Thank you, ? https://docs.gluster.org/en/latest/Administrator Guide/Managing Volumes/ -- Alan Orth alan.orth at gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Tue Jun 11 16:32:23 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Tue, 11 Jun 2019 22:02:23 +0530 Subject: [Gluster-users] =?utf-8?q?Proper_command_for_replace-brick_on_dis?= =?utf-8?q?tribute=E2=80=93replicate=3F?= In-Reply-To: References: Message-ID: <15442218-f033-002a-72a5-ee21bbde00d7@redhat.com> On 11/06/19 9:11 PM, Alan Orth wrote: > Dear list, > > In a recent discussion on this list Ravi suggested that the > documentation for replace-brick? was out of date. For a > distribute?replicate volume the documentation currently says that we > need to kill the old brick's PID, create a temporary empty directory > on the FUSE mount, check the xattrs, replace-brick with commit force. > > Is all this still necessary? I'm running Gluster 5.6 on CentOS 7 with > a distribute?replicate volume. No,? all these very steps are 'codified' into the `replace brick commit force` command via https://review.gluster.org/#/c/glusterfs/+/10076/ and https://review.gluster.org/#/c/glusterfs/+/10448/.? You can see the commit messages of these 2 patches for more details. You can play around with most of these commands in a 1 node setup if you want to convince yourself that they work. There is no need to form a cluster. [root at tuxpad glusterfs]# gluster v create testvol replica 3 127.0.0.2:/home/ravi/bricks/brick{1..3} force [root at tuxpad glusterfs]# gluster v start testvol [root at tuxpad glusterfs]# mount -t glusterfs 127.0.0.2:testvol /mnt/fuse_mnt/ [root at tuxpad glusterfs]# touch /mnt/fuse_mnt/FILE [root at tuxpad glusterfs]# ll /home/ravi/bricks/brick*/FILE -rw-r--r--. 2 root root 0 Jun 11 21:55 /home/ravi/bricks/brick1/FILE -rw-r--r--. 2 root root 0 Jun 11 21:55 /home/ravi/bricks/brick2/FILE -rw-r--r--. 2 root root 0 Jun 11 21:55 /home/ravi/bricks/brick3/FILE [root at tuxpad glusterfs]# gluster v replace-brick testvol 127.0.0.2:/home/ravi/bricks/brick3 127.0.0.2:/home/ravi/bricks/brick3_new commit force volume replace-brick: success: replace-brick commit force operation successful [root at tuxpad glusterfs]# ll /home/ravi/bricks/brick3_new/FILE -rw-r--r--. 2 root root 0 Jun 11 21:55 /home/ravi/bricks/brick3_new/FILE Why don't you send a patch to update the doc for replace-brick? I'd be happy to review it. ;-) HTH, Ravi > > Thank you, > > ? https://docs.gluster.org/en/latest/Administrator Guide/Managing Volumes/ > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.orth at gmail.com Wed Jun 12 08:08:53 2019 From: alan.orth at gmail.com (Alan Orth) Date: Wed, 12 Jun 2019 11:08:53 +0300 Subject: [Gluster-users] =?utf-8?q?Proper_command_for_replace-brick_on_dis?= =?utf-8?q?tribute=E2=80=93replicate=3F?= In-Reply-To: <15442218-f033-002a-72a5-ee21bbde00d7@redhat.com> References: <15442218-f033-002a-72a5-ee21bbde00d7@redhat.com> Message-ID: Dear Ravi, Thanks for the confirmation?I replaced a brick in a volume last night and by the morning I see that Gluster has replicated data there, though I don't have any indication of its progress. The `gluster v heal volume info` and `gluster v heal volume info split-brain` are all looking good so I guess that's enough of an indication. One question, though. Immediately after I replaced the brick I checked `gluster v status volume` and I saw the following: Task Status of Volume volume ------------------------------------------------------------------------------ Task : Rebalance ID : a890e99c-5715-4bc1-80ee-c28490612135 Status : not started I did not initiate a rebalance, so it's strange to see it there. Is Gluster hinting that I should start a rebalance? If a rebalance is "not started" shouldn't Gluster just not show it at all? Regarding the patch to the documentation: absolutely! Let me just get my Gluster back in order after my confusing upgrade last month. :P Thanks, On Tue, Jun 11, 2019 at 7:32 PM Ravishankar N wrote: > > On 11/06/19 9:11 PM, Alan Orth wrote: > > Dear list, > > In a recent discussion on this list Ravi suggested that the documentation > for replace-brick? was out of date. For a distribute?replicate volume the > documentation currently says that we need to kill the old brick's PID, > create a temporary empty directory on the FUSE mount, check the xattrs, > replace-brick with commit force. > > Is all this still necessary? I'm running Gluster 5.6 on CentOS 7 with a > distribute?replicate volume. > > No, all these very steps are 'codified' into the `replace brick commit > force` command via https://review.gluster.org/#/c/glusterfs/+/10076/ and > https://review.gluster.org/#/c/glusterfs/+/10448/. You can see the > commit messages of these 2 patches for more details. > > You can play around with most of these commands in a 1 node setup if you > want to convince yourself that they work. There is no need to form a > cluster. > [root at tuxpad glusterfs]# gluster v create testvol replica 3 127.0.0.2:/home/ravi/bricks/brick{1..3} > force > [root at tuxpad glusterfs]# gluster v start testvol > [root at tuxpad glusterfs]# mount -t glusterfs 127.0.0.2:testvol > /mnt/fuse_mnt/ > [root at tuxpad glusterfs]# touch /mnt/fuse_mnt/FILE > [root at tuxpad glusterfs]# ll /home/ravi/bricks/brick*/FILE > -rw-r--r--. 2 root root 0 Jun 11 21:55 /home/ravi/bricks/brick1/FILE > -rw-r--r--. 2 root root 0 Jun 11 21:55 /home/ravi/bricks/brick2/FILE > -rw-r--r--. 2 root root 0 Jun 11 21:55 /home/ravi/bricks/brick3/FILE > > [root at tuxpad glusterfs]# gluster v replace-brick testvol 127.0.0.2:/home/ravi/bricks/brick3 > 127.0.0.2:/home/ravi/bricks/brick3_new commit force > volume replace-brick: success: replace-brick commit force operation > successful > [root at tuxpad glusterfs]# ll /home/ravi/bricks/brick3_new/FILE > > -rw-r--r--. 2 root root 0 Jun 11 21:55 /home/ravi/bricks/brick3_new/FILE > Why don't you send a patch to update the doc for replace-brick? I'd be > happy to review it. ;-) > HTH, > Ravi > > > Thank you, > > ? https://docs.gluster.org/en/latest/Administrator Guide/Managing Volumes/ > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users > > -- Alan Orth alan.orth at gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Wed Jun 12 11:00:14 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Wed, 12 Jun 2019 16:30:14 +0530 Subject: [Gluster-users] =?utf-8?q?Proper_command_for_replace-brick_on_dis?= =?utf-8?q?tribute=E2=80=93replicate=3F?= In-Reply-To: References: <15442218-f033-002a-72a5-ee21bbde00d7@redhat.com> Message-ID: On 12/06/19 1:38 PM, Alan Orth wrote: > Dear Ravi, > > Thanks for the confirmation?I replaced a brick in a volume last night > and by the morning I see that Gluster has replicated data there, > though I don't have any indication of its progress. The `gluster v > heal volume info` and `gluster v heal volume info split-brain` are all > looking good so I guess that's enough of an indication. Yes, right now, heal info showing no files is the indication. A new command for pending heal time estimation is something that is being worked upon. See https://github.com/gluster/glusterfs/issues/643 > > One question, though. Immediately after I replaced the brick I checked > `gluster v status volume` and I saw the following: > > Task Status of Volume volume > ------------------------------------------------------------------------------ > Task ? ? ? ? ? ? ? ? : Rebalance > ID ? ? ? ? ? ? ? ? ? : a890e99c-5715-4bc1-80ee-c28490612135 > Status ? ? ? ? ? ? ? : not started > > I did not initiate a rebalance, so it's strange to see it there. Is > Gluster hinting that I should start a rebalance? If a rebalance is > "not started" shouldn't Gluster just not show it at all? `replace-brick` should not show rebalance status. Not sure why you're seeing it. Adding Nithya for help. > > Regarding the patch to the documentation: absolutely! Let me just get > my Gluster back in order after my confusing upgrade last month. :P Great. Please send the PR for the https://github.com/gluster/glusterdocs/ project. I think docs/Administrator Guide/Managing Volumes.md is the file that needs to be updated. -Ravi > > Thanks, > > On Tue, Jun 11, 2019 at 7:32 PM Ravishankar N > wrote: > > > On 11/06/19 9:11 PM, Alan Orth wrote: >> Dear list, >> >> In a recent discussion on this list Ravi suggested that the >> documentation for replace-brick? was out of date. For a >> distribute?replicate volume the documentation currently says that >> we need to kill the old brick's PID, create a temporary empty >> directory on the FUSE mount, check the xattrs, replace-brick with >> commit force. >> >> Is all this still necessary? I'm running Gluster 5.6 on CentOS 7 >> with a distribute?replicate volume. > No,? all these very steps are 'codified' into the `replace brick > commit force` command via > https://review.gluster.org/#/c/glusterfs/+/10076/ and > https://review.gluster.org/#/c/glusterfs/+/10448/. You can see the > commit messages of these 2 patches for more details. > > You can play around with most of these commands in a 1 node setup > if you want to convince yourself that they work. There is no need > to form a cluster. > [root at tuxpad glusterfs]# gluster v create testvol replica 3 > 127.0.0.2:/home/ravi/bricks/brick{1..3} force > [root at tuxpad glusterfs]# gluster v start testvol > [root at tuxpad glusterfs]# mount -t glusterfs 127.0.0.2:testvol > /mnt/fuse_mnt/ > [root at tuxpad glusterfs]# touch /mnt/fuse_mnt/FILE > [root at tuxpad glusterfs]# ll /home/ravi/bricks/brick*/FILE > -rw-r--r--. 2 root root 0 Jun 11 21:55 /home/ravi/bricks/brick1/FILE > -rw-r--r--. 2 root root 0 Jun 11 21:55 /home/ravi/bricks/brick2/FILE > -rw-r--r--. 2 root root 0 Jun 11 21:55 /home/ravi/bricks/brick3/FILE > > [root at tuxpad glusterfs]# gluster v replace-brick testvol > 127.0.0.2:/home/ravi/bricks/brick3 > 127.0.0.2:/home/ravi/bricks/brick3_new commit force > volume replace-brick: success: replace-brick commit force > operation successful > [root at tuxpad glusterfs]# ll /home/ravi/bricks/brick3_new/FILE > > -rw-r--r--. 2 root root 0 Jun 11 21:55 > /home/ravi/bricks/brick3_new/FILE > > Why don't you send a patch to update the doc for replace-brick? > I'd be happy to review it. ;-) > HTH, > Ravi >> >> Thank you, >> >> ? https://docs.gluster.org/en/latest/Administrator Guide/Managing >> Volumes/ >> -- >> Alan Orth >> alan.orth at gmail.com >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> "In heaven all the interesting people are missing." ?Friedrich >> Nietzsche >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandeepkumar at nus.edu.sg Thu Jun 13 16:15:20 2019 From: sandeepkumar at nus.edu.sg (Kumar Sandeep) Date: Thu, 13 Jun 2019 16:15:20 +0000 Subject: [Gluster-users] Gluster performance on NVMe SSDs Message-ID: Hi, I have setup oVirt hyperconverged with 3xDL325 with 3 NVMe SSDs on each server for each Gluster volumes: engine, vmstore and data in replica 3. I am getting almost 4x performance difference when I am benchmarking with DD on gluster disk: Below output is from one of the gluster host: time dd if=/dev/zero of=/gluster/vmstore/testfile bs=1M count=1000 oflag=direct 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 0.546909 s, 1.9 GB/s real 0m0.552s user 0m0.001s sys 0m0.165s Below output is from VM. # time dd if=/dev/zero of=/root/testfile bs=1M count=1000 oflag=direct 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 1.89253 s, 554 MB/s real 0m1.896s user 0m0.002s sys 0m0.247s Earlier I was getting around ~270MB in that VM, It improved after setting the engine-config -s LibgfApiSupported=true --cver=4.3 I am using default tuned profile(virtual-host). Backend network is 10GbE, tested with iPerf and it gives above 9.40 Gbits/se bandwidth # gluster v info vmstore Volume Name: vmstore Type: Replicate Volume ID: b1f0c1fb-8b8b-479a-9117-61d45692047a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: daal-gluster.mbi.nus.edu.sg:/gluster/vmstore/brick1 Brick2: naan-gluster.mbi.nus.edu.sg:/gluster/vmstore/brick1 Brick3: dosa-gluster.mbi.nus.edu.sg:/gluster/vmstore/brick1 Options Reconfigured: server.event-threads: 4 client.event-threads: 4 performance.client-io-threads: off nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: on performance.low-prio-threads: 32 network.remote-dio: enable cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off storage.owner-uid: 36 storage.owner-gid: 36 features.shard-block-size: 512MB performance.cache-size: 1GB performance.io-thread-count: 32 performance.write-behind-window-size: 4MB Is there any way I can improve my gluster disk?s performance? Thanks for helping out. Regards, Sandeep ________________________________ Important: This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately; you should not copy or use it for any purpose, nor disclose its contents to any other person. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Fri Jun 14 08:36:13 2019 From: spisla80 at gmail.com (David Spisla) Date: Fri, 14 Jun 2019 10:36:13 +0200 Subject: [Gluster-users] Duplicated brick processes after restart of glusterd Message-ID: Dear Gluster Community, this morning I had an interesting observation. On my 2 Node Gluster v5.5 System with 3 Replica1 volumes (volume1, volume2, test) I had duplicated brick processes (See output of ps aux in attached file duplicate_bricks.txt) for each of the volumes. Additionally there is a fs-ss volume which I use instead of gluster_shared_storage but this volume was not effected. After doing some research I found a hint in glusterd.log . It seems to be that after a restart glusterd couldn't found the pid files for the freshly created brick processes and create new brick processes. One can see in the brick logs that for all the volumes that two brick processes were created just one after another. Result: Two brick processes for each of the volumes volume1, volume2 and test. "gluster vo status" shows that the pid number was mapped to the wrong port number for hydmedia and impax But beside of that the volume was working correctly. I resolve that issue with a workaround. Kill all brick processes and restart glusterd. After that everything is fine. Is this a bug in glusterd? You can find all relevant informations attached below Regards David Spisla -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: duplicated-bricks.tar.gz Type: application/x-gzip Size: 8796 bytes Desc: not available URL: From khiremat at redhat.com Fri Jun 14 08:43:53 2019 From: khiremat at redhat.com (Kotresh Hiremath Ravishankar) Date: Fri, 14 Jun 2019 14:13:53 +0530 Subject: [Gluster-users] Geo Replication Stop even after migratingto 5.6 In-Reply-To: References: Message-ID: It's about complete re-sync. The idea is to set the stime xattr which marks the sync time to 0 on all the bricks. If lot of the data is not synced to slave, this is not very useful. You can as well delete the geo-rep session with 'reset-sync-time' option and re-setup. I prefer the second way. Thanks, Kotresh HR On Fri, Jun 14, 2019 at 12:48 PM deepu srinivasan wrote: > Hi Guys > Yes, I will try the root geo-rep setup and update you back. > Meanwhile is there any procedure for the below-quoted info in the docs? > >> Synchronization is not complete >> >> *Description*: GlusterFS geo-replication did not synchronize the data >> completely but the geo-replication status displayed is OK. >> >> *Solution*: You can enforce a full sync of the data by erasing the index >> and restarting GlusterFS geo-replication. After restarting, GlusterFS >> geo-replication begins synchronizing all the data. All files are compared >> using checksum, which can be a lengthy and high resource utilization >> operation on large data sets. >> >> > On Fri, Jun 14, 2019 at 12:30 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> Could you please try root geo-rep setup and update back? >> >> On Fri, Jun 14, 2019 at 12:28 PM deepu srinivasan >> wrote: >> >>> Hi Any updates on this >>> >>> >>> On Thu, Jun 13, 2019 at 5:43 PM deepu srinivasan >>> wrote: >>> >>>> Hi Guys >>>> Hope you remember the issue I reported for geo replication hang status >>>> on History Crawl. >>>> So you advised me to update the gluster version. previously I was using >>>> 4.1 now I upgraded to 5.6/Still after deleting the previous geo-rep session >>>> and creating a new one the geo-rep session hangs. Is there any other way >>>> that I could solve the issue. >>>> I heard that I could redo the whole geo-replication again. How could I >>>> do that? >>>> Please help. >>>> >>> >> >> -- >> Thanks and Regards, >> Kotresh H R >> > -- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: From amukherj at redhat.com Fri Jun 14 17:03:05 2019 From: amukherj at redhat.com (Atin Mukherjee) Date: Fri, 14 Jun 2019 22:33:05 +0530 Subject: [Gluster-users] Duplicated brick processes after restart of glusterd In-Reply-To: References: Message-ID: Please see https://bugzilla.redhat.com/show_bug.cgi?id=1696147 which is fixed in 5.6 . Although a race, I believe you're hitting this. Although the title of the bug reflects it to be shd + brick multiplexing combo, but it's applicable for bricks too. On Fri, Jun 14, 2019 at 2:07 PM David Spisla wrote: > Dear Gluster Community, > > this morning I had an interesting observation. On my 2 Node Gluster v5.5 > System with 3 Replica1 volumes (volume1, volume2, test) I had duplicated > brick processes (See output of ps aux in attached file > duplicate_bricks.txt) for each of the volumes. Additionally there is a > fs-ss volume which I use instead of gluster_shared_storage but this volume > was not effected. > > After doing some research I found a hint in glusterd.log . It seems to be > that after a restart glusterd couldn't found the pid files for the freshly > created brick processes and create new brick processes. One can see in the > brick logs that for all the volumes that two brick processes were created > just one after another. > > Result: Two brick processes for each of the volumes volume1, volume2 and > test. > "gluster vo status" shows that the pid number was mapped to the wrong port > number for hydmedia and impax > > But beside of that the volume was working correctly. I resolve that issue > with a workaround. Kill all brick processes and restart glusterd. After > that everything is fine. > > Is this a bug in glusterd? You can find all relevant informations attached > below > > Regards > David Spisla > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From kahlil.talledo at webqem.com Mon Jun 17 01:39:37 2019 From: kahlil.talledo at webqem.com (Kahlil Talledo) Date: Mon, 17 Jun 2019 01:39:37 +0000 Subject: [Gluster-users] Bricks has different disk usage Message-ID: <1560735577387.74924@webqem.com> Hello, I currently have a gluster (glusterfs 3.7.15) volume (distributed-replicated) with 4 bricks (2 x 2 = 4) and the bricks themselves seem to have different usage. I had the impression that they should all be equal? So all bricks has a 250GB disk. 1 set of bricks (2 bricks) has 91% usage, while the other set (2 bricks) has 16% usage. So originally, the volume (250GB total) started out with just a set of 2 bricks. Then 2 more bricks were added to expand the volume to 500GB. A rebalance was done after adding the new bricks as well as fix-layout. Really not sure what to make of this. Details below: Volume Name: glustervol0 Type: Distributed-Replicate Volume ID: 243e0652-5b95-4f63-bcf6-f7c60a75ff83 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: dc1-x-smb-clust-01-1:/vol1/brick1 Brick2: dc1-x-smb-clust-01-2:/vol1/brick1 Brick3: dc1-x-smb-clust-01-1:/vol2/brick2 Brick4: dc1-x-smb-clust-01-2:/vol2/brick2 /dev/xvdb1 250G 226G 25G 91% /vol1 /dev/xvdc1 250G 40G 211G 16% /vol2 Cheers, ? -------------------------------------------------- Kahlil Talledo -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Mon Jun 17 06:50:27 2019 From: spisla80 at gmail.com (David Spisla) Date: Mon, 17 Jun 2019 08:50:27 +0200 Subject: [Gluster-users] Duplicated brick processes after restart of glusterd In-Reply-To: References: Message-ID: Hello Atin, thank you for the clarification. Am Fr., 14. Juni 2019 um 19:03 Uhr schrieb Atin Mukherjee < amukherj at redhat.com>: > Please see https://bugzilla.redhat.com/show_bug.cgi?id=1696147 which is > fixed in 5.6 . Although a race, I believe you're hitting this. Although the > title of the bug reflects it to be shd + brick multiplexing combo, but it's > applicable for bricks too. > > On Fri, Jun 14, 2019 at 2:07 PM David Spisla wrote: > >> Dear Gluster Community, >> >> this morning I had an interesting observation. On my 2 Node Gluster v5.5 >> System with 3 Replica1 volumes (volume1, volume2, test) I had duplicated >> brick processes (See output of ps aux in attached file >> duplicate_bricks.txt) for each of the volumes. Additionally there is a >> fs-ss volume which I use instead of gluster_shared_storage but this volume >> was not effected. >> >> After doing some research I found a hint in glusterd.log . It seems to be >> that after a restart glusterd couldn't found the pid files for the freshly >> created brick processes and create new brick processes. One can see in the >> brick logs that for all the volumes that two brick processes were created >> just one after another. >> >> Result: Two brick processes for each of the volumes volume1, volume2 and >> test. >> "gluster vo status" shows that the pid number was mapped to the wrong >> port number for hydmedia and impax >> >> But beside of that the volume was working correctly. I resolve that issue >> with a workaround. Kill all brick processes and restart glusterd. After >> that everything is fine. >> >> Is this a bug in glusterd? You can find all relevant informations >> attached below >> >> Regards >> David Spisla >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Mon Jun 17 10:15:15 2019 From: spisla80 at gmail.com (David Spisla) Date: Mon, 17 Jun 2019 12:15:15 +0200 Subject: [Gluster-users] Pending heal status when deleting files which are marked as to be healed Message-ID: Hello Gluster Community, my newest observation concerns the self heal daemon: Scenario: 2 Node Gluster v5.5 Cluster with Replica 2 Volume. Just one brick per node. Access via SMB Client from a Win10 machine How to reproduce: I have created a small folder with a lot of small files and I copied that folder recursively into itself for a few times. Additionally I copied three big folders with a lot of content into the root of the volume. Note: There was no node down or something else like brick down, etc.. So the whole volume was accessible. Because of the recursively copy action all this copied files whre listed as to be healed (via gluster heal info). Now I set some of the effected files ReadOnly (they get WORMed because worm-file-level is enabled). After this I tried to delete the parent folder of that files. Expected: All files should be healed Actually: All files, which are Read-Only, are not healed. heal info shows permanently that this files has to be healed. glustershd log throws error and brick log (with level DEBUG) permanently throws a lot of messages which I don't understand. See the attached file which contains all informations, also heal info and volume info, beside the logs Maybe some of you know whats going on there? Since we can reproduce this scenario, we can give more debug information if needed. Regards David Spisla -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pending_heal.tar.gz Type: application/x-gzip Size: 111809 bytes Desc: not available URL: From dcunningham at voisonics.com Mon Jun 17 22:34:31 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Tue, 18 Jun 2019 10:34:31 +1200 Subject: [Gluster-users] Thin-arbiter questions In-Reply-To: <2006209001.22227657.1560230122663.JavaMail.zimbra@redhat.com> References: <757816852.16925254.1557123682731.JavaMail.zimbra@redhat.com> <645227359.16980056.1557131647054.JavaMail.zimbra@redhat.com> <2006209001.22227657.1560230122663.JavaMail.zimbra@redhat.com> Message-ID: Hi Ashish, Thanks for that. I guess it's not your responsibility, but do you know how often it typically takes for new versions to reach the CentOS package system after being released? On Tue, 11 Jun 2019 at 17:15, Ashish Pandey wrote: > Hi David, > > It should be any time soon as we are in last phase of patch reviews. You > can follow this patch - https://review.gluster.org/#/c/glusterfs/+/22612/ > > --- > Ashish > > ------------------------------ > *From: *"David Cunningham" > *To: *"Ashish Pandey" > *Cc: *"gluster-users" > *Sent: *Tuesday, June 11, 2019 9:55:40 AM > *Subject: *Re: [Gluster-users] Thin-arbiter questions > > Hi Ashish and Amar, > > Is there any news on when thin-arbiter might be in the regular GlusterFS, > and the CentOS packages please? > > Thanks for your help. > > > On Mon, 6 May 2019 at 20:34, Ashish Pandey wrote: > >> >> >> ------------------------------ >> *From: *"David Cunningham" >> *To: *"Ashish Pandey" >> *Cc: *"gluster-users" >> *Sent: *Monday, May 6, 2019 1:40:30 PM >> *Subject: *Re: [Gluster-users] Thin-arbiter questions >> >> Hi Ashish, >> >> Thank you for the update. Does that mean they're now in the regular >> Glusterfs? Any idea how long it typically takes the Ubuntu and CentOS >> packages to be updated with the latest code? >> >> No, for regular glusterd, work is still in progress. It will be done soon. >> I don't have answer for the next question. May be Amar have information >> regarding this. Adding him in CC. >> >> >> On Mon, 6 May 2019 at 18:21, Ashish Pandey wrote: >> >>> Hi, >>> >>> I can see that Amar has already committed the changes and those are >>> visible on >>> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ >>> >>> --- >>> Ashish >>> >>> >>> >>> ------------------------------ >>> *From: *"Strahil" >>> *To: *"Ashish" , "David" >> > >>> *Cc: *"gluster-users" >>> *Sent: *Saturday, May 4, 2019 12:10:01 AM >>> *Subject: *Re: [Gluster-users] Thin-arbiter questions >>> >>> Hi Ashish, >>> >>> Can someone commit the doc change I have already proposed ? >>> At least, the doc will clarify that fact . >>> >>> Best Regards, >>> Strahil Nikolov >>> On May 3, 2019 05:30, Ashish Pandey wrote: >>> >>> Hi David, >>> >>> Creation of thin-arbiter volume is currently supported by GD2 only. The >>> command "glustercli" is available when glusterd2 is running. >>> We are also working on providing thin-arbiter support on glusted >>> however, it is not available right now. >>> https://review.gluster.org/#/c/glusterfs/+/22612/ >>> >>> --- >>> Ashish >>> >>> ------------------------------ >>> *From: *"David Cunningham" >>> *To: *gluster-users at gluster.org >>> *Sent: *Friday, May 3, 2019 7:40:03 AM >>> *Subject: *[Gluster-users] Thin-arbiter questions >>> >>> Hello, >>> >>> We are setting up a thin-arbiter and hope someone can help with some >>> questions. We've been following the documentation from >>> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ >>> . >>> >>> 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume >>> create" with the --thin-arbiter option on 5.5 and got an "unrecognized >>> option --thin-arbiter" error. >>> >>> 2. The instruction to create a new volume with a thin-arbiter is clear. >>> How do you add a thin-arbiter to an already existing volume though? >>> >>> 3. The documentation suggests running glusterfsd manually to start the >>> thin-arbiter. Is there a service that can do this instead? I found a >>> mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 >>> but it's not really documented. >>> >>> Thanks in advance for your help, >>> >>> -- >>> David Cunningham, Voisonics Limited >>> http://voisonics.com/ >>> USA: +1 213 221 1092 >>> New Zealand: +64 (0)28 2558 3782 >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgowtham at redhat.com Tue Jun 18 05:24:19 2019 From: hgowtham at redhat.com (Hari Gowtham) Date: Tue, 18 Jun 2019 10:54:19 +0530 Subject: [Gluster-users] Thin-arbiter questions In-Reply-To: References: <757816852.16925254.1557123682731.JavaMail.zimbra@redhat.com> <645227359.16980056.1557131647054.JavaMail.zimbra@redhat.com> <2006209001.22227657.1560230122663.JavaMail.zimbra@redhat.com> Message-ID: Hi David, Once a feature is added to the master branch, we have to back port it to the release 5, 6 and other such branches which are active. And these release branches will be tagged every month around 10th. So if an feature has been back ported to the particular release branch before tagging, then it will be a part of the tagging. And this tag is the one used for creating packaging. This is the procedure for CentOS, Fedora and Debian. Regards, Hari. On Tue, 18 Jun, 2019, 4:06 AM David Cunningham, wrote: > Hi Ashish, > > Thanks for that. I guess it's not your responsibility, but do you know how > often it typically takes for new versions to reach the CentOS package > system after being released? > > > On Tue, 11 Jun 2019 at 17:15, Ashish Pandey wrote: > >> Hi David, >> >> It should be any time soon as we are in last phase of patch reviews. You >> can follow this patch - https://review.gluster.org/#/c/glusterfs/+/22612/ >> >> --- >> Ashish >> >> ------------------------------ >> *From: *"David Cunningham" >> *To: *"Ashish Pandey" >> *Cc: *"gluster-users" >> *Sent: *Tuesday, June 11, 2019 9:55:40 AM >> *Subject: *Re: [Gluster-users] Thin-arbiter questions >> >> Hi Ashish and Amar, >> >> Is there any news on when thin-arbiter might be in the regular GlusterFS, >> and the CentOS packages please? >> >> Thanks for your help. >> >> >> On Mon, 6 May 2019 at 20:34, Ashish Pandey wrote: >> >>> >>> >>> ------------------------------ >>> *From: *"David Cunningham" >>> *To: *"Ashish Pandey" >>> *Cc: *"gluster-users" >>> *Sent: *Monday, May 6, 2019 1:40:30 PM >>> *Subject: *Re: [Gluster-users] Thin-arbiter questions >>> >>> Hi Ashish, >>> >>> Thank you for the update. Does that mean they're now in the regular >>> Glusterfs? Any idea how long it typically takes the Ubuntu and CentOS >>> packages to be updated with the latest code? >>> >>> No, for regular glusterd, work is still in progress. It will be done >>> soon. >>> I don't have answer for the next question. May be Amar have information >>> regarding this. Adding him in CC. >>> >>> >>> On Mon, 6 May 2019 at 18:21, Ashish Pandey wrote: >>> >>>> Hi, >>>> >>>> I can see that Amar has already committed the changes and those are >>>> visible on >>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ >>>> >>>> --- >>>> Ashish >>>> >>>> >>>> >>>> ------------------------------ >>>> *From: *"Strahil" >>>> *To: *"Ashish" , "David" < >>>> dcunningham at voisonics.com> >>>> *Cc: *"gluster-users" >>>> *Sent: *Saturday, May 4, 2019 12:10:01 AM >>>> *Subject: *Re: [Gluster-users] Thin-arbiter questions >>>> >>>> Hi Ashish, >>>> >>>> Can someone commit the doc change I have already proposed ? >>>> At least, the doc will clarify that fact . >>>> >>>> Best Regards, >>>> Strahil Nikolov >>>> On May 3, 2019 05:30, Ashish Pandey wrote: >>>> >>>> Hi David, >>>> >>>> Creation of thin-arbiter volume is currently supported by GD2 only. The >>>> command "glustercli" is available when glusterd2 is running. >>>> We are also working on providing thin-arbiter support on glusted >>>> however, it is not available right now. >>>> https://review.gluster.org/#/c/glusterfs/+/22612/ >>>> >>>> --- >>>> Ashish >>>> >>>> ------------------------------ >>>> *From: *"David Cunningham" >>>> *To: *gluster-users at gluster.org >>>> *Sent: *Friday, May 3, 2019 7:40:03 AM >>>> *Subject: *[Gluster-users] Thin-arbiter questions >>>> >>>> Hello, >>>> >>>> We are setting up a thin-arbiter and hope someone can help with some >>>> questions. We've been following the documentation from >>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ >>>> . >>>> >>>> 1. What release of 5.x supports thin-arbiter? We tried a "gluster >>>> volume create" with the --thin-arbiter option on 5.5 and got an >>>> "unrecognized option --thin-arbiter" error. >>>> >>>> 2. The instruction to create a new volume with a thin-arbiter is clear. >>>> How do you add a thin-arbiter to an already existing volume though? >>>> >>>> 3. The documentation suggests running glusterfsd manually to start the >>>> thin-arbiter. Is there a service that can do this instead? I found a >>>> mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 >>>> but it's not really documented. >>>> >>>> Thanks in advance for your help, >>>> >>>> -- >>>> David Cunningham, Voisonics Limited >>>> http://voisonics.com/ >>>> USA: +1 213 221 1092 >>>> New Zealand: +64 (0)28 2558 3782 >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>> >>> -- >>> David Cunningham, Voisonics Limited >>> http://voisonics.com/ >>> USA: +1 213 221 1092 >>> New Zealand: +64 (0)28 2558 3782 >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Tue Jun 18 22:53:47 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Wed, 19 Jun 2019 10:53:47 +1200 Subject: [Gluster-users] Thin-arbiter questions In-Reply-To: References: <757816852.16925254.1557123682731.JavaMail.zimbra@redhat.com> <645227359.16980056.1557131647054.JavaMail.zimbra@redhat.com> <2006209001.22227657.1560230122663.JavaMail.zimbra@redhat.com> Message-ID: Hi Hari, Thanks for that information. So if I understand correctly, if thin-arbiter is committed to the master branch by the 10th July, then it should be in the CentOS package fairly soon afterwards? I have a customer asking when we can use it, hence the questions. Thank you. On Tue, 18 Jun 2019 at 17:24, Hari Gowtham wrote: > Hi David, > > Once a feature is added to the master branch, we have to back port it to > the release 5, 6 and other such branches which are active. And these > release branches will be tagged every month around 10th. So if an feature > has been back ported to the particular release branch before tagging, then > it will be a part of the tagging. And this tag is the one used for creating > packaging. This is the procedure for CentOS, Fedora and Debian. > > Regards, > Hari. > > On Tue, 18 Jun, 2019, 4:06 AM David Cunningham, > wrote: > >> Hi Ashish, >> >> Thanks for that. I guess it's not your responsibility, but do you know >> how often it typically takes for new versions to reach the CentOS package >> system after being released? >> >> >> On Tue, 11 Jun 2019 at 17:15, Ashish Pandey wrote: >> >>> Hi David, >>> >>> It should be any time soon as we are in last phase of patch reviews. You >>> can follow this patch - >>> https://review.gluster.org/#/c/glusterfs/+/22612/ >>> >>> --- >>> Ashish >>> >>> ------------------------------ >>> *From: *"David Cunningham" >>> *To: *"Ashish Pandey" >>> *Cc: *"gluster-users" >>> *Sent: *Tuesday, June 11, 2019 9:55:40 AM >>> *Subject: *Re: [Gluster-users] Thin-arbiter questions >>> >>> Hi Ashish and Amar, >>> >>> Is there any news on when thin-arbiter might be in the regular >>> GlusterFS, and the CentOS packages please? >>> >>> Thanks for your help. >>> >>> >>> On Mon, 6 May 2019 at 20:34, Ashish Pandey wrote: >>> >>>> >>>> >>>> ------------------------------ >>>> *From: *"David Cunningham" >>>> *To: *"Ashish Pandey" >>>> *Cc: *"gluster-users" >>>> *Sent: *Monday, May 6, 2019 1:40:30 PM >>>> *Subject: *Re: [Gluster-users] Thin-arbiter questions >>>> >>>> Hi Ashish, >>>> >>>> Thank you for the update. Does that mean they're now in the regular >>>> Glusterfs? Any idea how long it typically takes the Ubuntu and CentOS >>>> packages to be updated with the latest code? >>>> >>>> No, for regular glusterd, work is still in progress. It will be done >>>> soon. >>>> I don't have answer for the next question. May be Amar have information >>>> regarding this. Adding him in CC. >>>> >>>> >>>> On Mon, 6 May 2019 at 18:21, Ashish Pandey wrote: >>>> >>>>> Hi, >>>>> >>>>> I can see that Amar has already committed the changes and those are >>>>> visible on >>>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ >>>>> >>>>> --- >>>>> Ashish >>>>> >>>>> >>>>> >>>>> ------------------------------ >>>>> *From: *"Strahil" >>>>> *To: *"Ashish" , "David" < >>>>> dcunningham at voisonics.com> >>>>> *Cc: *"gluster-users" >>>>> *Sent: *Saturday, May 4, 2019 12:10:01 AM >>>>> *Subject: *Re: [Gluster-users] Thin-arbiter questions >>>>> >>>>> Hi Ashish, >>>>> >>>>> Can someone commit the doc change I have already proposed ? >>>>> At least, the doc will clarify that fact . >>>>> >>>>> Best Regards, >>>>> Strahil Nikolov >>>>> On May 3, 2019 05:30, Ashish Pandey wrote: >>>>> >>>>> Hi David, >>>>> >>>>> Creation of thin-arbiter volume is currently supported by GD2 only. >>>>> The command "glustercli" is available when glusterd2 is running. >>>>> We are also working on providing thin-arbiter support on glusted >>>>> however, it is not available right now. >>>>> https://review.gluster.org/#/c/glusterfs/+/22612/ >>>>> >>>>> --- >>>>> Ashish >>>>> >>>>> ------------------------------ >>>>> *From: *"David Cunningham" >>>>> *To: *gluster-users at gluster.org >>>>> *Sent: *Friday, May 3, 2019 7:40:03 AM >>>>> *Subject: *[Gluster-users] Thin-arbiter questions >>>>> >>>>> Hello, >>>>> >>>>> We are setting up a thin-arbiter and hope someone can help with some >>>>> questions. We've been following the documentation from >>>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ >>>>> . >>>>> >>>>> 1. What release of 5.x supports thin-arbiter? We tried a "gluster >>>>> volume create" with the --thin-arbiter option on 5.5 and got an >>>>> "unrecognized option --thin-arbiter" error. >>>>> >>>>> 2. The instruction to create a new volume with a thin-arbiter is >>>>> clear. How do you add a thin-arbiter to an already existing volume though? >>>>> >>>>> 3. The documentation suggests running glusterfsd manually to start the >>>>> thin-arbiter. Is there a service that can do this instead? I found a >>>>> mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 >>>>> but it's not really documented. >>>>> >>>>> Thanks in advance for your help, >>>>> >>>>> -- >>>>> David Cunningham, Voisonics Limited >>>>> http://voisonics.com/ >>>>> USA: +1 213 221 1092 >>>>> New Zealand: +64 (0)28 2558 3782 >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >>>> >>>> -- >>>> David Cunningham, Voisonics Limited >>>> http://voisonics.com/ >>>> USA: +1 213 221 1092 >>>> New Zealand: +64 (0)28 2558 3782 >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>> >>> -- >>> David Cunningham, Voisonics Limited >>> http://voisonics.com/ >>> USA: +1 213 221 1092 >>> New Zealand: +64 (0)28 2558 3782 >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkmail at bneit.com Wed Jun 19 00:25:38 2019 From: wkmail at bneit.com (wkmail) Date: Tue, 18 Jun 2019 17:25:38 -0700 Subject: [Gluster-users] Thin arbiter daemon on non-thin setup? Message-ID: On a brand new Ubuntu 18? Gluster 6.2?? replicate 3 arbiter 1 (normal arbiter) setup. glusterfs-server/bionic,now 6.2-ubuntu1~bionic1 amd64 [installed] ? clustered file-system (server package) Systemd is degraded and I show this this in systemctl listing ? gluster-ta-volume.service??? loaded failed failed??? GlusterFS, Thin-arbiter process to maintain quorum for replica volume systemctl status show this ? gluster-ta-volume.service - GlusterFS, Thin-arbiter process to maintain quorum for replica volume ?? Loaded: loaded (/lib/systemd/system/gluster-ta-volume.service; enabled; vendor preset: enabled) ?? Active: failed (Result: exit-code) since Sun 2019-06-16 12:36:15 PDT; 2 days ago ? Process: 13020 ExecStart=/usr/sbin/glusterfsd -N --volfile-id ta-vol -f /var/lib/glusterd/thin-arbiter/thin-arbiter.vol --brick-port 24007 --xlator-option ta-vol-server.transport.socket.listen-port=24007 (code=exited, status=255) ?Main PID: 13020 (code=exited, status=255) Jun 16 12:36:15 onetest3.pixelgate.net systemd[1]: gluster-ta-volume.service: Service hold-off time over, scheduling restart. Jun 16 12:36:15 onetest3.pixelgate.net systemd[1]: gluster-ta-volume.service: Scheduled restart job, restart counter is at 5. Jun 16 12:36:15 onetest3.pixelgate.net systemd[1]: Stopped GlusterFS, Thin-arbiter process to maintain quorum for replica volume. Jun 16 12:36:15 onetest3.pixelgate.net systemd[1]: gluster-ta-volume.service: Start request repeated too quickly. Jun 16 12:36:15 onetest3.pixelgate.net systemd[1]: gluster-ta-volume.service: Failed with result 'exit-code'. Jun 16 12:36:15 onetest3.pixelgate.net systemd[1]: Failed to start GlusterFS, Thin-arbiter process to maintain quorum for replica volume Since I am not using Thin Arbiter, I am a little confused. The Gluster setup itself seems fine and seems to work normally. root at onetest2:/var/log/libvirt/qemu# gluster peer status Number of Peers: 2 Hostname: onetest1.gluster Uuid: 79dc67df-c606-42f8-bbee-f7e73c730eb8 State: Peer in Cluster (Connected) Hostname: onetest3.gluster Uuid: d4e3330b-eaac-4a54-ad2e-a0da1114ec09 State: Peer in Cluster (Connected) root at onetest2:/var/log/libvirt/qemu# gluster volume info Volume Name: gv0 Type: Replicate Volume ID: 1a80b833-0850-4ddb-83fa-f36da2b7a8fc Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: onetest2.gluster:/GLUSTER/gv0 Brick2: onetest3.gluster:/GLUSTER/gv0 Brick3: onetest1.gluster:/GLUSTER/gv0 (arbiter) Thoughts? Can I just disable remove that service? Sincerely, W Kern From aspandey at redhat.com Wed Jun 19 02:44:10 2019 From: aspandey at redhat.com (Ashish Pandey) Date: Tue, 18 Jun 2019 22:44:10 -0400 (EDT) Subject: [Gluster-users] Thin arbiter daemon on non-thin setup? In-Reply-To: References: Message-ID: <2014326970.23604883.1560912250773.JavaMail.zimbra@redhat.com> Hi, Yes, you can stop/disable gluster-ta-volume.service using systemctl command. I will also check and see why it even trying to load thin-arbiter for non thin-arbiter volume but for now you can just disable it. --- Ashish ----- Original Message ----- From: "wkmail" To: "gluster-users at gluster.org List" Sent: Wednesday, June 19, 2019 5:55:38 AM Subject: [Gluster-users] Thin arbiter daemon on non-thin setup? On a brand new Ubuntu 18 Gluster 6.2 replicate 3 arbiter 1 (normal arbiter) setup. glusterfs-server/bionic,now 6.2-ubuntu1~bionic1 amd64 [installed] clustered file-system (server package) Systemd is degraded and I show this this in systemctl listing ? gluster-ta-volume.service loaded failed failed GlusterFS, Thin-arbiter process to maintain quorum for replica volume systemctl status show this ? gluster-ta-volume.service - GlusterFS, Thin-arbiter process to maintain quorum for replica volume Loaded: loaded (/lib/systemd/system/gluster-ta-volume.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Sun 2019-06-16 12:36:15 PDT; 2 days ago Process: 13020 ExecStart=/usr/sbin/glusterfsd -N --volfile-id ta-vol -f /var/lib/glusterd/thin-arbiter/thin-arbiter.vol --brick-port 24007 --xlator-option ta-vol-server.transport.socket.listen-port=24007 (code=exited, status=255) Main PID: 13020 (code=exited, status=255) Jun 16 12:36:15 onetest3.pixelgate.net systemd[1]: gluster-ta-volume.service: Service hold-off time over, scheduling restart. Jun 16 12:36:15 onetest3.pixelgate.net systemd[1]: gluster-ta-volume.service: Scheduled restart job, restart counter is at 5. Jun 16 12:36:15 onetest3.pixelgate.net systemd[1]: Stopped GlusterFS, Thin-arbiter process to maintain quorum for replica volume. Jun 16 12:36:15 onetest3.pixelgate.net systemd[1]: gluster-ta-volume.service: Start request repeated too quickly. Jun 16 12:36:15 onetest3.pixelgate.net systemd[1]: gluster-ta-volume.service: Failed with result 'exit-code'. Jun 16 12:36:15 onetest3.pixelgate.net systemd[1]: Failed to start GlusterFS, Thin-arbiter process to maintain quorum for replica volume Since I am not using Thin Arbiter, I am a little confused. The Gluster setup itself seems fine and seems to work normally. root at onetest2:/var/log/libvirt/qemu# gluster peer status Number of Peers: 2 Hostname: onetest1.gluster Uuid: 79dc67df-c606-42f8-bbee-f7e73c730eb8 State: Peer in Cluster (Connected) Hostname: onetest3.gluster Uuid: d4e3330b-eaac-4a54-ad2e-a0da1114ec09 State: Peer in Cluster (Connected) root at onetest2:/var/log/libvirt/qemu# gluster volume info Volume Name: gv0 Type: Replicate Volume ID: 1a80b833-0850-4ddb-83fa-f36da2b7a8fc Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: onetest2.gluster:/GLUSTER/gv0 Brick2: onetest3.gluster:/GLUSTER/gv0 Brick3: onetest1.gluster:/GLUSTER/gv0 (arbiter) Thoughts? Can I just disable remove that service? Sincerely, W Kern _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgowtham at redhat.com Wed Jun 19 05:41:54 2019 From: hgowtham at redhat.com (Hari Gowtham) Date: Wed, 19 Jun 2019 11:11:54 +0530 Subject: [Gluster-users] Thin-arbiter questions In-Reply-To: References: <757816852.16925254.1557123682731.JavaMail.zimbra@redhat.com> <645227359.16980056.1557131647054.JavaMail.zimbra@redhat.com> <2006209001.22227657.1560230122663.JavaMail.zimbra@redhat.com> Message-ID: The committing should happen a little earlier than 10th. The tagging is the one which decides if the patch makes it to the release or not and tagging happens before 10th. We do announce the tagging date for each release in the mailing list. You can keep an eye on that to know the dates. And committing in master won't be enough for it to make it to a release. If it has to be a part of release 6 then after being committed into master we have to back port it to the release 6 branch and it should get committed in that particular branch as well. Only then it will be a part of the package released for that branch. On Wed, 19 Jun, 2019, 4:24 AM David Cunningham, wrote: > Hi Hari, > > Thanks for that information. So if I understand correctly, if thin-arbiter > is committed to the master branch by the 10th July, then it should be in > the CentOS package fairly soon afterwards? > > I have a customer asking when we can use it, hence the questions. Thank > you. > > > On Tue, 18 Jun 2019 at 17:24, Hari Gowtham wrote: > >> Hi David, >> >> Once a feature is added to the master branch, we have to back port it to >> the release 5, 6 and other such branches which are active. And these >> release branches will be tagged every month around 10th. So if an feature >> has been back ported to the particular release branch before tagging, then >> it will be a part of the tagging. And this tag is the one used for creating >> packaging. This is the procedure for CentOS, Fedora and Debian. >> >> Regards, >> Hari. >> >> On Tue, 18 Jun, 2019, 4:06 AM David Cunningham, < >> dcunningham at voisonics.com> wrote: >> >>> Hi Ashish, >>> >>> Thanks for that. I guess it's not your responsibility, but do you know >>> how often it typically takes for new versions to reach the CentOS package >>> system after being released? >>> >>> >>> On Tue, 11 Jun 2019 at 17:15, Ashish Pandey wrote: >>> >>>> Hi David, >>>> >>>> It should be any time soon as we are in last phase of patch reviews. >>>> You can follow this patch - >>>> https://review.gluster.org/#/c/glusterfs/+/22612/ >>>> >>>> --- >>>> Ashish >>>> >>>> ------------------------------ >>>> *From: *"David Cunningham" >>>> *To: *"Ashish Pandey" >>>> *Cc: *"gluster-users" >>>> *Sent: *Tuesday, June 11, 2019 9:55:40 AM >>>> *Subject: *Re: [Gluster-users] Thin-arbiter questions >>>> >>>> Hi Ashish and Amar, >>>> >>>> Is there any news on when thin-arbiter might be in the regular >>>> GlusterFS, and the CentOS packages please? >>>> >>>> Thanks for your help. >>>> >>>> >>>> On Mon, 6 May 2019 at 20:34, Ashish Pandey wrote: >>>> >>>>> >>>>> >>>>> ------------------------------ >>>>> *From: *"David Cunningham" >>>>> *To: *"Ashish Pandey" >>>>> *Cc: *"gluster-users" >>>>> *Sent: *Monday, May 6, 2019 1:40:30 PM >>>>> *Subject: *Re: [Gluster-users] Thin-arbiter questions >>>>> >>>>> Hi Ashish, >>>>> >>>>> Thank you for the update. Does that mean they're now in the regular >>>>> Glusterfs? Any idea how long it typically takes the Ubuntu and CentOS >>>>> packages to be updated with the latest code? >>>>> >>>>> No, for regular glusterd, work is still in progress. It will be done >>>>> soon. >>>>> I don't have answer for the next question. May be Amar have >>>>> information regarding this. Adding him in CC. >>>>> >>>>> >>>>> On Mon, 6 May 2019 at 18:21, Ashish Pandey >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I can see that Amar has already committed the changes and those are >>>>>> visible on >>>>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ >>>>>> >>>>>> --- >>>>>> Ashish >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------ >>>>>> *From: *"Strahil" >>>>>> *To: *"Ashish" , "David" < >>>>>> dcunningham at voisonics.com> >>>>>> *Cc: *"gluster-users" >>>>>> *Sent: *Saturday, May 4, 2019 12:10:01 AM >>>>>> *Subject: *Re: [Gluster-users] Thin-arbiter questions >>>>>> >>>>>> Hi Ashish, >>>>>> >>>>>> Can someone commit the doc change I have already proposed ? >>>>>> At least, the doc will clarify that fact . >>>>>> >>>>>> Best Regards, >>>>>> Strahil Nikolov >>>>>> On May 3, 2019 05:30, Ashish Pandey wrote: >>>>>> >>>>>> Hi David, >>>>>> >>>>>> Creation of thin-arbiter volume is currently supported by GD2 only. >>>>>> The command "glustercli" is available when glusterd2 is running. >>>>>> We are also working on providing thin-arbiter support on glusted >>>>>> however, it is not available right now. >>>>>> https://review.gluster.org/#/c/glusterfs/+/22612/ >>>>>> >>>>>> --- >>>>>> Ashish >>>>>> >>>>>> ------------------------------ >>>>>> *From: *"David Cunningham" >>>>>> *To: *gluster-users at gluster.org >>>>>> *Sent: *Friday, May 3, 2019 7:40:03 AM >>>>>> *Subject: *[Gluster-users] Thin-arbiter questions >>>>>> >>>>>> Hello, >>>>>> >>>>>> We are setting up a thin-arbiter and hope someone can help with some >>>>>> questions. We've been following the documentation from >>>>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ >>>>>> . >>>>>> >>>>>> 1. What release of 5.x supports thin-arbiter? We tried a "gluster >>>>>> volume create" with the --thin-arbiter option on 5.5 and got an >>>>>> "unrecognized option --thin-arbiter" error. >>>>>> >>>>>> 2. The instruction to create a new volume with a thin-arbiter is >>>>>> clear. How do you add a thin-arbiter to an already existing volume though? >>>>>> >>>>>> 3. The documentation suggests running glusterfsd manually to start >>>>>> the thin-arbiter. Is there a service that can do this instead? I found a >>>>>> mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 >>>>>> but it's not really documented. >>>>>> >>>>>> Thanks in advance for your help, >>>>>> >>>>>> -- >>>>>> David Cunningham, Voisonics Limited >>>>>> http://voisonics.com/ >>>>>> USA: +1 213 221 1092 >>>>>> New Zealand: +64 (0)28 2558 3782 >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> >>>>> >>>>> -- >>>>> David Cunningham, Voisonics Limited >>>>> http://voisonics.com/ >>>>> USA: +1 213 221 1092 >>>>> New Zealand: +64 (0)28 2558 3782 >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >>>> >>>> -- >>>> David Cunningham, Voisonics Limited >>>> http://voisonics.com/ >>>> USA: +1 213 221 1092 >>>> New Zealand: +64 (0)28 2558 3782 >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>> >>> -- >>> David Cunningham, Voisonics Limited >>> http://voisonics.com/ >>> USA: +1 213 221 1092 >>> New Zealand: +64 (0)28 2558 3782 >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.buitelaar at gmail.com Wed Jun 19 12:21:29 2019 From: olaf.buitelaar at gmail.com (Olaf Buitelaar) Date: Wed, 19 Jun 2019 14:21:29 +0200 Subject: [Gluster-users] glusterd crashes on Assertion failed: rsp.op == txn_op_info.op Message-ID: Dear All, Has anybody seen this error on gluster 5.6; [glusterd-rpc-ops.c:1388:__glusterd_commit_op_cbk] (-->/lib64/libgfrpc.so.0(+0xec60) [0x7fbfb7801c60] -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x79b7a) [0x7fbfac50db7a] -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x77393) [0x7fbfac50b393] ) 0-: Assertion failed: rsp.op == txn_op_info.op checking the code; https://github.com/gluster/glusterfs/blob/6fd8281ac9af58609979f660ece58c2ed1100e72/xlators/mgmt/glusterd/src/glusterd-rpc-ops.c#L1388 doesn't seem to reveal much on what could causing this. It's the second time this occurs. Attached the full stack. Thanks Olaf -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- [2019-06-19 07:25:03.289265] E [glusterd-rpc-ops.c:1388:__glusterd_commit_op_cbk] (-->/lib64/libgfrpc.so.0(+0xec60) [0x7fbfb7801c60] -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x79b7a) [0x7fbfac50db7a] -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x77393) [0x7fbfac50b393] ) 0-: Assertion failed: rsp.op == txn_op_info.op [2019-06-19 07:25:03.289410] E [glusterd-rpc-ops.c:1388:__glusterd_commit_op_cbk] (-->/lib64/libgfrpc.so.0(+0xec60) [0x7fbfb7801c60] -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x79b7a) [0x7fbfac50db7a] -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x77393) [0x7fbfac50b393] ) 0-: Assertion failed: rsp.op == txn_op_info.op [2019-06-19 07:25:03.290103] E [glusterd-rpc-ops.c:1388:__glusterd_commit_op_cbk] (-->/lib64/libgfrpc.so.0(+0xec60) [0x7fbfb7801c60] -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x79b7a) [0x7fbfac50db7a] -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x77393) [0x7fbfac50b393] ) 0-: Assertion failed: rsp.op == txn_op_info.op [2019-06-19 07:25:03.291407] E [glusterd-rpc-ops.c:1388:__glusterd_commit_op_cbk] (-->/lib64/libgfrpc.so.0(+0xec60) [0x7fbfb7801c60] -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x79b7a) [0x7fbfac50db7a] -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x77393) [0x7fbfac50b393] ) 0-: Assertion failed: rsp.op == txn_op_info.op [2019-06-19 07:25:03.299574] E [glusterd-rpc-ops.c:1388:__glusterd_commit_op_cbk] (-->/lib64/libgfrpc.so.0(+0xec60) [0x7fbfb7801c60] -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x79b7a) [0x7fbfac50db7a] -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x77393) [0x7fbfac50b393] ) 0-: Assertion failed: rsp.op == txn_op_info.op pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2019-06-19 07:25:03 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 5.6 /lib64/libglusterfs.so.0(+0x26610)[0x7fbfb7a35610] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fbfb7a3fbc4] /lib64/libc.so.6(+0x36280)[0x7fbfb6098280] /lib64/libglusterfs.so.0(+0x197a8)[0x7fbfb7a287a8] /lib64/libglusterfs.so.0(+0x1ce18)[0x7fbfb7a2be18] /lib64/libglusterfs.so.0(dict_get_strn+0x67)[0x7fbfb7a2d887] /usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x63ef8)[0x7fbfac4f7ef8] /usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x775dc)[0x7fbfac50b5dc] /usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x79b7a)[0x7fbfac50db7a] /lib64/libgfrpc.so.0(+0xec60)[0x7fbfb7801c60] /lib64/libgfrpc.so.0(+0xf033)[0x7fbfb7802033] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fbfb77fdf23] /usr/lib64/glusterfs/5.6/rpc-transport/socket.so(+0xa026)[0x7fbfa94e4026] /lib64/libglusterfs.so.0(+0x8aa79)[0x7fbfb7a99a79] /lib64/libpthread.so.0(+0x7dd5)[0x7fbfb6898dd5] /lib64/libc.so.6(clone+0x6d)[0x7fbfb615fead] --------- From amukherj at redhat.com Wed Jun 19 15:57:14 2019 From: amukherj at redhat.com (Atin Mukherjee) Date: Wed, 19 Jun 2019 21:27:14 +0530 Subject: [Gluster-users] glusterd crashes on Assertion failed: rsp.op == txn_op_info.op In-Reply-To: References: Message-ID: Please see - https://bugzilla.redhat.com/show_bug.cgi?id=1655827 On Wed, Jun 19, 2019 at 5:52 PM Olaf Buitelaar wrote: > Dear All, > > Has anybody seen this error on gluster 5.6; > [glusterd-rpc-ops.c:1388:__glusterd_commit_op_cbk] > (-->/lib64/libgfrpc.so.0(+0xec60) [0x7fbfb7801c60] > -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x79b7a) > [0x7fbfac50db7a] > -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x77393) > [0x7fbfac50b393] ) 0-: Assertion failed: rsp.op == txn_op_info.op > > checking the code; > https://github.com/gluster/glusterfs/blob/6fd8281ac9af58609979f660ece58c2ed1100e72/xlators/mgmt/glusterd/src/glusterd-rpc-ops.c#L1388 > > doesn't seem to reveal much on what could causing this. > > It's the second time this occurs. > > Attached the full stack. > > Thanks Olaf > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Wed Jun 19 16:06:10 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Wed, 19 Jun 2019 21:36:10 +0530 Subject: [Gluster-users] Pending heal status when deleting files which are marked as to be healed In-Reply-To: References: Message-ID: <165ac2cb-4e12-81bf-bb47-ef800adf6652@redhat.com> On 17/06/19 3:45 PM, David Spisla wrote: > Hello Gluster Community, > > my newest observation concerns the self heal daemon: > Scenario: 2 Node Gluster v5.5 Cluster with Replica 2 Volume. Just one > brick per node. Access via SMB Client from a Win10 machine > > How to reproduce: > I have created a small folder with a lot of small files and I copied > that folder recursively into itself for a few times. Additionally I > copied three big folders with a lot of content into the root of the > volume. > Note: There was no node down or something else like brick down, etc.. > So the whole volume was accessible. > > Because of the recursively copy action all this copied files whre > listed as to be healed (via gluster heal info). This is odd. How did you conclude that writing to the volume (i.e. recursive copy) was the reason for the files to be needing heal? Did you check if there were any gluster messages about disconnects in the smb client logs? > Now I set some of the effected files ReadOnly (they get WORMed because > worm-file-level is enabled). After this I tried to delete the parent > folder of that files. > > Expected: All files should be healed > Actually: All files, which are Read-Only, are not healed. heal info > shows permanently that this files has to be healed. Does disabling read-only let the files to be healed? > > glustershd log throws error and brick log (with level DEBUG) > permanently throws a lot of messages which I don't understand. See the > attached file which contains all informations, also heal info and > volume info, beside the logs > > Maybe some of you know whats going on there? Since we can reproduce > this scenario, we can give more debug information if needed. Is it possible to? script the list of steps to reproduce this issue? Regards, Ravi > > Regards > David Spisla > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.buitelaar at gmail.com Wed Jun 19 17:00:25 2019 From: olaf.buitelaar at gmail.com (Olaf Buitelaar) Date: Wed, 19 Jun 2019 19:00:25 +0200 Subject: [Gluster-users] glusterd crashes on Assertion failed: rsp.op == txn_op_info.op In-Reply-To: References: Message-ID: Hi Atin, Thank you for pointing out this bug report, however no rebalancing task was running during this event. So maybe something else is causing this? According the report this should be fixed in gluster 6, unfortunate ovirt doesn't seem to officially support that version, so i'm stuck on the 5 branch for now. Any chance this will be back ported? Thanks Olaf Op wo 19 jun. 2019 om 17:57 schreef Atin Mukherjee : > Please see - https://bugzilla.redhat.com/show_bug.cgi?id=1655827 > > > > On Wed, Jun 19, 2019 at 5:52 PM Olaf Buitelaar > wrote: > >> Dear All, >> >> Has anybody seen this error on gluster 5.6; >> [glusterd-rpc-ops.c:1388:__glusterd_commit_op_cbk] >> (-->/lib64/libgfrpc.so.0(+0xec60) [0x7fbfb7801c60] >> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x79b7a) >> [0x7fbfac50db7a] >> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x77393) >> [0x7fbfac50b393] ) 0-: Assertion failed: rsp.op == txn_op_info.op >> >> checking the code; >> https://github.com/gluster/glusterfs/blob/6fd8281ac9af58609979f660ece58c2ed1100e72/xlators/mgmt/glusterd/src/glusterd-rpc-ops.c#L1388 >> >> doesn't seem to reveal much on what could causing this. >> >> It's the second time this occurs. >> >> Attached the full stack. >> >> Thanks Olaf >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srakonde at redhat.com Thu Jun 20 06:34:51 2019 From: srakonde at redhat.com (Sanju Rakonde) Date: Thu, 20 Jun 2019 12:04:51 +0530 Subject: [Gluster-users] glusterd crashes on Assertion failed: rsp.op == txn_op_info.op In-Reply-To: References: Message-ID: Olaf, Can you please paste complete backtrace from the core file, so that we can analyse what is wrong here. On Wed, Jun 19, 2019 at 10:31 PM Olaf Buitelaar wrote: > Hi Atin, > > Thank you for pointing out this bug report, however no rebalancing task > was running during this event. So maybe something else is causing this? > According the report this should be fixed in gluster 6, unfortunate ovirt > doesn't seem to officially support that version, so i'm stuck on the 5 > branch for now. > Any chance this will be back ported? > > Thanks Olaf > > > Op wo 19 jun. 2019 om 17:57 schreef Atin Mukherjee : > >> Please see - https://bugzilla.redhat.com/show_bug.cgi?id=1655827 >> >> >> >> On Wed, Jun 19, 2019 at 5:52 PM Olaf Buitelaar >> wrote: >> >>> Dear All, >>> >>> Has anybody seen this error on gluster 5.6; >>> [glusterd-rpc-ops.c:1388:__glusterd_commit_op_cbk] >>> (-->/lib64/libgfrpc.so.0(+0xec60) [0x7fbfb7801c60] >>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x79b7a) >>> [0x7fbfac50db7a] >>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x77393) >>> [0x7fbfac50b393] ) 0-: Assertion failed: rsp.op == txn_op_info.op >>> >>> checking the code; >>> https://github.com/gluster/glusterfs/blob/6fd8281ac9af58609979f660ece58c2ed1100e72/xlators/mgmt/glusterd/src/glusterd-rpc-ops.c#L1388 >>> >>> doesn't seem to reveal much on what could causing this. >>> >>> It's the second time this occurs. >>> >>> Attached the full stack. >>> >>> Thanks Olaf >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: From cristian.delcarlo at targetsolutions.it Thu Jun 20 10:12:34 2019 From: cristian.delcarlo at targetsolutions.it (Cristian Del Carlo) Date: Thu, 20 Jun 2019 12:12:34 +0200 Subject: [Gluster-users] General questions Message-ID: Hi, I'm testing glusterfs before using it in production, it should be used to store vm for nodes with libvirtd. In production I will have 4 nodes connected with a dedicated 20gbit/s network. Which version to use in production on a centos 7.x? Should I use Gluster version 6? To make the volume available to libvirtd the best method is to use FUSE? I see that stripped is deprecated. Is it reasonable to use the volume with 3 replicas on 4 nodes and sharding enabled? Is there convenience to use sharding volume in this context? I think could positive inpact in read performance or rebalance. Is it true? In the vm configuration I use the virtio disk. How is it better to set the disk cache to get the best performances none, default or writeback? Thanks in advance for your patience and answers. Thanks, *Cristian Del Carlo* -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Jun 20 10:26:18 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Thu, 20 Jun 2019 13:26:18 +0300 Subject: [Gluster-users] General questions Message-ID: Hi, Are you planing to use oVirt or plain KVM or openstack? I would recommend you to use gluster v6.1 as it is the latest stable version and will have longer support than the older versions. Fuse vs libgfapi - use the latter as it has better performance and less overhead on the host.oVirt does supports both libgfapi and fuse. Also, use replica 3 because you will have better read performance compared to replica 2 arbiter 1. Sharding is a tradeoff between CPU (when there is no sharding , gluster shd must calculate the offset of the VM disk) and bandwidth (whole shard is being replicated despite even 512 need to be synced). If you will do live migration - you do not want to cache in order to avoid corruption. Thus oVirt is using direct I/O. Still, you can check the gluster settings mentioned in Red Hat documentation for Virt/openStack . Best Regards, Strahil Nikolov On Jun 20, 2019 13:12, Cristian Del Carlo wrote: > > Hi, > > I'm testing glusterfs before using it in production, it should be used to store vm for nodes with libvirtd. > > In production I will have 4 nodes connected with a dedicated 20gbit/s network. > > Which version to use in production on a centos 7.x? Should I use Gluster version 6? > > To make the volume available to libvirtd the best method is to use FUSE? > > I see that stripped is deprecated. Is it reasonable to use the volume with 3 replicas on 4 nodes and? sharding enabled? > Is there convenience to use sharding volume in this context? I think could positive inpact in read performance or rebalance. Is it true? > > In the vm configuration I use the virtio disk. How is it better to set the disk cache to get the best performances none, default or writeback? > > Thanks in advance for your patience and answers. > > Thanks, > > > Cristian Del Carlo -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.buitelaar at gmail.com Thu Jun 20 12:52:11 2019 From: olaf.buitelaar at gmail.com (Olaf Buitelaar) Date: Thu, 20 Jun 2019 14:52:11 +0200 Subject: [Gluster-users] glusterd crashes on Assertion failed: rsp.op == txn_op_info.op In-Reply-To: References: Message-ID: Hi Sanju, you can download the coredump here; http://edgecastcdn.net/0004FA/files/core_dump.zip (around 20MB) Thanks Olaf Op do 20 jun. 2019 om 08:35 schreef Sanju Rakonde : > Olaf, > > Can you please paste complete backtrace from the core file, so that we can > analyse what is wrong here. > > On Wed, Jun 19, 2019 at 10:31 PM Olaf Buitelaar > wrote: > >> Hi Atin, >> >> Thank you for pointing out this bug report, however no rebalancing task >> was running during this event. So maybe something else is causing this? >> According the report this should be fixed in gluster 6, unfortunate ovirt >> doesn't seem to officially support that version, so i'm stuck on the 5 >> branch for now. >> Any chance this will be back ported? >> >> Thanks Olaf >> >> >> Op wo 19 jun. 2019 om 17:57 schreef Atin Mukherjee : >> >>> Please see - https://bugzilla.redhat.com/show_bug.cgi?id=1655827 >>> >>> >>> >>> On Wed, Jun 19, 2019 at 5:52 PM Olaf Buitelaar >>> wrote: >>> >>>> Dear All, >>>> >>>> Has anybody seen this error on gluster 5.6; >>>> [glusterd-rpc-ops.c:1388:__glusterd_commit_op_cbk] >>>> (-->/lib64/libgfrpc.so.0(+0xec60) [0x7fbfb7801c60] >>>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x79b7a) >>>> [0x7fbfac50db7a] >>>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x77393) >>>> [0x7fbfac50b393] ) 0-: Assertion failed: rsp.op == txn_op_info.op >>>> >>>> checking the code; >>>> https://github.com/gluster/glusterfs/blob/6fd8281ac9af58609979f660ece58c2ed1100e72/xlators/mgmt/glusterd/src/glusterd-rpc-ops.c#L1388 >>>> >>>> doesn't seem to reveal much on what could causing this. >>>> >>>> It's the second time this occurs. >>>> >>>> Attached the full stack. >>>> >>>> Thanks Olaf >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Thanks, > Sanju > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.buitelaar at gmail.com Thu Jun 20 14:00:07 2019 From: olaf.buitelaar at gmail.com (Olaf Buitelaar) Date: Thu, 20 Jun 2019 16:00:07 +0200 Subject: [Gluster-users] glusterd crashes on Assertion failed: rsp.op == txn_op_info.op In-Reply-To: References: Message-ID: Hi Sanju, going through the stacks i noticed that this function was in between; glusterd_volume_rebalance_use_rsp_dict So it might after all have todo something with the rebalancing logic. I've checked the cmd_history.log and exactly on the time of crash time command was executed; [2019-06-19 07:25:03.108360] : volume rebalance ovirt-data status : SUCCESS preceding a couple of other status checks of rebalancing. The complete batch of 2 mins before, all reported success. These commands are executed by ovirt about every 2 minutes, to pull for the status of gluster. I'm sure no actual rebalancing tasks were running, also checked the last time that was @2019-06-08 21:13:02 and was completed successfully Hopefully this is additional useful info. Thanks Olaf Op do 20 jun. 2019 om 14:52 schreef Olaf Buitelaar : > Hi Sanju, > > you can download the coredump here; > http://edgecastcdn.net/0004FA/files/core_dump.zip (around 20MB) > > Thanks Olaf > > Op do 20 jun. 2019 om 08:35 schreef Sanju Rakonde : > >> Olaf, >> >> Can you please paste complete backtrace from the core file, so that we >> can analyse what is wrong here. >> >> On Wed, Jun 19, 2019 at 10:31 PM Olaf Buitelaar >> wrote: >> >>> Hi Atin, >>> >>> Thank you for pointing out this bug report, however no rebalancing task >>> was running during this event. So maybe something else is causing this? >>> According the report this should be fixed in gluster 6, unfortunate >>> ovirt doesn't seem to officially support that version, so i'm stuck on the >>> 5 branch for now. >>> Any chance this will be back ported? >>> >>> Thanks Olaf >>> >>> >>> Op wo 19 jun. 2019 om 17:57 schreef Atin Mukherjee >> >: >>> >>>> Please see - https://bugzilla.redhat.com/show_bug.cgi?id=1655827 >>>> >>>> >>>> >>>> On Wed, Jun 19, 2019 at 5:52 PM Olaf Buitelaar < >>>> olaf.buitelaar at gmail.com> wrote: >>>> >>>>> Dear All, >>>>> >>>>> Has anybody seen this error on gluster 5.6; >>>>> [glusterd-rpc-ops.c:1388:__glusterd_commit_op_cbk] >>>>> (-->/lib64/libgfrpc.so.0(+0xec60) [0x7fbfb7801c60] >>>>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x79b7a) >>>>> [0x7fbfac50db7a] >>>>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x77393) >>>>> [0x7fbfac50b393] ) 0-: Assertion failed: rsp.op == txn_op_info.op >>>>> >>>>> checking the code; >>>>> https://github.com/gluster/glusterfs/blob/6fd8281ac9af58609979f660ece58c2ed1100e72/xlators/mgmt/glusterd/src/glusterd-rpc-ops.c#L1388 >>>>> >>>>> doesn't seem to reveal much on what could causing this. >>>>> >>>>> It's the second time this occurs. >>>>> >>>>> Attached the full stack. >>>>> >>>>> Thanks Olaf >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> Thanks, >> Sanju >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cristian.delcarlo at targetsolutions.it Thu Jun 20 15:31:48 2019 From: cristian.delcarlo at targetsolutions.it (Cristian Del Carlo) Date: Thu, 20 Jun 2019 17:31:48 +0200 Subject: [Gluster-users] General questions In-Reply-To: References: Message-ID: Hi, thanks for your help. I am planing to use libvirtd with plain KVM. Ok i will use libgfapi. I'm confused about the use of sharding is it useful in this configuration? Doesn't sharding help limit the bandwidth in the event of a rebalancing? In the vm setting so i need to use directsync to avoid corruption. Thanks again, Il giorno gio 20 giu 2019 alle ore 12:25 Strahil ha scritto: > Hi, > > Are you planing to use oVirt or plain KVM or openstack? > > I would recommend you to use gluster v6.1 as it is the latest stable > version and will have longer support than the older versions. > > Fuse vs libgfapi - use the latter as it has better performance and less > overhead on the host.oVirt does supports both libgfapi and fuse. > > Also, use replica 3 because you will have better read performance compared > to replica 2 arbiter 1. > > Sharding is a tradeoff between CPU (when there is no sharding , gluster > shd must calculate the offset of the VM disk) and bandwidth (whole shard > is being replicated despite even 512 need to be synced). > > If you will do live migration - you do not want to cache in order to > avoid corruption. > Thus oVirt is using direct I/O. > Still, you can check the gluster settings mentioned in Red Hat > documentation for Virt/openStack . > > Best Regards, > Strahil Nikolov > On Jun 20, 2019 13:12, Cristian Del Carlo < > cristian.delcarlo at targetsolutions.it> wrote: > > Hi, > > I'm testing glusterfs before using it in production, it should be used to > store vm for nodes with libvirtd. > > In production I will have 4 nodes connected with a dedicated 20gbit/s > network. > > Which version to use in production on a centos 7.x? Should I use Gluster > version 6? > > To make the volume available to libvirtd the best method is to use FUSE? > > I see that stripped is deprecated. Is it reasonable to use the volume with > 3 replicas on 4 nodes and sharding enabled? > Is there convenience to use sharding volume in this context? I think could > positive inpact in read performance or rebalance. Is it true? > > In the vm configuration I use the virtio disk. How is it better to set the > disk cache to get the best performances none, default or writeback? > > Thanks in advance for your patience and answers. > > Thanks, > > > *Cristian Del Carlo* > > -- *Cristian Del Carlo* *Target Solutions s.r.l.* *T* +39 0583 1905621 *F* +39 0583 1905675 *@* cristian.delcarlo at targetsolutions.it http://www.targetsolutions.it P.IVA e C.Fiscale: 01815270465 Reg. Imp. di Lucca Capitale Sociale: ?11.000,00 iv - REA n? 173227 Il testo e gli eventuali documenti trasmessi contengono informazioni riservate al destinatario indicato. La seguente e-mail e' confidenziale e la sua riservatezza e' tutelata legalmente dal Decreto Legislativo 196 del 30/06/2003 (Codice di tutela della privacy). La lettura, copia o altro uso non autorizzato o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere, immediatamente, alla sua distruzione. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nico at furyweb.fr Thu Jun 20 17:01:40 2019 From: nico at furyweb.fr (nico at furyweb.fr) Date: Thu, 20 Jun 2019 19:01:40 +0200 (CEST) Subject: [Gluster-users] What is TCP/4007 for ? Message-ID: <850101223.444.1561050100860.JavaMail.zimbra@furyweb.fr> I have several Gluster clients behind firewalls and opened 24007:24008,49152:49241 to gluster servers. I recently found that TCP/4007 is blocked from clients to servers but found this port reference nowhere on google search. On server side there's no process listening on TCP/4007 so I would like to disable it on client side, maybe a volume setting should be initialized. Gluster servers are 5.1 Volumes are replica 3 with arbiter Clients are 3.12.15 (Wheezy) or 4.1.5 (Jessie/Stretch) I thank anyone able to give me some information. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Jun 20 20:13:37 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Thu, 20 Jun 2019 20:13:37 +0000 (UTC) Subject: [Gluster-users] General questions In-Reply-To: References: Message-ID: <1827082585.3381556.1561061617490@mail.yahoo.com> Sharding is complex. It helps to heal faster -as only the shards that got changed will be replicated, but imagine a 1GB shard that got only 512k updated - in such case you will copy the whole shard to the other replicas.RHV & oVirt use a default shard size of 4M which is the exact size of the default PE in LVM. On the other side, it speeds stuff as gluster can balance the shards properly on the replicas and thus you can evenly distribute the load on the cluster.It is not a coincidence that RHV and oVirt use sharding by default. Just a warning.NEVER, EVER, DISABLE SHARDING!!! ONCE ENABLED - STAYS ENABLED!Don't ask how I learnt that :) Best Regards,Strahil Nikolov ? ?????????, 20 ??? 2019 ?., 18:32:00 ?. ???????+3, Cristian Del Carlo ??????: Hi, thanks for your help. I am planing to use libvirtd with plain KVM. Ok i will use libgfapi. I'm confused about the use of sharding is it useful in this configuration? Doesn't sharding help limit the bandwidth in the event of a rebalancing? In the vm setting so i need to use directsync to avoid corruption. Thanks again, Il giorno gio 20 giu 2019 alle ore 12:25 Strahil ha scritto: Hi, Are you planing to use oVirt or plain KVM or openstack? I would recommend you to use gluster v6.1 as it is the latest stable version and will have longer support than the older versions. Fuse vs libgfapi - use the latter as it has better performance and less overhead on the host.oVirt does supports both libgfapi and fuse. Also, use replica 3 because you will have better read performance compared to replica 2 arbiter 1. Sharding is a tradeoff? between CPU (when there is no sharding , gluster shd must calculate the offset of the VM disk) and bandwidth (whole shard? is being replicated despite even 512 need to be synced). If you will do live migration -? you do not want to cache in order to avoid? corruption. Thus oVirt is using direct I/O. Still, you can check the gluster settings mentioned in Red Hat documentation for Virt/openStack . Best Regards, Strahil Nikolov On Jun 20, 2019 13:12, Cristian Del Carlo wrote: Hi, I'm testing glusterfs before using it in production, it should be used to store vm for nodes with libvirtd. In production I will have 4 nodes connected with a dedicated 20gbit/s network. Which version to use in production on a centos 7.x? Should I use Gluster version 6? To make the volume available to libvirtd the best method is to use FUSE? I see that stripped is deprecated. Is it reasonable to use the volume with 3 replicas on 4 nodes and? sharding enabled? Is there convenience to use sharding volume in this context? I think could positive inpact in read performance or rebalance. Is it true? In the vm configuration I use the virtio disk. How is it better to set the disk cache to get the best performances none, default or writeback? Thanks in advance for your patience and answers. Thanks, Cristian Del Carlo -- Cristian Del Carlo Target Solutions s.r.l. T +39 0583 1905621F +39 0583 1905675@ cristian.delcarlo at targetsolutions.it http://www.targetsolutions.it P.IVA e C.Fiscale: 01815270465? Reg. Imp. di Lucca Capitale Sociale: ??11.000,00 iv - REA n? 173227 Il testo e gli eventuali documenti trasmessi contengono informazioni riservate al destinatario indicato. La seguente e-mail e' confidenziale e la sua riservatezza e' tutelata legalmente dal Decreto Legislativo 196 del 30/06/2003 (Codice di tutela della privacy). La lettura, copia o altro uso non autorizzato o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere, immediatamente, alla sua distruzione. -------------- next part -------------- An HTML attachment was scrubbed... URL: From srakonde at redhat.com Fri Jun 21 05:54:39 2019 From: srakonde at redhat.com (Sanju Rakonde) Date: Fri, 21 Jun 2019 11:24:39 +0530 Subject: [Gluster-users] glusterd crashes on Assertion failed: rsp.op == txn_op_info.op In-Reply-To: References: Message-ID: Thanks for the update Olaf. You are hitting the bug, which is mentioned by Atin in this mail thread. I'm not sure whether we can backport the fix to release-5 branch. I will update you regarding this in early next week. On Thu, Jun 20, 2019 at 7:30 PM Olaf Buitelaar wrote: > Hi Sanju, > > going through the stacks i noticed that this function was in > between; glusterd_volume_rebalance_use_rsp_dict > So it might after all have todo something with the rebalancing logic. > I've checked the cmd_history.log and exactly on the time of crash time > command was executed; > [2019-06-19 07:25:03.108360] : volume rebalance ovirt-data status : > SUCCESS preceding a couple of other status checks of rebalancing. The > complete batch of 2 mins before, all reported success. > These commands are executed by ovirt about every 2 minutes, to pull for > the status of gluster. > I'm sure no actual rebalancing tasks were running, also checked the last > time that was @2019-06-08 21:13:02 and was completed successfully > Hopefully this is additional useful info. > > Thanks Olaf > > Op do 20 jun. 2019 om 14:52 schreef Olaf Buitelaar < > olaf.buitelaar at gmail.com>: > >> Hi Sanju, >> >> you can download the coredump here; >> http://edgecastcdn.net/0004FA/files/core_dump.zip (around 20MB) >> >> Thanks Olaf >> >> Op do 20 jun. 2019 om 08:35 schreef Sanju Rakonde : >> >>> Olaf, >>> >>> Can you please paste complete backtrace from the core file, so that we >>> can analyse what is wrong here. >>> >>> On Wed, Jun 19, 2019 at 10:31 PM Olaf Buitelaar < >>> olaf.buitelaar at gmail.com> wrote: >>> >>>> Hi Atin, >>>> >>>> Thank you for pointing out this bug report, however no rebalancing task >>>> was running during this event. So maybe something else is causing this? >>>> According the report this should be fixed in gluster 6, unfortunate >>>> ovirt doesn't seem to officially support that version, so i'm stuck on the >>>> 5 branch for now. >>>> Any chance this will be back ported? >>>> >>>> Thanks Olaf >>>> >>>> >>>> Op wo 19 jun. 2019 om 17:57 schreef Atin Mukherjee >>> >: >>>> >>>>> Please see - https://bugzilla.redhat.com/show_bug.cgi?id=1655827 >>>>> >>>>> >>>>> >>>>> On Wed, Jun 19, 2019 at 5:52 PM Olaf Buitelaar < >>>>> olaf.buitelaar at gmail.com> wrote: >>>>> >>>>>> Dear All, >>>>>> >>>>>> Has anybody seen this error on gluster 5.6; >>>>>> [glusterd-rpc-ops.c:1388:__glusterd_commit_op_cbk] >>>>>> (-->/lib64/libgfrpc.so.0(+0xec60) [0x7fbfb7801c60] >>>>>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x79b7a) >>>>>> [0x7fbfac50db7a] >>>>>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x77393) >>>>>> [0x7fbfac50b393] ) 0-: Assertion failed: rsp.op == txn_op_info.op >>>>>> >>>>>> checking the code; >>>>>> https://github.com/gluster/glusterfs/blob/6fd8281ac9af58609979f660ece58c2ed1100e72/xlators/mgmt/glusterd/src/glusterd-rpc-ops.c#L1388 >>>>>> >>>>>> doesn't seem to reveal much on what could causing this. >>>>>> >>>>>> It's the second time this occurs. >>>>>> >>>>>> Attached the full stack. >>>>>> >>>>>> Thanks Olaf >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> -- >>> Thanks, >>> Sanju >>> >> -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: From cristian.delcarlo at targetsolutions.it Fri Jun 21 07:12:26 2019 From: cristian.delcarlo at targetsolutions.it (Cristian Del Carlo) Date: Fri, 21 Jun 2019 09:12:26 +0200 Subject: [Gluster-users] General questions In-Reply-To: <1827082585.3381556.1561061617490@mail.yahoo.com> References: <1827082585.3381556.1561061617490@mail.yahoo.com> Message-ID: Thanks Strahil, in this link https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/sect-creating_replicated_volumes i see: *Sharding has one supported use case: in the context of providing Red Hat Gluster Storage as a storage domain for Red Hat Enterprise Virtualization, to provide storage for live virtual machine images. Note that sharding is also a requirement for this use case, as it provides significant performance improvements over previous implementations. * The deafult setting in GusterFS 6.1 appears to be: features.shard-block-size 64MB features.shard-lru-limit 16384 features.shard-deletion-rate 100 Bricks in my case are over an xfs filesystem. I'll try different block-size but if I understand correctly, small block sizes are preferable to big block sizes and If i have doubt I will put 4M. Very thanks for the warning, message received! :-) Best Regards, Cristian Il giorno gio 20 giu 2019 alle ore 22:13 Strahil Nikolov < hunter86_bg at yahoo.com> ha scritto: > Sharding is complex. It helps to heal faster -as only the shards that got > changed will be replicated, but imagine a 1GB shard that got only 512k > updated - in such case you will copy the whole shard to the other replicas. > RHV & oVirt use a default shard size of 4M which is the exact size of the > default PE in LVM. > > On the other side, it speeds stuff as gluster can balance the shards > properly on the replicas and thus you can evenly distribute the load on the > cluster. > It is not a coincidence that RHV and oVirt use sharding by default. > > Just a warning. > NEVER, EVER, DISABLE SHARDING!!! ONCE ENABLED - STAYS ENABLED! > Don't ask how I learnGrazie dell'avviso, messaggio ricevuto!t that :) > > Best Regards, > Strahil Nikolov > > > > ? ?????????, 20 ??? 2019 ?., 18:32:00 ?. ???????+3, Cristian Del Carlo < > cristian.delcarlo at targetsolutions.it> ??????: > > > Hi, > > thanks for your help. > > I am planing to use libvirtd with plain KVM. > > Ok i will use libgfapi. > > I'm confused about the use of sharding is it useful in this configuration? > Doesn't sharding help limit the bandwidth in the event of a rebalancing? > > In the vm setting so i need to use directsync to avoid corruption. > > Thanks again, > > Il giorno gio 20 giu 2019 alle ore 12:25 Strahil > ha scritto: > > Hi, > > Are you planing to use oVirt or plain KVM or openstack? > > I would recommend you to use gluster v6.1 as it is the latest stable > version and will have longer support than the older versions. > > Fuse vs libgfapi - use the latter as it has better performance and less > overhead on the host.oVirt does supports both libgfapi and fuse. > > Also, use replica 3 because you will have better read performance compared > to replica 2 arbiter 1. > > Sharding is a tradeoff between CPU (when there is no sharding , gluster > shd must calculate the offset of the VM disk) and bandwidth (whole shard > is being replicated despite even 512 need to be synced). > > If you will do live migration - you do not want to cache in order to > avoid corruption. > Thus oVirt is using direct I/O. > Still, you can check the gluster settings mentioned in Red Hat > documentation for Virt/openStack . > > Best Regards, > Strahil Nikolov > On Jun 20, 2019 13:12, Cristian Del Carlo < > cristian.delcarlo at targetsolutions.it> wrote: > > Hi, > > I'm testing glusterfs before using it in production, it should be used to > store vm for nodes with libvirtd. > > In production I will have 4 nodes connected with a dedicated 20gbit/s > network. > > Which version to use in production on a centos 7.x? Should I use Gluster > version 6? > > To make the volume available to libvirtd the best method is to use FUSE? > > I see that stripped is deprecated. Is it reasonable to use the volume with > 3 replicas on 4 nodes and sharding enabled? > Is there convenience to use sharding volume in this context? I think could > positive inpact in read performance or rebalance. Is it true? > > In the vm configuration I use the virtio disk. How is it better to set the > disk cache to get the best performances none, default or writeback? > > Thanks in advance for your patience and answers. > > Thanks, > > > *Cristian Del Carlo* > > > > -- > > > *Cristian Del Carlo* > > *Target Solutions s.r.l.* > > *T* +39 0583 1905621 > *F* +39 0583 1905675 > *@* cristian.delcarlo at targetsolutions.it > > http://www.targetsolutions.it > P.IVA e C.Fiscale: 01815270465 Reg. Imp. di Lucca > Capitale Sociale: ?11.000,00 iv - REA n? 173227 > > Il testo e gli eventuali documenti trasmessi contengono informazioni > riservate al destinatario indicato. La seguente e-mail e' confidenziale e > la sua riservatezza e' tutelata legalmente dal Decreto Legislativo 196 del > 30/06/2003 (Codice di tutela della privacy). La lettura, copia o altro uso > non autorizzato o qualsiasi altra azione derivante dalla conoscenza di > queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto > questo documento per errore siete cortesemente pregati di darne immediata > comunicazione al mittente e di provvedere, immediatamente, alla sua > distruzione. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdhananj at redhat.com Fri Jun 21 07:40:30 2019 From: kdhananj at redhat.com (Krutika Dhananjay) Date: Fri, 21 Jun 2019 13:10:30 +0530 Subject: [Gluster-users] General questions In-Reply-To: References: <1827082585.3381556.1561061617490@mail.yahoo.com> Message-ID: Adding (back) gluster-users. -Krutika On Fri, Jun 21, 2019 at 1:09 PM Krutika Dhananjay wrote: > > > On Fri, Jun 21, 2019 at 12:43 PM Cristian Del Carlo < > cristian.delcarlo at targetsolutions.it> wrote: > >> Thanks Strahil, >> >> in this link >> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/sect-creating_replicated_volumes >> i see: >> >> >> *Sharding has one supported use case: in the context of providing Red Hat >> Gluster Storage as a storage domain for Red Hat Enterprise Virtualization, >> to provide storage for live virtual machine images. Note that sharding is >> also a requirement for this use case, as it provides significant >> performance improvements over previous implementations. * >> >> The deafult setting in GusterFS 6.1 appears to be: >> >> features.shard-block-size 64MB >> >> features.shard-lru-limit 16384 >> >> features.shard-deletion-rate 100 >> > > That's right. Based on the tests we'd conducted internally, we'd found > 64MB to be a good number both in terms of self-heal and IO performance. 4MB > is a little on the lower side in that sense. The benefits of some features > like eager-locking are lost if the shard size is too small. You can perhaps > run some tests with 64MB shard-block-size to begin with, and tune it if it > doesn't fit your needs. > > -Krutika > > >> Bricks in my case are over an xfs filesystem. I'll try different >> block-size but if I understand correctly, small block sizes are preferable >> to big block sizes and If i have doubt I will put 4M. >> >> Very thanks for the warning, message received! :-) >> >> Best Regards, >> >> Cristian >> >> >> Il giorno gio 20 giu 2019 alle ore 22:13 Strahil Nikolov < >> hunter86_bg at yahoo.com> ha scritto: >> >>> Sharding is complex. It helps to heal faster -as only the shards that >>> got changed will be replicated, but imagine a 1GB shard that got only 512k >>> updated - in such case you will copy the whole shard to the other replicas. >>> RHV & oVirt use a default shard size of 4M which is the exact size of >>> the default PE in LVM. >>> >>> On the other side, it speeds stuff as gluster can balance the shards >>> properly on the replicas and thus you can evenly distribute the load on the >>> cluster. >>> It is not a coincidence that RHV and oVirt use sharding by default. >>> >>> Just a warning. >>> NEVER, EVER, DISABLE SHARDING!!! ONCE ENABLED - STAYS ENABLED! >>> Don't ask how I learnGrazie dell'avviso, messaggio ricevuto!t that :) >>> >>> Best Regards, >>> Strahil Nikolov >>> >>> >>> >>> ? ?????????, 20 ??? 2019 ?., 18:32:00 ?. ???????+3, Cristian Del Carlo < >>> cristian.delcarlo at targetsolutions.it> ??????: >>> >>> >>> Hi, >>> >>> thanks for your help. >>> >>> I am planing to use libvirtd with plain KVM. >>> >>> Ok i will use libgfapi. >>> >>> I'm confused about the use of sharding is it useful in this >>> configuration? Doesn't sharding help limit the bandwidth in the event of a >>> rebalancing? >>> >>> In the vm setting so i need to use directsync to avoid corruption. >>> >>> Thanks again, >>> >>> Il giorno gio 20 giu 2019 alle ore 12:25 Strahil >>> ha scritto: >>> >>> Hi, >>> >>> Are you planing to use oVirt or plain KVM or openstack? >>> >>> I would recommend you to use gluster v6.1 as it is the latest stable >>> version and will have longer support than the older versions. >>> >>> Fuse vs libgfapi - use the latter as it has better performance and less >>> overhead on the host.oVirt does supports both libgfapi and fuse. >>> >>> Also, use replica 3 because you will have better read performance >>> compared to replica 2 arbiter 1. >>> >>> Sharding is a tradeoff between CPU (when there is no sharding , gluster >>> shd must calculate the offset of the VM disk) and bandwidth (whole shard >>> is being replicated despite even 512 need to be synced). >>> >>> If you will do live migration - you do not want to cache in order to >>> avoid corruption. >>> Thus oVirt is using direct I/O. >>> Still, you can check the gluster settings mentioned in Red Hat >>> documentation for Virt/openStack . >>> >>> Best Regards, >>> Strahil Nikolov >>> On Jun 20, 2019 13:12, Cristian Del Carlo < >>> cristian.delcarlo at targetsolutions.it> wrote: >>> >>> Hi, >>> >>> I'm testing glusterfs before using it in production, it should be used >>> to store vm for nodes with libvirtd. >>> >>> In production I will have 4 nodes connected with a dedicated 20gbit/s >>> network. >>> >>> Which version to use in production on a centos 7.x? Should I use Gluster >>> version 6? >>> >>> To make the volume available to libvirtd the best method is to use FUSE? >>> >>> I see that stripped is deprecated. Is it reasonable to use the volume >>> with 3 replicas on 4 nodes and sharding enabled? >>> Is there convenience to use sharding volume in this context? I think >>> could positive inpact in read performance or rebalance. Is it true? >>> >>> In the vm configuration I use the virtio disk. How is it better to set >>> the disk cache to get the best performances none, default or writeback? >>> >>> Thanks in advance for your patience and answers. >>> >>> Thanks, >>> >>> >>> *Cristian Del Carlo* >>> >>> >>> >>> -- >>> >>> >>> *Cristian Del Carlo* >>> >>> *Target Solutions s.r.l.* >>> >>> *T* +39 0583 1905621 >>> *F* +39 0583 1905675 >>> *@* cristian.delcarlo at targetsolutions.it >>> >>> http://www.targetsolutions.it >>> P.IVA e C.Fiscale: 01815270465 Reg. Imp. di Lucca >>> Capitale Sociale: ?11.000,00 iv - REA n? 173227 >>> >>> Il testo e gli eventuali documenti trasmessi contengono informazioni >>> riservate al destinatario indicato. La seguente e-mail e' confidenziale e >>> la sua riservatezza e' tutelata legalmente dal Decreto Legislativo 196 del >>> 30/06/2003 (Codice di tutela della privacy). La lettura, copia o altro uso >>> non autorizzato o qualsiasi altra azione derivante dalla conoscenza di >>> queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto >>> questo documento per errore siete cortesemente pregati di darne immediata >>> comunicazione al mittente e di provvedere, immediatamente, alla sua >>> distruzione. >>> >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nico at furyweb.fr Fri Jun 21 07:48:47 2019 From: nico at furyweb.fr (nico at furyweb.fr) Date: Fri, 21 Jun 2019 09:48:47 +0200 (CEST) Subject: [Gluster-users] Parallel process hang on gluster volume Message-ID: <1119765369.1998.1561103327489.JavaMail.zimbra@furyweb.fr> I encounterd an issue on production servers using GlusterFS servers 5.1 and clients 4.1.5 when several process write at the same time on a gluster volume. With more than 48 process writes on the volume at the same time, they are blocked in D state (uninterruptible sleep), I guess some volume settings have to be tuned but can't figure out which. The client is using op-version 40100 on this volume Below are volume info, volume settings and ps output on blocked processes. root at glusterVM1:~# gluster volume info logsscripts Volume Name: logsscripts Type: Replicate Volume ID: cb49af70-d197-43c1-852d-0bcf8dc9f6fa Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: glusterVM1:/bricks/logsscripts/brick1/data Brick2: glusterVM2:/bricks/logsscripts/brick1/data Brick3: glusterVM3:/bricks/logsscripts/brick1/data (arbiter) Options Reconfigured: server.tcp-user-timeout: 42 cluster.data-self-heal-algorithm: full features.trash: off diagnostics.client-log-level: ERROR ssl.cipher-list: HIGH:!SSLv2 server.ssl: on client.ssl: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off root at glusterVM1:~# gluster volume get logsscripts all Option Value ------ ----- cluster.lookup-unhashed on cluster.lookup-optimize on cluster.min-free-disk 10% cluster.min-free-inodes 5% cluster.rebalance-stats off cluster.subvols-per-directory (null) cluster.readdir-optimize off cluster.rsync-hash-regex (null) cluster.extra-hash-regex (null) cluster.dht-xattr-name trusted.glusterfs.dht cluster.randomize-hash-range-by-gfid off cluster.rebal-throttle normal cluster.lock-migration off cluster.force-migration off cluster.local-volume-name (null) cluster.weighted-rebalance on cluster.switch-pattern (null) cluster.entry-change-log on cluster.read-subvolume (null) cluster.read-subvolume-index -1 cluster.read-hash-mode 1 cluster.background-self-heal-count 8 cluster.metadata-self-heal on cluster.data-self-heal on cluster.entry-self-heal on cluster.self-heal-daemon on cluster.heal-timeout 600 cluster.self-heal-window-size 1 cluster.data-change-log on cluster.metadata-change-log on cluster.data-self-heal-algorithm full cluster.eager-lock on disperse.eager-lock on disperse.other-eager-lock on disperse.eager-lock-timeout 1 disperse.other-eager-lock-timeout 1 cluster.quorum-type auto cluster.quorum-count (null) cluster.choose-local true cluster.self-heal-readdir-size 1KB cluster.post-op-delay-secs 1 cluster.ensure-durability on cluster.consistent-metadata no cluster.heal-wait-queue-length 128 cluster.favorite-child-policy none cluster.full-lock yes cluster.stripe-block-size 128KB cluster.stripe-coalesce true diagnostics.latency-measurement off diagnostics.dump-fd-stats off diagnostics.count-fop-hits off diagnostics.brick-log-level INFO diagnostics.client-log-level ERROR diagnostics.brick-sys-log-level CRITICAL diagnostics.client-sys-log-level CRITICAL diagnostics.brick-logger (null) diagnostics.client-logger (null) diagnostics.brick-log-format (null) diagnostics.client-log-format (null) diagnostics.brick-log-buf-size 5 diagnostics.client-log-buf-size 5 diagnostics.brick-log-flush-timeout 120 diagnostics.client-log-flush-timeout 120 diagnostics.stats-dump-interval 0 diagnostics.fop-sample-interval 0 diagnostics.stats-dump-format json diagnostics.fop-sample-buf-size 65535 diagnostics.stats-dnscache-ttl-sec 86400 performance.cache-max-file-size 0 performance.cache-min-file-size 0 performance.cache-refresh-timeout 1 performance.cache-priority performance.cache-size 32MB performance.io-thread-count 16 performance.high-prio-threads 16 performance.normal-prio-threads 16 performance.low-prio-threads 16 performance.least-prio-threads 1 performance.enable-least-priority on performance.iot-watchdog-secs (null) performance.iot-cleanup-disconnected-reqsoff performance.iot-pass-through false performance.io-cache-pass-through false performance.cache-size 128MB performance.qr-cache-timeout 1 performance.cache-invalidation false performance.ctime-invalidation false performance.flush-behind on performance.nfs.flush-behind on performance.write-behind-window-size 1MB performance.resync-failed-syncs-after-fsyncoff performance.nfs.write-behind-window-size1MB performance.strict-o-direct off performance.nfs.strict-o-direct off performance.strict-write-ordering off performance.nfs.strict-write-ordering off performance.write-behind-trickling-writeson performance.aggregate-size 128KB performance.nfs.write-behind-trickling-writeson performance.lazy-open yes performance.read-after-open yes performance.open-behind-pass-through false performance.read-ahead-page-count 4 performance.read-ahead-pass-through false performance.readdir-ahead-pass-through false performance.md-cache-pass-through false performance.md-cache-timeout 1 performance.cache-swift-metadata true performance.cache-samba-metadata false performance.cache-capability-xattrs true performance.cache-ima-xattrs true performance.md-cache-statfs off performance.xattr-cache-list performance.nl-cache-pass-through false features.encryption off encryption.master-key (null) encryption.data-key-size 256 encryption.block-size 4096 network.frame-timeout 1800 network.ping-timeout 42 network.tcp-window-size (null) client.ssl on network.remote-dio disable client.event-threads 2 client.tcp-user-timeout 0 client.keepalive-time 20 client.keepalive-interval 2 client.keepalive-count 9 network.tcp-window-size (null) network.inode-lru-limit 16384 auth.allow * auth.reject (null) transport.keepalive 1 server.allow-insecure on server.root-squash off server.anonuid 65534 server.anongid 65534 server.statedump-path /var/run/gluster server.outstanding-rpc-limit 64 server.ssl on auth.ssl-allow * server.manage-gids off server.dynamic-auth on client.send-gids on server.gid-timeout 300 server.own-thread (null) server.event-threads 1 server.tcp-user-timeout 42 server.keepalive-time 20 server.keepalive-interval 2 server.keepalive-count 9 transport.listen-backlog 1024 ssl.own-cert (null) ssl.private-key (null) ssl.ca-list (null) ssl.crl-path (null) ssl.certificate-depth (null) ssl.cipher-list HIGH:!SSLv2 ssl.dh-param (null) ssl.ec-curve (null) transport.address-family inet performance.write-behind on performance.read-ahead on performance.readdir-ahead on performance.io-cache on performance.quick-read on performance.open-behind on performance.nl-cache off performance.stat-prefetch on performance.client-io-threads off performance.nfs.write-behind on performance.nfs.read-ahead off performance.nfs.io-cache off performance.nfs.quick-read off performance.nfs.stat-prefetch off performance.nfs.io-threads off performance.force-readdirp true performance.cache-invalidation false features.uss off features.snapshot-directory .snaps features.show-snapshot-directory off features.tag-namespaces off network.compression off network.compression.window-size -15 network.compression.mem-level 8 network.compression.min-size 0 network.compression.compression-level -1 network.compression.debug false features.default-soft-limit 80% features.soft-timeout 60 features.hard-timeout 5 features.alert-time 86400 features.quota-deem-statfs off geo-replication.indexing off geo-replication.indexing off geo-replication.ignore-pid-check off geo-replication.ignore-pid-check off features.quota off features.inode-quota off features.bitrot disable debug.trace off debug.log-history no debug.log-file no debug.exclude-ops (null) debug.include-ops (null) debug.error-gen off debug.error-failure (null) debug.error-number (null) debug.random-failure off debug.error-fops (null) nfs.enable-ino32 no nfs.mem-factor 15 nfs.export-dirs on nfs.export-volumes on nfs.addr-namelookup off nfs.dynamic-volumes off nfs.register-with-portmap on nfs.outstanding-rpc-limit 16 nfs.port 2049 nfs.rpc-auth-unix on nfs.rpc-auth-null on nfs.rpc-auth-allow all nfs.rpc-auth-reject none nfs.ports-insecure off nfs.trusted-sync off nfs.trusted-write off nfs.volume-access read-write nfs.export-dir nfs.disable on nfs.nlm on nfs.acl on nfs.mount-udp off nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab nfs.rpc-statd /sbin/rpc.statd nfs.server-aux-gids off nfs.drc off nfs.drc-size 0x20000 nfs.read-size (1 * 1048576ULL) nfs.write-size (1 * 1048576ULL) nfs.readdir-size (1 * 1048576ULL) nfs.rdirplus on nfs.event-threads 1 nfs.exports-auth-enable (null) nfs.auth-refresh-interval-sec (null) nfs.auth-cache-ttl-sec (null) features.read-only off features.worm off features.worm-file-level off features.worm-files-deletable on features.default-retention-period 120 features.retention-mode relax features.auto-commit-period 180 storage.linux-aio off storage.batch-fsync-mode reverse-fsync storage.batch-fsync-delay-usec 0 storage.owner-uid -1 storage.owner-gid -1 storage.node-uuid-pathinfo off storage.health-check-interval 30 storage.build-pgfid off storage.gfid2path on storage.gfid2path-separator : storage.reserve 1 storage.health-check-timeout 10 storage.fips-mode-rchecksum off storage.force-create-mode 0000 storage.force-directory-mode 0000 storage.create-mask 0777 storage.create-directory-mask 0777 storage.max-hardlinks 100 storage.ctime off storage.bd-aio off config.gfproxyd off cluster.server-quorum-type off cluster.server-quorum-ratio 0 changelog.changelog off changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs changelog.encoding ascii changelog.rollover-time 15 changelog.fsync-interval 5 changelog.changelog-barrier-timeout 120 changelog.capture-del-path off features.barrier disable features.barrier-timeout 120 features.trash off features.trash-dir .trashcan features.trash-eliminate-path (null) features.trash-max-filesize 5MB features.trash-internal-op off cluster.enable-shared-storage disable cluster.write-freq-threshold 0 cluster.read-freq-threshold 0 cluster.tier-pause off cluster.tier-promote-frequency 120 cluster.tier-demote-frequency 3600 cluster.watermark-hi 90 cluster.watermark-low 75 cluster.tier-mode cache cluster.tier-max-promote-file-size 0 cluster.tier-max-mb 4000 cluster.tier-max-files 10000 cluster.tier-query-limit 100 cluster.tier-compact on cluster.tier-hot-compact-frequency 604800 cluster.tier-cold-compact-frequency 604800 features.ctr-enabled off features.record-counters off features.ctr-record-metadata-heat off features.ctr_link_consistency off features.ctr_lookupheal_link_timeout 300 features.ctr_lookupheal_inode_timeout 300 features.ctr-sql-db-cachesize 12500 features.ctr-sql-db-wal-autocheckpoint 25000 features.selinux on locks.trace off locks.mandatory-locking off cluster.disperse-self-heal-daemon enable cluster.quorum-reads no client.bind-insecure (null) features.timeout 45 features.failover-hosts (null) features.shard off features.shard-block-size 64MB features.shard-lru-limit 16384 features.shard-deletion-rate 100 features.scrub-throttle lazy features.scrub-freq biweekly features.scrub false features.expiry-time 120 features.cache-invalidation off features.cache-invalidation-timeout 60 features.leases off features.lease-lock-recall-timeout 60 disperse.background-heals 8 disperse.heal-wait-qlength 128 cluster.heal-timeout 600 dht.force-readdirp on disperse.read-policy gfid-hash cluster.shd-max-threads 1 cluster.shd-wait-qlength 1024 cluster.locking-scheme full cluster.granular-entry-heal no features.locks-revocation-secs 0 features.locks-revocation-clear-all false features.locks-revocation-max-blocked 0 features.locks-monkey-unlocking false features.locks-notify-contention no features.locks-notify-contention-delay 5 disperse.shd-max-threads 1 disperse.shd-wait-qlength 1024 disperse.cpu-extensions auto disperse.self-heal-window-size 1 cluster.use-compound-fops off performance.parallel-readdir off performance.rda-request-size 131072 performance.rda-low-wmark 4096 performance.rda-high-wmark 128KB performance.rda-cache-limit 10MB performance.nl-cache-positive-entry false performance.nl-cache-limit 10MB performance.nl-cache-timeout 60 cluster.brick-multiplex off cluster.max-bricks-per-process 0 disperse.optimistic-change-log on disperse.stripe-cache 4 cluster.halo-enabled False cluster.halo-shd-max-latency 99999 cluster.halo-nfsd-max-latency 5 cluster.halo-max-latency 5 cluster.halo-max-replicas 99999 cluster.halo-min-replicas 2 cluster.daemon-log-level INFO debug.delay-gen off delay-gen.delay-percentage 10% delay-gen.delay-duration 100000 delay-gen.enable disperse.parallel-writes on features.sdfs on features.cloudsync off features.utime off ctime.noatime on traitVM2:~# ps fax | grep '[_]save' 7305 ? D 2:57 [remote_save] 7801 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7802 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7803 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7804 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7805 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7806 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7807 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7808 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7809 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7810 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7811 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7812 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7813 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7814 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7815 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7816 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7817 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7818 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7819 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7820 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7821 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7822 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7823 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7824 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7825 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7826 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7827 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7828 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7829 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7830 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7831 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7832 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7833 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7834 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7835 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7836 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7837 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7838 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7839 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7840 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7841 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7842 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7843 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7844 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7845 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7846 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7847 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7848 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7849 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7850 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7851 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7852 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7853 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7854 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7855 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7856 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7857 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7858 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7859 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7860 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7861 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7862 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7863 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7864 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7865 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7866 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7867 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7868 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7869 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7870 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7871 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7872 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7873 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7874 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7875 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7876 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7877 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7878 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7879 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7880 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7881 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7882 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7883 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7884 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 7885 ? D 0:00 \_ /usr/bin/perl /home/prod/current/app/scripts/remote_save.pl 30000 From laurentfdumont at gmail.com Sat Jun 22 23:16:43 2019 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Sat, 22 Jun 2019 19:16:43 -0400 Subject: [Gluster-users] Gluster CLI - No output/no info displayed. Message-ID: Hi, I am facing a strange issue with the Gluster CLI. No matter what command is used, the CLI doesn't output anything. It's a gluster with a single node. The volumes themselves are working without any issues. coldadmin at gluster01:~$ sudo dpkg -l | grep -i gluster > ii glusterfs-client 6.3-1 amd64 > clustered file-system (client package) > ii glusterfs-common 6.3-1 amd64 > GlusterFS common libraries and translator modules > ii glusterfs-server 6.0-1 amd64 > clustered file-system (server package) > coldadmin at gluster01:~$ sudo gluster --version > glusterfs 6.0 > Repository revision: git://git.gluster.org/glusterfs.git > Copyright (c) 2006-2016 Red Hat, Inc. > GlusterFS comes with ABSOLUTELY NO WARRANTY. > It is licensed to you under your choice of the GNU Lesser > General Public License, version 3 or any later version (LGPLv3 > or later), or the GNU General Public License, version 2 (GPLv2), > in all cases as published by the Free Software Foundation. coldadmin at gluster01:~$ sudo gluster volume info all > coldadmin at gluster01:~$ The cli.log is filled with : root at gluster01:/var/log/glusterfs# tail cli.log > [2019-06-22 23:10:31.397945] I [cli.c:834:main] 0-cli: Started running > gluster with version 6.0 > [2019-06-22 23:10:31.398325] E [mem-pool.c:868:mem_get] > (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x18705) [0x7fc937069705] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x1863c) [0x7fc93706963c] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get+0x98) > [0x7fc9370a12f8] ) 0-mem-pool: invalid argument [Invalid argument] > [2019-06-22 23:13:20.895606] I [cli.c:834:main] 0-cli: Started running > gluster with version 6.0 > [2019-06-22 23:13:20.895887] E [mem-pool.c:868:mem_get] > (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x18705) [0x7f65fc2e0705] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x1863c) [0x7f65fc2e063c] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get+0x98) > [0x7f65fc3182f8] ) 0-mem-pool: invalid argument [Invalid argument] > [2019-06-22 23:13:22.603313] I [cli.c:834:main] 0-cli: Started running > gluster with version 6.0 > [2019-06-22 23:13:22.603581] E [mem-pool.c:868:mem_get] > (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x18705) [0x7fe7cd672705] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x1863c) [0x7fe7cd67263c] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get+0x98) > [0x7fe7cd6aa2f8] ) 0-mem-pool: invalid argument [Invalid argument] > [2019-06-22 23:13:47.945239] I [cli.c:834:main] 0-cli: Started running > gluster with version 6.0 > [2019-06-22 23:13:47.945481] E [mem-pool.c:868:mem_get] > (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x18705) [0x7fd31bb2c705] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x1863c) [0x7fd31bb2c63c] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get+0x98) > [0x7fd31bb642f8] ) 0-mem-pool: invalid argument [Invalid argument] > [2019-06-22 23:14:09.015151] I [cli.c:834:main] 0-cli: Started running > gluster with version 6.0 > [2019-06-22 23:14:09.015461] E [mem-pool.c:868:mem_get] > (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x18705) [0x7f471963b705] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x1863c) [0x7f471963b63c] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get+0x98) > [0x7f47196732f8] ) 0-mem-pool: invalid argument [Invalid argument] The strange part is that cmd_history.log is not logging my commands anymore :( root at gluster01:/var/log/glusterfs# tail cmd_history.log > [2019-05-19 21:45:27.814905] : volume start media : FAILED : Volume media > already started > [2019-05-19 21:45:32.630507] : volume status all : SUCCESS > [2019-05-19 21:45:32.632032] : volume status all : SUCCESS > [2019-05-19 21:45:32.644639] : volume status all : SUCCESS > [2019-05-19 21:46:21.691147] : volume status all : SUCCESS > [2019-05-19 21:46:21.692664] : volume status all : SUCCESS > [2019-05-19 21:46:21.706471] : volume status all : SUCCESS > [2019-05-19 21:48:15.418905] : volume status all : SUCCESS > [2019-05-19 21:48:15.420487] : volume status all : SUCCESS > [2019-05-19 21:48:15.422784] : volume status all : SUCCESS I have this old bug from 2015, but the issue seems to be purely cosmetic. https://bugzilla.redhat.com/show_bug.cgi?id=1243753 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mabi at protonmail.ch Sun Jun 23 15:04:23 2019 From: mabi at protonmail.ch (mabi) Date: Sun, 23 Jun 2019 15:04:23 +0000 Subject: [Gluster-users] GlusterFS 4.1.9 Debian stretch packages missing Message-ID: Hello, I would like to upgrade my GlusterFS 4.1.8 cluster to 4.1.9 on my Debian stretch nodes. Unfortunately the packages are missing as you can see here: https://download.gluster.org/pub/gluster/glusterfs/4.1/4.1.9/Debian/stretch/amd64/apt/ As far as I know GlusterFS 4.1 is not yet EOL so I don't understand why the packages are missing... Maybe an error? Could please someone check? Thank you very much in advance. Best, M. From hgowtham at redhat.com Mon Jun 24 09:34:28 2019 From: hgowtham at redhat.com (hgowtham at redhat.com) Date: Mon, 24 Jun 2019 09:34:28 +0000 Subject: [Gluster-users] Invitation: Gluster Community Meeting (APAC friendly hours) @ Tue Jun 25, 2019 11:30am - 12:30pm (IST) (gluster-users@gluster.org) Message-ID: <000000000000c555a7058c0e84de@google.com> You have been invited to the following event. Title: Gluster Community Meeting (APAC friendly hours) Hi all, This is the biweekly Gluster community meeting that is hosted to collaborate and make the community better. Please do join the discussion. Bridge: https://bluejeans.com/836554017 Minutes meeting: https://hackmd.io/PEnYhQziQsyBwhMksbRWUw Previous Meeting notes: https://github.com/gluster/community Regards, Hari. When: Tue Jun 25, 2019 11:30am ? 12:30pm India Standard Time - Kolkata Calendar: gluster-users at gluster.org Who: * hgowtham at redhat.com - organizer * gluster-users * gluster-devel Event details: https://www.google.com/calendar/event?action=VIEW&eid=N3IzZ3FtanYyaHIwNWhqaDhuaW5nN3ZuMHEgZ2x1c3Rlci11c2Vyc0BnbHVzdGVyLm9yZw&tok=MTkjaGdvd3RoYW1AcmVkaGF0LmNvbTQ5MDdlZTI5YmEyYTFhYjE0N2ExZDgxODZiZDMwOTAyMjcyODRiMTc&ctz=Asia%2FKolkata&hl=en&es=0 Invitation from Google Calendar: https://www.google.com/calendar/ You are receiving this courtesy email at the account gluster-users at gluster.org because you are an attendee of this event. To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar. Forwarding this invitation could allow any recipient to send a response to the organizer and be added to the guest list, or invite others regardless of their own invitation status, or to modify your RSVP. Learn more at https://support.google.com/calendar/answer/37135#forwarding -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1842 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: invite.ics Type: application/ics Size: 1880 bytes Desc: not available URL: From spisla80 at gmail.com Mon Jun 24 12:25:54 2019 From: spisla80 at gmail.com (David Spisla) Date: Mon, 24 Jun 2019 14:25:54 +0200 Subject: [Gluster-users] Fwd: Pending heal status when deleting files which are marked as to be healed In-Reply-To: References: <165ac2cb-4e12-81bf-bb47-ef800adf6652@redhat.com> Message-ID: ---------- Forwarded message --------- Von: David Spisla Date: Fr., 21. Juni 2019 um 10:02 Uhr Subject: Re: [Gluster-users] Pending heal status when deleting files which are marked as to be healed To: Ravishankar N Hello Ravi, Am Mi., 19. Juni 2019 um 18:06 Uhr schrieb Ravishankar N < ravishankar at redhat.com>: > > On 17/06/19 3:45 PM, David Spisla wrote: > > Hello Gluster Community, > > my newest observation concerns the self heal daemon: > Scenario: 2 Node Gluster v5.5 Cluster with Replica 2 Volume. Just one > brick per node. Access via SMB Client from a Win10 machine > > How to reproduce: > I have created a small folder with a lot of small files and I copied that > folder recursively into itself for a few times. Additionally I copied three > big folders with a lot of content into the root of the volume. > Note: There was no node down or something else like brick down, etc.. So > the whole volume was accessible. > > Because of the recursively copy action all this copied files whre listed > as to be healed (via gluster heal info). > > This is odd. How did you conclude that writing to the volume (i.e. > recursive copy) was the reason for the files to be needing heal? Did you > check if there were any gluster messages about disconnects in the smb > client logs? > There was no disconnection, I am sure. But at all I am not really sure whats the cause of this problem. > > Now I set some of the effected files ReadOnly (they get WORMed because > worm-file-level is enabled). After this I tried to delete the parent folder > of that files. > > Expected: All files should be healed > Actually: All files, which are Read-Only, are not healed. heal info shows > permanently that this files has to be healed. > > Does disabling read-only let the files to be healed? > I have to ty this. > > glustershd log throws error and brick log (with level DEBUG) permanently > throws a lot of messages which I don't understand. See the attached file > which contains all informations, also heal info and volume info, beside the > logs > > Maybe some of you know whats going on there? Since we can reproduce this > scenario, we can give more debug information if needed. > > Is it possible to script the list of steps to reproduce this issue? > I will do that and post it here. Although I will collect more data when it happens Regards David > Regards, > > Ravi > > > Regards > David Spisla > > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Mon Jun 24 13:33:45 2019 From: spisla80 at gmail.com (David Spisla) Date: Mon, 24 Jun 2019 15:33:45 +0200 Subject: [Gluster-users] Pending heal status when deleting files which are marked as to be healed In-Reply-To: References: <165ac2cb-4e12-81bf-bb47-ef800adf6652@redhat.com> Message-ID: Hello Ravi and Gluster Community, Am Mo., 24. Juni 2019 um 14:25 Uhr schrieb David Spisla : > > > ---------- Forwarded message --------- > Von: David Spisla > Date: Fr., 21. Juni 2019 um 10:02 Uhr > Subject: Re: [Gluster-users] Pending heal status when deleting files which > are marked as to be healed > To: Ravishankar N > > > Hello Ravi, > > Am Mi., 19. Juni 2019 um 18:06 Uhr schrieb Ravishankar N < > ravishankar at redhat.com>: > >> >> On 17/06/19 3:45 PM, David Spisla wrote: >> >> Hello Gluster Community, >> >> my newest observation concerns the self heal daemon: >> Scenario: 2 Node Gluster v5.5 Cluster with Replica 2 Volume. Just one >> brick per node. Access via SMB Client from a Win10 machine >> >> How to reproduce: >> I have created a small folder with a lot of small files and I copied that >> folder recursively into itself for a few times. Additionally I copied three >> big folders with a lot of content into the root of the volume. >> Note: There was no node down or something else like brick down, etc.. So >> the whole volume was accessible. >> >> Because of the recursively copy action all this copied files whre listed >> as to be healed (via gluster heal info). >> >> This is odd. How did you conclude that writing to the volume (i.e. >> recursive copy) was the reason for the files to be needing heal? Did you >> check if there were any gluster messages about disconnects in the smb >> client logs? >> > There was no disconnection, I am sure. But at all I am not really sure > whats the cause of this problem. > I reproduce it. Now I don't think that recursive copy is the reason. I copied several small files in the volume (capacity 1GB) unless it is full (see steps to reproduce below). I didn't set RO to the file. There was never a disconnection. > >> Now I set some of the effected files ReadOnly (they get WORMed because >> worm-file-level is enabled). After this I tried to delete the parent folder >> of that files. >> >> Expected: All files should be healed >> Actually: All files, which are Read-Only, are not healed. heal info shows >> permanently that this files has to be healed. >> >> Does disabling read-only let the files to be healed? >> > I have to ty this. > I tried it out and it had no efffect. > >> glustershd log throws error and brick log (with level DEBUG) permanently >> throws a lot of messages which I don't understand. See the attached file >> which contains all informations, also heal info and volume info, beside the >> logs >> >> Maybe some of you know whats going on there? Since we can reproduce this >> scenario, we can give more debug information if needed. >> >> Is it possible to script the list of steps to reproduce this issue? >> > I will do that and post it here. Although I will collect more data when it > happens > Steps to reproduce: 1. Copy several small files into a volume (here: 1GB capacity) 2. Copy until the volume is nearly full (70-80% or more) 3. Now self-heal is listing files to be healed 4. Move or delete all of this files or a just a part. 5. The files won't be healed and stay in the heal info list. In my case I copied until the volume was 100% full (storage.reserve was 1%). I delete some of the files, to get a level of 98%. I wait for a while but nothing happens. After this I stopped and started the volume. Files are now healed. Attached there is the glustershd.log where you can see that performing entry.self-heal (2019-06-24 10:04:02.007328) could not be finished for pgfid:7e4fa649-434a-4bb7-a1c2-258818d76076 until the volume was stopped and started again. After starting again entry.self-heal could be finished for that pgfid (at 2019-06-24 12:38:38.689632). The pgfid refers to the files which were listed to be healed: fs-davids-c2-n1:~ # gluster vo heal archive1 info Brick fs-davids-c2-n1:/gluster/brick1/glusterbrick /archive1/data/fff/gg - Kopie.txt /archive1/data/fff /archive1/data/fff/gg - Kopie - Kopie.txt /archive1/data/fff/gg - Kopie - Kopie (2).txt Status: Connected Number of entries: 4 All of this files has the same pgfid: fs-davids-c2-n1:~ # getfattr -e hex -d -m "" '/gluster/brick1/glusterbrick/archive1/data/fff/'* | grep trusted.pgfid getfattr: Removing leading '/' from absolute path names trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001 trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001 trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001 Summary: The pending heal problem seems to occur if a volume is nearly full or completely full. Regards David Spisla > Regards > David > >> Regards, >> >> Ravi >> >> >> Regards >> David Spisla >> >> >> _______________________________________________ >> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: glustershd.log Type: application/octet-stream Size: 51236 bytes Desc: not available URL: From spisla80 at gmail.com Mon Jun 24 13:45:29 2019 From: spisla80 at gmail.com (David Spisla) Date: Mon, 24 Jun 2019 15:45:29 +0200 Subject: [Gluster-users] Pending heal status when deleting files which are marked as to be healed In-Reply-To: References: <165ac2cb-4e12-81bf-bb47-ef800adf6652@redhat.com> Message-ID: Additional information, After the volume was 100% full, I delete some of the files but not the files which are listed in heal info. When it was 98%, I delete the folder which was marked as to be healed: /archive1/data/fff After start and stop the volume the files in /archive1/data/fff were still there. Regards David Spisla Am Mo., 24. Juni 2019 um 15:33 Uhr schrieb David Spisla : > Hello Ravi and Gluster Community, > > Am Mo., 24. Juni 2019 um 14:25 Uhr schrieb David Spisla < > spisla80 at gmail.com>: > >> >> >> ---------- Forwarded message --------- >> Von: David Spisla >> Date: Fr., 21. Juni 2019 um 10:02 Uhr >> Subject: Re: [Gluster-users] Pending heal status when deleting files >> which are marked as to be healed >> To: Ravishankar N >> >> >> Hello Ravi, >> >> Am Mi., 19. Juni 2019 um 18:06 Uhr schrieb Ravishankar N < >> ravishankar at redhat.com>: >> >>> >>> On 17/06/19 3:45 PM, David Spisla wrote: >>> >>> Hello Gluster Community, >>> >>> my newest observation concerns the self heal daemon: >>> Scenario: 2 Node Gluster v5.5 Cluster with Replica 2 Volume. Just one >>> brick per node. Access via SMB Client from a Win10 machine >>> >>> How to reproduce: >>> I have created a small folder with a lot of small files and I copied >>> that folder recursively into itself for a few times. Additionally I copied >>> three big folders with a lot of content into the root of the volume. >>> Note: There was no node down or something else like brick down, etc.. So >>> the whole volume was accessible. >>> >>> Because of the recursively copy action all this copied files whre listed >>> as to be healed (via gluster heal info). >>> >>> This is odd. How did you conclude that writing to the volume (i.e. >>> recursive copy) was the reason for the files to be needing heal? Did you >>> check if there were any gluster messages about disconnects in the smb >>> client logs? >>> >> There was no disconnection, I am sure. But at all I am not really sure >> whats the cause of this problem. >> > I reproduce it. Now I don't think that recursive copy is the reason. I > copied several small files in the volume (capacity 1GB) unless it is full > (see steps to reproduce below). I didn't set RO to the file. There was > never a disconnection. > >> >>> Now I set some of the effected files ReadOnly (they get WORMed because >>> worm-file-level is enabled). After this I tried to delete the parent folder >>> of that files. >>> >>> Expected: All files should be healed >>> Actually: All files, which are Read-Only, are not healed. heal info >>> shows permanently that this files has to be healed. >>> >>> Does disabling read-only let the files to be healed? >>> >> I have to ty this. >> > I tried it out and it had no efffect. > >> >>> glustershd log throws error and brick log (with level DEBUG) permanently >>> throws a lot of messages which I don't understand. See the attached file >>> which contains all informations, also heal info and volume info, beside the >>> logs >>> >>> Maybe some of you know whats going on there? Since we can reproduce this >>> scenario, we can give more debug information if needed. >>> >>> Is it possible to script the list of steps to reproduce this issue? >>> >> I will do that and post it here. Although I will collect more data when >> it happens >> > Steps to reproduce: > > 1. Copy several small files into a volume (here: 1GB capacity) > 2. Copy until the volume is nearly full (70-80% or more) > 3. Now self-heal is listing files to be healed > 4. Move or delete all of this files or a just a part. > 5. The files won't be healed and stay in the heal info list. > > In my case I copied until the volume was 100% full (storage.reserve was > 1%). I delete some of the files, to get a level of 98%. I wait for a while > but nothing happens. After this I stopped and started the volume. Files are > now healed. > Attached there is the glustershd.log where you can see that performing > entry.self-heal (2019-06-24 10:04:02.007328) could not be finished for > pgfid:7e4fa649-434a-4bb7-a1c2-258818d76076 until the volume was stopped and > started again. After starting again entry.self-heal could be finished for > that pgfid (at 2019-06-24 12:38:38.689632). The pgfid refers to the files > which were listed to be healed: > > fs-davids-c2-n1:~ # gluster vo heal archive1 info > Brick fs-davids-c2-n1:/gluster/brick1/glusterbrick > /archive1/data/fff/gg - Kopie.txt > /archive1/data/fff > /archive1/data/fff/gg - Kopie - Kopie.txt > /archive1/data/fff/gg - Kopie - Kopie (2).txt > Status: Connected > Number of entries: 4 > > All of this files has the same pgfid: > > fs-davids-c2-n1:~ # getfattr -e hex -d -m "" > '/gluster/brick1/glusterbrick/archive1/data/fff/'* | grep trusted.pgfid > getfattr: Removing leading '/' from absolute path names > trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001 > trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001 > trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001 > > Summary: The pending heal problem seems to occur if a volume is nearly > full or completely full. > > Regards > David Spisla > > >> Regards >> David >> >>> Regards, >>> >>> Ravi >>> >>> >>> Regards >>> David Spisla >>> >>> >>> _______________________________________________ >>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave at sherohman.org Tue Jun 25 09:13:30 2019 From: dave at sherohman.org (Dave Sherohman) Date: Tue, 25 Jun 2019 04:13:30 -0500 Subject: [Gluster-users] Removing subvolume from dist/rep volume Message-ID: <20190625091330.GU19805@sherohman.org> I have a 9-brick, replica 2+A cluster and plan to (permanently) remove one of the three subvolumes. I think I've worked out how to do it, but want to verify first that I've got it right, since downtime or data loss would be Bad Things. The current configuration has six data bricks across six hosts (B through G), and all three arbiter bricks on the same host (A), such as one might create with # gluster volume create myvol replica 3 arbiter 1 B:/data C:/data A:/arb1 D:/data E:/data A:/arb2 F:/data G:/data A:/arb3 My objective is to remove nodes B and C entirely. First up is to pull their bricks from the volume: # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start (wait for data to be migrated) # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit And then remove the nodes with: # gluster peer detach B # gluster peer detach C Is this correct, or did I forget any steps and/or mangle the syntax on any commands? Also, for the remove-brick command, is there any way to throttle the amount of bandwidth which will be used for the data migration? Unfortunately, I was not able to provision a dedicated VLAN for the gluster servers to communicate among themselves, so I don't want it hogging all available capacity if that can be avoided. If it makes a difference, my gluster version is 3.12.15-1, running on Debian and installed from the debs at deb https://download.gluster.org/pub/gluster/glusterfs/3.12/LATEST/Debian/9/amd64/apt stretch main -- Dave Sherohman From filonov at hkl.hms.harvard.edu Wed Jun 26 12:41:36 2019 From: filonov at hkl.hms.harvard.edu (Dmitry Filonov) Date: Wed, 26 Jun 2019 08:41:36 -0400 Subject: [Gluster-users] snapshots questions Message-ID: Hi, am really new to gluster and have couple question that I hope will be really easy to answer. Just couldn't find anything on that myself. I did set up replica 3 gluster over 3 nodes with 2TB SSD in each node. To have snapshot functionality I have created thin pool of the size of VG (1.82TB) and then 1.75TB thin LVM inside on each of the bricks. It worked just fine until I scheduled creating hourly and daily snapshots on that gluster volume. In less than 2 days my thin volume got full and crashed. Not refused creating new snapshots, but just died as LVM couldn't perform any operations there anymore. So my first question is how to prevent this from happening. I could create smaller thin LVM, but I still have no control how much space I would need for snapshots. I was hoping to see some warnings and errors while creeating snapshots, but not failed LVM/Gluster. The second question is related but not that important. Is there a way to schedule snapshot removal in cron? gluster snapshot delete requires interactive confirmation and I don't see any flag to auto-confirm snapshot removal. Thank you, Fil -- Dmitry Filonov Linux Administrator SBGrid Core | Harvard Medical School 250 Longwood Ave, SGM-114 Boston, MA 02115 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dm at belkam.com Wed Jun 26 13:12:14 2019 From: dm at belkam.com (Dmitry Melekhov) Date: Wed, 26 Jun 2019 17:12:14 +0400 Subject: [Gluster-users] snapshots questions In-Reply-To: References: Message-ID: <9e6ca6a8-39c9-ee8b-4777-77ed7500c8e8@belkam.com> 26.06.2019 16:41, Dmitry Filonov ?????: > Hi, > ?am really new to gluster and have couple question that I hope will be > really easy to answer. Just couldn't find anything on that myself. > > I did set up replica 3 gluster over 3 nodes with 2TB SSD in each node. > To have snapshot functionality I have created thin pool of the size of > VG (1.82TB) and then 1.75TB thin LVM inside on each of the bricks. > It worked just fine until I scheduled creating hourly and daily > snapshots on that gluster volume. In less than 2 days my thin volume > got full and crashed. > Not refused creating new snapshots, but just died as LVM couldn't > perform any operations there anymore. > So my first question is how to prevent this from happening. I could > create smaller thin LVM, but I still have no control how much space I > would need for snapshots. I was hoping to see some warnings and errors > while creeating snapshots, but not failed LVM/Gluster. > > The second question is related but not that important. Is there a way > to schedule snapshot removal in cron? gluster snapshot delete requires > interactive confirmation and I don't see any flag to auto-confirm > snapshot removal. > --mode=script > Thank you, > > Fil > > -- > Dmitry Filonov > Linux Administrator > SBGrid Core | Harvard Medical School > 250 Longwood Ave, SGM-114 > Boston, MA 02115 > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpapnoi at redhat.com Wed Jun 26 13:35:25 2019 From: vpapnoi at redhat.com (Vinayak Papnoi) Date: Wed, 26 Jun 2019 19:05:25 +0530 Subject: [Gluster-users] snapshots questions In-Reply-To: References: Message-ID: Comments inline. Regards, Vinayak Papnoi Associate Quality Engineer Red Hat vpapnoi at redhat.com M: 91-9702904495 IM: vpapnoi On Wed, Jun 26, 2019 at 6:12 PM Dmitry Filonov wrote: > Hi, > am really new to gluster and have couple question that I hope will be > really easy to answer. Just couldn't find anything on that myself. > > I did set up replica 3 gluster over 3 nodes with 2TB SSD in each node. > To have snapshot functionality I have created thin pool of the size of VG > (1.82TB) and then 1.75TB thin LVM inside on each of the bricks. > It worked just fine until I scheduled creating hourly and daily snapshots > on that gluster volume. In less than 2 days my thin volume got full and > crashed. > Not refused creating new snapshots, but just died as LVM couldn't perform > any operations there anymore. > So my first question is how to prevent this from happening. I could create > smaller thin LVM, but I still have no control how much space I would need > for snapshots. I was hoping to see some warnings and errors while creeating > snapshots, but not failed LVM/Gluster. > Newly created snapshots will occupy some data and metadata space in LVM. So the more snapshots you have, the more space will be utilized. > The second question is related but not that important. Is there a way to > schedule snapshot removal in cron? gluster snapshot delete requires > interactive confirmation and I don't see any flag to auto-confirm snapshot > removal. > As of now, No. There isn't any option to schedule a removal of the snapshot. As for auto-confirmation of any command in gluster, '--mode=script' at the end of the command should work. Although, there is a snapshot config option "*auto-delete*" which, when enabled, will *delete* the oldest snapshot *after crossing the snap-max-soft-limit* (which is a set percentage of the snap-max-hard-limit). # gluster snapshot config Snapshot System Configuration: snap-max-hard-limit : 256 snap-max-soft-limit : 90% auto-delete : enable activate-on-create : disable Snapshot Volume Configuration: Volume : snap-max-hard-limit : 256 Effective snap-max-hard-limit : 256 Effective snap-max-soft-limit : 230 (90%) Usage: snapshot config [volname] ([snap-max-hard-limit ] [snap-max-soft-limit ]) | ([auto-delete ])| ([activate-on-create ]) > Thank you, > > Fil > > -- > Dmitry Filonov > Linux Administrator > SBGrid Core | Harvard Medical School > 250 Longwood Ave, SGM-114 > Boston, MA 02115 > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From filonov at hkl.hms.harvard.edu Wed Jun 26 13:48:52 2019 From: filonov at hkl.hms.harvard.edu (Dmitry Filonov) Date: Wed, 26 Jun 2019 09:48:52 -0400 Subject: [Gluster-users] snapshots questions In-Reply-To: References: Message-ID: Newly created snapshots will occupy some data and metadata space in LVM. So the more snapshots you have, the more space will be utilized. Yes, I understand that. The question is how to get snapshots error out instead of breaking LVM completely. Is there any way to tell snapshots to fail if after creating that snapshot pool won't have certain amount of free space? As of now, No. There isn't any option to schedule a removal of the > snapshot. As for auto-confirmation of any command in gluster, > '--mode=script' at the end of the command should work. > Although, there is a snapshot config option "*auto-delete*" which, when > enabled, will *delete* the oldest snapshot *after crossing the > snap-max-soft-limit* (which is a set percentage of the > snap-max-hard-limit). > Auto-delete is not suited for proper snapshot handling (i.e. certain number of hourly/daily/monthly snapshots kept) but --mode=script would do. Thank you! Fil -------------- next part -------------- An HTML attachment was scrubbed... URL: From srakonde at redhat.com Thu Jun 27 04:13:52 2019 From: srakonde at redhat.com (Sanju Rakonde) Date: Thu, 27 Jun 2019 09:43:52 +0530 Subject: [Gluster-users] Gluster CLI - No output/no info displayed. In-Reply-To: References: Message-ID: On Sun, Jun 23, 2019 at 4:54 AM Laurent Dumont wrote: > Hi, > > I am facing a strange issue with the Gluster CLI. No matter what command > is used, the CLI doesn't output anything. It's a gluster with a single > node. The volumes themselves are working without any issues. > > coldadmin at gluster01:~$ sudo dpkg -l | grep -i gluster >> ii glusterfs-client 6.3-1 amd64 >> clustered file-system (client package) >> ii glusterfs-common 6.3-1 amd64 >> GlusterFS common libraries and translator modules >> ii glusterfs-server 6.0-1 amd64 >> clustered file-system (server package) > > Looks like, you've missed installing glusterfs-cli package. This is why you are unable to execute any commands using cli. Thanks, Sanju > >> coldadmin at gluster01:~$ sudo gluster --version >> glusterfs 6.0 >> Repository revision: git://git.gluster.org/glusterfs.git >> Copyright (c) 2006-2016 Red Hat, Inc. >> GlusterFS comes with ABSOLUTELY NO WARRANTY. >> It is licensed to you under your choice of the GNU Lesser >> General Public License, version 3 or any later version (LGPLv3 >> or later), or the GNU General Public License, version 2 (GPLv2), >> in all cases as published by the Free Software Foundation. > > > coldadmin at gluster01:~$ sudo gluster volume info all >> coldadmin at gluster01:~$ > > > The cli.log is filled with : > > root at gluster01:/var/log/glusterfs# tail cli.log >> [2019-06-22 23:10:31.397945] I [cli.c:834:main] 0-cli: Started running >> gluster with version 6.0 >> [2019-06-22 23:10:31.398325] E [mem-pool.c:868:mem_get] >> (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x18705) [0x7fc937069705] >> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x1863c) [0x7fc93706963c] >> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get+0x98) >> [0x7fc9370a12f8] ) 0-mem-pool: invalid argument [Invalid argument] >> [2019-06-22 23:13:20.895606] I [cli.c:834:main] 0-cli: Started running >> gluster with version 6.0 >> [2019-06-22 23:13:20.895887] E [mem-pool.c:868:mem_get] >> (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x18705) [0x7f65fc2e0705] >> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x1863c) [0x7f65fc2e063c] >> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get+0x98) >> [0x7f65fc3182f8] ) 0-mem-pool: invalid argument [Invalid argument] >> [2019-06-22 23:13:22.603313] I [cli.c:834:main] 0-cli: Started running >> gluster with version 6.0 >> [2019-06-22 23:13:22.603581] E [mem-pool.c:868:mem_get] >> (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x18705) [0x7fe7cd672705] >> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x1863c) [0x7fe7cd67263c] >> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get+0x98) >> [0x7fe7cd6aa2f8] ) 0-mem-pool: invalid argument [Invalid argument] >> [2019-06-22 23:13:47.945239] I [cli.c:834:main] 0-cli: Started running >> gluster with version 6.0 >> [2019-06-22 23:13:47.945481] E [mem-pool.c:868:mem_get] >> (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x18705) [0x7fd31bb2c705] >> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x1863c) [0x7fd31bb2c63c] >> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get+0x98) >> [0x7fd31bb642f8] ) 0-mem-pool: invalid argument [Invalid argument] >> [2019-06-22 23:14:09.015151] I [cli.c:834:main] 0-cli: Started running >> gluster with version 6.0 >> [2019-06-22 23:14:09.015461] E [mem-pool.c:868:mem_get] >> (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x18705) [0x7f471963b705] >> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x1863c) [0x7f471963b63c] >> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get+0x98) >> [0x7f47196732f8] ) 0-mem-pool: invalid argument [Invalid argument] > > > The strange part is that cmd_history.log is not logging my commands > anymore :( > > root at gluster01:/var/log/glusterfs# tail cmd_history.log >> [2019-05-19 21:45:27.814905] : volume start media : FAILED : Volume >> media already started >> [2019-05-19 21:45:32.630507] : volume status all : SUCCESS >> [2019-05-19 21:45:32.632032] : volume status all : SUCCESS >> [2019-05-19 21:45:32.644639] : volume status all : SUCCESS >> [2019-05-19 21:46:21.691147] : volume status all : SUCCESS >> [2019-05-19 21:46:21.692664] : volume status all : SUCCESS >> [2019-05-19 21:46:21.706471] : volume status all : SUCCESS >> [2019-05-19 21:48:15.418905] : volume status all : SUCCESS >> [2019-05-19 21:48:15.420487] : volume status all : SUCCESS >> [2019-05-19 21:48:15.422784] : volume status all : SUCCESS > > > I have this old bug from 2015, but the issue seems to be purely cosmetic. > > https://bugzilla.redhat.com/show_bug.cgi?id=1243753 > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Jun 27 05:53:22 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Thu, 27 Jun 2019 08:53:22 +0300 Subject: [Gluster-users] snapshots questions Message-ID: If it expects a single word like 'y' or 'yes' , then you can try: echo 'yes' | gluster snapshot delete $(/my/script/to/find/oldest/snapshot) Of course, you should put some logic in order to find the oldest snapshot, bit that won't be hard as date & time of creation should be in the name. About the situation with the LVM, it is expected that the user takes care of that, as thin LVs can be overcommitted. For example my arbiter has 20 GB thin LV pool and I have 4 20GB LVs inside that pool. As long as I don't exhaust the pool's storage - I'm fine. You shouldb't expect that LVM will play the monitoring role here - either put some kind of monitoring, or create your own solution to monitor that fact. Best Regards, Strahil NikolovOn Jun 26, 2019 15:41, Dmitry Filonov wrote: > > Hi, > ?am really new to gluster and have couple question that I hope will be really easy to answer. Just couldn't find anything on that myself. > > I did set up replica 3 gluster over 3 nodes with 2TB SSD in each node. > To have snapshot functionality I have created thin pool of the size of VG (1.82TB) and then 1.75TB thin LVM inside on each of the bricks. > It worked just fine until I scheduled creating hourly and daily snapshots on that gluster volume. In less than 2 days my thin volume got full and crashed. > Not refused creating new snapshots, but just died as LVM couldn't perform any operations there anymore. > So my first question is how to prevent this from happening. I could create smaller thin LVM, but I still have no control how much space I would need for snapshots. I was hoping to see some warnings and errors while creeating snapshots, but not failed LVM/Gluster. > > The second question is related but not that important. Is there a way to schedule snapshot removal in cron? gluster snapshot delete requires interactive confirmation and I don't see any flag to auto-confirm snapshot removal. > > Thank you, > > Fil > > -- > Dmitry Filonov > Linux Administrator > SBGrid Core |?Harvard Medical School > 250 Longwood Ave, SGM-114 > Boston, MA 02115 -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Thu Jun 27 06:47:10 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Thu, 27 Jun 2019 12:17:10 +0530 Subject: [Gluster-users] Removing subvolume from dist/rep volume In-Reply-To: <20190625091330.GU19805@sherohman.org> References: <20190625091330.GU19805@sherohman.org> Message-ID: Hi, On Tue, 25 Jun 2019 at 15:26, Dave Sherohman wrote: > I have a 9-brick, replica 2+A cluster and plan to (permanently) remove > one of the three subvolumes. I think I've worked out how to do it, but > want to verify first that I've got it right, since downtime or data loss > would be Bad Things. > > The current configuration has six data bricks across six hosts (B > through G), and all three arbiter bricks on the same host (A), such as > one might create with > > # gluster volume create myvol replica 3 arbiter 1 B:/data C:/data A:/arb1 > D:/data E:/data A:/arb2 F:/data G:/data A:/arb3 > > > My objective is to remove nodes B and C entirely. > > First up is to pull their bricks from the volume: > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start > (wait for data to be migrated) > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit > > There are some edge cases that may prevent a file from being migrated during a remove-brick. Please do the following after this: 1. Check the remove-brick status for any failures. If there are any, check the rebalance log file for errors. 2. Even if there are no failures, check the removed bricks to see if any files have not been migrated. If there are any, please check that they are valid files on the brick and copy them to the volume from the brick to the mount point. The rest of the steps look good. Regards, Nithya > And then remove the nodes with: > > # gluster peer detach B > # gluster peer detach C > > > Is this correct, or did I forget any steps and/or mangle the syntax on > any commands? > > Also, for the remove-brick command, is there any way to throttle the > amount of bandwidth which will be used for the data migration? > Unfortunately, I was not able to provision a dedicated VLAN for the > gluster servers to communicate among themselves, so I don't want it > hogging all available capacity if that can be avoided. > > > If it makes a difference, my gluster version is 3.12.15-1, running on > Debian and installed from the debs at > > deb > https://download.gluster.org/pub/gluster/glusterfs/3.12/LATEST/Debian/9/amd64/apt > stretch main > > -- > Dave Sherohman > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Thu Jun 27 06:49:18 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Thu, 27 Jun 2019 12:19:18 +0530 Subject: [Gluster-users] Removing subvolume from dist/rep volume In-Reply-To: References: <20190625091330.GU19805@sherohman.org> Message-ID: On Thu, 27 Jun 2019 at 12:17, Nithya Balachandran wrote: > Hi, > > > On Tue, 25 Jun 2019 at 15:26, Dave Sherohman wrote: > >> I have a 9-brick, replica 2+A cluster and plan to (permanently) remove >> one of the three subvolumes. I think I've worked out how to do it, but >> want to verify first that I've got it right, since downtime or data loss >> would be Bad Things. >> >> The current configuration has six data bricks across six hosts (B >> through G), and all three arbiter bricks on the same host (A), such as >> one might create with >> >> # gluster volume create myvol replica 3 arbiter 1 B:/data C:/data A:/arb1 >> D:/data E:/data A:/arb2 F:/data G:/data A:/arb3 >> >> >> My objective is to remove nodes B and C entirely. >> >> First up is to pull their bricks from the volume: >> >> # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start >> (wait for data to be migrated) >> # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit >> >> > There are some edge cases that may prevent a file from being migrated > during a remove-brick. Please do the following after this: > > 1. Check the remove-brick status for any failures. If there are any, > check the rebalance log file for errors. > 2. Even if there are no failures, check the removed bricks to see if > any files have not been migrated. If there are any, please check that they > are valid files on the brick and that they match on both bricks (files are > not in split brain) and copy them to the volume from the brick to the mount > point. > > You can run the following at the root of the brick to find any files that have not been migrated: find . -not \( -path ./.glusterfs -prune \) -type f -not -perm 01000 > The rest of the steps look good. > > Regards, > Nithya > >> And then remove the nodes with: >> >> # gluster peer detach B >> # gluster peer detach C >> >> >> Is this correct, or did I forget any steps and/or mangle the syntax on >> any commands? >> >> Also, for the remove-brick command, is there any way to throttle the >> amount of bandwidth which will be used for the data migration? >> Unfortunately, I was not able to provision a dedicated VLAN for the >> gluster servers to communicate among themselves, so I don't want it >> hogging all available capacity if that can be avoided. >> >> >> If it makes a difference, my gluster version is 3.12.15-1, running on >> Debian and installed from the debs at >> >> deb >> https://download.gluster.org/pub/gluster/glusterfs/3.12/LATEST/Debian/9/amd64/apt >> stretch main >> >> -- >> Dave Sherohman >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Jun 27 07:19:39 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Thu, 27 Jun 2019 10:19:39 +0300 Subject: [Gluster-users] Regular gluster meetings Message-ID: Hello All, Sadly I got an invite to the Asian session of the gluster meetings. As it was too early for me (travelling to work) - can we move it a little bit later. Do we have EU/US session, or there are not enough people for such session ? Best Regards, Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From aspandey at redhat.com Thu Jun 27 08:07:57 2019 From: aspandey at redhat.com (Ashish Pandey) Date: Thu, 27 Jun 2019 04:07:57 -0400 (EDT) Subject: [Gluster-users] Regular gluster meetings In-Reply-To: References: Message-ID: <426231418.24944838.1561622877865.JavaMail.zimbra@redhat.com> Hi, There are two different meetings based on regions. You can find out the details here - https://github.com/gluster/community --- Ashish ----- Original Message ----- From: "Strahil" To: "gluster-users" Sent: Thursday, June 27, 2019 12:49:39 PM Subject: [Gluster-users] Regular gluster meetings Hello All, Sadly I got an invite to the Asian session of the gluster meetings. As it was too early for me (travelling to work) - can we move it a little bit later. Do we have EU/US session, or there are not enough people for such session ? Best Regards, Strahil Nikolov _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailinglists at lucassen.org Thu Jun 27 08:33:46 2019 From: mailinglists at lucassen.org (richard lucassen) Date: Thu, 27 Jun 2019 10:33:46 +0200 Subject: [Gluster-users] very poor performance on Debian Buster Message-ID: <20190627103346.c70621217f5f3e7cce4ddb3f@lucassen.org> I run glusterfs server on a sys-V version of Debian Gluster. The machine is an 8-core/256GB/SSD server and I want to copy 400GB to a mounted gluster device. The copy now runs for more than 3 days and it has only copied 243GB. The network activity is around 4 to 8 Mbit. Is this a known issue of version 5.5-3? I did not touch the defaults. R. -- richard lucassen http://contact.xaq.nl/ From hunter86_bg at yahoo.com Thu Jun 27 09:52:52 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Thu, 27 Jun 2019 12:52:52 +0300 Subject: [Gluster-users] very poor performance on Debian Buster Message-ID: Hi Richard, Let's try to get some info. 1. gluster volume info will provide valuable information 2. Based on previous (step 1) info, you can check the bricks via 'findmnt /path/to/brick/mountpoint' which actually is a mount point for your storage and will hold 1 or more bricks. Ex: /gluster_bricks/volume1/volume1 is my brick and my LV is mounted on /gluster_bricks/volume1 Findmnt will show filesystem, mount point and mount options. 3. Next get some info with iostat to show what is going on the brick (I guess its close to idle, otherwise you won't be here) 4. Check network usage. I prefer iftop, but you can use other stuff 5. Previous steps can show if a brick (or multiple bricks in distributed cluster) is actually a bottleneck of your performance 6. You can get a gluster profile for analysis. It consists of starting, get info after some time and stop For details check: https://docs.gluster.org/en/latest/Administrator%20Guide/Monitoring%20Workload/ 7. What kind of workload are you uploading ? Is it miliions of small files, the depth of the directories (dirA/dirB/dirC/dirN ... etc) 8. What is the tuned profile on the gluster nodes ? Use 'tuned-adm active'. 9. What kind of connection do you use - FUSE, libgfapi, built-in nfs/cifs , nfs ganesha ? Best Regards, Strahil NikolovOn Jun 27, 2019 11:33, richard lucassen wrote: > > I run glusterfs server on a sys-V version of Debian Gluster. The > machine is an 8-core/256GB/SSD server and I want to copy 400GB to a > mounted gluster device. The copy now runs for more than 3 days and it > has only copied 243GB. The network activity is around 4 to 8 Mbit. > > Is this a known issue of version 5.5-3? I did not touch the defaults. > > R. > > -- > richard lucassen > http://contact.xaq.nl/ > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From filonov at hkl.hms.harvard.edu Thu Jun 27 10:02:32 2019 From: filonov at hkl.hms.harvard.edu (Dmitry Filonov) Date: Thu, 27 Jun 2019 06:02:32 -0400 Subject: [Gluster-users] snapshots questions In-Reply-To: References: Message-ID: Thank you, Strahil - I was pointed to --mode=script option that works perfectly for me. As for snapshots - am spoiled with ZFS that has much better reporting and tools to work with snapshots. Will do some internal monitoring and checks around snapshots. Was hoping am just missing something. Thanks, Fil On Thu, Jun 27, 2019, 1:53 AM Strahil wrote: > If it expects a single word like 'y' or 'yes' , then you can try: > echo 'yes' | gluster snapshot delete $(/my/script/to/find/oldest/snapshot) > > Of course, you should put some logic in order to find the oldest snapshot, > bit that won't be hard as date & time of creation should be in the name. > > About the situation with the LVM, it is expected that the user takes care > of that, as thin LVs can be overcommitted. > > For example my arbiter has 20 GB thin LV pool and I have 4 20GB LVs inside > that pool. > As long as I don't exhaust the pool's storage - I'm fine. > > You shouldb't expect that LVM will play the monitoring role here - either > put some kind of monitoring, or create your own solution to monitor that > fact. > > Best Regards, > Strahil Nikolov > On Jun 26, 2019 15:41, Dmitry Filonov wrote: > > Hi, > am really new to gluster and have couple question that I hope will be > really easy to answer. Just couldn't find anything on that myself. > > I did set up replica 3 gluster over 3 nodes with 2TB SSD in each node. > To have snapshot functionality I have created thin pool of the size of VG > (1.82TB) and then 1.75TB thin LVM inside on each of the bricks. > It worked just fine until I scheduled creating hourly and daily snapshots > on that gluster volume. In less than 2 days my thin volume got full and > crashed. > Not refused creating new snapshots, but just died as LVM couldn't perform > any operations there anymore. > So my first question is how to prevent this from happening. I could create > smaller thin LVM, but I still have no control how much space I would need > for snapshots. I was hoping to see some warnings and errors while creeating > snapshots, but not failed LVM/Gluster. > > The second question is related but not that important. Is there a way to > schedule snapshot removal in cron? gluster snapshot delete requires > interactive confirmation and I don't see any flag to auto-confirm snapshot > removal. > > Thank you, > > Fil > > -- > Dmitry Filonov > Linux Administrator > SBGrid Core | Harvard Medical School > 250 Longwood Ave, SGM-114 > Boston, MA 02115 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Jun 27 10:06:48 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Thu, 27 Jun 2019 13:06:48 +0300 Subject: [Gluster-users] very poor performance on Debian Buster Message-ID: <2h1nisnfpybdfnbr5sti7d8n.1561630008584@email.android.com> I forgot to ask what kind of storage do you have on your gluster machines. Is it rotational SATA, SAS or NVMe ? Best Regards, Strahil NikolovOn Jun 27, 2019 11:33, richard lucassen wrote: > > I run glusterfs server on a sys-V version of Debian Gluster. The > machine is an 8-core/256GB/SSD server and I want to copy 400GB to a > mounted gluster device. The copy now runs for more than 3 days and it > has only copied 243GB. The network activity is around 4 to 8 Mbit. > > Is this a known issue of version 5.5-3? I did not touch the defaults. > > R. > > -- > richard lucassen > http://contact.xaq.nl/ > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From hunter86_bg at yahoo.com Thu Jun 27 10:11:36 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Thu, 27 Jun 2019 13:11:36 +0300 Subject: [Gluster-users] snapshots questions Message-ID: Don't invest too much time. With Stratis, I expect better reporting/warning to be available. Yet, that's only an expectation. Best Regards, Strahil NikolovOn Jun 27, 2019 13:02, Dmitry Filonov wrote: > > Thank you, Strahil - > ?I was pointed to --mode=script option that works perfectly for me. > > As for snapshots - am spoiled with ZFS that has much better reporting and tools to work with snapshots. Will do some internal monitoring and checks around snapshots. Was hoping am just missing something. > > Thanks, > > Fil > > > On Thu, Jun 27, 2019, 1:53 AM Strahil wrote: >> >> If it expects a single word like 'y' or 'yes' , then you can try: >> echo 'yes' | gluster snapshot delete? $(/my/script/to/find/oldest/snapshot) >> >> Of course, you should put some logic in order to find the oldest snapshot, bit that won't be hard as date & time of creation should be in the name. >> >> About the situation with the LVM, it is expected that the user takes care of that, as thin LVs can be overcommitted. >> >> For example my arbiter has 20 GB thin LV pool and I have 4 20GB LVs inside that pool. >> As long as I don't exhaust the pool's storage - I'm fine. >> >> You shouldb't expect that LVM will play the monitoring role here - either put some kind of monitoring, or create your own solution to monitor that fact. >> >> Best Regards, >> Strahil Nikolov >> >> On Jun 26, 2019 15:41, Dmitry Filonov wrote: >>> >>> Hi, >>> ?am really new to gluster and have couple question that I hope will be really easy to answer. Just couldn't find anything on that myself. >>> >>> I did set up replica 3 gluster over 3 nodes with 2TB SSD in each node. >>> To have snapshot functionality I have created thin pool of the size of VG (1.82TB) and then 1.75TB thin LVM inside on each of the bricks. >>> It worked just fine until I scheduled creating hourly and daily snapshots on that gluster volume. In less than 2 days my thin volume got full and crashed. >>> Not refused creating new snapshots, but just died as LVM couldn't perform any operations there anymore. >>> So my first question is how to prevent this from happening. I could create smaller thin LVM, but I still have no control how much space I would need for snapshots. I was hoping to see some warnings and errors while creeating snapshots, but not failed LVM/Gluster. >>> >>> The second question is related but not that important. Is there a way to schedule snapshot removal in cron? gluster snapshot delete requires interactive confirmation and I don't see any flag to auto-confirm snapshot removal. >>> >>> Thank you, >>> >>> Fil >>> >>> -- >>> Dmitry Filonov >>> Linux Administrator >>> SBGrid Core |?Harvard Medical School >>> 250 Longwood Ave, SGM-114 >>> Boston, MA 02115 -------------- next part -------------- An HTML attachment was scrubbed... URL: From laurentfdumont at gmail.com Thu Jun 27 13:48:59 2019 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Thu, 27 Jun 2019 09:48:59 -0400 Subject: [Gluster-users] Gluster CLI - No output/no info displayed. In-Reply-To: References: Message-ID: There doesn't seem to be a glusterfs-cli package for Debian. I don't remember installing one. To rule-out anything weird, I've upgraded from 6.0.1 to 6.3.1 from the official packages. I can now use the CLI. I feel like it might have been a mismatch between 6.0.1 (glusterfs-server) and 6.3.1 for the common libraries. coldadmin at gluster01:~$ sudo dpkg -l | grep -i gluster ii glusterfs-client 6.3-1 amd64 clustered file-system (client package) ii glusterfs-common 6.3-1 amd64 GlusterFS common libraries and translator modules ii glusterfs-server 6.3-1 amd64 clustered file-system (server package) coldadmin at gluster01:~$ sudo gluster volume list media proxmox-vol1 On Thu, Jun 27, 2019 at 12:14 AM Sanju Rakonde wrote: > > > On Sun, Jun 23, 2019 at 4:54 AM Laurent Dumont > wrote: > >> Hi, >> >> I am facing a strange issue with the Gluster CLI. No matter what command >> is used, the CLI doesn't output anything. It's a gluster with a single >> node. The volumes themselves are working without any issues. >> >> coldadmin at gluster01:~$ sudo dpkg -l | grep -i gluster >>> ii glusterfs-client 6.3-1 amd64 >>> clustered file-system (client package) >>> ii glusterfs-common 6.3-1 amd64 >>> GlusterFS common libraries and translator modules >>> ii glusterfs-server 6.0-1 amd64 >>> clustered file-system (server package) >> >> > Looks like, you've missed installing glusterfs-cli package. This is why > you are unable to execute any commands using cli. > > Thanks, > Sanju > >> >>> coldadmin at gluster01:~$ sudo gluster --version >>> glusterfs 6.0 >>> Repository revision: git://git.gluster.org/glusterfs.git >>> Copyright (c) 2006-2016 Red Hat, Inc. >>> GlusterFS comes with ABSOLUTELY NO WARRANTY. >>> It is licensed to you under your choice of the GNU Lesser >>> General Public License, version 3 or any later version (LGPLv3 >>> or later), or the GNU General Public License, version 2 (GPLv2), >>> in all cases as published by the Free Software Foundation. >> >> >> coldadmin at gluster01:~$ sudo gluster volume info all >>> coldadmin at gluster01:~$ >> >> >> The cli.log is filled with : >> >> root at gluster01:/var/log/glusterfs# tail cli.log >>> [2019-06-22 23:10:31.397945] I [cli.c:834:main] 0-cli: Started running >>> gluster with version 6.0 >>> [2019-06-22 23:10:31.398325] E [mem-pool.c:868:mem_get] >>> (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x18705) [0x7fc937069705] >>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x1863c) [0x7fc93706963c] >>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get+0x98) >>> [0x7fc9370a12f8] ) 0-mem-pool: invalid argument [Invalid argument] >>> [2019-06-22 23:13:20.895606] I [cli.c:834:main] 0-cli: Started running >>> gluster with version 6.0 >>> [2019-06-22 23:13:20.895887] E [mem-pool.c:868:mem_get] >>> (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x18705) [0x7f65fc2e0705] >>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x1863c) [0x7f65fc2e063c] >>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get+0x98) >>> [0x7f65fc3182f8] ) 0-mem-pool: invalid argument [Invalid argument] >>> [2019-06-22 23:13:22.603313] I [cli.c:834:main] 0-cli: Started running >>> gluster with version 6.0 >>> [2019-06-22 23:13:22.603581] E [mem-pool.c:868:mem_get] >>> (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x18705) [0x7fe7cd672705] >>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x1863c) [0x7fe7cd67263c] >>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get+0x98) >>> [0x7fe7cd6aa2f8] ) 0-mem-pool: invalid argument [Invalid argument] >>> [2019-06-22 23:13:47.945239] I [cli.c:834:main] 0-cli: Started running >>> gluster with version 6.0 >>> [2019-06-22 23:13:47.945481] E [mem-pool.c:868:mem_get] >>> (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x18705) [0x7fd31bb2c705] >>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x1863c) [0x7fd31bb2c63c] >>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get+0x98) >>> [0x7fd31bb642f8] ) 0-mem-pool: invalid argument [Invalid argument] >>> [2019-06-22 23:14:09.015151] I [cli.c:834:main] 0-cli: Started running >>> gluster with version 6.0 >>> [2019-06-22 23:14:09.015461] E [mem-pool.c:868:mem_get] >>> (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x18705) [0x7f471963b705] >>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x1863c) [0x7f471963b63c] >>> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get+0x98) >>> [0x7f47196732f8] ) 0-mem-pool: invalid argument [Invalid argument] >> >> >> The strange part is that cmd_history.log is not logging my commands >> anymore :( >> >> root at gluster01:/var/log/glusterfs# tail cmd_history.log >>> [2019-05-19 21:45:27.814905] : volume start media : FAILED : Volume >>> media already started >>> [2019-05-19 21:45:32.630507] : volume status all : SUCCESS >>> [2019-05-19 21:45:32.632032] : volume status all : SUCCESS >>> [2019-05-19 21:45:32.644639] : volume status all : SUCCESS >>> [2019-05-19 21:46:21.691147] : volume status all : SUCCESS >>> [2019-05-19 21:46:21.692664] : volume status all : SUCCESS >>> [2019-05-19 21:46:21.706471] : volume status all : SUCCESS >>> [2019-05-19 21:48:15.418905] : volume status all : SUCCESS >>> [2019-05-19 21:48:15.420487] : volume status all : SUCCESS >>> [2019-05-19 21:48:15.422784] : volume status all : SUCCESS >> >> >> I have this old bug from 2015, but the issue seems to be purely cosmetic. >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1243753 >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Thanks, > Sanju > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailinglists at lucassen.org Thu Jun 27 20:57:40 2019 From: mailinglists at lucassen.org (richard lucassen) Date: Thu, 27 Jun 2019 22:57:40 +0200 Subject: [Gluster-users] very poor performance on Debian Buster In-Reply-To: <2h1nisnfpybdfnbr5sti7d8n.1561630008584@email.android.com> References: <2h1nisnfpybdfnbr5sti7d8n.1561630008584@email.android.com> Message-ID: <20190627225740.b74f7768a595ba861568c065@lucassen.org> On Thu, 27 Jun 2019 13:06:48 +0300 Strahil wrote: > I forgot to ask what kind of storage do you have on your gluster > machines. Is it rotational SATA, SAS or NVMe ? I'm a bit busy now, I'll have a look at all your suggestions when I have some time. Hope I'll have some time during the weekend. BTW, it is all hardware raid1 with 3.4GB SSD's. The server is a beast, a Dell R630 R. -- richard lucassen http://contact.xaq.nl/ From dave at sherohman.org Fri Jun 28 09:03:40 2019 From: dave at sherohman.org (Dave Sherohman) Date: Fri, 28 Jun 2019 04:03:40 -0500 Subject: [Gluster-users] Removing subvolume from dist/rep volume In-Reply-To: References: <20190625091330.GU19805@sherohman.org> Message-ID: <20190628090339.GV19805@sherohman.org> On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote: > On Tue, 25 Jun 2019 at 15:26, Dave Sherohman wrote: > > My objective is to remove nodes B and C entirely. > > > > First up is to pull their bricks from the volume: > > > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start > > (wait for data to be migrated) > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit > > > > > There are some edge cases that may prevent a file from being migrated > during a remove-brick. Please do the following after this: > > 1. Check the remove-brick status for any failures. If there are any, > check the rebalance log file for errors. > 2. Even if there are no failures, check the removed bricks to see if any > files have not been migrated. If there are any, please check that they are > valid files on the brick and copy them to the volume from the brick to the > mount point. > > The rest of the steps look good. Apparently, they weren't quite right. I tried it and it just gives me the usage notes in return. Transcript of the commands and output is below. Any insight on how I got the syntax wrong? --- cut here --- root at merlin:/# gluster volume status Status of volume: palantir Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick saruman:/var/local/brick0/data 49153 0 Y 17995 Brick gandalf:/var/local/brick0/data 49153 0 Y 9415 Brick merlin:/var/local/arbiter1/data 49170 0 Y 35034 Brick azathoth:/var/local/brick0/data 49153 0 Y 25312 Brick yog-sothoth:/var/local/brick0/data 49152 0 Y 10671 Brick merlin:/var/local/arbiter2/data 49171 0 Y 35043 Brick cthulhu:/var/local/brick0/data 49153 0 Y 21925 Brick mordiggian:/var/local/brick0/data 49152 0 Y 12368 Brick merlin:/var/local/arbiter3/data 49172 0 Y 35050 Self-heal Daemon on localhost N/A N/A Y 1209 Self-heal Daemon on saruman.lub.lu.se N/A N/A Y 23253 Self-heal Daemon on gandalf.lub.lu.se N/A N/A Y 9542 Self-heal Daemon on mordiggian.lub.lu.se N/A N/A Y 11016 Self-heal Daemon on yog-sothoth.lub.lu.se N/A N/A Y 8126 Self-heal Daemon on cthulhu.lub.lu.se N/A N/A Y 30998 Self-heal Daemon on azathoth.lub.lu.se N/A N/A Y 34399 Task Status of Volume palantir ------------------------------------------------------------------------------ Task : Rebalance ID : e58bc091-5809-4364-af83-2b89bc5c7106 Status : completed root at merlin:/# gluster volume remove-brick palantir saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data Usage: volume remove-brick [replica ] ... root at merlin:/# gluster volume remove-brick palantir replica 3 arbiter 1 saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data Usage: volume remove-brick [replica ] ... root at merlin:/# gluster volume remove-brick palantir replica 3 saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data Usage: volume remove-brick [replica ] ... --- cut here --- -- Dave Sherohman From nbalacha at redhat.com Fri Jun 28 09:56:00 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Fri, 28 Jun 2019 15:26:00 +0530 Subject: [Gluster-users] Removing subvolume from dist/rep volume In-Reply-To: <20190628090339.GV19805@sherohman.org> References: <20190625091330.GU19805@sherohman.org> <20190628090339.GV19805@sherohman.org> Message-ID: On Fri, 28 Jun 2019 at 14:34, Dave Sherohman wrote: > On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote: > > On Tue, 25 Jun 2019 at 15:26, Dave Sherohman wrote: > > > My objective is to remove nodes B and C entirely. > > > > > > First up is to pull their bricks from the volume: > > > > > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start > > > (wait for data to be migrated) > > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit > > > > > > > > There are some edge cases that may prevent a file from being migrated > > during a remove-brick. Please do the following after this: > > > > 1. Check the remove-brick status for any failures. If there are any, > > check the rebalance log file for errors. > > 2. Even if there are no failures, check the removed bricks to see if > any > > files have not been migrated. If there are any, please check that > they are > > valid files on the brick and copy them to the volume from the brick > to the > > mount point. > > > > The rest of the steps look good. > > Apparently, they weren't quite right. I tried it and it just gives me > the usage notes in return. Transcript of the commands and output is below. > > Any insight on how I got the syntax wrong? > > --- cut here --- > root at merlin:/# gluster volume status > Status of volume: palantir > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick saruman:/var/local/brick0/data 49153 0 Y > 17995 > Brick gandalf:/var/local/brick0/data 49153 0 Y > 9415 > Brick merlin:/var/local/arbiter1/data 49170 0 Y > 35034 > Brick azathoth:/var/local/brick0/data 49153 0 Y > 25312 > Brick yog-sothoth:/var/local/brick0/data 49152 0 Y > 10671 > Brick merlin:/var/local/arbiter2/data 49171 0 Y > 35043 > Brick cthulhu:/var/local/brick0/data 49153 0 Y > 21925 > Brick mordiggian:/var/local/brick0/data 49152 0 Y > 12368 > Brick merlin:/var/local/arbiter3/data 49172 0 Y > 35050 > Self-heal Daemon on localhost N/A N/A Y > 1209 > Self-heal Daemon on saruman.lub.lu.se N/A N/A Y > 23253 > Self-heal Daemon on gandalf.lub.lu.se N/A N/A Y > 9542 > Self-heal Daemon on mordiggian.lub.lu.se N/A N/A Y > 11016 > Self-heal Daemon on yog-sothoth.lub.lu.se N/A N/A Y > 8126 > Self-heal Daemon on cthulhu.lub.lu.se N/A N/A Y > 30998 > Self-heal Daemon on azathoth.lub.lu.se N/A N/A Y > 34399 > > Task Status of Volume palantir > > ------------------------------------------------------------------------------ > Task : Rebalance > ID : e58bc091-5809-4364-af83-2b89bc5c7106 > Status : completed > > root at merlin:/# gluster volume remove-brick palantir > saruman:/var/local/brick0/data gandalf:/var/local/brick0/data > merlin:/var/local/arbiter1/data > > You had it right in the first email. gluster volume remove-brick palantir replica 3 arbiter 1 saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data *start* Usage: > volume remove-brick [replica ] ... > > > root at merlin:/# gluster volume remove-brick palantir replica 3 arbiter 1 > saruman:/var/local/brick0/data gandalf:/var/local/brick0/data > merlin:/var/local/arbiter1/data > > Usage: > volume remove-brick [replica ] ... > > > root at merlin:/# gluster volume remove-brick palantir replica 3 > saruman:/var/local/brick0/data gandalf:/var/local/brick0/data > merlin:/var/local/arbiter1/data > > Usage: > volume remove-brick [replica ] ... > > --- cut here --- > > -- > Dave Sherohman > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave at sherohman.org Fri Jun 28 11:06:14 2019 From: dave at sherohman.org (Dave Sherohman) Date: Fri, 28 Jun 2019 06:06:14 -0500 Subject: [Gluster-users] Removing subvolume from dist/rep volume In-Reply-To: <20190628090339.GV19805@sherohman.org> References: <20190625091330.GU19805@sherohman.org> <20190628090339.GV19805@sherohman.org> Message-ID: <20190628110614.GW19805@sherohman.org> OK, I'm just careless. Forgot to include "start" after the list of bricks... On Fri, Jun 28, 2019 at 04:03:40AM -0500, Dave Sherohman wrote: > On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote: > > On Tue, 25 Jun 2019 at 15:26, Dave Sherohman wrote: > > > My objective is to remove nodes B and C entirely. > > > > > > First up is to pull their bricks from the volume: > > > > > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start > > > (wait for data to be migrated) > > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit > > > > > > > > There are some edge cases that may prevent a file from being migrated > > during a remove-brick. Please do the following after this: > > > > 1. Check the remove-brick status for any failures. If there are any, > > check the rebalance log file for errors. > > 2. Even if there are no failures, check the removed bricks to see if any > > files have not been migrated. If there are any, please check that they are > > valid files on the brick and copy them to the volume from the brick to the > > mount point. > > > > The rest of the steps look good. > > Apparently, they weren't quite right. I tried it and it just gives me > the usage notes in return. Transcript of the commands and output is below. > > Any insight on how I got the syntax wrong? > > --- cut here --- > root at merlin:/# gluster volume status > Status of volume: palantir > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick saruman:/var/local/brick0/data 49153 0 Y 17995 > Brick gandalf:/var/local/brick0/data 49153 0 Y 9415 > Brick merlin:/var/local/arbiter1/data 49170 0 Y 35034 > Brick azathoth:/var/local/brick0/data 49153 0 Y 25312 > Brick yog-sothoth:/var/local/brick0/data 49152 0 Y 10671 > Brick merlin:/var/local/arbiter2/data 49171 0 Y 35043 > Brick cthulhu:/var/local/brick0/data 49153 0 Y 21925 > Brick mordiggian:/var/local/brick0/data 49152 0 Y 12368 > Brick merlin:/var/local/arbiter3/data 49172 0 Y 35050 > Self-heal Daemon on localhost N/A N/A Y 1209 > Self-heal Daemon on saruman.lub.lu.se N/A N/A Y 23253 > Self-heal Daemon on gandalf.lub.lu.se N/A N/A Y 9542 > Self-heal Daemon on mordiggian.lub.lu.se N/A N/A Y 11016 > Self-heal Daemon on yog-sothoth.lub.lu.se N/A N/A Y 8126 > Self-heal Daemon on cthulhu.lub.lu.se N/A N/A Y 30998 > Self-heal Daemon on azathoth.lub.lu.se N/A N/A Y 34399 > > Task Status of Volume palantir > ------------------------------------------------------------------------------ > Task : Rebalance > ID : e58bc091-5809-4364-af83-2b89bc5c7106 > Status : completed > > root at merlin:/# gluster volume remove-brick palantir saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data > > Usage: > volume remove-brick [replica ] ... > > root at merlin:/# gluster volume remove-brick palantir replica 3 arbiter 1 saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data > > Usage: > volume remove-brick [replica ] ... > > root at merlin:/# gluster volume remove-brick palantir replica 3 saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data > > Usage: > volume remove-brick [replica ] ... > --- cut here --- > > -- > Dave Sherohman > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Dave Sherohman From hunter86_bg at yahoo.com Fri Jun 28 14:17:07 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Fri, 28 Jun 2019 14:17:07 +0000 (UTC) Subject: [Gluster-users] Removing subvolume from dist/rep volume In-Reply-To: <20190628090339.GV19805@sherohman.org> References: <20190625091330.GU19805@sherohman.org> <20190628090339.GV19805@sherohman.org> Message-ID: <1252218199.161596.1561731427601@mail.yahoo.com> I think it should be like :gluster volume remove-brick myvol A:/data B:/data C:/data startgluster volume remove-brick myvol A:/data B:/data C/data commit (force) Best Regards,Strahil Nikolov ? ?????, 28 ??? 2019 ?., 5:03:48 ?. ???????-4, Dave Sherohman ??????: On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote: > On Tue, 25 Jun 2019 at 15:26, Dave Sherohman wrote: > > My objective is to remove nodes B and C entirely. > > > > First up is to pull their bricks from the volume: > > > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start > > (wait for data to be migrated) > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit > > > > > There are some edge cases that may prevent a file from being migrated > during a remove-brick. Please do the following after this: > >? ? 1. Check the remove-brick status for any failures.? If there are any, >? ? check the rebalance log file for errors. >? ? 2. Even if there are no failures, check the removed bricks to see if any >? ? files have not been migrated. If there are any, please check that they are >? ? valid files on the brick and copy them to the volume from the brick to the >? ? mount point. > > The rest of the steps look good. Apparently, they weren't quite right.? I tried it and it just gives me the usage notes in return.? Transcript of the commands and output is below. Any insight on how I got the syntax wrong? --- cut here --- root at merlin:/# gluster volume status Status of volume: palantir Gluster process? ? ? ? ? ? ? ? ? ? ? ? ? ? TCP Port? RDMA Port? Online? Pid ------------------------------------------------------------------------------ Brick saruman:/var/local/brick0/data? ? ? ? 49153? ? 0? ? ? ? ? Y? ? ? 17995 Brick gandalf:/var/local/brick0/data? ? ? ? 49153? ? 0? ? ? ? ? Y? ? ? 9415 Brick merlin:/var/local/arbiter1/data? ? ? 49170? ? 0? ? ? ? ? Y? ? ? 35034 Brick azathoth:/var/local/brick0/data? ? ? 49153? ? 0? ? ? ? ? Y? ? ? 25312 Brick yog-sothoth:/var/local/brick0/data? ? 49152? ? 0? ? ? ? ? Y? ? ? 10671 Brick merlin:/var/local/arbiter2/data? ? ? 49171? ? 0? ? ? ? ? Y? ? ? 35043 Brick cthulhu:/var/local/brick0/data? ? ? ? 49153? ? 0? ? ? ? ? Y? ? ? 21925 Brick mordiggian:/var/local/brick0/data? ? 49152? ? 0? ? ? ? ? Y? ? ? 12368 Brick merlin:/var/local/arbiter3/data? ? ? 49172? ? 0? ? ? ? ? Y? ? ? 35050 Self-heal Daemon on localhost? ? ? ? ? ? ? N/A? ? ? N/A? ? ? ? Y? ? ? 1209 Self-heal Daemon on saruman.lub.lu.se? ? ? N/A? ? ? N/A? ? ? ? Y? ? ? 23253 Self-heal Daemon on gandalf.lub.lu.se? ? ? N/A? ? ? N/A? ? ? ? Y? ? ? 9542 Self-heal Daemon on mordiggian.lub.lu.se? ? N/A? ? ? N/A? ? ? ? Y? ? ? 11016 Self-heal Daemon on yog-sothoth.lub.lu.se? N/A? ? ? N/A? ? ? ? Y? ? ? 8126 Self-heal Daemon on cthulhu.lub.lu.se? ? ? N/A? ? ? N/A? ? ? ? Y? ? ? 30998 Self-heal Daemon on azathoth.lub.lu.se? ? ? N/A? ? ? N/A? ? ? ? Y? ? ? 34399 Task Status of Volume palantir ------------------------------------------------------------------------------ Task? ? ? ? ? ? ? ? : Rebalance? ? ? ? ? ID? ? ? ? ? ? ? ? ? : e58bc091-5809-4364-af83-2b89bc5c7106 Status? ? ? ? ? ? ? : completed? ? ? ? ? root at merlin:/# gluster volume remove-brick palantir saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data Usage: volume remove-brick [replica ] ... root at merlin:/# gluster volume remove-brick palantir replica 3 arbiter 1 saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data Usage: volume remove-brick [replica ] ... root at merlin:/# gluster volume remove-brick palantir replica 3 saruman:/var/local/brick0/data gandalf:/var/local/brick0/data merlin:/var/local/arbiter1/data Usage: volume remove-brick [replica ] ... --- cut here --- -- Dave Sherohman _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave at sherohman.org Fri Jun 28 14:24:54 2019 From: dave at sherohman.org (Dave Sherohman) Date: Fri, 28 Jun 2019 09:24:54 -0500 Subject: [Gluster-users] Removing subvolume from dist/rep volume In-Reply-To: References: <20190625091330.GU19805@sherohman.org> Message-ID: <20190628142454.GX19805@sherohman.org> On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote: > There are some edge cases that may prevent a file from being migrated > during a remove-brick. Please do the following after this: > > 1. Check the remove-brick status for any failures. If there are any, > check the rebalance log file for errors. > 2. Even if there are no failures, check the removed bricks to see if any > files have not been migrated. If there are any, please check that they are > valid files on the brick and copy them to the volume from the brick to the > mount point. Well, looks like I hit one of those edge cases. Probably because of some issues around a reboot last September which left a handful of files in a state where self-heal identified them as needing to be healed, but incapable of actually healing them. (Check the list archives for "Kicking a stuck heal", posted on Sept 4, if you want more details.) So I'm getting 9 failures on the arbiter (merlin), 8 on one data brick (gandalf), and 3 on the other (saruman). Looking in /var/log/gluster/palantir-rebalance.log, I see those numbers of migrate file failed: /.shard/291e9749-2d1b-47af-ad53-3a09ad4e64c6.229: failed to lock file on palantir-replicate-1 [Stale file handle] errors. Also, merlin has four errors, and gandalf has one, of the form: Gfid mismatch detected for /0f500288-ff62-4f0b-9574-53f510b4159f.2898>, 9f00c0fe-58c3-457e-a2e6-f6a006d1cfc6 on palantir-client-7 and 08bb7cdc-172b-4c21-916a-2a244c095a3e on palantir-client-1. There are no gfid mismatches recorded on saruman. All of the gfid mismatches are for and (on saruman) appear to correspond to 0-byte files (e.g., .shard/0f500288-ff62-4f0b-9574-53f510b4159f.2898, in the case of the gfid mismatch quoted above). For both types of errors, all affected files are in .shard/ and have UUID-style names, so I have no idea which actual files they belong to. File sizes are generally either 0 bytes or 4M (exactly), although one of them has a size slightly larger than 3M. So I'm assuming they're chunks of larger files (which would be almost all the files on the volume - it's primarily holding disk image files for kvm servers). Web searches generally seem to consider gfid mismatches to be a form of split-brain, but `gluster volume heal palantir info split-brain` shows "Number of entries in split-brain: 0" for all bricks, including those bricks which are reporting gfid mismatches. Given all that, how do I proceed with cleaning up the stale handle issues? I would guess that this will involve somehow converting the shard filename to a "real" filename, then shutting down the corresponding VM and maybe doing some additional cleanup. And then there's the gfid mismatches. Since they're for 0-byte files, is it safe to just ignore them on the assumption that they only hold metadata? Or do I need to do some kind of split-brain resolution on them (even though gluster says no files are in split-brain)? Finally, a listing of /var/local/brick0/data/.shard on saruman, in case any of the information it contains (like file sizes/permissions) might provide clues to resolving the errors: --- cut here --- root at saruman:/var/local/brick0/data/.shard# ls -l total 63996 -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018 0f500288-ff62-4f0b-9574-53f510b4159f.2864 -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018 0f500288-ff62-4f0b-9574-53f510b4159f.2868 -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018 0f500288-ff62-4f0b-9574-53f510b4159f.2879 -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018 0f500288-ff62-4f0b-9574-53f510b4159f.2898 -rw------- 2 root libvirt-qemu 4194304 May 17 14:42 291e9749-2d1b-47af-ad53-3a09ad4e64c6.229 -rw------- 2 root libvirt-qemu 4194304 Jun 24 09:10 291e9749-2d1b-47af-ad53-3a09ad4e64c6.925 -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 26 12:54 2df12cb0-6cf4-44ae-8b0a-4a554791187e.266 -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 26 16:30 2df12cb0-6cf4-44ae-8b0a-4a554791187e.820 -rw-r--r-- 2 root libvirt-qemu 4194304 Jun 17 20:22 323186b1-6296-4cbe-8275-b940cc9d65cf.27466 -rw-r--r-- 2 root libvirt-qemu 4194304 Jun 27 05:01 323186b1-6296-4cbe-8275-b940cc9d65cf.32575 -rw-r--r-- 2 root libvirt-qemu 3145728 Jun 11 13:23 323186b1-6296-4cbe-8275-b940cc9d65cf.3448 ---------T 2 root libvirt-qemu 0 Jun 28 14:26 4cd094f4-0344-4660-98b0-83249d5bd659.22998 -rw------- 2 root libvirt-qemu 4194304 Mar 13 2018 6cdd2e5c-f49e-492b-8039-239e71577836.1302 ---------T 2 root libvirt-qemu 0 Jun 28 13:22 7530a2d1-d6ec-4a04-95a2-da1f337ac1ad.47131 ---------T 2 root libvirt-qemu 0 Jun 28 13:22 7530a2d1-d6ec-4a04-95a2-da1f337ac1ad.52615 -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 27 08:56 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.100 -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 27 11:29 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.106 -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 28 02:35 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.137 -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 4 2018 9544617c-901c-4613-a94b-ccfad4e38af1.165 -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 4 2018 9544617c-901c-4613-a94b-ccfad4e38af1.168 -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 5 2018 9544617c-901c-4613-a94b-ccfad4e38af1.193 -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 6 2018 9544617c-901c-4613-a94b-ccfad4e38af1.3800 ---------T 2 root libvirt-qemu 0 Jun 28 15:02 b48a5934-5e5b-4918-8193-6ff36f685f70.46559 -rw-rw---- 2 root libvirt-qemu 0 Oct 12 2018 c5bde2f2-3361-4d1a-9c88-28751ef74ce6.3568 -rw-r--r-- 2 root libvirt-qemu 4194304 Apr 13 2018 c953c676-152d-4826-80ff-bd307fa7f6e5.10724 -rw-r--r-- 2 root libvirt-qemu 4194304 Apr 11 2018 c953c676-152d-4826-80ff-bd307fa7f6e5.3101 --- cut here --- -- Dave Sherohman From lists at localguru.de Fri Jun 28 14:43:36 2019 From: lists at localguru.de (Marcus Schopen) Date: Fri, 28 Jun 2019 16:43:36 +0200 Subject: [Gluster-users] gluster and qcow2 images Message-ID: <8a1718af6a0c270c55b34955a380ef58962a6ae9.camel@localguru.de> Hi, does anyone have experience with gluster in KVM environments? I would like to hold qcow2 images of a KVM host with a second KVM host in sync. Unfortunately, shared storage is not available to me, only the two KVM hosts. In principle, it would be sufficient for me - in case of a failure of the first KVM host - to start the guests on the second host by hand without restoring the images from the nightly backup first. The question is, is glusterfs a sensible solution here or should one better use other approaches e.g. DRBD. I have read contradictory statements about this, many advise against using gluster for qcow2 images, some report no problems at all. Cheers Marcus From hunter86_bg at yahoo.com Fri Jun 28 15:06:48 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Fri, 28 Jun 2019 15:06:48 +0000 (UTC) Subject: [Gluster-users] gluster and qcow2 images In-Reply-To: <8a1718af6a0c270c55b34955a380ef58962a6ae9.camel@localguru.de> References: <8a1718af6a0c270c55b34955a380ef58962a6ae9.camel@localguru.de> Message-ID: <336988738.187029.1561734408544@mail.yahoo.com> Hi Markus, this is one of the popular approaches in oVirt. Currently I'm running oVirt (KVM with management layer) over a shared storage - gluster v6.3. It's doable and if properly configured - you can rely on it without any issues. Have you considered oVirt as a solution instead of pure KVM ? Best Regards,Strahil Nikolov ? ?????, 28 ??? 2019 ?., 10:54:49 ?. ???????-4, Marcus Schopen ??????: Hi, does anyone have experience with gluster in KVM environments? I would like to hold qcow2 images of a KVM host with a second KVM host in sync. Unfortunately, shared storage is not available to me, only the two KVM hosts. In principle, it would be sufficient for me - in case of a failure of the first KVM host - to start the guests on the second host by hand without restoring the images from the nightly backup first. The question is, is glusterfs a sensible solution here or should one better use other approaches e.g. DRBD. I have read contradictory statements about this, many advise against using gluster for qcow2 images, some report no problems at all. Cheers Marcus _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthewb at uvic.ca Fri Jun 28 18:19:15 2019 From: matthewb at uvic.ca (Matthew Benstead) Date: Fri, 28 Jun 2019 11:19:15 -0700 Subject: [Gluster-users] Geo-Replication Changelog Error - is a directory Message-ID: <6ffbd4a4-0755-4a4b-9150-8726007bf382@uvic.ca> Hello, I'm having some issues with successfully establishing a geo-repolication session between a 7-server distribute cluster as the primary volume, and a 2 server distribute cluster as the secondary volume. Both are running the same version of gluster on CentOS 7: glusterfs-5.3-2.el7.x86_64. I was able to setup the replication keys, user, groups, etc and establish the session, but it goes faulty quickly after starting. The error from the gsyncd.log is: Changelog register failed error=[Errno 21] Is a directory We made an attempt about 2 years ago to configure geo-replication but abandoned it, now with a new cluster I wanted to get it setup, but it looks like changelogs have been accumulating since then: [root at gluster07 .glusterfs]# ls -lh changelogs > /var/tmp/changelogs.txt [root at gluster07 ~]# head /var/tmp/changelogs.txt total 11G -rw-r--r--. 1 root root 130 Jun 27 13:48 CHANGELOG -rw-r--r--. 1 root root 2.6K Jun 19 2017 CHANGELOG.1497891971 -rw-r--r--. 1 root root 470 Jun 19 2017 CHANGELOG.1497892055 -rw-r--r--. 1 root root 186 Jun 19 2017 CHANGELOG.1497892195 -rw-r--r--. 1 root root 458 Jun 19 2017 CHANGELOG.1497892308 -rw-r--r--. 1 root root 188 Jun 19 2017 CHANGELOG.1497892491 -rw-r--r--. 1 root root 862 Jun 19 2017 CHANGELOG.1497892828 -rw-r--r--. 1 root root 11K Jun 19 2017 CHANGELOG.1497892927 -rw-r--r--. 1 root root 4.4K Jun 19 2017 CHANGELOG.1497892941 [root at gluster07 ~]# tail /var/tmp/changelogs.txt -rw-r--r--. 1 root root 130 Jun 27 13:47 CHANGELOG.1561668463 -rw-r--r--. 1 root root 130 Jun 27 13:47 CHANGELOG.1561668477 -rw-r--r--. 1 root root 130 Jun 27 13:48 CHANGELOG.1561668491 -rw-r--r--. 1 root root 130 Jun 27 13:48 CHANGELOG.1561668506 -rw-r--r--. 1 root root 130 Jun 27 13:48 CHANGELOG.1561668521 -rw-r--r--. 1 root root 130 Jun 27 13:48 CHANGELOG.1561668536 -rw-r--r--. 1 root root 130 Jun 27 13:49 CHANGELOG.1561668550 -rw-r--r--. 1 root root 130 Jun 27 13:49 CHANGELOG.1561668565 drw-------. 2 root root 10 Jun 19 2017 csnap drw-------. 2 root root 37 Jun 19 2017 htime Could this be related? When deleting the replication session I made sure to try the 'delete reset-sync-time' option, but it failed with: gsyncd failed to delete session info for storage and 10.0.231.81::pcic-backup peers geo-replication command failed Here is the volume info: [root at gluster07 ~]# gluster volume info storage Volume Name: storage Type: Distribute Volume ID: 6f95525a-94d7-4174-bac4-e1a18fe010a2 Status: Started Snapshot Count: 0 Number of Bricks: 7 Transport-type: tcp Bricks: Brick1: 10.0.231.50:/mnt/raid6-storage/storage Brick2: 10.0.231.51:/mnt/raid6-storage/storage Brick3: 10.0.231.52:/mnt/raid6-storage/storage Brick4: 10.0.231.53:/mnt/raid6-storage/storage Brick5: 10.0.231.54:/mnt/raid6-storage/storage Brick6: 10.0.231.55:/mnt/raid6-storage/storage Brick7: 10.0.231.56:/mnt/raid6-storage/storage Options Reconfigured: features.quota-deem-statfs: on features.read-only: off features.inode-quota: on features.quota: on performance.readdir-ahead: on nfs.disable: on geo-replication.indexing: on geo-replication.ignore-pid-check: on changelog.changelog: on transport.address-family: inet Any ideas? Thanks, -Matthew From rightkicktech at gmail.com Sat Jun 29 04:05:24 2019 From: rightkicktech at gmail.com (Alex K) Date: Sat, 29 Jun 2019 07:05:24 +0300 Subject: [Gluster-users] gluster and qcow2 images In-Reply-To: <8a1718af6a0c270c55b34955a380ef58962a6ae9.camel@localguru.de> References: <8a1718af6a0c270c55b34955a380ef58962a6ae9.camel@localguru.de> Message-ID: Hi On Fri, Jun 28, 2019, 17:54 Marcus Schopen wrote: > Hi, > > does anyone have experience with gluster in KVM environments? I would > like to hold qcow2 images of a KVM host with a second KVM host in sync. > Unfortunately, shared storage is not available to me, only the > two KVM hosts. In principle, it would be sufficient for me - in case of > a failure of the first KVM host - to start the guests on the second > host by hand without restoring the images from the nightly backup > first. The question is, is glusterfs a sensible solution here or > should one better use other approaches e.g. DRBD. I have read > contradictory statements about this, many advise against using gluster > for qcow2 images, some report no problems at all. > Redhat uses gluster in its RHEV solution. Ovirt is the open source one. Thus gluster can be used with good results. You will need a 10G network for the gluster storage for higher performance and enable sharding on the shared volume. Two node setups are prone to split brain issues which may cause headaches. I am running such setups for years and encountered few splits which i was able to recover from. You need some fencing solution inplace to minimize such issues. I would expect higher performance from DRBD, though I am not aware of any GUI solution that simplifies its management. > > Cheers > Marcus > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at localguru.de Sat Jun 29 11:26:29 2019 From: lists at localguru.de (Marcus Schopen) Date: Sat, 29 Jun 2019 13:26:29 +0200 Subject: [Gluster-users] keep qcow2 images in sync Message-ID: <9ba4a4704029292f6f0725886ea09823343f2d13.camel@localguru.de> Hi, i am looking for a solution to keep qcow2 images of two KVM hosts sync. (about 20 guests, images size max 100 GB). Unfortunately, shared storage is not available to me. My first look was on Gluster, but after what I've read so far, there might be performance issues here. I don't need real HA. If the first host fails, it would be sufficient for me to manually start the KVM guests on the second host. Ultimately, I just want to avoid having to restore the images from the nightly backup and have a maximum "data loss" of 24 hours. Is drdb a alternative here? And what should I pay particular attention to when setting up, split brain etc.? Cheers Marcus From guy.boisvert at ingtegration.com Sat Jun 29 15:17:01 2019 From: guy.boisvert at ingtegration.com (Guy Boisvert) Date: Sat, 29 Jun 2019 11:17:01 -0400 Subject: [Gluster-users] keep qcow2 images in sync In-Reply-To: <9ba4a4704029292f6f0725886ea09823343f2d13.camel@localguru.de> References: <9ba4a4704029292f6f0725886ea09823343f2d13.camel@localguru.de> Message-ID: <2aaced84-4464-8338-d80f-710023868e5f@ingtegration.com> On 2019-06-29 7:26 a.m., Marcus Schopen wrote: > Hi, > > i am looking for a solution to keep qcow2 images of two KVM hosts sync. > (about 20 guests, images size max 100 GB). Unfortunately, shared > storage is not available to me. My first look was on Gluster, but after > what I've read so far, there might be performance issues here. I don't > need real HA. If the first host fails, it would be sufficient for me to > manually start the KVM guests on the second host. Ultimately, I just > want to avoid having to restore the images from the nightly backup and > have a maximum "data loss" of 24 hours. Is drdb a alternative here? And > what should I pay particular attention to when setting up, split brain > etc.? > > Cheers > Marcus Hi Marcus, ??? I use GlusterFS for VM image storage.? Most of them are RAW files and QCOW2 is fine too.? There are things to watch for performance.? In a setup we have for a client, the Windows File server (8TB + 5TB) is now screaming compared to the old setup: 120 MBps!? And the overtaxed mail server (CentOS 7 Linux VM running CommuniGate) on the same cluster is doing fine too (low IOwait considering the 1TB+ Public Folders, 140 Outlook users and a total of 2.5 TB of emails). ??? This setup is configured in three way replication.? The 3 servers are Supermicro using LSI RAID 6 arrays.? GlusterFS bricks are ontop thin provisioned LVM logical volumes.? Watch out for lvm / filesystem optimal alignment VS your physical RAID arrays. ??? Just keep in mind that with 2 servers, you are exposed to split brain issues.? The other way of doing it in the above setup would be 2+1 replication with an arbiter: It would saves space and still enforce quorum.? But many things i read are pointing to 3 way replication for simplicity and stability. Here are the infos about this Gluster cluster: 3 x Supermicro 3U servers with each: 1 x Xeon E5-2609 CPU 32 Gig RAM Integrated LSI SAS 2208 32 TB Array consisting of 8 x HGST NAS 7200 RPMs drives in RAID 6 2 x Integrated Intel X540-AT2 10 Gbps ethernet We are considering of adding SSD caching, still looking for the best way of doing it... (LVM Cache? / Gluster Tiering / etc) There are 2 KVM Servers running VMs with dual E5-2630v2 CPU, 96 Gigs RAM and dual 10 Gbps ether The next part will be to add OVirt. Be sure to simulate what you want to achieve before going in prod.? Test disconnection of a node, etc.? I like Gluster but there are situations that you must know what you're doing.? I paid a premium for the Red Hat Gluster Video self training and it is not a good course IMHO (more of an intro...), i didn't learn what was of interest to me (highly technical) and i taught i could ask questions: They direct you to tech support and you must have a contract.... They seem to want to direct you to their "design" team. I didn't find any "best practices", not real good guidelines and explanations... A real important thing to consider: The network part.? Look for bonding + dual attach switch and for ECM Routing + OSPF on the servers / network.? I'm still searching for the best network setup to get performance and full redundancy. Hope this helps. Guy -- Guy Boisvert, ing. IngTegration inc. http://www.ingtegration.com https://www.linkedin.com/pub/guy-boisvert/7/48/899/fr AVIS DE CONFIDENTIALITE : ce message peut contenir des renseignements confidentiels appartenant exclusivement a IngTegration Inc. ou a ses filiales. Si vous n'etes pas le destinataire indique ou prevu dans ce message (ou responsable de livrer ce message a la personne indiquee ou prevue) ou si vous pensez que ce message vous a ete adresse par erreur, vous ne pouvez pas utiliser ou reproduire ce message, ni le livrer a quelqu'un d'autre. Dans ce cas, vous devez le detruire et vous etes prie d'avertir l'expediteur en repondant au courriel. CONFIDENTIALITY NOTICE : Proprietary/Confidential Information belonging to IngTegration Inc. and its affiliates may be contained in this message. If you are not a recipient indicated or intended in this message (or responsible for delivery of this message to such person), or you think for any reason that this message may have been addressed to you in error, you may not use or copy or deliver this message to anyone else. In such case, you should destroy this message and are asked to notify the sender by reply email. From mailinglists at lucassen.org Sun Jun 30 11:59:51 2019 From: mailinglists at lucassen.org (richard lucassen) Date: Sun, 30 Jun 2019 13:59:51 +0200 Subject: [Gluster-users] very poor performance on Debian Buster In-Reply-To: <20190627225740.b74f7768a595ba861568c065@lucassen.org> References: <2h1nisnfpybdfnbr5sti7d8n.1561630008584@email.android.com> <20190627225740.b74f7768a595ba861568c065@lucassen.org> Message-ID: <20190630135951.66b39be769d0e4a2849bec1a@lucassen.org> On Thu, 27 Jun 2019 22:57:40 +0200 richard lucassen wrote: > On Thu, 27 Jun 2019 13:06:48 +0300 > Strahil wrote: > > > I forgot to ask what kind of storage do you have on your gluster > > machines. Is it rotational SATA, SAS or NVMe ? > > I'm a bit busy now, I'll have a look at all your suggestions when I > have some time. Hope I'll have some time during the weekend. > > BTW, it is all hardware raid1 with 3.4GB SSD's. The server is a > beast, a Dell R630 I'll come back to the issue, I'm too busy at the moment. R. -- richard lucassen http://contact.xaq.nl/ From sdeepugd at gmail.com Mon Jun 3 21:51:48 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Mon, 03 Jun 2019 21:51:48 -0000 Subject: [Gluster-users] Glusterd2 for production Message-ID: Hi Users Is it safe to use glusterd2 for production? -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Tue Jun 4 11:40:28 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Tue, 04 Jun 2019 11:40:28 -0000 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Hi As discussed I have upgraded gluster from 4.1 to 6.2 version. But the Geo replication failed to start. Stays in faulty state On Fri, May 31, 2019, 5:32 PM deepu srinivasan wrote: > Checked the data. It remains in 2708. No progress. > > On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> That means it could be working and the defunct process might be some old >> zombie one. Could you check, that data progress ? >> >> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan >> wrote: >> >>> Hi >>> When i change the rsync option the rsync process doesnt seem to start . >>> Only a defunt process is listed in ps aux. Only when i set rsync option to >>> " " and restart all the process the rsync process is listed in ps aux. >>> >>> >>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >>> khiremat at redhat.com> wrote: >>> >>>> Yes, rsync config option should have fixed this issue. >>>> >>>> Could you share the output of the following? >>>> >>>> 1. gluster volume geo-replication :: >>>> config rsync-options >>>> 2. ps -ef | grep rsync >>>> >>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan >>>> wrote: >>>> >>>>> Done. >>>>> We got the following result . >>>>> >>>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>>> failed: No such file or directory (2)", 128 >>>>> >>>>> seems like a file is missing ? >>>>> >>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>>> khiremat at redhat.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Could you take the strace with with more string size? The argument >>>>>> strings are truncated. >>>>>> >>>>>> strace -s 500 -ttt -T -p >>>>>> >>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan >>>>>> wrote: >>>>>> >>>>>>> Hi Kotresh >>>>>>> The above-mentioned work around did not work properly. >>>>>>> >>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Kotresh >>>>>>>> We have tried the above-mentioned rsync option and we are planning >>>>>>>> to have the version upgrade to 6.0. >>>>>>>> >>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>>>>> khiremat at redhat.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> This looks like the hang because stderr buffer filled up with >>>>>>>>> errors messages and no one reading it. >>>>>>>>> I think this issue is fixed in latest releases. As a workaround, >>>>>>>>> you can do following and check if it works. >>>>>>>>> >>>>>>>>> Prerequisite: >>>>>>>>> rsync version should be > 3.1.0 >>>>>>>>> >>>>>>>>> Workaround: >>>>>>>>> gluster volume geo-replication :: >>>>>>>>> config rsync-options "--ignore-missing-args" >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Kotresh HR >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi >>>>>>>>>> We were evaluating Gluster geo Replication between two DCs one is >>>>>>>>>> in US west and one is in US east. We took multiple trials for different >>>>>>>>>> file size. >>>>>>>>>> The Geo Replication tends to stop replicating but while checking >>>>>>>>>> the status it appears to be in Active state. But the slave volume did not >>>>>>>>>> increase in size. >>>>>>>>>> So we have restarted the geo-replication session and checked the >>>>>>>>>> status. The status was in an active state and it was in History Crawl for a >>>>>>>>>> long time. We have enabled the DEBUG mode in logging and checked for any >>>>>>>>>> error. >>>>>>>>>> There was around 2000 file appeared for syncing candidate. The >>>>>>>>>> Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>>>> this problem? Is there anyway to debug it? >>>>>>>>>> >>>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>>> it displays something like this >>>>>>>>>> >>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> We are using the below specs >>>>>>>>>> >>>>>>>>>> Gluster version - 4.1.7 >>>>>>>>>> Sync mode - rsync >>>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Thanks and Regards, >>>>>>>>> Kotresh H R >>>>>>>>> >>>>>>>> >>>>>> >>>>>> -- >>>>>> Thanks and Regards, >>>>>> Kotresh H R >>>>>> >>>>> >>>> >>>> -- >>>> Thanks and Regards, >>>> Kotresh H R >>>> >>> >> >> -- >> Thanks and Regards, >> Kotresh H R >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Tue Jun 4 11:54:03 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Tue, 04 Jun 2019 11:54:03 -0000 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Hi Kortesh Please find the logs of the above error *Master log snippet* > [2019-06-04 11:52:09.254731] I [resource(worker > /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing > SSH connection between master and slave... > [2019-06-04 11:52:09.308923] D [repce(worker > /home/sas/gluster/data/code-misc):196:push] RepceClient: call > 89724:139652759443264:1559649129.31 __repce_version__() ... > [2019-06-04 11:52:09.602792] E [syncdutils(worker > /home/sas/gluster/data/code-misc):311:log_raise_exception] : > connection to peer is broken > [2019-06-04 11:52:09.603312] E [syncdutils(worker > /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error > cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S > /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock > sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ > 192.168.185.107::code-misc --master-node 192.168.185.106 --master-node-id > 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick > /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- > id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 > --slave-log-level DEBUG --slave-gluster-log-level INFO > --slave-gluster-command-dir /usr/sbin error=1 > [2019-06-04 11:52:09.614996] I [repce(agent > /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating > on reaching EOF. > [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor: > worker(/home/sas/gluster/data/code-misc) connected > [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor: > worker died in startup phase brick=/home/sas/gluster/data/code-misc > [2019-06-04 11:52:09.619391] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status > Change status=Faulty > *Slave log snippet* > [2019-06-04 11:50:09.782668] E [syncdutils(slave > 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: > /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) > [2019-06-04 11:50:11.188167] W [gsyncd(slave > 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : Session > config file not exists, using the default config > path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf > [2019-06-04 11:50:11.201070] I [resource(slave > 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER: > Mounting gluster volume locally... > [2019-06-04 11:50:11.271231] E [resource(slave > 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] > MountbrokerMounter: glusterd answered mnt= > [2019-06-04 11:50:11.271998] E [syncdutils(slave > 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: > command returned error cmd=/usr/sbin/gluster --remote-host=localhost > system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO > log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log > volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 > [2019-06-04 11:50:11.272113] E [syncdutils(slave > 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: > /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan wrote: > Hi > As discussed I have upgraded gluster from 4.1 to 6.2 version. But the Geo > replication failed to start. > Stays in faulty state > > On Fri, May 31, 2019, 5:32 PM deepu srinivasan wrote: > >> Checked the data. It remains in 2708. No progress. >> >> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < >> khiremat at redhat.com> wrote: >> >>> That means it could be working and the defunct process might be some old >>> zombie one. Could you check, that data progress ? >>> >>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan >>> wrote: >>> >>>> Hi >>>> When i change the rsync option the rsync process doesnt seem to start . >>>> Only a defunt process is listed in ps aux. Only when i set rsync option to >>>> " " and restart all the process the rsync process is listed in ps aux. >>>> >>>> >>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >>>> khiremat at redhat.com> wrote: >>>> >>>>> Yes, rsync config option should have fixed this issue. >>>>> >>>>> Could you share the output of the following? >>>>> >>>>> 1. gluster volume geo-replication :: >>>>> config rsync-options >>>>> 2. ps -ef | grep rsync >>>>> >>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan >>>>> wrote: >>>>> >>>>>> Done. >>>>>> We got the following result . >>>>>> >>>>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>>>> failed: No such file or directory (2)", 128 >>>>>> >>>>>> seems like a file is missing ? >>>>>> >>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>>>> khiremat at redhat.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Could you take the strace with with more string size? The argument >>>>>>> strings are truncated. >>>>>>> >>>>>>> strace -s 500 -ttt -T -p >>>>>>> >>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Kotresh >>>>>>>> The above-mentioned work around did not work properly. >>>>>>>> >>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan < >>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Kotresh >>>>>>>>> We have tried the above-mentioned rsync option and we are planning >>>>>>>>> to have the version upgrade to 6.0. >>>>>>>>> >>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> This looks like the hang because stderr buffer filled up with >>>>>>>>>> errors messages and no one reading it. >>>>>>>>>> I think this issue is fixed in latest releases. As a workaround, >>>>>>>>>> you can do following and check if it works. >>>>>>>>>> >>>>>>>>>> Prerequisite: >>>>>>>>>> rsync version should be > 3.1.0 >>>>>>>>>> >>>>>>>>>> Workaround: >>>>>>>>>> gluster volume geo-replication >>>>>>>>>> :: config rsync-options "--ignore-missing- >>>>>>>>>> args" >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Kotresh HR >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi >>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs one >>>>>>>>>>> is in US west and one is in US east. We took multiple trials for different >>>>>>>>>>> file size. >>>>>>>>>>> The Geo Replication tends to stop replicating but while checking >>>>>>>>>>> the status it appears to be in Active state. But the slave volume did not >>>>>>>>>>> increase in size. >>>>>>>>>>> So we have restarted the geo-replication session and checked the >>>>>>>>>>> status. The status was in an active state and it was in History Crawl for a >>>>>>>>>>> long time. We have enabled the DEBUG mode in logging and checked for any >>>>>>>>>>> error. >>>>>>>>>>> There was around 2000 file appeared for syncing candidate. The >>>>>>>>>>> Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>>>>> this problem? Is there anyway to debug it? >>>>>>>>>>> >>>>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>>>> it displays something like this >>>>>>>>>>> >>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> We are using the below specs >>>>>>>>>>> >>>>>>>>>>> Gluster version - 4.1.7 >>>>>>>>>>> Sync mode - rsync >>>>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Thanks and Regards, >>>>>>>>>> Kotresh H R >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Thanks and Regards, >>>>>>> Kotresh H R >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Thanks and Regards, >>>>> Kotresh H R >>>>> >>>> >>> >>> -- >>> Thanks and Regards, >>> Kotresh H R >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Tue Jun 4 12:16:20 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Tue, 04 Jun 2019 12:16:20 -0000 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Have already added the path in bashrc . Still in faulty state On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar < khiremat at redhat.com> wrote: > could you please try adding /usr/sbin to $PATH for user 'sas'? If it's > bash, add 'export PATH=/usr/sbin:$PATH' in > /home/sas/.bashrc > > On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan > wrote: > >> Hi Kortesh >> Please find the logs of the above error >> *Master log snippet* >> >>> [2019-06-04 11:52:09.254731] I [resource(worker >>> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing >>> SSH connection between master and slave... >>> [2019-06-04 11:52:09.308923] D [repce(worker >>> /home/sas/gluster/data/code-misc):196:push] RepceClient: call >>> 89724:139652759443264:1559649129.31 __repce_version__() ... >>> [2019-06-04 11:52:09.602792] E [syncdutils(worker >>> /home/sas/gluster/data/code-misc):311:log_raise_exception] : >>> connection to peer is broken >>> [2019-06-04 11:52:09.603312] E [syncdutils(worker >>> /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error >>> cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >>> /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S >>> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock >>> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ >>> 192.168.185.107::code-misc --master-node 192.168.185.106 >>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick >>> /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- >>> id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 >>> --slave-log-level DEBUG --slave-gluster-log-level INFO >>> --slave-gluster-command-dir /usr/sbin error=1 >>> [2019-06-04 11:52:09.614996] I [repce(agent >>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating >>> on reaching EOF. >>> [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor: >>> worker(/home/sas/gluster/data/code-misc) connected >>> [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor: >>> worker died in startup phase brick=/home/sas/gluster/data/code-misc >>> [2019-06-04 11:52:09.619391] I >>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status >>> Change status=Faulty >>> >> >> *Slave log snippet* >> >>> [2019-06-04 11:50:09.782668] E [syncdutils(slave >>> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: >>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >>> [2019-06-04 11:50:11.188167] W [gsyncd(slave >>> 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : >>> Session config file not exists, using the default config >>> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf >>> [2019-06-04 11:50:11.201070] I [resource(slave >>> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER: >>> Mounting gluster volume locally... >>> [2019-06-04 11:50:11.271231] E [resource(slave >>> 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] >>> MountbrokerMounter: glusterd answered mnt= >>> [2019-06-04 11:50:11.271998] E [syncdutils(slave >>> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: >>> command returned error cmd=/usr/sbin/gluster --remote-host=localhost >>> system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO >>> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log >>> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >>> [2019-06-04 11:50:11.272113] E [syncdutils(slave >>> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: >>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >> >> >> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan >> wrote: >> >>> Hi >>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But the >>> Geo replication failed to start. >>> Stays in faulty state >>> >>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan >>> wrote: >>> >>>> Checked the data. It remains in 2708. No progress. >>>> >>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < >>>> khiremat at redhat.com> wrote: >>>> >>>>> That means it could be working and the defunct process might be some >>>>> old zombie one. Could you check, that data progress ? >>>>> >>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan >>>>> wrote: >>>>> >>>>>> Hi >>>>>> When i change the rsync option the rsync process doesnt seem to start >>>>>> . Only a defunt process is listed in ps aux. Only when i set rsync option >>>>>> to " " and restart all the process the rsync process is listed in ps aux. >>>>>> >>>>>> >>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >>>>>> khiremat at redhat.com> wrote: >>>>>> >>>>>>> Yes, rsync config option should have fixed this issue. >>>>>>> >>>>>>> Could you share the output of the following? >>>>>>> >>>>>>> 1. gluster volume geo-replication >>>>>>> :: config rsync-options >>>>>>> 2. ps -ef | grep rsync >>>>>>> >>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan >>>>>>> wrote: >>>>>>> >>>>>>>> Done. >>>>>>>> We got the following result . >>>>>>>> >>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>>>>>> failed: No such file or directory (2)", 128 >>>>>>>> >>>>>>>> seems like a file is missing ? >>>>>>>> >>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>>>>>> khiremat at redhat.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Could you take the strace with with more string size? The argument >>>>>>>>> strings are truncated. >>>>>>>>> >>>>>>>>> strace -s 500 -ttt -T -p >>>>>>>>> >>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan < >>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Kotresh >>>>>>>>>> The above-mentioned work around did not work properly. >>>>>>>>>> >>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan < >>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Kotresh >>>>>>>>>>> We have tried the above-mentioned rsync option and we are >>>>>>>>>>> planning to have the version upgrade to 6.0. >>>>>>>>>>> >>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> This looks like the hang because stderr buffer filled up with >>>>>>>>>>>> errors messages and no one reading it. >>>>>>>>>>>> I think this issue is fixed in latest releases. As a >>>>>>>>>>>> workaround, you can do following and check if it works. >>>>>>>>>>>> >>>>>>>>>>>> Prerequisite: >>>>>>>>>>>> rsync version should be > 3.1.0 >>>>>>>>>>>> >>>>>>>>>>>> Workaround: >>>>>>>>>>>> gluster volume geo-replication >>>>>>>>>>>> :: config rsync-options "--ignore-missing- >>>>>>>>>>>> args" >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Kotresh HR >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi >>>>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs one >>>>>>>>>>>>> is in US west and one is in US east. We took multiple trials for different >>>>>>>>>>>>> file size. >>>>>>>>>>>>> The Geo Replication tends to stop replicating but while >>>>>>>>>>>>> checking the status it appears to be in Active state. But the slave volume >>>>>>>>>>>>> did not increase in size. >>>>>>>>>>>>> So we have restarted the geo-replication session and checked >>>>>>>>>>>>> the status. The status was in an active state and it was in History Crawl >>>>>>>>>>>>> for a long time. We have enabled the DEBUG mode in logging and checked for >>>>>>>>>>>>> any error. >>>>>>>>>>>>> There was around 2000 file appeared for syncing candidate. The >>>>>>>>>>>>> Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>>>>>>> this problem? Is there anyway to debug it? >>>>>>>>>>>>> >>>>>>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>>>>>> it displays something like this >>>>>>>>>>>>> >>>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> We are using the below specs >>>>>>>>>>>>> >>>>>>>>>>>>> Gluster version - 4.1.7 >>>>>>>>>>>>> Sync mode - rsync >>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>> Kotresh H R >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Thanks and Regards, >>>>>>>>> Kotresh H R >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Thanks and Regards, >>>>>>> Kotresh H R >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Thanks and Regards, >>>>> Kotresh H R >>>>> >>>> > > -- > Thanks and Regards, > Kotresh H R > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Tue Jun 4 19:36:14 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Tue, 04 Jun 2019 19:36:14 -0000 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Thankyou Kotresh On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar < khiremat at redhat.com> wrote: > Ccing Sunny, who was investing similar issue. > > On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan > wrote: > >> Have already added the path in bashrc . Still in faulty state >> >> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar < >> khiremat at redhat.com> wrote: >> >>> could you please try adding /usr/sbin to $PATH for user 'sas'? If it's >>> bash, add 'export PATH=/usr/sbin:$PATH' in >>> /home/sas/.bashrc >>> >>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan >>> wrote: >>> >>>> Hi Kortesh >>>> Please find the logs of the above error >>>> *Master log snippet* >>>> >>>>> [2019-06-04 11:52:09.254731] I [resource(worker >>>>> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing >>>>> SSH connection between master and slave... >>>>> [2019-06-04 11:52:09.308923] D [repce(worker >>>>> /home/sas/gluster/data/code-misc):196:push] RepceClient: call >>>>> 89724:139652759443264:1559649129.31 __repce_version__() ... >>>>> [2019-06-04 11:52:09.602792] E [syncdutils(worker >>>>> /home/sas/gluster/data/code-misc):311:log_raise_exception] : >>>>> connection to peer is broken >>>>> [2019-06-04 11:52:09.603312] E [syncdutils(worker >>>>> /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error >>>>> cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >>>>> /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S >>>>> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock >>>>> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ >>>>> 192.168.185.107::code-misc --master-node 192.168.185.106 >>>>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick >>>>> /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- >>>>> id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 >>>>> --slave-log-level DEBUG --slave-gluster-log-level INFO >>>>> --slave-gluster-command-dir /usr/sbin error=1 >>>>> [2019-06-04 11:52:09.614996] I [repce(agent >>>>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating >>>>> on reaching EOF. >>>>> [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] >>>>> Monitor: worker(/home/sas/gluster/data/code-misc) connected >>>>> [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] >>>>> Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc >>>>> [2019-06-04 11:52:09.619391] I >>>>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status >>>>> Change status=Faulty >>>>> >>>> >>>> *Slave log snippet* >>>> >>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave >>>>> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: >>>>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave >>>>> 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : >>>>> Session config file not exists, using the default config >>>>> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf >>>>> [2019-06-04 11:50:11.201070] I [resource(slave >>>>> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] >>>>> GLUSTER: Mounting gluster volume locally... >>>>> [2019-06-04 11:50:11.271231] E [resource(slave >>>>> 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] >>>>> MountbrokerMounter: glusterd answered mnt= >>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave >>>>> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: >>>>> command returned error cmd=/usr/sbin/gluster --remote-host=localhost >>>>> system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO >>>>> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log >>>>> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave >>>>> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: >>>>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >>>> >>>> >>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan >>>> wrote: >>>> >>>>> Hi >>>>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But the >>>>> Geo replication failed to start. >>>>> Stays in faulty state >>>>> >>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan >>>>> wrote: >>>>> >>>>>> Checked the data. It remains in 2708. No progress. >>>>>> >>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < >>>>>> khiremat at redhat.com> wrote: >>>>>> >>>>>>> That means it could be working and the defunct process might be some >>>>>>> old zombie one. Could you check, that data progress ? >>>>>>> >>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan >>>>>>> wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> When i change the rsync option the rsync process doesnt seem to >>>>>>>> start . Only a defunt process is listed in ps aux. Only when i set rsync >>>>>>>> option to " " and restart all the process the rsync process is listed in ps >>>>>>>> aux. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >>>>>>>> khiremat at redhat.com> wrote: >>>>>>>> >>>>>>>>> Yes, rsync config option should have fixed this issue. >>>>>>>>> >>>>>>>>> Could you share the output of the following? >>>>>>>>> >>>>>>>>> 1. gluster volume geo-replication >>>>>>>>> :: config rsync-options >>>>>>>>> 2. ps -ef | grep rsync >>>>>>>>> >>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan < >>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Done. >>>>>>>>>> We got the following result . >>>>>>>>>> >>>>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>>>>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>>>>>>>> failed: No such file or directory (2)", 128 >>>>>>>>>> >>>>>>>>>> seems like a file is missing ? >>>>>>>>>> >>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Could you take the strace with with more string size? The >>>>>>>>>>> argument strings are truncated. >>>>>>>>>>> >>>>>>>>>>> strace -s 500 -ttt -T -p >>>>>>>>>>> >>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan < >>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Kotresh >>>>>>>>>>>> The above-mentioned work around did not work properly. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan < >>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Kotresh >>>>>>>>>>>>> We have tried the above-mentioned rsync option and we are >>>>>>>>>>>>> planning to have the version upgrade to 6.0. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>>>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> This looks like the hang because stderr buffer filled up with >>>>>>>>>>>>>> errors messages and no one reading it. >>>>>>>>>>>>>> I think this issue is fixed in latest releases. As a >>>>>>>>>>>>>> workaround, you can do following and check if it works. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Prerequisite: >>>>>>>>>>>>>> rsync version should be > 3.1.0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Workaround: >>>>>>>>>>>>>> gluster volume geo-replication >>>>>>>>>>>>>> :: config rsync-options "--ignore- >>>>>>>>>>>>>> missing-args" >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Kotresh HR >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi >>>>>>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs >>>>>>>>>>>>>>> one is in US west and one is in US east. We took multiple trials for >>>>>>>>>>>>>>> different file size. >>>>>>>>>>>>>>> The Geo Replication tends to stop replicating but while >>>>>>>>>>>>>>> checking the status it appears to be in Active state. But the slave volume >>>>>>>>>>>>>>> did not increase in size. >>>>>>>>>>>>>>> So we have restarted the geo-replication session and checked >>>>>>>>>>>>>>> the status. The status was in an active state and it was in History Crawl >>>>>>>>>>>>>>> for a long time. We have enabled the DEBUG mode in logging and checked for >>>>>>>>>>>>>>> any error. >>>>>>>>>>>>>>> There was around 2000 file appeared for syncing candidate. >>>>>>>>>>>>>>> The Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>>>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>>>>>>>>> this problem? Is there anyway to debug it? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>>>>>>>> it displays something like this >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We are using the below specs >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Gluster version - 4.1.7 >>>>>>>>>>>>>>> Sync mode - rsync >>>>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>>> Kotresh H R >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Thanks and Regards, >>>>>>>>>>> Kotresh H R >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Thanks and Regards, >>>>>>>>> Kotresh H R >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Thanks and Regards, >>>>>>> Kotresh H R >>>>>>> >>>>>> >>> >>> -- >>> Thanks and Regards, >>> Kotresh H R >>> >> > > -- > Thanks and Regards, > Kotresh H R > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Wed Jun 5 08:58:28 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Wed, 05 Jun 2019 08:58:28 -0000 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Hi Kotresh, Sunny Found this log in the slave machine. > [2019-06-05 08:49:10.632583] I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req > > The message "I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req" repeated 2 times between [2019-06-05 08:49:10.632583] > and [2019-06-05 08:49:10.670863] > > The message "I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req" repeated 34 times between [2019-06-05 08:48:41.005398] and > [2019-06-05 08:50:37.254063] > > The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 34 times between > [2019-06-05 08:48:41.005434] and [2019-06-05 08:50:37.254079] > > The message "W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory]" repeated 34 times between > [2019-06-05 08:48:41.005444] and [2019-06-05 08:50:37.254080] > > [2019-06-05 08:50:46.361347] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > > [2019-06-05 08:50:46.361384] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > > [2019-06-05 08:50:46.361419] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > > The message "I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req" repeated 33 times between [2019-06-05 08:50:46.361347] and > [2019-06-05 08:52:34.019741] > > The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 33 times between > [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] > > The message "W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory]" repeated 33 times between > [2019-06-05 08:50:46.361419] and [2019-06-05 08:52:34.019758] > > [2019-06-05 08:52:44.426839] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > > [2019-06-05 08:52:44.426886] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > > [2019-06-05 08:52:44.426896] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > On Wed, Jun 5, 2019 at 1:06 AM deepu srinivasan wrote: > Thankyou Kotresh > > On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> Ccing Sunny, who was investing similar issue. >> >> On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan >> wrote: >> >>> Have already added the path in bashrc . Still in faulty state >>> >>> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar < >>> khiremat at redhat.com> wrote: >>> >>>> could you please try adding /usr/sbin to $PATH for user 'sas'? If it's >>>> bash, add 'export PATH=/usr/sbin:$PATH' in >>>> /home/sas/.bashrc >>>> >>>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan >>>> wrote: >>>> >>>>> Hi Kortesh >>>>> Please find the logs of the above error >>>>> *Master log snippet* >>>>> >>>>>> [2019-06-04 11:52:09.254731] I [resource(worker >>>>>> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing >>>>>> SSH connection between master and slave... >>>>>> [2019-06-04 11:52:09.308923] D [repce(worker >>>>>> /home/sas/gluster/data/code-misc):196:push] RepceClient: call >>>>>> 89724:139652759443264:1559649129.31 __repce_version__() ... >>>>>> [2019-06-04 11:52:09.602792] E [syncdutils(worker >>>>>> /home/sas/gluster/data/code-misc):311:log_raise_exception] : >>>>>> connection to peer is broken >>>>>> [2019-06-04 11:52:09.603312] E [syncdutils(worker >>>>>> /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error >>>>>> cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >>>>>> /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S >>>>>> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock >>>>>> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc >>>>>> sas@ 192.168.185.107::code-misc --master-node 192.168.185.106 >>>>>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick >>>>>> /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- >>>>>> id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 >>>>>> --slave-log-level DEBUG --slave-gluster-log-level INFO >>>>>> --slave-gluster-command-dir /usr/sbin error=1 >>>>>> [2019-06-04 11:52:09.614996] I [repce(agent >>>>>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating >>>>>> on reaching EOF. >>>>>> [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] >>>>>> Monitor: worker(/home/sas/gluster/data/code-misc) connected >>>>>> [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] >>>>>> Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc >>>>>> [2019-06-04 11:52:09.619391] I >>>>>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status >>>>>> Change status=Faulty >>>>>> >>>>> >>>>> *Slave log snippet* >>>>> >>>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave >>>>>> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: >>>>>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >>>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave >>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : >>>>>> Session config file not exists, using the default config >>>>>> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf >>>>>> [2019-06-04 11:50:11.201070] I [resource(slave >>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] >>>>>> GLUSTER: Mounting gluster volume locally... >>>>>> [2019-06-04 11:50:11.271231] E [resource(slave >>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] >>>>>> MountbrokerMounter: glusterd answered mnt= >>>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave >>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: >>>>>> command returned error cmd=/usr/sbin/gluster --remote-host=localhost >>>>>> system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO >>>>>> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log >>>>>> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >>>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave >>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: >>>>>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >>>>> >>>>> >>>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan >>>>> wrote: >>>>> >>>>>> Hi >>>>>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But the >>>>>> Geo replication failed to start. >>>>>> Stays in faulty state >>>>>> >>>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan >>>>>> wrote: >>>>>> >>>>>>> Checked the data. It remains in 2708. No progress. >>>>>>> >>>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < >>>>>>> khiremat at redhat.com> wrote: >>>>>>> >>>>>>>> That means it could be working and the defunct process might be >>>>>>>> some old zombie one. Could you check, that data progress ? >>>>>>>> >>>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan < >>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi >>>>>>>>> When i change the rsync option the rsync process doesnt seem to >>>>>>>>> start . Only a defunt process is listed in ps aux. Only when i set rsync >>>>>>>>> option to " " and restart all the process the rsync process is listed in ps >>>>>>>>> aux. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> Yes, rsync config option should have fixed this issue. >>>>>>>>>> >>>>>>>>>> Could you share the output of the following? >>>>>>>>>> >>>>>>>>>> 1. gluster volume geo-replication >>>>>>>>>> :: config rsync-options >>>>>>>>>> 2. ps -ef | grep rsync >>>>>>>>>> >>>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan < >>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Done. >>>>>>>>>>> We got the following result . >>>>>>>>>>> >>>>>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>>>>>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>>>>>>>>> failed: No such file or directory (2)", 128 >>>>>>>>>>> >>>>>>>>>>> seems like a file is missing ? >>>>>>>>>>> >>>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Could you take the strace with with more string size? The >>>>>>>>>>>> argument strings are truncated. >>>>>>>>>>>> >>>>>>>>>>>> strace -s 500 -ttt -T -p >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan < >>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Kotresh >>>>>>>>>>>>> The above-mentioned work around did not work properly. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan < >>>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Kotresh >>>>>>>>>>>>>> We have tried the above-mentioned rsync option and we are >>>>>>>>>>>>>> planning to have the version upgrade to 6.0. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This looks like the hang because stderr buffer filled up >>>>>>>>>>>>>>> with errors messages and no one reading it. >>>>>>>>>>>>>>> I think this issue is fixed in latest releases. As a >>>>>>>>>>>>>>> workaround, you can do following and check if it works. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Prerequisite: >>>>>>>>>>>>>>> rsync version should be > 3.1.0 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Workaround: >>>>>>>>>>>>>>> gluster volume geo-replication >>>>>>>>>>>>>>> :: config rsync-options "--ignore- >>>>>>>>>>>>>>> missing-args" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Kotresh HR >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi >>>>>>>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs >>>>>>>>>>>>>>>> one is in US west and one is in US east. We took multiple trials for >>>>>>>>>>>>>>>> different file size. >>>>>>>>>>>>>>>> The Geo Replication tends to stop replicating but while >>>>>>>>>>>>>>>> checking the status it appears to be in Active state. But the slave volume >>>>>>>>>>>>>>>> did not increase in size. >>>>>>>>>>>>>>>> So we have restarted the geo-replication session and >>>>>>>>>>>>>>>> checked the status. The status was in an active state and it was in History >>>>>>>>>>>>>>>> Crawl for a long time. We have enabled the DEBUG mode in logging and >>>>>>>>>>>>>>>> checked for any error. >>>>>>>>>>>>>>>> There was around 2000 file appeared for syncing candidate. >>>>>>>>>>>>>>>> The Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>>>>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>>>>>>>>>> this problem? Is there anyway to debug it? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>>>>>>>>> it displays something like this >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We are using the below specs >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Gluster version - 4.1.7 >>>>>>>>>>>>>>>> Sync mode - rsync >>>>>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>>>> Kotresh H R >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>> Kotresh H R >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Thanks and Regards, >>>>>>>>>> Kotresh H R >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks and Regards, >>>>>>>> Kotresh H R >>>>>>>> >>>>>>> >>>> >>>> -- >>>> Thanks and Regards, >>>> Kotresh H R >>>> >>> >> >> -- >> Thanks and Regards, >> Kotresh H R >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Thu Jun 6 04:54:18 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Thu, 06 Jun 2019 04:54:18 -0000 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Hi Kotresh, Sunny I Have mailed the logs I found in one of the slave machines. Is there anything to do with permission? Please help. On Wed, Jun 5, 2019 at 2:28 PM deepu srinivasan wrote: > Hi Kotresh, Sunny > Found this log in the slave machine. > >> [2019-06-05 08:49:10.632583] I [MSGID: 106488] >> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: >> Received get vol req >> >> The message "I [MSGID: 106488] >> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: >> Received get vol req" repeated 2 times between [2019-06-05 08:49:10.632583] >> and [2019-06-05 08:49:10.670863] >> >> The message "I [MSGID: 106496] >> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received >> mount req" repeated 34 times between [2019-06-05 08:48:41.005398] and >> [2019-06-05 08:50:37.254063] >> >> The message "E [MSGID: 106061] >> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option >> mountbroker-root' missing in glusterd vol file" repeated 34 times between >> [2019-06-05 08:48:41.005434] and [2019-06-05 08:50:37.254079] >> >> The message "W [MSGID: 106176] >> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful >> mount request [No such file or directory]" repeated 34 times between >> [2019-06-05 08:48:41.005444] and [2019-06-05 08:50:37.254080] >> >> [2019-06-05 08:50:46.361347] I [MSGID: 106496] >> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received >> mount req >> >> [2019-06-05 08:50:46.361384] E [MSGID: 106061] >> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option >> mountbroker-root' missing in glusterd vol file >> >> [2019-06-05 08:50:46.361419] W [MSGID: 106176] >> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful >> mount request [No such file or directory] >> >> The message "I [MSGID: 106496] >> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received >> mount req" repeated 33 times between [2019-06-05 08:50:46.361347] and >> [2019-06-05 08:52:34.019741] >> >> The message "E [MSGID: 106061] >> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option >> mountbroker-root' missing in glusterd vol file" repeated 33 times between >> [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] >> >> The message "W [MSGID: 106176] >> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful >> mount request [No such file or directory]" repeated 33 times between >> [2019-06-05 08:50:46.361419] and [2019-06-05 08:52:34.019758] >> >> [2019-06-05 08:52:44.426839] I [MSGID: 106496] >> [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received >> mount req >> >> [2019-06-05 08:52:44.426886] E [MSGID: 106061] >> [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option >> mountbroker-root' missing in glusterd vol file >> >> [2019-06-05 08:52:44.426896] W [MSGID: 106176] >> [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful >> mount request [No such file or directory] >> > > On Wed, Jun 5, 2019 at 1:06 AM deepu srinivasan > wrote: > >> Thankyou Kotresh >> >> On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar < >> khiremat at redhat.com> wrote: >> >>> Ccing Sunny, who was investing similar issue. >>> >>> On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan >>> wrote: >>> >>>> Have already added the path in bashrc . Still in faulty state >>>> >>>> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar < >>>> khiremat at redhat.com> wrote: >>>> >>>>> could you please try adding /usr/sbin to $PATH for user 'sas'? If it's >>>>> bash, add 'export PATH=/usr/sbin:$PATH' in >>>>> /home/sas/.bashrc >>>>> >>>>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan >>>>> wrote: >>>>> >>>>>> Hi Kortesh >>>>>> Please find the logs of the above error >>>>>> *Master log snippet* >>>>>> >>>>>>> [2019-06-04 11:52:09.254731] I [resource(worker >>>>>>> /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing >>>>>>> SSH connection between master and slave... >>>>>>> [2019-06-04 11:52:09.308923] D [repce(worker >>>>>>> /home/sas/gluster/data/code-misc):196:push] RepceClient: call >>>>>>> 89724:139652759443264:1559649129.31 __repce_version__() ... >>>>>>> [2019-06-04 11:52:09.602792] E [syncdutils(worker >>>>>>> /home/sas/gluster/data/code-misc):311:log_raise_exception] : >>>>>>> connection to peer is broken >>>>>>> [2019-06-04 11:52:09.603312] E [syncdutils(worker >>>>>>> /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error >>>>>>> cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >>>>>>> /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S >>>>>>> /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock >>>>>>> sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc >>>>>>> sas@ 192.168.185.107::code-misc --master-node 192.168.185.106 >>>>>>> --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick >>>>>>> /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- >>>>>>> id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 >>>>>>> --slave-log-level DEBUG --slave-gluster-log-level INFO >>>>>>> --slave-gluster-command-dir /usr/sbin error=1 >>>>>>> [2019-06-04 11:52:09.614996] I [repce(agent >>>>>>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating >>>>>>> on reaching EOF. >>>>>>> [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] >>>>>>> Monitor: worker(/home/sas/gluster/data/code-misc) connected >>>>>>> [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] >>>>>>> Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc >>>>>>> [2019-06-04 11:52:09.619391] I >>>>>>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status >>>>>>> Change status=Faulty >>>>>>> >>>>>> >>>>>> *Slave log snippet* >>>>>> >>>>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave >>>>>>> 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: >>>>>>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >>>>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave >>>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : >>>>>>> Session config file not exists, using the default config >>>>>>> path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf >>>>>>> [2019-06-04 11:50:11.201070] I [resource(slave >>>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] >>>>>>> GLUSTER: Mounting gluster volume locally... >>>>>>> [2019-06-04 11:50:11.271231] E [resource(slave >>>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] >>>>>>> MountbrokerMounter: glusterd answered mnt= >>>>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave >>>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: >>>>>>> command returned error cmd=/usr/sbin/gluster --remote-host=localhost >>>>>>> system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO >>>>>>> log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log >>>>>>> volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >>>>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave >>>>>>> 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: >>>>>>> /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >>>>>> >>>>>> >>>>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan >>>>>> wrote: >>>>>> >>>>>>> Hi >>>>>>> As discussed I have upgraded gluster from 4.1 to 6.2 version. But >>>>>>> the Geo replication failed to start. >>>>>>> Stays in faulty state >>>>>>> >>>>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan >>>>>>> wrote: >>>>>>> >>>>>>>> Checked the data. It remains in 2708. No progress. >>>>>>>> >>>>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < >>>>>>>> khiremat at redhat.com> wrote: >>>>>>>> >>>>>>>>> That means it could be working and the defunct process might be >>>>>>>>> some old zombie one. Could you check, that data progress ? >>>>>>>>> >>>>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan < >>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi >>>>>>>>>> When i change the rsync option the rsync process doesnt seem to >>>>>>>>>> start . Only a defunt process is listed in ps aux. Only when i set rsync >>>>>>>>>> option to " " and restart all the process the rsync process is listed in ps >>>>>>>>>> aux. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >>>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> Yes, rsync config option should have fixed this issue. >>>>>>>>>>> >>>>>>>>>>> Could you share the output of the following? >>>>>>>>>>> >>>>>>>>>>> 1. gluster volume geo-replication >>>>>>>>>>> :: config rsync-options >>>>>>>>>>> 2. ps -ef | grep rsync >>>>>>>>>>> >>>>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan < >>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Done. >>>>>>>>>>>> We got the following result . >>>>>>>>>>>> >>>>>>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>>>>>>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>>>>>>>>>> failed: No such file or directory (2)", 128 >>>>>>>>>>>> >>>>>>>>>>>> seems like a file is missing ? >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>>>>>>>>>> khiremat at redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Could you take the strace with with more string size? The >>>>>>>>>>>>> argument strings are truncated. >>>>>>>>>>>>> >>>>>>>>>>>>> strace -s 500 -ttt -T -p >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan < >>>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Kotresh >>>>>>>>>>>>>> The above-mentioned work around did not work properly. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan < >>>>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Kotresh >>>>>>>>>>>>>>> We have tried the above-mentioned rsync option and we are >>>>>>>>>>>>>>> planning to have the version upgrade to 6.0. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath >>>>>>>>>>>>>>> Ravishankar wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This looks like the hang because stderr buffer filled up >>>>>>>>>>>>>>>> with errors messages and no one reading it. >>>>>>>>>>>>>>>> I think this issue is fixed in latest releases. As a >>>>>>>>>>>>>>>> workaround, you can do following and check if it works. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Prerequisite: >>>>>>>>>>>>>>>> rsync version should be > 3.1.0 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Workaround: >>>>>>>>>>>>>>>> gluster volume geo-replication >>>>>>>>>>>>>>>> :: config rsync-options "--ignore- >>>>>>>>>>>>>>>> missing-args" >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Kotresh HR >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>>>>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi >>>>>>>>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs >>>>>>>>>>>>>>>>> one is in US west and one is in US east. We took multiple trials for >>>>>>>>>>>>>>>>> different file size. >>>>>>>>>>>>>>>>> The Geo Replication tends to stop replicating but while >>>>>>>>>>>>>>>>> checking the status it appears to be in Active state. But the slave volume >>>>>>>>>>>>>>>>> did not increase in size. >>>>>>>>>>>>>>>>> So we have restarted the geo-replication session and >>>>>>>>>>>>>>>>> checked the status. The status was in an active state and it was in History >>>>>>>>>>>>>>>>> Crawl for a long time. We have enabled the DEBUG mode in logging and >>>>>>>>>>>>>>>>> checked for any error. >>>>>>>>>>>>>>>>> There was around 2000 file appeared for syncing candidate. >>>>>>>>>>>>>>>>> The Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>>>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>>>>>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>>>>>>>>>>> this problem? Is there anyway to debug it? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>>>>>>>>>> it displays something like this >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We are using the below specs >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Gluster version - 4.1.7 >>>>>>>>>>>>>>>>> Sync mode - rsync >>>>>>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>>>>> Kotresh H R >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>> Kotresh H R >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Thanks and Regards, >>>>>>>>>>> Kotresh H R >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Thanks and Regards, >>>>>>>>> Kotresh H R >>>>>>>>> >>>>>>>> >>>>> >>>>> -- >>>>> Thanks and Regards, >>>>> Kotresh H R >>>>> >>>> >>> >>> -- >>> Thanks and Regards, >>> Kotresh H R >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Thu Jun 6 10:30:09 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Thu, 06 Jun 2019 10:30:09 -0000 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Hi I have followed the following steps to create the geo-replication but the status seems to be in a faulty state. Steps : - Installed cluster version 5.6 in totally six nodes. glusterfs 5.6 > > Repository revision: git://git.gluster.org/glusterfs.git > > Copyright (c) 2006-2016 Red Hat, Inc. > > GlusterFS comes with ABSOLUTELY NO WARRANTY. > > It is licensed to you under your choice of the GNU Lesser > > General Public License, version 3 or any later version (LGPLv3 > > or later), or the GNU General Public License, version 2 (GPLv2), > > in all cases as published by the Free Software Foundation > - peer_probed the first three nodes and second three nodes. [image: Screen Shot 2019-06-06 at 12.21.41 PM.png] [image: Screen Shot 2019-06-06 at 12.21.19 PM.png] - Added new volume in both the clusters [image: Screen Shot 2019-06-06 at 12.24.29 PM.png] [image: Screen Shot 2019-06-06 at 12.24.18 PM.png] - execute gluster-mountbroker commands and restarted glusterd. gluster-mountbroker setup /var/mountbroker-root sas > gluster-mountbroker remove --volume code-misc --user sas - configured a passwordless sssh from master to slave ssh-keygen; ssh-copy-id sas at 192.168.185.107 - created a common pem pub file gluster system:: execute gsec_create > - created geo-replication session. gluster volume geo-replication code-misc sas at 192.168.185.107::code-misc > create push-pem > - executed the following command in slave /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh sas code-misc code-misc > - started the gluster geo-replication. gluster volume geo-replication code-misc sas at 192.168.185.107::code-misc > start > - Now the geo-replication works fine. - Tested with 2000 files All seems to sync finely. - Now I updated all the node to version 6.2 by using rpms which were built by the source code in a docker container in my personal machine. gluster --version > > glusterfs 6.2 > > Repository revision: git://git.gluster.org/glusterfs.git > > Copyright (c) 2006-2016 Red Hat, Inc. > > GlusterFS comes with ABSOLUTELY NO WARRANTY. > > It is licensed to you under your choice of the GNU Lesser > > General Public License, version 3 or any later version (LGPLv3 > > or later), or the GNU General Public License, version 2 (GPLv2), > > in all cases as published by the Free Software Foundation. > - I have stopped the glusterd daemons in all the node along with the volume and geo-replication. - Now I started the daemons, volume and geo-replication session the status seems to be faulty. - Also noted that the result of "gluster-mountbroker status" command always end in python exception like this Traceback (most recent call last): > > File "/usr/sbin/gluster-mountbroker", line 396, in > > runcli() > > File "/usr/lib/python2.7/site-packages/gluster/cliutils/cliutils.py", > line 225, in runcli > > cls.run(args) > > File "/usr/sbin/gluster-mountbroker", line 275, in run > > out = execute_in_peers("node-status") > > File "/usr/lib/python2.7/site-packages/gluster/cliutils/cliutils.py", > line 127, in execute_in_peers > > raise GlusterCmdException((rc, out, err, " ".join(cmd))) > > gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to end. > Error : Success\n', 'gluster system:: execute mountbroker.py node-status') > Is it I or everyone gets an error for gluster-mountbroker command for gluster version greater than 6.0?. Please help. Thank you Deepak On Thu, Jun 6, 2019 at 10:35 AM Sunny Kumar wrote: > Hi, > > Updated link for documentation : > > -- > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ > > You can use this tool as well: > http://aravindavk.in/blog/gluster-georep-tools/ > > -Sunny > > On Thu, Jun 6, 2019 at 10:29 AM Kotresh Hiremath Ravishankar > wrote: > > > > Hi, > > > > I think the steps to setup non-root geo-rep is not followed properly. > The following entry is missing in glusterd vol file which is required. > > > > The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 33 times between > [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] > > > > Could you please the steps from below? > > > > > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/administration_guide/index#Setting_Up_the_Environment_for_a_Secure_Geo-replication_Slave > > > > And let us know if you still face the issue. > > > > > > > > > > On Thu, Jun 6, 2019 at 10:24 AM deepu srinivasan > wrote: > >> > >> Hi Kotresh, Sunny > >> I Have mailed the logs I found in one of the slave machines. Is there > anything to do with permission? Please help. > >> > >> On Wed, Jun 5, 2019 at 2:28 PM deepu srinivasan > wrote: > >>> > >>> Hi Kotresh, Sunny > >>> Found this log in the slave machine. > >>>> > >>>> [2019-06-05 08:49:10.632583] I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req > >>>> > >>>> The message "I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req" repeated 2 times between [2019-06-05 08:49:10.632583] > and [2019-06-05 08:49:10.670863] > >>>> > >>>> The message "I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req" repeated 34 times between [2019-06-05 08:48:41.005398] and > [2019-06-05 08:50:37.254063] > >>>> > >>>> The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 34 times between > [2019-06-05 08:48:41.005434] and [2019-06-05 08:50:37.254079] > >>>> > >>>> The message "W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory]" repeated 34 times between > [2019-06-05 08:48:41.005444] and [2019-06-05 08:50:37.254080] > >>>> > >>>> [2019-06-05 08:50:46.361347] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > >>>> > >>>> [2019-06-05 08:50:46.361384] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > >>>> > >>>> [2019-06-05 08:50:46.361419] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > >>>> > >>>> The message "I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req" repeated 33 times between [2019-06-05 08:50:46.361347] and > [2019-06-05 08:52:34.019741] > >>>> > >>>> The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 33 times between > [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] > >>>> > >>>> The message "W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory]" repeated 33 times between > [2019-06-05 08:50:46.361419] and [2019-06-05 08:52:34.019758] > >>>> > >>>> [2019-06-05 08:52:44.426839] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > >>>> > >>>> [2019-06-05 08:52:44.426886] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > >>>> > >>>> [2019-06-05 08:52:44.426896] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > >>> > >>> > >>> On Wed, Jun 5, 2019 at 1:06 AM deepu srinivasan > wrote: > >>>> > >>>> Thankyou Kotresh > >>>> > >>>> On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >>>>> > >>>>> Ccing Sunny, who was investing similar issue. > >>>>> > >>>>> On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan > wrote: > >>>>>> > >>>>>> Have already added the path in bashrc . Still in faulty state > >>>>>> > >>>>>> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >>>>>>> > >>>>>>> could you please try adding /usr/sbin to $PATH for user 'sas'? If > it's bash, add 'export PATH=/usr/sbin:$PATH' in > >>>>>>> /home/sas/.bashrc > >>>>>>> > >>>>>>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >>>>>>>> > >>>>>>>> Hi Kortesh > >>>>>>>> Please find the logs of the above error > >>>>>>>> Master log snippet > >>>>>>>>> > >>>>>>>>> [2019-06-04 11:52:09.254731] I [resource(worker > /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing > SSH connection between master and slave... > >>>>>>>>> [2019-06-04 11:52:09.308923] D [repce(worker > /home/sas/gluster/data/code-misc):196:push] RepceClient: call > 89724:139652759443264:1559649129.31 __repce_version__() ... > >>>>>>>>> [2019-06-04 11:52:09.602792] E [syncdutils(worker > /home/sas/gluster/data/code-misc):311:log_raise_exception] : > connection to peer is broken > >>>>>>>>> [2019-06-04 11:52:09.603312] E [syncdutils(worker > /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned > error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S > /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock > sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ > 192.168.185.107::code-misc --master-node 192.168.185.106 > --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick > /home/sas/gluster/data/code-misc --local-node 192.168.185.122 > --local-node- id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 > --slave-log-level DEBUG --slave-gluster-log-level INFO > --slave-gluster-command-dir /usr/sbin error=1 > >>>>>>>>> [2019-06-04 11:52:09.614996] I [repce(agent > /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating > on reaching EOF. > >>>>>>>>> [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] > Monitor: worker(/home/sas/gluster/data/code-misc) connected > >>>>>>>>> [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] > Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc > >>>>>>>>> [2019-06-04 11:52:09.619391] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status > Change status=Faulty > >>>>>>>> > >>>>>>>> > >>>>>>>> Slave log snippet > >>>>>>>>> > >>>>>>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave > 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: > /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) > >>>>>>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave > 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : Session > config file not exists, using the default config > path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf > >>>>>>>>> [2019-06-04 11:50:11.201070] I [resource(slave > 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER: > Mounting gluster volume locally... > >>>>>>>>> [2019-06-04 11:50:11.271231] E [resource(slave > 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] > MountbrokerMounter: glusterd answered mnt= > >>>>>>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave > 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: > command returned error cmd=/usr/sbin/gluster --remote-host=localhost > system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO > log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log > volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 > >>>>>>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave > 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: > /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) > >>>>>>>> > >>>>>>>> > >>>>>>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >>>>>>>>> > >>>>>>>>> Hi > >>>>>>>>> As discussed I have upgraded gluster from 4.1 to 6.2 version. > But the Geo replication failed to start. > >>>>>>>>> Stays in faulty state > >>>>>>>>> > >>>>>>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >>>>>>>>>> > >>>>>>>>>> Checked the data. It remains in 2708. No progress. > >>>>>>>>>> > >>>>>>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>> That means it could be working and the defunct process might > be some old zombie one. Could you check, that data progress ? > >>>>>>>>>>> > >>>>>>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Hi > >>>>>>>>>>>> When i change the rsync option the rsync process doesnt seem > to start . Only a defunt process is listed in ps aux. Only when i set rsync > option to " " and restart all the process the rsync process is listed in ps > aux. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Yes, rsync config option should have fixed this issue. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Could you share the output of the following? > >>>>>>>>>>>>> > >>>>>>>>>>>>> 1. gluster volume geo-replication > :: config rsync-options > >>>>>>>>>>>>> 2. ps -ef | grep rsync > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Done. > >>>>>>>>>>>>>> We got the following result . > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat > \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" > failed: No such file or directory (2)", 128 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> seems like a file is missing ? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath > Ravishankar wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Could you take the strace with with more string size? The > argument strings are truncated. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> strace -s 500 -ttt -T -p > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi Kotresh > >>>>>>>>>>>>>>>> The above-mentioned work around did not work properly. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hi Kotresh > >>>>>>>>>>>>>>>>> We have tried the above-mentioned rsync option and we > are planning to have the version upgrade to 6.0. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath > Ravishankar wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> This looks like the hang because stderr buffer filled > up with errors messages and no one reading it. > >>>>>>>>>>>>>>>>>> I think this issue is fixed in latest releases. As a > workaround, you can do following and check if it works. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Prerequisite: > >>>>>>>>>>>>>>>>>> rsync version should be > 3.1.0 > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Workaround: > >>>>>>>>>>>>>>>>>> gluster volume geo-replication > :: config rsync-options "--ignore-missing-args" > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>> Kotresh HR > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Hi > >>>>>>>>>>>>>>>>>>> We were evaluating Gluster geo Replication between two > DCs one is in US west and one is in US east. We took multiple trials for > different file size. > >>>>>>>>>>>>>>>>>>> The Geo Replication tends to stop replicating but > while checking the status it appears to be in Active state. But the slave > volume did not increase in size. > >>>>>>>>>>>>>>>>>>> So we have restarted the geo-replication session and > checked the status. The status was in an active state and it was in History > Crawl for a long time. We have enabled the DEBUG mode in logging and > checked for any error. > >>>>>>>>>>>>>>>>>>> There was around 2000 file appeared for syncing > candidate. The Rsync process starts but the rsync did not happen in the > slave volume. Every time the rsync process appears in the "ps auxxx" list > but the replication did not happen in the slave end. What would be the > cause of this problem? Is there anyway to debug it? > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> We have also checked the strace of the rync program. > >>>>>>>>>>>>>>>>>>> it displays something like this > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> We are using the below specs > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Gluster version - 4.1.7 > >>>>>>>>>>>>>>>>>>> Sync mode - rsync > >>>>>>>>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) > >>>>>>>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>> Thanks and Regards, > >>>>>>>>>>>>>>>>>> Kotresh H R > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>> Thanks and Regards, > >>>>>>>>>>>>>>> Kotresh H R > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> Thanks and Regards, > >>>>>>>>>>>>> Kotresh H R > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Thanks and Regards, > >>>>>>>>>>> Kotresh H R > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Thanks and Regards, > >>>>>>> Kotresh H R > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Thanks and Regards, > >>>>> Kotresh H R > > > > > > > > -- > > Thanks and Regards, > > Kotresh H R > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2019-06-06 at 12.21.41 PM.png Type: image/png Size: 42020 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2019-06-06 at 12.21.19 PM.png Type: image/png Size: 42922 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2019-06-06 at 12.24.29 PM.png Type: image/png Size: 65430 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2019-06-06 at 12.24.18 PM.png Type: image/png Size: 73264 bytes Desc: not available URL: From sdeepugd at gmail.com Thu Jun 6 11:23:04 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Thu, 06 Jun 2019 11:23:04 -0000 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Hi Sunny I have changed the file in /usr/libexec/glusterfs/peer_mountbroker.py as mentioned in the patch. Now the "gluster-mountbroker status" command is working fine. But the geo-replication seems to be in the faulty state still. [image: Screen Shot 2019-06-06 at 4.50.30 PM.png] [image: Screen Shot 2019-06-06 at 4.51.55 PM.png] Thankyou Deepak On Thu, Jun 6, 2019 at 4:10 PM Sunny Kumar wrote: > Above error can be tracked here: > > https://bugzilla.redhat.com/show_bug.cgi?id=1709248 > > and patch link: > https://review.gluster.org/#/c/glusterfs/+/22716/ > > You can apply patch and test it however its waiting on regression to > pass and merge. > > -Sunny > > > On Thu, Jun 6, 2019 at 4:00 PM deepu srinivasan > wrote: > > > > Hi > > I have followed the following steps to create the geo-replication but > the status seems to be in a faulty state. > > > > Steps : > > > > Installed cluster version 5.6 in totally six nodes. > >> > >> glusterfs 5.6 > >> > >> Repository revision: git://git.gluster.org/glusterfs.git > >> > >> Copyright (c) 2006-2016 Red Hat, Inc. > >> > >> GlusterFS comes with ABSOLUTELY NO WARRANTY. > >> > >> It is licensed to you under your choice of the GNU Lesser > >> > >> General Public License, version 3 or any later version (LGPLv3 > >> > >> or later), or the GNU General Public License, version 2 (GPLv2), > >> > >> in all cases as published by the Free Software Foundation > > > > > > peer_probed the first three nodes and second three nodes. > > > > > > > > Added new volume in both the clusters > > > > > > > > execute gluster-mountbroker commands and restarted glusterd. > >> > >> gluster-mountbroker setup /var/mountbroker-root sas > >> > >> gluster-mountbroker remove --volume code-misc --user sas > > > > > > configured a passwordless sssh from master to slave > >> > >> ssh-keygen; ssh-copy-id sas at 192.168.185.107 > > > > created a common pem pub file > >> > >> gluster system:: execute gsec_create > > > > created geo-replication session. > >> > >> gluster volume geo-replication code-misc sas at 192.168.185.107::code-misc > create push-pem > > > > executed the following command in slave > >> > >> /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh sas code-misc code-misc > > > > started the gluster geo-replication. > >> > >> gluster volume geo-replication code-misc sas at 192.168.185.107::code-misc > start > > > > > > Now the geo-replication works fine. > > Tested with 2000 files All seems to sync finely. > > > > Now I updated all the node to version 6.2 by using rpms which were built > by the source code in a docker container in my personal machine. > > > > > >> gluster --version > >> > >> glusterfs 6.2 > >> > >> Repository revision: git://git.gluster.org/glusterfs.git > >> > >> Copyright (c) 2006-2016 Red Hat, Inc. > >> > >> GlusterFS comes with ABSOLUTELY NO WARRANTY. > >> > >> It is licensed to you under your choice of the GNU Lesser > >> > >> General Public License, version 3 or any later version (LGPLv3 > >> > >> or later), or the GNU General Public License, version 2 (GPLv2), > >> > >> in all cases as published by the Free Software Foundation. > > > > > > I have stopped the glusterd daemons in all the node along with the > volume and geo-replication. > > Now I started the daemons, volume and geo-replication session the status > seems to be faulty. > > Also noted that the result of "gluster-mountbroker status" command > always end in python exception like this > >> > >> Traceback (most recent call last): > >> > >> File "/usr/sbin/gluster-mountbroker", line 396, in > >> > >> runcli() > >> > >> File "/usr/lib/python2.7/site-packages/gluster/cliutils/cliutils.py", > line 225, in runcli > >> > >> cls.run(args) > >> > >> File "/usr/sbin/gluster-mountbroker", line 275, in run > >> > >> out = execute_in_peers("node-status") > >> > >> File "/usr/lib/python2.7/site-packages/gluster/cliutils/cliutils.py", > line 127, in execute_in_peers > >> > >> raise GlusterCmdException((rc, out, err, " ".join(cmd))) > >> > >> gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to end. > Error : Success\n', 'gluster system:: execute mountbroker.py node-status') > > > > > > Is it I or everyone gets an error for gluster-mountbroker command for > gluster version greater than 6.0?. Please help. > > > > Thank you > > Deepak > > > > > > On Thu, Jun 6, 2019 at 10:35 AM Sunny Kumar wrote: > >> > >> Hi, > >> > >> Updated link for documentation : > >> > >> -- > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ > >> > >> You can use this tool as well: > >> http://aravindavk.in/blog/gluster-georep-tools/ > >> > >> -Sunny > >> > >> On Thu, Jun 6, 2019 at 10:29 AM Kotresh Hiremath Ravishankar > >> wrote: > >> > > >> > Hi, > >> > > >> > I think the steps to setup non-root geo-rep is not followed properly. > The following entry is missing in glusterd vol file which is required. > >> > > >> > The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 33 times between > [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] > >> > > >> > Could you please the steps from below? > >> > > >> > > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/administration_guide/index#Setting_Up_the_Environment_for_a_Secure_Geo-replication_Slave > >> > > >> > And let us know if you still face the issue. > >> > > >> > > >> > > >> > > >> > On Thu, Jun 6, 2019 at 10:24 AM deepu srinivasan > wrote: > >> >> > >> >> Hi Kotresh, Sunny > >> >> I Have mailed the logs I found in one of the slave machines. Is > there anything to do with permission? Please help. > >> >> > >> >> On Wed, Jun 5, 2019 at 2:28 PM deepu srinivasan > wrote: > >> >>> > >> >>> Hi Kotresh, Sunny > >> >>> Found this log in the slave machine. > >> >>>> > >> >>>> [2019-06-05 08:49:10.632583] I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req > >> >>>> > >> >>>> The message "I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req" repeated 2 times between [2019-06-05 08:49:10.632583] > and [2019-06-05 08:49:10.670863] > >> >>>> > >> >>>> The message "I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req" repeated 34 times between [2019-06-05 08:48:41.005398] and > [2019-06-05 08:50:37.254063] > >> >>>> > >> >>>> The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 34 times between > [2019-06-05 08:48:41.005434] and [2019-06-05 08:50:37.254079] > >> >>>> > >> >>>> The message "W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory]" repeated 34 times between > [2019-06-05 08:48:41.005444] and [2019-06-05 08:50:37.254080] > >> >>>> > >> >>>> [2019-06-05 08:50:46.361347] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > >> >>>> > >> >>>> [2019-06-05 08:50:46.361384] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > >> >>>> > >> >>>> [2019-06-05 08:50:46.361419] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > >> >>>> > >> >>>> The message "I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req" repeated 33 times between [2019-06-05 08:50:46.361347] and > [2019-06-05 08:52:34.019741] > >> >>>> > >> >>>> The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 33 times between > [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] > >> >>>> > >> >>>> The message "W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory]" repeated 33 times between > [2019-06-05 08:50:46.361419] and [2019-06-05 08:52:34.019758] > >> >>>> > >> >>>> [2019-06-05 08:52:44.426839] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > >> >>>> > >> >>>> [2019-06-05 08:52:44.426886] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > >> >>>> > >> >>>> [2019-06-05 08:52:44.426896] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > >> >>> > >> >>> > >> >>> On Wed, Jun 5, 2019 at 1:06 AM deepu srinivasan > wrote: > >> >>>> > >> >>>> Thankyou Kotresh > >> >>>> > >> >>>> On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> >>>>> > >> >>>>> Ccing Sunny, who was investing similar issue. > >> >>>>> > >> >>>>> On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >>>>>> > >> >>>>>> Have already added the path in bashrc . Still in faulty state > >> >>>>>> > >> >>>>>> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> >>>>>>> > >> >>>>>>> could you please try adding /usr/sbin to $PATH for user 'sas'? > If it's bash, add 'export PATH=/usr/sbin:$PATH' in > >> >>>>>>> /home/sas/.bashrc > >> >>>>>>> > >> >>>>>>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >>>>>>>> > >> >>>>>>>> Hi Kortesh > >> >>>>>>>> Please find the logs of the above error > >> >>>>>>>> Master log snippet > >> >>>>>>>>> > >> >>>>>>>>> [2019-06-04 11:52:09.254731] I [resource(worker > /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing > SSH connection between master and slave... > >> >>>>>>>>> [2019-06-04 11:52:09.308923] D [repce(worker > /home/sas/gluster/data/code-misc):196:push] RepceClient: call > 89724:139652759443264:1559649129.31 __repce_version__() ... > >> >>>>>>>>> [2019-06-04 11:52:09.602792] E [syncdutils(worker > /home/sas/gluster/data/code-misc):311:log_raise_exception] : > connection to peer is broken > >> >>>>>>>>> [2019-06-04 11:52:09.603312] E [syncdutils(worker > /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned > error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S > /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock > sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ > 192.168.185.107::code-misc --master-node 192.168.185.106 > --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick > /home/sas/gluster/data/code-misc --local-node 192.168.185.122 > --local-node- id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 > --slave-log-level DEBUG --slave-gluster-log-level INFO > --slave-gluster-command-dir /usr/sbin error=1 > >> >>>>>>>>> [2019-06-04 11:52:09.614996] I [repce(agent > /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating > on reaching EOF. > >> >>>>>>>>> [2019-06-04 11:52:09.615545] D > [monitor(monitor):271:monitor] Monitor: > worker(/home/sas/gluster/data/code-misc) connected > >> >>>>>>>>> [2019-06-04 11:52:09.616528] I > [monitor(monitor):278:monitor] Monitor: worker died in startup phase > brick=/home/sas/gluster/data/code-misc > >> >>>>>>>>> [2019-06-04 11:52:09.619391] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status > Change status=Faulty > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> Slave log snippet > >> >>>>>>>>> > >> >>>>>>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave > 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: > /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) > >> >>>>>>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave > 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : Session > config file not exists, using the default config > path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf > >> >>>>>>>>> [2019-06-04 11:50:11.201070] I [resource(slave > 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER: > Mounting gluster volume locally... > >> >>>>>>>>> [2019-06-04 11:50:11.271231] E [resource(slave > 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] > MountbrokerMounter: glusterd answered mnt= > >> >>>>>>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave > 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: > command returned error cmd=/usr/sbin/gluster --remote-host=localhost > system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO > log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log > volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 > >> >>>>>>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave > 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: > /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >>>>>>>>> > >> >>>>>>>>> Hi > >> >>>>>>>>> As discussed I have upgraded gluster from 4.1 to 6.2 version. > But the Geo replication failed to start. > >> >>>>>>>>> Stays in faulty state > >> >>>>>>>>> > >> >>>>>>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >>>>>>>>>> > >> >>>>>>>>>> Checked the data. It remains in 2708. No progress. > >> >>>>>>>>>> > >> >>>>>>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar > wrote: > >> >>>>>>>>>>> > >> >>>>>>>>>>> That means it could be working and the defunct process > might be some old zombie one. Could you check, that data progress ? > >> >>>>>>>>>>> > >> >>>>>>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> Hi > >> >>>>>>>>>>>> When i change the rsync option the rsync process doesnt > seem to start . Only a defunt process is listed in ps aux. Only when i set > rsync option to " " and restart all the process the rsync process is listed > in ps aux. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath > Ravishankar wrote: > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Yes, rsync config option should have fixed this issue. > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Could you share the output of the following? > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> 1. gluster volume geo-replication > :: config rsync-options > >> >>>>>>>>>>>>> 2. ps -ef | grep rsync > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> Done. > >> >>>>>>>>>>>>>> We got the following result . > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat > \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" > failed: No such file or directory (2)", 128 > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> seems like a file is missing ? > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath > Ravishankar wrote: > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> Hi, > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> Could you take the strace with with more string size? > The argument strings are truncated. > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> strace -s 500 -ttt -T -p > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> Hi Kotresh > >> >>>>>>>>>>>>>>>> The above-mentioned work around did not work properly. > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>> Hi Kotresh > >> >>>>>>>>>>>>>>>>> We have tried the above-mentioned rsync option and we > are planning to have the version upgrade to 6.0. > >> >>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath > Ravishankar wrote: > >> >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> Hi, > >> >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> This looks like the hang because stderr buffer > filled up with errors messages and no one reading it. > >> >>>>>>>>>>>>>>>>>> I think this issue is fixed in latest releases. As a > workaround, you can do following and check if it works. > >> >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> Prerequisite: > >> >>>>>>>>>>>>>>>>>> rsync version should be > 3.1.0 > >> >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> Workaround: > >> >>>>>>>>>>>>>>>>>> gluster volume geo-replication > :: config rsync-options "--ignore-missing-args" > >> >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> Thanks, > >> >>>>>>>>>>>>>>>>>> Kotresh HR > >> >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >>>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>>> Hi > >> >>>>>>>>>>>>>>>>>>> We were evaluating Gluster geo Replication between > two DCs one is in US west and one is in US east. We took multiple trials > for different file size. > >> >>>>>>>>>>>>>>>>>>> The Geo Replication tends to stop replicating but > while checking the status it appears to be in Active state. But the slave > volume did not increase in size. > >> >>>>>>>>>>>>>>>>>>> So we have restarted the geo-replication session > and checked the status. The status was in an active state and it was in > History Crawl for a long time. We have enabled the DEBUG mode in logging > and checked for any error. > >> >>>>>>>>>>>>>>>>>>> There was around 2000 file appeared for syncing > candidate. The Rsync process starts but the rsync did not happen in the > slave volume. Every time the rsync process appears in the "ps auxxx" list > but the replication did not happen in the slave end. What would be the > cause of this problem? Is there anyway to debug it? > >> >>>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>>> We have also checked the strace of the rync program. > >> >>>>>>>>>>>>>>>>>>> it displays something like this > >> >>>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., > 128" > >> >>>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>>> We are using the below specs > >> >>>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>>> Gluster version - 4.1.7 > >> >>>>>>>>>>>>>>>>>>> Sync mode - rsync > >> >>>>>>>>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) > >> >>>>>>>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig > >> >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> -- > >> >>>>>>>>>>>>>>>>>> Thanks and Regards, > >> >>>>>>>>>>>>>>>>>> Kotresh H R > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> -- > >> >>>>>>>>>>>>>>> Thanks and Regards, > >> >>>>>>>>>>>>>>> Kotresh H R > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> -- > >> >>>>>>>>>>>>> Thanks and Regards, > >> >>>>>>>>>>>>> Kotresh H R > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> -- > >> >>>>>>>>>>> Thanks and Regards, > >> >>>>>>>>>>> Kotresh H R > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> -- > >> >>>>>>> Thanks and Regards, > >> >>>>>>> Kotresh H R > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> -- > >> >>>>> Thanks and Regards, > >> >>>>> Kotresh H R > >> > > >> > > >> > > >> > -- > >> > Thanks and Regards, > >> > Kotresh H R > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2019-06-06 at 4.50.30 PM.png Type: image/png Size: 44504 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2019-06-06 at 4.51.55 PM.png Type: image/png Size: 48203 bytes Desc: not available URL: From sdeepugd at gmail.com Thu Jun 6 11:57:52 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Thu, 06 Jun 2019 11:57:52 -0000 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Hi Sunny Please find the logs attached > The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 13 times between > [2019-06-06 11:51:43.986788] and [2019-06-06 11:52:32.764546] > > The message "W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory]" repeated 13 times between > [2019-06-06 11:51:43.986798] and [2019-06-06 11:52:32.764548] > > The message "I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req" repeated 2 times between [2019-06-06 11:53:07.064332] > and [2019-06-06 11:53:07.303978] > > [2019-06-06 11:55:35.624320] I [MSGID: 106495] > [glusterd-handler.c:3137:__glusterd_handle_getwd] 0-glusterd: Received > getwd req > > [2019-06-06 11:55:35.884345] I [MSGID: 106131] > [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: quotad already > stopped > > [2019-06-06 11:55:35.884373] I [MSGID: 106568] > [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: quotad service is > stopped > > [2019-06-06 11:55:35.884459] I [MSGID: 106131] > [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: bitd already > stopped > > [2019-06-06 11:55:35.884473] I [MSGID: 106568] > [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: bitd service is > stopped > > [2019-06-06 11:55:35.884554] I [MSGID: 106131] > [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: scrub already > stopped > > [2019-06-06 11:55:35.884567] I [MSGID: 106568] > [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: scrub service is > stopped > > [2019-06-06 11:55:35.893823] I [run.c:242:runner_log] > (-->/usr/lib64/glusterfs/6.2/xlator/mgmt/glusterd.so(+0xe8e1a) > [0x7f7380d60e1a] > -->/usr/lib64/glusterfs/6.2/xlator/mgmt/glusterd.so(+0xe88e5) > [0x7f7380d608e5] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7f738cbc5df5] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh --volname=code-misc -o > features.read-only=on --gd-workdir=/var/lib/glusterd > > [2019-06-06 11:55:35.900465] I [run.c:242:runner_log] > (-->/usr/lib64/glusterfs/6.2/xlator/mgmt/glusterd.so(+0xe8e1a) > [0x7f7380d60e1a] > -->/usr/lib64/glusterfs/6.2/xlator/mgmt/glusterd.so(+0xe88e5) > [0x7f7380d608e5] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7f738cbc5df5] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh > --volname=code-misc -o features.read-only=on --gd-workdir=/var/lib/glusterd > > [2019-06-06 11:55:43.485284] I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req > > The message "I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req" repeated 2 times between [2019-06-06 11:55:43.485284] > and [2019-06-06 11:55:43.512321] > > [2019-06-06 11:55:44.055419] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > > [2019-06-06 11:55:44.055473] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > > [2019-06-06 11:55:44.055483] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > > [2019-06-06 11:55:44.056695] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > > [2019-06-06 11:55:44.056725] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > > [2019-06-06 11:55:44.056734] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > > [2019-06-06 11:55:44.057522] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > > [2019-06-06 11:55:44.057552] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > > [2019-06-06 11:55:44.057562] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > > [2019-06-06 11:55:54.655681] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > > [2019-06-06 11:55:54.655741] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > > [2019-06-06 11:55:54.655752] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > On Thu, Jun 6, 2019 at 5:09 PM Sunny Kumar wrote: > Whats current trackback please share. > > -Sunny > > > On Thu, Jun 6, 2019 at 4:53 PM deepu srinivasan > wrote: > > > > Hi Sunny > > I have changed the file in /usr/libexec/glusterfs/peer_mountbroker.py as > mentioned in the patch. > > Now the "gluster-mountbroker status" command is working fine. But the > geo-replication seems to be in the faulty state still. > > > > > > Thankyou > > Deepak > > > > On Thu, Jun 6, 2019 at 4:10 PM Sunny Kumar wrote: > >> > >> Above error can be tracked here: > >> > >> https://bugzilla.redhat.com/show_bug.cgi?id=1709248 > >> > >> and patch link: > >> https://review.gluster.org/#/c/glusterfs/+/22716/ > >> > >> You can apply patch and test it however its waiting on regression to > >> pass and merge. > >> > >> -Sunny > >> > >> > >> On Thu, Jun 6, 2019 at 4:00 PM deepu srinivasan > wrote: > >> > > >> > Hi > >> > I have followed the following steps to create the geo-replication but > the status seems to be in a faulty state. > >> > > >> > Steps : > >> > > >> > Installed cluster version 5.6 in totally six nodes. > >> >> > >> >> glusterfs 5.6 > >> >> > >> >> Repository revision: git://git.gluster.org/glusterfs.git > >> >> > >> >> Copyright (c) 2006-2016 Red Hat, Inc. > >> >> > >> >> GlusterFS comes with ABSOLUTELY NO WARRANTY. > >> >> > >> >> It is licensed to you under your choice of the GNU Lesser > >> >> > >> >> General Public License, version 3 or any later version (LGPLv3 > >> >> > >> >> or later), or the GNU General Public License, version 2 (GPLv2), > >> >> > >> >> in all cases as published by the Free Software Foundation > >> > > >> > > >> > peer_probed the first three nodes and second three nodes. > >> > > >> > > >> > > >> > Added new volume in both the clusters > >> > > >> > > >> > > >> > execute gluster-mountbroker commands and restarted glusterd. > >> >> > >> >> gluster-mountbroker setup /var/mountbroker-root sas > >> >> > >> >> gluster-mountbroker remove --volume code-misc --user sas > >> > > >> > > >> > configured a passwordless sssh from master to slave > >> >> > >> >> ssh-keygen; ssh-copy-id sas at 192.168.185.107 > >> > > >> > created a common pem pub file > >> >> > >> >> gluster system:: execute gsec_create > >> > > >> > created geo-replication session. > >> >> > >> >> gluster volume geo-replication code-misc sas at 192.168.185.107::code-misc > create push-pem > >> > > >> > executed the following command in slave > >> >> > >> >> /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh sas code-misc > code-misc > >> > > >> > started the gluster geo-replication. > >> >> > >> >> gluster volume geo-replication code-misc sas at 192.168.185.107::code-misc > start > >> > > >> > > >> > Now the geo-replication works fine. > >> > Tested with 2000 files All seems to sync finely. > >> > > >> > Now I updated all the node to version 6.2 by using rpms which were > built by the source code in a docker container in my personal machine. > >> > > >> > > >> >> gluster --version > >> >> > >> >> glusterfs 6.2 > >> >> > >> >> Repository revision: git://git.gluster.org/glusterfs.git > >> >> > >> >> Copyright (c) 2006-2016 Red Hat, Inc. > >> >> > >> >> GlusterFS comes with ABSOLUTELY NO WARRANTY. > >> >> > >> >> It is licensed to you under your choice of the GNU Lesser > >> >> > >> >> General Public License, version 3 or any later version (LGPLv3 > >> >> > >> >> or later), or the GNU General Public License, version 2 (GPLv2), > >> >> > >> >> in all cases as published by the Free Software Foundation. > >> > > >> > > >> > I have stopped the glusterd daemons in all the node along with the > volume and geo-replication. > >> > Now I started the daemons, volume and geo-replication session the > status seems to be faulty. > >> > Also noted that the result of "gluster-mountbroker status" command > always end in python exception like this > >> >> > >> >> Traceback (most recent call last): > >> >> > >> >> File "/usr/sbin/gluster-mountbroker", line 396, in > >> >> > >> >> runcli() > >> >> > >> >> File > "/usr/lib/python2.7/site-packages/gluster/cliutils/cliutils.py", line 225, > in runcli > >> >> > >> >> cls.run(args) > >> >> > >> >> File "/usr/sbin/gluster-mountbroker", line 275, in run > >> >> > >> >> out = execute_in_peers("node-status") > >> >> > >> >> File > "/usr/lib/python2.7/site-packages/gluster/cliutils/cliutils.py", line 127, > in execute_in_peers > >> >> > >> >> raise GlusterCmdException((rc, out, err, " ".join(cmd))) > >> >> > >> >> gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to > end. Error : Success\n', 'gluster system:: execute mountbroker.py > node-status') > >> > > >> > > >> > Is it I or everyone gets an error for gluster-mountbroker command for > gluster version greater than 6.0?. Please help. > >> > > >> > Thank you > >> > Deepak > >> > > >> > > >> > On Thu, Jun 6, 2019 at 10:35 AM Sunny Kumar > wrote: > >> >> > >> >> Hi, > >> >> > >> >> Updated link for documentation : > >> >> > >> >> -- > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ > >> >> > >> >> You can use this tool as well: > >> >> http://aravindavk.in/blog/gluster-georep-tools/ > >> >> > >> >> -Sunny > >> >> > >> >> On Thu, Jun 6, 2019 at 10:29 AM Kotresh Hiremath Ravishankar > >> >> wrote: > >> >> > > >> >> > Hi, > >> >> > > >> >> > I think the steps to setup non-root geo-rep is not followed > properly. The following entry is missing in glusterd vol file which is > required. > >> >> > > >> >> > The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 33 times between > [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] > >> >> > > >> >> > Could you please the steps from below? > >> >> > > >> >> > > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/administration_guide/index#Setting_Up_the_Environment_for_a_Secure_Geo-replication_Slave > >> >> > > >> >> > And let us know if you still face the issue. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > On Thu, Jun 6, 2019 at 10:24 AM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >> > >> >> >> Hi Kotresh, Sunny > >> >> >> I Have mailed the logs I found in one of the slave machines. Is > there anything to do with permission? Please help. > >> >> >> > >> >> >> On Wed, Jun 5, 2019 at 2:28 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >>> > >> >> >>> Hi Kotresh, Sunny > >> >> >>> Found this log in the slave machine. > >> >> >>>> > >> >> >>>> [2019-06-05 08:49:10.632583] I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req > >> >> >>>> > >> >> >>>> The message "I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req" repeated 2 times between [2019-06-05 08:49:10.632583] > and [2019-06-05 08:49:10.670863] > >> >> >>>> > >> >> >>>> The message "I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req" repeated 34 times between [2019-06-05 08:48:41.005398] and > [2019-06-05 08:50:37.254063] > >> >> >>>> > >> >> >>>> The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 34 times between > [2019-06-05 08:48:41.005434] and [2019-06-05 08:50:37.254079] > >> >> >>>> > >> >> >>>> The message "W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory]" repeated 34 times between > [2019-06-05 08:48:41.005444] and [2019-06-05 08:50:37.254080] > >> >> >>>> > >> >> >>>> [2019-06-05 08:50:46.361347] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > >> >> >>>> > >> >> >>>> [2019-06-05 08:50:46.361384] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > >> >> >>>> > >> >> >>>> [2019-06-05 08:50:46.361419] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > >> >> >>>> > >> >> >>>> The message "I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req" repeated 33 times between [2019-06-05 08:50:46.361347] and > [2019-06-05 08:52:34.019741] > >> >> >>>> > >> >> >>>> The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 33 times between > [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] > >> >> >>>> > >> >> >>>> The message "W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory]" repeated 33 times between > [2019-06-05 08:50:46.361419] and [2019-06-05 08:52:34.019758] > >> >> >>>> > >> >> >>>> [2019-06-05 08:52:44.426839] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > >> >> >>>> > >> >> >>>> [2019-06-05 08:52:44.426886] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > >> >> >>>> > >> >> >>>> [2019-06-05 08:52:44.426896] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > >> >> >>> > >> >> >>> > >> >> >>> On Wed, Jun 5, 2019 at 1:06 AM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >>>> > >> >> >>>> Thankyou Kotresh > >> >> >>>> > >> >> >>>> On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> >> >>>>> > >> >> >>>>> Ccing Sunny, who was investing similar issue. > >> >> >>>>> > >> >> >>>>> On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >>>>>> > >> >> >>>>>> Have already added the path in bashrc . Still in faulty state > >> >> >>>>>> > >> >> >>>>>> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> >> >>>>>>> > >> >> >>>>>>> could you please try adding /usr/sbin to $PATH for user > 'sas'? If it's bash, add 'export PATH=/usr/sbin:$PATH' in > >> >> >>>>>>> /home/sas/.bashrc > >> >> >>>>>>> > >> >> >>>>>>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >>>>>>>> > >> >> >>>>>>>> Hi Kortesh > >> >> >>>>>>>> Please find the logs of the above error > >> >> >>>>>>>> Master log snippet > >> >> >>>>>>>>> > >> >> >>>>>>>>> [2019-06-04 11:52:09.254731] I [resource(worker > /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing > SSH connection between master and slave... > >> >> >>>>>>>>> [2019-06-04 11:52:09.308923] D [repce(worker > /home/sas/gluster/data/code-misc):196:push] RepceClient: call > 89724:139652759443264:1559649129.31 __repce_version__() ... > >> >> >>>>>>>>> [2019-06-04 11:52:09.602792] E [syncdutils(worker > /home/sas/gluster/data/code-misc):311:log_raise_exception] : > connection to peer is broken > >> >> >>>>>>>>> [2019-06-04 11:52:09.603312] E [syncdutils(worker > /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned > error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S > /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock > sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ > 192.168.185.107::code-misc --master-node 192.168.185.106 > --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick > /home/sas/gluster/data/code-misc --local-node 192.168.185.122 > --local-node- id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 > --slave-log-level DEBUG --slave-gluster-log-level INFO > --slave-gluster-command-dir /usr/sbin error=1 > >> >> >>>>>>>>> [2019-06-04 11:52:09.614996] I [repce(agent > /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating > on reaching EOF. > >> >> >>>>>>>>> [2019-06-04 11:52:09.615545] D > [monitor(monitor):271:monitor] Monitor: > worker(/home/sas/gluster/data/code-misc) connected > >> >> >>>>>>>>> [2019-06-04 11:52:09.616528] I > [monitor(monitor):278:monitor] Monitor: worker died in startup phase > brick=/home/sas/gluster/data/code-misc > >> >> >>>>>>>>> [2019-06-04 11:52:09.619391] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status > Change status=Faulty > >> >> >>>>>>>> > >> >> >>>>>>>> > >> >> >>>>>>>> Slave log snippet > >> >> >>>>>>>>> > >> >> >>>>>>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave > 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: > /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) > >> >> >>>>>>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave > 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : Session > config file not exists, using the default config > path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf > >> >> >>>>>>>>> [2019-06-04 11:50:11.201070] I [resource(slave > 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER: > Mounting gluster volume locally... > >> >> >>>>>>>>> [2019-06-04 11:50:11.271231] E [resource(slave > 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] > MountbrokerMounter: glusterd answered mnt= > >> >> >>>>>>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave > 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: > command returned error cmd=/usr/sbin/gluster --remote-host=localhost > system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO > log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log > volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 > >> >> >>>>>>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave > 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: > /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) > >> >> >>>>>>>> > >> >> >>>>>>>> > >> >> >>>>>>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >>>>>>>>> > >> >> >>>>>>>>> Hi > >> >> >>>>>>>>> As discussed I have upgraded gluster from 4.1 to 6.2 > version. But the Geo replication failed to start. > >> >> >>>>>>>>> Stays in faulty state > >> >> >>>>>>>>> > >> >> >>>>>>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >>>>>>>>>> > >> >> >>>>>>>>>> Checked the data. It remains in 2708. No progress. > >> >> >>>>>>>>>> > >> >> >>>>>>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath > Ravishankar wrote: > >> >> >>>>>>>>>>> > >> >> >>>>>>>>>>> That means it could be working and the defunct process > might be some old zombie one. Could you check, that data progress ? > >> >> >>>>>>>>>>> > >> >> >>>>>>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >>>>>>>>>>>> > >> >> >>>>>>>>>>>> Hi > >> >> >>>>>>>>>>>> When i change the rsync option the rsync process doesnt > seem to start . Only a defunt process is listed in ps aux. Only when i set > rsync option to " " and restart all the process the rsync process is listed > in ps aux. > >> >> >>>>>>>>>>>> > >> >> >>>>>>>>>>>> > >> >> >>>>>>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath > Ravishankar wrote: > >> >> >>>>>>>>>>>>> > >> >> >>>>>>>>>>>>> Yes, rsync config option should have fixed this issue. > >> >> >>>>>>>>>>>>> > >> >> >>>>>>>>>>>>> Could you share the output of the following? > >> >> >>>>>>>>>>>>> > >> >> >>>>>>>>>>>>> 1. gluster volume geo-replication > :: config rsync-options > >> >> >>>>>>>>>>>>> 2. ps -ef | grep rsync > >> >> >>>>>>>>>>>>> > >> >> >>>>>>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>> Done. > >> >> >>>>>>>>>>>>>> We got the following result . > >> >> >>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat > \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" > failed: No such file or directory (2)", 128 > >> >> >>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>> seems like a file is missing ? > >> >> >>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath > Ravishankar wrote: > >> >> >>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>> Hi, > >> >> >>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>> Could you take the strace with with more string > size? The argument strings are truncated. > >> >> >>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>> strace -s 500 -ttt -T -p > >> >> >>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>> Hi Kotresh > >> >> >>>>>>>>>>>>>>>> The above-mentioned work around did not work > properly. > >> >> >>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>> Hi Kotresh > >> >> >>>>>>>>>>>>>>>>> We have tried the above-mentioned rsync option and > we are planning to have the version upgrade to 6.0. > >> >> >>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath > Ravishankar wrote: > >> >> >>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>> Hi, > >> >> >>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>> This looks like the hang because stderr buffer > filled up with errors messages and no one reading it. > >> >> >>>>>>>>>>>>>>>>>> I think this issue is fixed in latest releases. > As a workaround, you can do following and check if it works. > >> >> >>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>> Prerequisite: > >> >> >>>>>>>>>>>>>>>>>> rsync version should be > 3.1.0 > >> >> >>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>> Workaround: > >> >> >>>>>>>>>>>>>>>>>> gluster volume geo-replication > :: config rsync-options "--ignore-missing-args" > >> >> >>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>> Thanks, > >> >> >>>>>>>>>>>>>>>>>> Kotresh HR > >> >> >>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >>>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>>> Hi > >> >> >>>>>>>>>>>>>>>>>>> We were evaluating Gluster geo Replication > between two DCs one is in US west and one is in US east. We took multiple > trials for different file size. > >> >> >>>>>>>>>>>>>>>>>>> The Geo Replication tends to stop replicating > but while checking the status it appears to be in Active state. But the > slave volume did not increase in size. > >> >> >>>>>>>>>>>>>>>>>>> So we have restarted the geo-replication session > and checked the status. The status was in an active state and it was in > History Crawl for a long time. We have enabled the DEBUG mode in logging > and checked for any error. > >> >> >>>>>>>>>>>>>>>>>>> There was around 2000 file appeared for syncing > candidate. The Rsync process starts but the rsync did not happen in the > slave volume. Every time the rsync process appears in the "ps auxxx" list > but the replication did not happen in the slave end. What would be the > cause of this problem? Is there anyway to debug it? > >> >> >>>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>>> We have also checked the strace of the rync > program. > >> >> >>>>>>>>>>>>>>>>>>> it displays something like this > >> >> >>>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>>> "write(2, "rsync: link_stat > \"/tmp/gsyncd-au"..., 128" > >> >> >>>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>>> We are using the below specs > >> >> >>>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>>> Gluster version - 4.1.7 > >> >> >>>>>>>>>>>>>>>>>>> Sync mode - rsync > >> >> >>>>>>>>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) > >> >> >>>>>>>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig > >> >> >>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>>>>> -- > >> >> >>>>>>>>>>>>>>>>>> Thanks and Regards, > >> >> >>>>>>>>>>>>>>>>>> Kotresh H R > >> >> >>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>> > >> >> >>>>>>>>>>>>>>> -- > >> >> >>>>>>>>>>>>>>> Thanks and Regards, > >> >> >>>>>>>>>>>>>>> Kotresh H R > >> >> >>>>>>>>>>>>> > >> >> >>>>>>>>>>>>> > >> >> >>>>>>>>>>>>> > >> >> >>>>>>>>>>>>> -- > >> >> >>>>>>>>>>>>> Thanks and Regards, > >> >> >>>>>>>>>>>>> Kotresh H R > >> >> >>>>>>>>>>> > >> >> >>>>>>>>>>> > >> >> >>>>>>>>>>> > >> >> >>>>>>>>>>> -- > >> >> >>>>>>>>>>> Thanks and Regards, > >> >> >>>>>>>>>>> Kotresh H R > >> >> >>>>>>> > >> >> >>>>>>> > >> >> >>>>>>> > >> >> >>>>>>> -- > >> >> >>>>>>> Thanks and Regards, > >> >> >>>>>>> Kotresh H R > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> -- > >> >> >>>>> Thanks and Regards, > >> >> >>>>> Kotresh H R > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Thanks and Regards, > >> >> > Kotresh H R > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Thu Jun 6 13:37:15 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Thu, 06 Jun 2019 13:37:15 -0000 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Hi Sunny Sorry, that was a typo. I used the following command. > gluster-mountbroker add code-misc sas > On Thu, Jun 6, 2019 at 6:23 PM Sunny Kumar wrote: > You should not have used this one: > > > > gluster-mountbroker remove --volume code-misc --user sas > > -- This one is to remove volume/user from mount broker. > > Please try setting up mount broker once again. > > -Sunny > > On Thu, Jun 6, 2019 at 5:28 PM deepu srinivasan > wrote: > > > > Hi Sunny > > Please find the logs attached > >> > >> The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 13 times between > [2019-06-06 11:51:43.986788] and [2019-06-06 11:52:32.764546] > >> > >> The message "W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory]" repeated 13 times between > [2019-06-06 11:51:43.986798] and [2019-06-06 11:52:32.764548] > >> > >> The message "I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req" repeated 2 times between [2019-06-06 11:53:07.064332] > and [2019-06-06 11:53:07.303978] > >> > >> [2019-06-06 11:55:35.624320] I [MSGID: 106495] > [glusterd-handler.c:3137:__glusterd_handle_getwd] 0-glusterd: Received > getwd req > >> > >> [2019-06-06 11:55:35.884345] I [MSGID: 106131] > [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: quotad already > stopped > >> > >> [2019-06-06 11:55:35.884373] I [MSGID: 106568] > [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: quotad service is > stopped > >> > >> [2019-06-06 11:55:35.884459] I [MSGID: 106131] > [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: bitd already > stopped > >> > >> [2019-06-06 11:55:35.884473] I [MSGID: 106568] > [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: bitd service is > stopped > >> > >> [2019-06-06 11:55:35.884554] I [MSGID: 106131] > [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: scrub already > stopped > >> > >> [2019-06-06 11:55:35.884567] I [MSGID: 106568] > [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: scrub service is > stopped > >> > >> [2019-06-06 11:55:35.893823] I [run.c:242:runner_log] > (-->/usr/lib64/glusterfs/6.2/xlator/mgmt/glusterd.so(+0xe8e1a) > [0x7f7380d60e1a] > -->/usr/lib64/glusterfs/6.2/xlator/mgmt/glusterd.so(+0xe88e5) > [0x7f7380d608e5] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7f738cbc5df5] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh --volname=code-misc -o > features.read-only=on --gd-workdir=/var/lib/glusterd > >> > >> [2019-06-06 11:55:35.900465] I [run.c:242:runner_log] > (-->/usr/lib64/glusterfs/6.2/xlator/mgmt/glusterd.so(+0xe8e1a) > [0x7f7380d60e1a] > -->/usr/lib64/glusterfs/6.2/xlator/mgmt/glusterd.so(+0xe88e5) > [0x7f7380d608e5] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7f738cbc5df5] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh > --volname=code-misc -o features.read-only=on --gd-workdir=/var/lib/glusterd > >> > >> [2019-06-06 11:55:43.485284] I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req > >> > >> The message "I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req" repeated 2 times between [2019-06-06 11:55:43.485284] > and [2019-06-06 11:55:43.512321] > >> > >> [2019-06-06 11:55:44.055419] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > >> > >> [2019-06-06 11:55:44.055473] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > >> > >> [2019-06-06 11:55:44.055483] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > >> > >> [2019-06-06 11:55:44.056695] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > >> > >> [2019-06-06 11:55:44.056725] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > >> > >> [2019-06-06 11:55:44.056734] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > >> > >> [2019-06-06 11:55:44.057522] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > >> > >> [2019-06-06 11:55:44.057552] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > >> > >> [2019-06-06 11:55:44.057562] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > >> > >> [2019-06-06 11:55:54.655681] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > >> > >> [2019-06-06 11:55:54.655741] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > >> > >> [2019-06-06 11:55:54.655752] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > > > > > > On Thu, Jun 6, 2019 at 5:09 PM Sunny Kumar wrote: > >> > >> Whats current trackback please share. > >> > >> -Sunny > >> > >> > >> On Thu, Jun 6, 2019 at 4:53 PM deepu srinivasan > wrote: > >> > > >> > Hi Sunny > >> > I have changed the file in /usr/libexec/glusterfs/peer_mountbroker.py > as mentioned in the patch. > >> > Now the "gluster-mountbroker status" command is working fine. But the > geo-replication seems to be in the faulty state still. > >> > > >> > > >> > Thankyou > >> > Deepak > >> > > >> > On Thu, Jun 6, 2019 at 4:10 PM Sunny Kumar > wrote: > >> >> > >> >> Above error can be tracked here: > >> >> > >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1709248 > >> >> > >> >> and patch link: > >> >> https://review.gluster.org/#/c/glusterfs/+/22716/ > >> >> > >> >> You can apply patch and test it however its waiting on regression to > >> >> pass and merge. > >> >> > >> >> -Sunny > >> >> > >> >> > >> >> On Thu, Jun 6, 2019 at 4:00 PM deepu srinivasan > wrote: > >> >> > > >> >> > Hi > >> >> > I have followed the following steps to create the geo-replication > but the status seems to be in a faulty state. > >> >> > > >> >> > Steps : > >> >> > > >> >> > Installed cluster version 5.6 in totally six nodes. > >> >> >> > >> >> >> glusterfs 5.6 > >> >> >> > >> >> >> Repository revision: git://git.gluster.org/glusterfs.git > >> >> >> > >> >> >> Copyright (c) 2006-2016 Red Hat, Inc. > >> >> >> > >> >> >> GlusterFS comes with ABSOLUTELY NO WARRANTY. > >> >> >> > >> >> >> It is licensed to you under your choice of the GNU Lesser > >> >> >> > >> >> >> General Public License, version 3 or any later version (LGPLv3 > >> >> >> > >> >> >> or later), or the GNU General Public License, version 2 (GPLv2), > >> >> >> > >> >> >> in all cases as published by the Free Software Foundation > >> >> > > >> >> > > >> >> > peer_probed the first three nodes and second three nodes. > >> >> > > >> >> > > >> >> > > >> >> > Added new volume in both the clusters > >> >> > > >> >> > > >> >> > > >> >> > execute gluster-mountbroker commands and restarted glusterd. > >> >> >> > >> >> >> gluster-mountbroker setup /var/mountbroker-root sas > >> >> >> > >> >> >> gluster-mountbroker remove --volume code-misc --user sas > >> >> > > >> >> > > >> >> > configured a passwordless sssh from master to slave > >> >> >> > >> >> >> ssh-keygen; ssh-copy-id sas at 192.168.185.107 > >> >> > > >> >> > created a common pem pub file > >> >> >> > >> >> >> gluster system:: execute gsec_create > >> >> > > >> >> > created geo-replication session. > >> >> >> > >> >> >> gluster volume geo-replication code-misc sas at 192.168.185.107::code-misc > create push-pem > >> >> > > >> >> > executed the following command in slave > >> >> >> > >> >> >> /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh sas code-misc > code-misc > >> >> > > >> >> > started the gluster geo-replication. > >> >> >> > >> >> >> gluster volume geo-replication code-misc sas at 192.168.185.107::code-misc > start > >> >> > > >> >> > > >> >> > Now the geo-replication works fine. > >> >> > Tested with 2000 files All seems to sync finely. > >> >> > > >> >> > Now I updated all the node to version 6.2 by using rpms which were > built by the source code in a docker container in my personal machine. > >> >> > > >> >> > > >> >> >> gluster --version > >> >> >> > >> >> >> glusterfs 6.2 > >> >> >> > >> >> >> Repository revision: git://git.gluster.org/glusterfs.git > >> >> >> > >> >> >> Copyright (c) 2006-2016 Red Hat, Inc. > >> >> >> > >> >> >> GlusterFS comes with ABSOLUTELY NO WARRANTY. > >> >> >> > >> >> >> It is licensed to you under your choice of the GNU Lesser > >> >> >> > >> >> >> General Public License, version 3 or any later version (LGPLv3 > >> >> >> > >> >> >> or later), or the GNU General Public License, version 2 (GPLv2), > >> >> >> > >> >> >> in all cases as published by the Free Software Foundation. > >> >> > > >> >> > > >> >> > I have stopped the glusterd daemons in all the node along with the > volume and geo-replication. > >> >> > Now I started the daemons, volume and geo-replication session the > status seems to be faulty. > >> >> > Also noted that the result of "gluster-mountbroker status" command > always end in python exception like this > >> >> >> > >> >> >> Traceback (most recent call last): > >> >> >> > >> >> >> File "/usr/sbin/gluster-mountbroker", line 396, in > >> >> >> > >> >> >> runcli() > >> >> >> > >> >> >> File > "/usr/lib/python2.7/site-packages/gluster/cliutils/cliutils.py", line 225, > in runcli > >> >> >> > >> >> >> cls.run(args) > >> >> >> > >> >> >> File "/usr/sbin/gluster-mountbroker", line 275, in run > >> >> >> > >> >> >> out = execute_in_peers("node-status") > >> >> >> > >> >> >> File > "/usr/lib/python2.7/site-packages/gluster/cliutils/cliutils.py", line 127, > in execute_in_peers > >> >> >> > >> >> >> raise GlusterCmdException((rc, out, err, " ".join(cmd))) > >> >> >> > >> >> >> gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to > end. Error : Success\n', 'gluster system:: execute mountbroker.py > node-status') > >> >> > > >> >> > > >> >> > Is it I or everyone gets an error for gluster-mountbroker command > for gluster version greater than 6.0?. Please help. > >> >> > > >> >> > Thank you > >> >> > Deepak > >> >> > > >> >> > > >> >> > On Thu, Jun 6, 2019 at 10:35 AM Sunny Kumar > wrote: > >> >> >> > >> >> >> Hi, > >> >> >> > >> >> >> Updated link for documentation : > >> >> >> > >> >> >> -- > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ > >> >> >> > >> >> >> You can use this tool as well: > >> >> >> http://aravindavk.in/blog/gluster-georep-tools/ > >> >> >> > >> >> >> -Sunny > >> >> >> > >> >> >> On Thu, Jun 6, 2019 at 10:29 AM Kotresh Hiremath Ravishankar > >> >> >> wrote: > >> >> >> > > >> >> >> > Hi, > >> >> >> > > >> >> >> > I think the steps to setup non-root geo-rep is not followed > properly. The following entry is missing in glusterd vol file which is > required. > >> >> >> > > >> >> >> > The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 33 times between > [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] > >> >> >> > > >> >> >> > Could you please the steps from below? > >> >> >> > > >> >> >> > > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/administration_guide/index#Setting_Up_the_Environment_for_a_Secure_Geo-replication_Slave > >> >> >> > > >> >> >> > And let us know if you still face the issue. > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > On Thu, Jun 6, 2019 at 10:24 AM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >> >> > >> >> >> >> Hi Kotresh, Sunny > >> >> >> >> I Have mailed the logs I found in one of the slave machines. > Is there anything to do with permission? Please help. > >> >> >> >> > >> >> >> >> On Wed, Jun 5, 2019 at 2:28 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >> >>> > >> >> >> >>> Hi Kotresh, Sunny > >> >> >> >>> Found this log in the slave machine. > >> >> >> >>>> > >> >> >> >>>> [2019-06-05 08:49:10.632583] I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req > >> >> >> >>>> > >> >> >> >>>> The message "I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req" repeated 2 times between [2019-06-05 08:49:10.632583] > and [2019-06-05 08:49:10.670863] > >> >> >> >>>> > >> >> >> >>>> The message "I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req" repeated 34 times between [2019-06-05 08:48:41.005398] and > [2019-06-05 08:50:37.254063] > >> >> >> >>>> > >> >> >> >>>> The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 34 times between > [2019-06-05 08:48:41.005434] and [2019-06-05 08:50:37.254079] > >> >> >> >>>> > >> >> >> >>>> The message "W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory]" repeated 34 times between > [2019-06-05 08:48:41.005444] and [2019-06-05 08:50:37.254080] > >> >> >> >>>> > >> >> >> >>>> [2019-06-05 08:50:46.361347] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > >> >> >> >>>> > >> >> >> >>>> [2019-06-05 08:50:46.361384] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > >> >> >> >>>> > >> >> >> >>>> [2019-06-05 08:50:46.361419] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > >> >> >> >>>> > >> >> >> >>>> The message "I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req" repeated 33 times between [2019-06-05 08:50:46.361347] and > [2019-06-05 08:52:34.019741] > >> >> >> >>>> > >> >> >> >>>> The message "E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file" repeated 33 times between > [2019-06-05 08:50:46.361384] and [2019-06-05 08:52:34.019757] > >> >> >> >>>> > >> >> >> >>>> The message "W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory]" repeated 33 times between > [2019-06-05 08:50:46.361419] and [2019-06-05 08:52:34.019758] > >> >> >> >>>> > >> >> >> >>>> [2019-06-05 08:52:44.426839] I [MSGID: 106496] > [glusterd-handler.c:3187:__glusterd_handle_mount] 0-glusterd: Received > mount req > >> >> >> >>>> > >> >> >> >>>> [2019-06-05 08:52:44.426886] E [MSGID: 106061] > [glusterd-mountbroker.c:555:glusterd_do_mount] 0-management: 'option > mountbroker-root' missing in glusterd vol file > >> >> >> >>>> > >> >> >> >>>> [2019-06-05 08:52:44.426896] W [MSGID: 106176] > [glusterd-mountbroker.c:719:glusterd_do_mount] 0-management: unsuccessful > mount request [No such file or directory] > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> On Wed, Jun 5, 2019 at 1:06 AM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >> >>>> > >> >> >> >>>> Thankyou Kotresh > >> >> >> >>>> > >> >> >> >>>> On Tue, Jun 4, 2019, 11:20 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> >> >> >>>>> > >> >> >> >>>>> Ccing Sunny, who was investing similar issue. > >> >> >> >>>>> > >> >> >> >>>>> On Tue, Jun 4, 2019 at 5:46 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >> >>>>>> > >> >> >> >>>>>> Have already added the path in bashrc . Still in faulty > state > >> >> >> >>>>>> > >> >> >> >>>>>> On Tue, Jun 4, 2019, 5:27 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> >> >> >>>>>>> > >> >> >> >>>>>>> could you please try adding /usr/sbin to $PATH for user > 'sas'? If it's bash, add 'export PATH=/usr/sbin:$PATH' in > >> >> >> >>>>>>> /home/sas/.bashrc > >> >> >> >>>>>>> > >> >> >> >>>>>>> On Tue, Jun 4, 2019 at 5:24 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> Hi Kortesh > >> >> >> >>>>>>>> Please find the logs of the above error > >> >> >> >>>>>>>> Master log snippet > >> >> >> >>>>>>>>> > >> >> >> >>>>>>>>> [2019-06-04 11:52:09.254731] I [resource(worker > /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing > SSH connection between master and slave... > >> >> >> >>>>>>>>> [2019-06-04 11:52:09.308923] D [repce(worker > /home/sas/gluster/data/code-misc):196:push] RepceClient: call > 89724:139652759443264:1559649129.31 __repce_version__() ... > >> >> >> >>>>>>>>> [2019-06-04 11:52:09.602792] E [syncdutils(worker > /home/sas/gluster/data/code-misc):311:log_raise_exception] : > connection to peer is broken > >> >> >> >>>>>>>>> [2019-06-04 11:52:09.603312] E [syncdutils(worker > /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned > error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S > /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock > sas at 192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ > 192.168.185.107::code-misc --master-node 192.168.185.106 > --master-node-id 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick > /home/sas/gluster/data/code-misc --local-node 192.168.185.122 > --local-node- id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 > --slave-log-level DEBUG --slave-gluster-log-level INFO > --slave-gluster-command-dir /usr/sbin error=1 > >> >> >> >>>>>>>>> [2019-06-04 11:52:09.614996] I [repce(agent > /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating > on reaching EOF. > >> >> >> >>>>>>>>> [2019-06-04 11:52:09.615545] D > [monitor(monitor):271:monitor] Monitor: > worker(/home/sas/gluster/data/code-misc) connected > >> >> >> >>>>>>>>> [2019-06-04 11:52:09.616528] I > [monitor(monitor):278:monitor] Monitor: worker died in startup phase > brick=/home/sas/gluster/data/code-misc > >> >> >> >>>>>>>>> [2019-06-04 11:52:09.619391] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status > Change status=Faulty > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> Slave log snippet > >> >> >> >>>>>>>>> > >> >> >> >>>>>>>>> [2019-06-04 11:50:09.782668] E [syncdutils(slave > 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: > /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) > >> >> >> >>>>>>>>> [2019-06-04 11:50:11.188167] W [gsyncd(slave > 192.168.185.125/home/sas/gluster/data/code-misc):305:main] : Session > config file not exists, using the default config > path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf > >> >> >> >>>>>>>>> [2019-06-04 11:50:11.201070] I [resource(slave > 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER: > Mounting gluster volume locally... > >> >> >> >>>>>>>>> [2019-06-04 11:50:11.271231] E [resource(slave > 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] > MountbrokerMounter: glusterd answered mnt= > >> >> >> >>>>>>>>> [2019-06-04 11:50:11.271998] E [syncdutils(slave > 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: > command returned error cmd=/usr/sbin/gluster --remote-host=localhost > system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO > log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log > volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 > >> >> >> >>>>>>>>> [2019-06-04 11:50:11.272113] E [syncdutils(slave > 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: > /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> > >> >> >> >>>>>>>> On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >> >>>>>>>>> > >> >> >> >>>>>>>>> Hi > >> >> >> >>>>>>>>> As discussed I have upgraded gluster from 4.1 to 6.2 > version. But the Geo replication failed to start. > >> >> >> >>>>>>>>> Stays in faulty state > >> >> >> >>>>>>>>> > >> >> >> >>>>>>>>> On Fri, May 31, 2019, 5:32 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >> >>>>>>>>>> > >> >> >> >>>>>>>>>> Checked the data. It remains in 2708. No progress. > >> >> >> >>>>>>>>>> > >> >> >> >>>>>>>>>> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath > Ravishankar wrote: > >> >> >> >>>>>>>>>>> > >> >> >> >>>>>>>>>>> That means it could be working and the defunct > process might be some old zombie one. Could you check, that data progress ? > >> >> >> >>>>>>>>>>> > >> >> >> >>>>>>>>>>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >> >>>>>>>>>>>> > >> >> >> >>>>>>>>>>>> Hi > >> >> >> >>>>>>>>>>>> When i change the rsync option the rsync process > doesnt seem to start . Only a defunt process is listed in ps aux. Only when > i set rsync option to " " and restart all the process the rsync process is > listed in ps aux. > >> >> >> >>>>>>>>>>>> > >> >> >> >>>>>>>>>>>> > >> >> >> >>>>>>>>>>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath > Ravishankar wrote: > >> >> >> >>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>> Yes, rsync config option should have fixed this > issue. > >> >> >> >>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>> Could you share the output of the following? > >> >> >> >>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>> 1. gluster volume geo-replication > :: config rsync-options > >> >> >> >>>>>>>>>>>>> 2. ps -ef | grep rsync > >> >> >> >>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >> >>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>> Done. > >> >> >> >>>>>>>>>>>>>> We got the following result . > >> >> >> >>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>> 1559298781.338234 write(2, "rsync: link_stat > \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" > failed: No such file or directory (2)", 128 > >> >> >> >>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>> seems like a file is missing ? > >> >> >> >>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath > Ravishankar wrote: > >> >> >> >>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>> Hi, > >> >> >> >>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>> Could you take the strace with with more string > size? The argument strings are truncated. > >> >> >> >>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>> strace -s 500 -ttt -T -p > >> >> >> >>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan < > sdeepugd at gmail.com> wrote: > >> >> >> >>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>> Hi Kotresh > >> >> >> >>>>>>>>>>>>>>>> The above-mentioned work around did not work > properly. > >> >> >> >>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan > wrote: > >> >> >> >>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>> Hi Kotresh > >> >> >> >>>>>>>>>>>>>>>>> We have tried the above-mentioned rsync option > and we are planning to have the version upgrade to 6.0. > >> >> >> >>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh > Hiremath Ravishankar wrote: > >> >> >> >>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>> Hi, > >> >> >> >>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>> This looks like the hang because stderr buffer > filled up with errors messages and no one reading it. > >> >> >> >>>>>>>>>>>>>>>>>> I think this issue is fixed in latest > releases. As a workaround, you can do following and check if it works. > >> >> >> >>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>> Prerequisite: > >> >> >> >>>>>>>>>>>>>>>>>> rsync version should be > 3.1.0 > >> >> >> >>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>> Workaround: > >> >> >> >>>>>>>>>>>>>>>>>> gluster volume geo-replication > :: config rsync-options "--ignore-missing-args" > >> >> >> >>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>> Thanks, > >> >> >> >>>>>>>>>>>>>>>>>> Kotresh HR > >> >> >> >>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu > srinivasan wrote: > >> >> >> >>>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>>> Hi > >> >> >> >>>>>>>>>>>>>>>>>>> We were evaluating Gluster geo Replication > between two DCs one is in US west and one is in US east. We took multiple > trials for different file size. > >> >> >> >>>>>>>>>>>>>>>>>>> The Geo Replication tends to stop replicating > but while checking the status it appears to be in Active state. But the > slave volume did not increase in size. > >> >> >> >>>>>>>>>>>>>>>>>>> So we have restarted the geo-replication > session and checked the status. The status was in an active state and it > was in History Crawl for a long time. We have enabled the DEBUG mode in > logging and checked for any error. > >> >> >> >>>>>>>>>>>>>>>>>>> There was around 2000 file appeared for > syncing candidate. The Rsync process starts but the rsync did not happen in > the slave volume. Every time the rsync process appears in the "ps auxxx" > list but the replication did not happen in the slave end. What would be the > cause of this problem? Is there anyway to debug it? > >> >> >> >>>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>>> We have also checked the strace of the rync > program. > >> >> >> >>>>>>>>>>>>>>>>>>> it displays something like this > >> >> >> >>>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>>> "write(2, "rsync: link_stat > \"/tmp/gsyncd-au"..., 128" > >> >> >> >>>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>>> We are using the below specs > >> >> >> >>>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>>> Gluster version - 4.1.7 > >> >> >> >>>>>>>>>>>>>>>>>>> Sync mode - rsync > >> >> >> >>>>>>>>>>>>>>>>>>> Volume - 1x3 in each end (master and slave) > >> >> >> >>>>>>>>>>>>>>>>>>> Intranet Bandwidth - 10 Gig > >> >> >> >>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>>>>> -- > >> >> >> >>>>>>>>>>>>>>>>>> Thanks and Regards, > >> >> >> >>>>>>>>>>>>>>>>>> Kotresh H R > >> >> >> >>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>>>> -- > >> >> >> >>>>>>>>>>>>>>> Thanks and Regards, > >> >> >> >>>>>>>>>>>>>>> Kotresh H R > >> >> >> >>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>> > >> >> >> >>>>>>>>>>>>> -- > >> >> >> >>>>>>>>>>>>> Thanks and Regards, > >> >> >> >>>>>>>>>>>>> Kotresh H R > >> >> >> >>>>>>>>>>> > >> >> >> >>>>>>>>>>> > >> >> >> >>>>>>>>>>> > >> >> >> >>>>>>>>>>> -- > >> >> >> >>>>>>>>>>> Thanks and Regards, > >> >> >> >>>>>>>>>>> Kotresh H R > >> >> >> >>>>>>> > >> >> >> >>>>>>> > >> >> >> >>>>>>> > >> >> >> >>>>>>> -- > >> >> >> >>>>>>> Thanks and Regards, > >> >> >> >>>>>>> Kotresh H R > >> >> >> >>>>> > >> >> >> >>>>> > >> >> >> >>>>> > >> >> >> >>>>> -- > >> >> >> >>>>> Thanks and Regards, > >> >> >> >>>>> Kotresh H R > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > -- > >> >> >> > Thanks and Regards, > >> >> >> > Kotresh H R > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gdeschne at redhat.com Thu Jun 6 14:58:43 2019 From: gdeschne at redhat.com (=?UTF-8?Q?G=c3=bcnther_Deschner?=) Date: Thu, 06 Jun 2019 14:58:43 -0000 Subject: [Gluster-users] [Gluster-devel] Improve stability between SMB/CTDB and Gluster (together with Samba Core Developer) In-Reply-To: References: Message-ID: Hello, just a quick heads-up, during this week pretty much all Samba engineers are busy attending the SambaXP conference in Germany, in addition there was a public holiday in India also this week. Not sure about the general availability tomorrow, I would propose to look for a date maybe next week. Thanks, Guenther On 31/05/2019 14:32, David Spisla wrote: > Hello together, > > inorder not to lose the focus for the topic, I make new date suggestions > for next week > > June 03th ? 07th at 12:30 - 14:30 IST or? (9:00 - 11:00 CEST) > > June 03th ? 06th at 16:30 - 18:30 IST or? (13:00 - 15:00 CEST) > > > Regards > > David Spisla > > > Am Di., 21. Mai 2019 um 11:24?Uhr schrieb David Spisla > >: > > Hello together, > > we are still seeking a day and time to talk about interesting Samba > / Glusterfs issues. Here is a new list of possible dates and time. > > May 22th ? 24th at 12:30 - 14:30 IST or? (9:00 - 11:00 CEST) > > May 27th ? 29th and 31th at 12:30 - 14:30 IST (9:00 - 11:00 CEST) > > > On May 30th there is a holiday here in germany. > > @Poornima Gurusiddaiah If there is any > problem finding a date please contanct me. I will look for alternatives > > > Regards > > David Spisla > > > > Am Do., 16. Mai 2019 um 12:42?Uhr schrieb David Spisla > >: > > Hello Amar, > > thank you for the information. Of course, we should wait for > Poornima because of her knowledge. > > Regards > David Spisla > > Am Do., 16. Mai 2019 um 12:23?Uhr schrieb Amar Tumballi > Suryanarayan >: > > David, Poornima is on leave from today till 21st May. So > having it after she comes back is better. She has more > experience in SMB integration than many of us. > > -Amar > > On Thu, May 16, 2019 at 1:09 PM David Spisla > > wrote: > > Hello everyone, > > if there is any problem in finding a date and time, > please contact me. It would be fine to have a meeting soon. > > Regards > David Spisla > > Am Mo., 13. Mai 2019 um 12:38?Uhr schrieb David Spisla > >: > > Hi Poornima,____ > > __?__ > > thats fine. I would suggest this dates and times:____ > > __?__ > > May 15th ? 17th at 12:30, 13:30, 14:30 IST (9:00, > 10:00, 11:00 CEST) ____ > > May 20th ? 24th at 12:30, 13:30, 14:30 IST (9:00, > 10:00, 11:00 CEST)____ > > __?__ > > I add Volker Lendecke from Sernet to the mail. He is > the Samba Expert.____ > > Can someone of you provide a host via bluejeans.com > ? If not, I will try it with > GoToMeeting (https://www.gotomeeting.com).____ > > __?__ > > @all Please write your prefered dates and times. For > me, all oft the above dates and times are fine____ > > __?__ > > Regards____ > > David____ > > __?__ > > ? > > > *Von:* Poornima Gurusiddaiah > > *Gesendet:* Montag, 13. Mai 2019 07:22 > *An:* David Spisla >; Anoop C S > >; > Gunther Deschner > > *Cc:* Gluster Devel >; > gluster-users at gluster.org > List > > > *Betreff:* Re: [Gluster-devel] Improve stability > between SMB/CTDB and Gluster (together with Samba > Core Developer)____ > > __?__ > > Hi,____ > > __?__ > > We would be definitely interested in this. Thank you > for contacting us. For the starter we can have an > online conference. Please suggest few possible date > and times for the week(preferably between IST 7.00AM > - 9.PM )?____ > > Adding Anoop and Gunther who are also the main > contributors to the Gluster-Samba integration.____ > > __?__ > > Thanks,____ > > Poornima____ > > __?__ > > __?__ > > __?__ > > On Thu, May 9, 2019 at 7:43 PM David Spisla > > > wrote:____ > > Dear Gluster Community,____ > > at the moment we are improving the stability of > SMB/CTDB and Gluster. For this purpose we are > working together with an advanced SAMBA Core > Developer. He did some debugging but needs more > information about Gluster Core Behaviour. ____ > > __?__ > > *Would any of the Gluster Developer wants to > have a online conference with him and me?* ____ > > __?__ > > I would organize everything. In my opinion this > is a good chance to improve stability of > Glusterfs and this is at the moment one of the > major issues in the Community.____ > > __?__ > > Regards ____ > > David Spisla____ > > _______________________________________________ > > Community Meeting Calendar: > > APAC Schedule - > Every 2nd and 4th Tuesday at 11:30 AM IST > Bridge: https://bluejeans.com/836554017 > > NA/EMEA Schedule - > Every 1st and 3rd Tuesday at 01:00 PM EDT > Bridge: https://bluejeans.com/486278655 > > Gluster-devel mailing list > Gluster-devel at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-devel____ > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Amar Tumballi (amarts) > -- G?nther Deschner GPG-ID: 8EE11688 Red Hat gdeschner at redhat.com Samba Team gd at samba.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 195 bytes Desc: OpenPGP digital signature URL: From fusillator at gmail.com Thu Jun 6 15:19:56 2019 From: fusillator at gmail.com (Luca Cazzaniga) Date: Thu, 06 Jun 2019 15:19:56 -0000 Subject: [Gluster-users] healing of a volume of type disperse Message-ID: Hi all, I'm pretty new to glusterfs, I managed to setup a dispersed volume (4+2) following the manual using the release 6.1 from centos' repository.. Is it a stable release? Then I forced the volume stop when the application were writing on the mount point.. getting a wanted splitting (inconsistent) state, I'm wondering what are the best practice to solve this kinds of situation...I just found a detailed explanation about how to solve splitting-head states of replicated volume at https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ but it seems to be not applicable to the disperse type. Do I miss to read some important piece of documentation? Please point me to some reference. Here's some command detail: #gluster volume info elastic-volume Volume Name: elastic-volume Type: Disperse Volume ID: 96773fef-c443-465b-a518-6630bcf83397 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (4 + 2) = 6 Transport-type: tcp Bricks: Brick1: dev-netflow01.fineco.it:/data/gfs/lv_elastic/brick1/brick Brick2: dev-netflow02.fineco.it:/data/gfs/lv_elastic/brick1/brick Brick3: dev-netflow03.fineco.it:/data/gfs/lv_elastic/brick1/brick Brick4: dev-netflow04.fineco.it:/data/gfs/lv_elastic/brick1/brick Brick5: dev-netflow05.fineco.it:/data/gfs/lv_elastic/brick1/brick Brick6: dev-netflow06.fineco.it:/data/gfs/lv_elastic/brick1/brick Options Reconfigured: performance.io-cache: off performance.io-thread-count: 64 performance.write-behind-window-size: 100MB performance.cache-size: 1GB nfs.disable: on transport.address-family: inet # gluster volume heal elastic-volume info Brick dev01:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log Status: Connected Number of entries: 12 Brick dev02:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log Status: Connected Number of entries: 12 Brick dev03:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log Status: Connected Number of entries: 12 Brick dev04:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log Status: Connected Number of entries: 12 Brick dev05:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log Status: Connected Number of entries: 12 Brick dev06:/data/gfs/lv_elastic/brick1/brick /data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log /data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log /data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log Status: Connected Number of entries: 12 # gluster volume heal elastic-volume info split-brain Volume elastic-volume is not of type replicate Any advice? Best regards Luca From sudsingh at cs.stonybrook.edu Fri Jun 7 04:29:51 2019 From: sudsingh at cs.stonybrook.edu (Sudheer Singh) Date: Fri, 07 Jun 2019 04:29:51 -0000 Subject: [Gluster-users] Fuse vs NFS Message-ID: Hi , I was doing perf testing and found out fuse mount much slower than NFS mount. I was curious to know what community recommends, mount volumes as fuse or NFS? -- Thanks, Sudheer -------------- next part -------------- An HTML attachment was scrubbed... URL: From peljasz at yahoo.co.uk Mon Jun 10 11:13:44 2019 From: peljasz at yahoo.co.uk (lejeczek) Date: Mon, 10 Jun 2019 11:13:44 -0000 Subject: [Gluster-users] add interface(s) for gluster to listen to - how? Message-ID: <0389a241-aae4-cdee-8097-3feb9054b7ec@yahoo.co.uk> hi guys, is it possible to add iface either for global or per volume which gluster would be available through? And if yes then how? many thanks, L. -------------- next part -------------- A non-text attachment was scrubbed... Name: pEpkey.asc Type: application/pgp-keys Size: 1757 bytes Desc: not available URL: From sdeepugd at gmail.com Thu Jun 13 13:29:02 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Thu, 13 Jun 2019 13:29:02 -0000 Subject: [Gluster-users] How to resync completely? Message-ID: Hi Guys Found the quotes info in the docs. But Is there any procedure as to how to do this? the doc seems to not convey it. > Synchronization is not complete > > *Description*: GlusterFS geo-replication did not synchronize the data > completely but the geo-replication status displayed is OK. > > *Solution*: You can enforce a full sync of the data by erasing the index > and restarting GlusterFS geo-replication. After restarting, GlusterFS > geo-replication begins synchronizing all the data. All files are compared > using checksum, which can be a lengthy and high resource utilization > operation on large data sets. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Thu Jun 13 13:29:25 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Thu, 13 Jun 2019 13:29:25 -0000 Subject: [Gluster-users] Fwd: Geo Replication Stop even after migratingto 5.6 In-Reply-To: References: Message-ID: ---------- Forwarded message --------- From: deepu srinivasan Date: Thu, Jun 13, 2019 at 5:43 PM Subject: Geo Replication Stop even after migratingto 5.6 To: , Kotresh Hiremath Ravishankar , Hi Guys Hope you remember the issue I reported for geo replication hang status on History Crawl. So you advised me to update the gluster version. previously I was using 4.1 now I upgraded to 5.6/Still after deleting the previous geo-rep session and creating a new one the geo-rep session hangs. Is there any other way that I could solve the issue. I heard that I could redo the whole geo-replication again. How could I do that? Please help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Fri Jun 14 06:57:32 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Fri, 14 Jun 2019 06:57:32 -0000 Subject: [Gluster-users] How to resync completely? In-Reply-To: References: Message-ID: Any Updates on this ? On Thu, Jun 13, 2019 at 6:58 PM deepu srinivasan wrote: > Hi Guys > Found the quotes info in the docs. But Is there any procedure as to how to > do this? the doc seems to not convey it. > >> Synchronization is not complete >> >> *Description*: GlusterFS geo-replication did not synchronize the data >> completely but the geo-replication status displayed is OK. >> >> *Solution*: You can enforce a full sync of the data by erasing the index >> and restarting GlusterFS geo-replication. After restarting, GlusterFS >> geo-replication begins synchronizing all the data. All files are compared >> using checksum, which can be a lengthy and high resource utilization >> operation on large data sets. >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Fri Jun 14 06:57:55 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Fri, 14 Jun 2019 06:57:55 -0000 Subject: [Gluster-users] Geo Replication Stop even after migratingto 5.6 In-Reply-To: References: Message-ID: Hi Any updates on this On Thu, Jun 13, 2019 at 6:59 PM deepu srinivasan wrote: > > > ---------- Forwarded message --------- > From: deepu srinivasan > Date: Thu, Jun 13, 2019 at 5:43 PM > Subject: Geo Replication Stop even after migratingto 5.6 > To: , Kotresh Hiremath Ravishankar < > khiremat at redhat.com>, > > > Hi Guys > Hope you remember the issue I reported for geo replication hang status on > History Crawl. > So you advised me to update the gluster version. previously I was using > 4.1 now I upgraded to 5.6/Still after deleting the previous geo-rep session > and creating a new one the geo-rep session hangs. Is there any other way > that I could solve the issue. > I heard that I could redo the whole geo-replication again. How could I do > that? > Please help. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Fri Jun 14 07:18:36 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Fri, 14 Jun 2019 07:18:36 -0000 Subject: [Gluster-users] Geo Replication Stop even after migratingto 5.6 In-Reply-To: References: Message-ID: Hi Guys Yes, I will try the root geo-rep setup and update you back. Meanwhile is there any procedure for the below-quoted info in the docs? > Synchronization is not complete > > *Description*: GlusterFS geo-replication did not synchronize the data > completely but the geo-replication status displayed is OK. > > *Solution*: You can enforce a full sync of the data by erasing the index > and restarting GlusterFS geo-replication. After restarting, GlusterFS > geo-replication begins synchronizing all the data. All files are compared > using checksum, which can be a lengthy and high resource utilization > operation on large data sets. > > On Fri, Jun 14, 2019 at 12:30 PM Kotresh Hiremath Ravishankar < khiremat at redhat.com> wrote: > Could you please try root geo-rep setup and update back? > > On Fri, Jun 14, 2019 at 12:28 PM deepu srinivasan > wrote: > >> Hi Any updates on this >> >> >> On Thu, Jun 13, 2019 at 5:43 PM deepu srinivasan >> wrote: >> >>> Hi Guys >>> Hope you remember the issue I reported for geo replication hang status >>> on History Crawl. >>> So you advised me to update the gluster version. previously I was using >>> 4.1 now I upgraded to 5.6/Still after deleting the previous geo-rep session >>> and creating a new one the geo-rep session hangs. Is there any other way >>> that I could solve the issue. >>> I heard that I could redo the whole geo-replication again. How could I >>> do that? >>> Please help. >>> >> > > -- > Thanks and Regards, > Kotresh H R > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.spisla at iternity.com Mon Jun 17 16:56:59 2019 From: david.spisla at iternity.com (David Spisla) Date: Mon, 17 Jun 2019 16:56:59 -0000 Subject: [Gluster-users] Duplicated brick processes after restart of glusterd In-Reply-To: References: , Message-ID: Thank you for the clarification Regards David Spisla Outlook f?r Android herunterladen ________________________________ David Spisla Software Engineer david.spisla at iternity.com +49 761 59034852 iTernity GmbH Heinrich-von-Stephan-Str. 21 79100 Freiburg Deutschland Website Newsletter Support Portal iTernity GmbH. Gesch?ftsf?hrer: Ralf Steinemann. ?Eingetragen beim Amtsgericht Freiburg: HRB-Nr. 701332. ?USt.Id DE242664311. [v01.023] From: Atin Mukherjee Sent: Friday, June 14, 2019 7:03:05 PM To: David Spisla Cc: gluster-users at gluster.org List Subject: Re: [Gluster-users] Duplicated brick processes after restart of glusterd Please see https://bugzilla.redhat.com/show_bug.cgi?id=1696147 which is fixed in 5.6 . Although a race, I believe you're hitting this. Although the title of the bug reflects it to be shd + brick multiplexing combo, but it's applicable for bricks too. On Fri, Jun 14, 2019 at 2:07 PM David Spisla > wrote: Dear Gluster Community, this morning I had an interesting observation. On my 2 Node Gluster v5.5 System with 3 Replica1 volumes (volume1, volume2, test) I had duplicated brick processes (See output of ps aux in attached file duplicate_bricks.txt) for each of the volumes. Additionally there is a fs-ss volume which I use instead of gluster_shared_storage but this volume was not effected. After doing some research I found a hint in glusterd.log . It seems to be that after a restart glusterd couldn't found the pid files for the freshly created brick processes and create new brick processes. One can see in the brick logs that for all the volumes that two brick processes were created just one after another. Result: Two brick processes for each of the volumes volume1, volume2 and test. "gluster vo status" shows that the pid number was mapped to the wrong port number for hydmedia and impax But beside of that the volume was working correctly. I resolve that issue with a workaround. Kill all brick processes and restart glusterd. After that everything is fine. Is this a bug in glusterd? You can find all relevant informations attached below Regards David Spisla _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image639775.png Type: image/png Size: 382 bytes Desc: image639775.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image717143.png Type: image/png Size: 412 bytes Desc: image717143.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image965886.png Type: image/png Size: 6545 bytes Desc: image965886.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image986774.png Type: image/png Size: 8191 bytes Desc: image986774.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image804683.png Type: image/png Size: 522 bytes Desc: image804683.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image955587.png Type: image/png Size: 591 bytes Desc: image955587.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image082390.png Type: image/png Size: 775 bytes Desc: image082390.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image294497.png Type: image/png Size: 508 bytes Desc: image294497.png URL: From jaco at uls.co.za Tue Jun 18 13:51:09 2019 From: jaco at uls.co.za (Jaco Kroon) Date: Tue, 18 Jun 2019 13:51:09 -0000 Subject: [Gluster-users] lingering xattrop (heal) files Message-ID: <331cff19-5cb8-4efc-fdc1-4e7630a5b26a@uls.co.za> Hi All, We're using "gluster volume heal ${volname} statistics heal-count" to monitor our systems w.r.t. healing not happening. The reason we're using statistics heal-count and not info is because it's extremely fast in comparison with info. After upgrading to glusterfs 6.1 (from 4.1) we noticed that in many cases heal-count would report >0 values, and then upon running info, this just goes away. Upon closer investigation I've noticed that indices/xattrop there are a few gfid linked files which correlates with the counts given by heal-count, for example: # gluster volume heal mail statistics heal-count Gathering count of entries to be healed on volume mail has been successful Brick host_a:/mnt/gluster/mail Number of entries: 0 Brick host_b:/mnt/gluster/mail Number of entries: 3 And then: host_b /mnt/gluster/mail/.glusterfs/indices/xattrop # for i in [a-f0-9]*; do if stat ../../${i:0:2}/${i:2:2}/${i} &>/dev/null; then echo $i exists; else echo $i does not; fi ; done 12427a88-4a42-4cc1-bbd3-13e4cb8d7e6a does not 1a1e0425-acdb-4ed1-9c62-bb866f34b0c7 does not ed2cefe8-3854-49e5-9433-7198f53ffec5 does not Which to me is indicative that upon file removal these xattrop files are left behind. I'm not sure if this is by design, or a bug, or more likely due to a misunderstanding of how these actually function. Since gluster volume heal ... info can potentially take a long time under the kind of conditions that we're mindful of we'd prefer to use heal-count so that we can receive our alerts in a more timely manner. Kind Regards, Jaco From matthewb at uvic.ca Wed Jun 19 17:20:59 2019 From: matthewb at uvic.ca (Matthew Benstead) Date: Wed, 19 Jun 2019 17:20:59 -0000 Subject: [Gluster-users] GeoReplication Error - Changelog register failed error - Is a directory Message-ID: Hello - I am having a problem with geo-replication on gluster that I hope someone can help me with. I have a 7-server distribute cluster as the primary volume, and a 2 server distribute cluster as the secondary volume. Both are running the same version of gluster on CentOS 7: glusterfs-5.3-2.el7.x86_64 I was able to setup the replication keys, user, groups, etc and establish the session, but it goes faulty quickly after initializing. I ran into the missing libgfchangelog.so error and fixed with a symlink: [root at pcic-backup01 ~]# ln -s /usr/lib64/libgfchangelog.so.0 /usr/lib64/libgfchangelog.so [root at pcic-backup01 ~]# ls -lh /usr/lib64/libgfchangelog.so* lrwxrwxrwx. 1 root root 30 May 16 13:16 /usr/lib64/libgfchangelog.so -> /usr/lib64/libgfchangelog.so.0 lrwxrwxrwx. 1 root root 23 May 16 08:58 /usr/lib64/libgfchangelog.so.0 -> libgfchangelog.so.0.0.1 -rwxr-xr-x. 1 root root 62K Feb 25 04:02 /usr/lib64/libgfchangelog.so.0.0.1 But right now, when trying to start replication it goes faulty: [root at gluster01 ~]# gluster volume geo-replication storage geoaccount at 10.0.231.81::pcic-backup start Starting geo-replication session between storage & geoaccount at 10.0.231.81::pcic-backup has been successful [root at gluster01 ~]# gluster volume geo-replication status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 10.0.231.50 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.54 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.56 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.55 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.53 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.51 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.52 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A [root at gluster01 ~]# gluster volume geo-replication storage geoaccount at 10.0.231.81::pcic-backup stop Stopping geo-replication session between storage & geoaccount at 10.0.231.81::pcic-backup has been successful And the log file contains the error: GLUSTER: Changelog register failed error=[Errno 21] Is a directory File: /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log [root at gluster01 ~]# cat /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log [2019-05-23 17:07:23.500781] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:23.629298] I [gsyncd(status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.354005] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.483582] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.863888] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.994895] I [gsyncd(monitor):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:33.133888] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Initializing... [2019-05-23 17:07:33.134301] I [monitor(monitor):157:monitor] Monitor: starting gsyncd worker brick=/mnt/raid6-storage/storage slave_node=10.0.231.81 [2019-05-23 17:07:33.214462] I [gsyncd(agent /mnt/raid6-storage/storage):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:33.216737] I [changelogagent(agent /mnt/raid6-storage/storage):72:__init__] ChangelogAgent: Agent listining... [2019-05-23 17:07:33.228072] I [gsyncd(worker /mnt/raid6-storage/storage):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:33.247236] I [resource(worker /mnt/raid6-storage/storage):1366:connect_remote] SSH: Initializing SSH connection between master and slave... [2019-05-23 17:07:34.948796] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:35.73339] I [gsyncd(status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:35.232405] I [resource(worker /mnt/raid6-storage/storage):1413:connect_remote] SSH: SSH connection between master and slave established. duration=1.9849 [2019-05-23 17:07:35.232748] I [resource(worker /mnt/raid6-storage/storage):1085:connect] GLUSTER: Mounting gluster volume locally... [2019-05-23 17:07:36.359250] I [resource(worker /mnt/raid6-storage/storage):1108:connect] GLUSTER: Mounted gluster volume duration=1.1262 [2019-05-23 17:07:36.359639] I [subcmds(worker /mnt/raid6-storage/storage):80:subcmd_worker] : Worker spawn successful. Acknowledging back to monitor [2019-05-23 17:07:36.380975] E [repce(agent /mnt/raid6-storage/storage):122:worker] : call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 40, in register return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level, retries) File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 45, in cl_register cls.raise_changelog_err() File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 29, in raise_changelog_err raise ChangelogException(errn, os.strerror(errn)) ChangelogException: [Errno 21] Is a directory [2019-05-23 17:07:36.382556] E [repce(worker /mnt/raid6-storage/storage):214:__call__] RepceClient: call failed call=27412:140659114579776:1558631256.38 method=register error=ChangelogException [2019-05-23 17:07:36.382833] E [resource(worker /mnt/raid6-storage/storage):1266:service_loop] GLUSTER: Changelog register failed error=[Errno 21] Is a directory [2019-05-23 17:07:36.404313] I [repce(agent /mnt/raid6-storage/storage):97:service_loop] RepceServer: terminating on reaching EOF. [2019-05-23 17:07:37.361396] I [monitor(monitor):278:monitor] Monitor: worker died in startup phase brick=/mnt/raid6-storage/storage [2019-05-23 17:07:37.370690] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty [2019-05-23 17:07:41.526408] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:41.643923] I [gsyncd(status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:45.722193] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:45.817210] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:46.188499] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:46.258817] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:47.350276] I [gsyncd(monitor-status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:47.364751] I [subcmds(monitor-status):29:subcmd_monitor_status] : Monitor Status Change status=Stopped I'm not really sure where to go from here... [root at gluster01 ~]# gluster volume geo-replication storage geoaccount at 10.0.231.81::pcic-backup config | grep -i changelog change_detector:changelog changelog_archive_format:%Y%m changelog_batch_size:727040 changelog_log_file:/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/changes-${local_id}.log changelog_log_level:INFO Thanks, -Matthew From sdeepugd at gmail.com Mon Jun 24 07:06:12 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Mon, 24 Jun 2019 07:06:12 -0000 Subject: [Gluster-users] Any other method for full resync Message-ID: Hi Guys We deleted the Geo-replication session with "reset-sync-time" and when we run the sync completely from first and the sync was successful (Hope you remember that we got stuck in geo-replication session in its history crawl and did not go past it). It took a long time to sync the data. Is there any other way to completely sync from the beginning that takes much lesser time than this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailinglists at lucassen.org Tue Jun 25 09:17:35 2019 From: mailinglists at lucassen.org (richard lucassen) Date: Tue, 25 Jun 2019 09:17:35 -0000 Subject: [Gluster-users] very poor performance on Debian Buster Message-ID: <20190625110732.7c6decfd73a00577a21040e5@lucassen.org> I run glusterfs server on a sys-V version of Debian Gluster. The machine is an 8-core/256GB/SSD server and I want to copy 400GB to a mounted gluster device. The copy now runs for more than a day and it has only copied 79GB. The network activity is around 4 to 8 Mbit. Is this a known issue of version 5.5-3? I did not touch the defaults. R. -- richard lucassen http://contact.xaq.nl/ From matthew.has.questions at gmail.com Thu Jun 13 18:51:28 2019 From: matthew.has.questions at gmail.com (Matthew B) Date: Thu, 13 Jun 2019 18:51:28 -0000 Subject: [Gluster-users] Gluster Geo Replication ChangelogException Is a directory Message-ID: Hello - I am having a problem with geo-replication on glusterv5 that I hope someone can help me with. I have a 7-server distribute cluster as the primary volume, and a 2 server distribute cluster as the secondary volume. Both are running the same version of gluster on CentOS 7: glusterfs-5.3-2.el7.x86_64 I was able to setup the replication keys, user, groups, etc and establish the session, but it goes faulty quickly after initializing. I ran into the missing libgfchangelog.so error and fixed with a symlink: [root at pcic-backup01 ~]# ln -s /usr/lib64/libgfchangelog.so.0 /usr/lib64/libgfchangelog.so [root at pcic-backup01 ~]# ls -lh /usr/lib64/libgfchangelog.so* lrwxrwxrwx. 1 root root 30 May 16 13:16 /usr/lib64/libgfchangelog.so -> /usr/lib64/libgfchangelog.so.0 lrwxrwxrwx. 1 root root 23 May 16 08:58 /usr/lib64/libgfchangelog.so.0 -> libgfchangelog.so.0.0.1 -rwxr-xr-x. 1 root root 62K Feb 25 04:02 /usr/lib64/libgfchangelog.so.0.0.1 But right now, when trying to start replication it goes faulty: [root at gluster01 ~]# gluster volume geo-replication storage geoaccount at 10.0.231.81::pcic-backup start Starting geo-replication session between storage & geoaccount at 10.0.231.81::pcic-backup has been successful [root at gluster01 ~]# gluster volume geo-replication status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 10.0.231.50 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.54 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.56 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.52 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.55 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.51 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.53 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A [root at gluster01 ~]# gluster volume geo-replication status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 10.0.231.50 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.54 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.56 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.55 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.53 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.51 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.52 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A [root at gluster01 ~]# gluster volume geo-replication storage geoaccount at 10.0.231.81::pcic-backup stop Stopping geo-replication session between storage & geoaccount at 10.0.231.81::pcic-backup has been successful And the /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log log file contains the error: GLUSTER: Changelog register failed error=[Errno 21] Is a directory [root at gluster01 ~]# cat /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log [2019-05-23 17:07:23.500781] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:23.629298] I [gsyncd(status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.354005] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.483582] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.863888] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.994895] I [gsyncd(monitor):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:33.133888] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Initializing... [2019-05-23 17:07:33.134301] I [monitor(monitor):157:monitor] Monitor: starting gsyncd worker brick=/mnt/raid6-storage/storage slave_node=10.0.231.81 [2019-05-23 17:07:33.214462] I [gsyncd(agent /mnt/raid6-storage/storage):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:33.216737] I [changelogagent(agent /mnt/raid6-storage/storage):72:__init__] ChangelogAgent: Agent listining... [2019-05-23 17:07:33.228072] I [gsyncd(worker /mnt/raid6-storage/storage):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:33.247236] I [resource(worker /mnt/raid6-storage/storage):1366:connect_remote] SSH: Initializing SSH connection between master and slave... [2019-05-23 17:07:34.948796] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:35.73339] I [gsyncd(status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:35.232405] I [resource(worker /mnt/raid6-storage/storage):1413:connect_remote] SSH: SSH connection between master and slave established. duration=1.9849 [2019-05-23 17:07:35.232748] I [resource(worker /mnt/raid6-storage/storage):1085:connect] GLUSTER: Mounting gluster volume locally... [2019-05-23 17:07:36.359250] I [resource(worker /mnt/raid6-storage/storage):1108:connect] GLUSTER: Mounted gluster volume duration=1.1262 [2019-05-23 17:07:36.359639] I [subcmds(worker /mnt/raid6-storage/storage):80:subcmd_worker] : Worker spawn successful. Acknowledging back to monitor [2019-05-23 17:07:36.380975] E [repce(agent /mnt/raid6-storage/storage):122:worker] : call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 40, in register return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level, retries) File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 45, in cl_register cls.raise_changelog_err() File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 29, in raise_changelog_err raise ChangelogException(errn, os.strerror(errn)) ChangelogException: [Errno 21] Is a directory [2019-05-23 17:07:36.382556] E [repce(worker /mnt/raid6-storage/storage):214:__call__] RepceClient: call failed call=27412:140659114579776:1558631256.38 method=register error=ChangelogException [2019-05-23 17:07:36.382833] E [resource(worker /mnt/raid6-storage/storage):1266:service_loop] GLUSTER: Changelog register failed error=[Errno 21] Is a directory [2019-05-23 17:07:36.404313] I [repce(agent /mnt/raid6-storage/storage):97:service_loop] RepceServer: terminating on reaching EOF. [2019-05-23 17:07:37.361396] I [monitor(monitor):278:monitor] Monitor: worker died in startup phase brick=/mnt/raid6-storage/storage [2019-05-23 17:07:37.370690] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty [2019-05-23 17:07:41.526408] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:41.643923] I [gsyncd(status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:45.722193] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:45.817210] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:46.188499] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:46.258817] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:47.350276] I [gsyncd(monitor-status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:47.364751] I [subcmds(monitor-status):29:subcmd_monitor_status] : Monitor Status Change status=Stopped I'm not really sure where to go from here... [root at gluster01 ~]# gluster volume geo-replication storage geoaccount at 10.0.231.81::pcic-backup config | grep -i changelog change_detector:changelog changelog_archive_format:%Y%m changelog_batch_size:727040 changelog_log_file:/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/changes-${local_id}.log changelog_log_level:INFO Thanks, -Matthew