From spisla80 at gmail.com Fri Mar 1 07:48:42 2019 From: spisla80 at gmail.com (David Spisla) Date: Fri, 1 Mar 2019 08:48:42 +0100 Subject: [Gluster-users] Bitrot: Time of signing depending on the file size??? Message-ID: Hello folks, I did some observations concerning the bitrot daemon. It seems to be that the bitrot signer is signing files depending on file size. I copied files with different sizes into a volume and I was wonderung because the files get their signature not the same time (I keep the expiry time default with 120). Here are some examples: 300 KB file ~2-3 m 70 MB file ~ 40 m 115 MB file ~ 1 Sh 800 MB file ~ 4,5 h What is the expected behaviour here? Why does it take so long to sign a 800MB file? What about 500GB or 1TB? Is there a way to speed up the sign process? My ambition is to understand this observation Regards David Spisla -------------- next part -------------- An HTML attachment was scrubbed... URL: From amudhan83 at gmail.com Fri Mar 1 07:59:25 2019 From: amudhan83 at gmail.com (Amudhan P) Date: Fri, 1 Mar 2019 13:29:25 +0530 Subject: [Gluster-users] Bitrot: Time of signing depending on the file size??? In-Reply-To: References: Message-ID: Hi David, I have also tested the bitrot signature process by default it takes < 250 KB/s. regards Amudhan P On Fri, Mar 1, 2019 at 1:19 PM David Spisla wrote: > Hello folks, > > I did some observations concerning the bitrot daemon. It seems to be that > the bitrot signer is signing files depending on file size. I copied files > with different sizes into a volume and I was wonderung because the files > get their signature not the same time (I keep the expiry time default with > 120). Here are some examples: > > 300 KB file ~2-3 m > 70 MB file ~ 40 m > 115 MB file ~ 1 Sh > 800 MB file ~ 4,5 h > > What is the expected behaviour here? > Why does it take so long to sign a 800MB file? > What about 500GB or 1TB? > Is there a way to speed up the sign process? > > My ambition is to understand this observation > > Regards > David Spisla > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From srakonde at redhat.com Fri Mar 1 10:55:50 2019 From: srakonde at redhat.com (Sanju Rakonde) Date: Fri, 1 Mar 2019 16:25:50 +0530 Subject: [Gluster-users] Fwd: Added bricks with wrong name and now need to remove them without destroying volume. In-Reply-To: References: <45e6627e633e5acb1fb96fe6a457df1827187679.camel@gmail.com> <0d79899c7a34939e64bff8e61ec29f2ac553f50b.camel@gmail.com> Message-ID: On Thu, Feb 28, 2019 at 9:48 PM Poornima Gurusiddaiah wrote: > > > On Thu, Feb 28, 2019, 8:44 PM Tami Greene wrote: > >> I'm missing some information about how the cluster volume creates the >> metadata allowing it to see and find the data on the bricks. I've been >> told not to write anything to the bricks directly as the glusterfs cannot >> create the metadata and therefore the data doesn't exist in the cluster >> world. >> >> So, if I destroy the current gluster volume, leaving the data on the >> hardware RAID volume, correct the names of the new empty bricks, recreate >> the cluster volume, import bricks, how does the metadata get created so the >> new cluster volume can find and access the data? It seems like I would be >> laying the glusterfs on top on hardware and "hiding" the data. >> > > I couldn't get all the details why it went wrong, but you can delete a > Gluster volume and recreate it with the same bricks and the data should be > accessible again AFAIK. Preferably create with the same volume name. Do not > alter any data on the bricks, also make sure there is no valid data on the > 4 bricks that were wrongly added by checking in the backend. > > +Atin, Sanju > > @Atin, Sanju, This should work right? > Yes, this should work. When you delete a volume, your data will be left untouched in underlying filesystem. and you can always re-create a volume with the same bricks but you need to use force command while creating the volume. because the bricks will have extended attributes (written by glusterfs, when you use these bricks for creating the volume for the first time), which say that this brick is already part of volume. By using force command you can create volume and use your data. > > Regards, > Poornima > > >> >> On Wed, Feb 27, 2019 at 5:08 PM Jim Kinney wrote: >> >>> It sounds like new bricks were added and they mounted over the top of >>> existing bricks. >>> >>> gluster volume status detail >>> >>> This will give the data you need to find where the real files are. You >>> can look in those to see the data should be intact. >>> >>> Stopping the gluster volume is a good first step. Then as a safe guard >>> you can unmount the filesystem that holds the data you want. Now remove the >>> gluster volume(s) that are the problem - all if needed. Remount the real >>> filesystem(s). Create new gluster volumes with correct names. >>> >>> On Wed, 2019-02-27 at 16:56 -0500, Tami Greene wrote: >>> >>> That makes sense. System is made of four data arrays with a hardware >>> RAID 6 and then the distributed volume on top. I honestly don't know how >>> that works, but the previous administrator said we had redundancy. I'm >>> hoping there is a way to bypass the safeguard of migrating data when >>> removing a brick from the volume, which in my beginner's mind, would be a >>> straight-forward way of remedying the problem. Hopefully once the empty >>> bricks are removed, the "missing" data will be visible again in the volume. >>> >>> On Wed, Feb 27, 2019 at 3:59 PM Jim Kinney wrote: >>> >>> Keep in mind that gluster is a metadata process. It doesn't really touch >>> the actual volume files. The exception is the .glusterfs and .trashcan >>> folders in the very top directory of the gluster volume. >>> >>> When you create a gluster volume from brick, it doesn't format the >>> filesystem. It uses what's already there. >>> >>> So if you remove a volume and all it's bricks, you've not deleted data. >>> >>> That said, if you are using anything but replicated bricks, which is >>> what I use exclusively for my needs, then reassembling them into a new >>> volume with correct name might be tricky. By listing the bricks in the >>> exact same order as they were listed when creating the wrong name volume >>> when making the correct named volume, it should use the same method to put >>> data on the drives as previously and not scramble anything. >>> >>> On Wed, 2019-02-27 at 14:24 -0500, Tami Greene wrote: >>> >>> I sent this and realized I hadn't registered. My apologies for the >>> duplication >>> >>> Subject: Added bricks with wrong name and now need to remove them >>> without destroying volume. >>> To: >>> >>> >>> >>> Yes, I broke it. Now I need help fixing it. >>> >>> >>> >>> I have an existing Gluster Volume, spread over 16 bricks and 4 servers; >>> 1.5P space with 49% currently used . Added an additional 4 bricks and >>> server as we expect large influx of data in the next 4 to 6 months. The >>> system had been established by my predecessor, who is no longer here. >>> >>> >>> >>> First solo addition of bricks to gluster. >>> >>> >>> >>> Everything went smoothly until ?gluster volume add-brick Volume >>> newserver:/bricks/dataX/vol.name" >>> >>> (I don?t have the exact response as I worked on this for >>> almost 5 hours last night) Unable to add-brick as ?it is already mounted? >>> or something to that affect. >>> >>> Double checked my instructions, the name of the bricks. >>> Everything seemed correct. Tried to add again adding ?force.? Again, >>> ?unable to add-brick? >>> >>> Because of the keyword (in my mind) ?mounted? in the >>> error, I checked /etc/fstab, where the name of the mount point is simply >>> /bricks/dataX. >>> >>> This convention was the same across all servers, so I thought I had >>> discovered an error in my notes and changed the name to >>> newserver:/bricks/dataX. >>> >>> Still had to use force, but the bricks were added. >>> >>> Restarted the gluster volume vol.name. No errors. >>> >>> Rebooted; but /vol.name did not mount on reboot as the /etc/fstab >>> instructs. So I attempted to mount manually and discovered a had a big mess >>> on my hands. >>> >>> ?Transport endpoint not connected? in >>> addition to other messages. >>> >>> Discovered an issue between certificates and the >>> auth.ssl-allow list because of the hostname of new server. I made >>> correction and /vol.name mounted. >>> >>> However, df -h indicated the 4 new bricks were not being >>> seen as 400T were missing from what should have been available. >>> >>> >>> >>> Thankfully, I could add something to vol.name on one machine and see it >>> on another machine and I wrongly assumed the volume was operational, even >>> if the new bricks were not recognized. >>> >>> So I tried to correct the main issue by, >>> >>> gluster volume remove vol.name newserver/bricks/dataX/ >>> >>> received prompt, data will be migrated before brick is >>> removed continue (or something to that) and I started the process, think >>> this won?t take long because there is no data. >>> >>> After 10 minutes and no apparent progress on the >>> process, I did panic, thinking worse case scenario ? it is writing zeros >>> over my data. >>> >>> Executed the stop command and there was still no >>> progress, and I assume it was due to no data on the brick to be remove >>> causing the program to hang. >>> >>> Found the process ID and killed it. >>> >>> >>> This morning, while all clients and servers can access /vol.name; not >>> all of the data is present. I can find it under cluster, but users >>> cannot reach it. I am, again, assume it is because of the 4 bricks that >>> have been added, but aren't really a part of the volume because of their >>> incorrect name. >>> >>> >>> >>> So ? how do I proceed from here. >>> >>> >>> 1. Remove the 4 empty bricks from the volume without damaging data. >>> >>> 2. Correctly clear any metadata about these 4 bricks ONLY so they may be >>> added correctly. >>> >>> >>> If this doesn't restore the volume to full functionality, I'll write >>> another post if I cannot find answer in the notes or on line. >>> >>> >>> Tami-- >>> >>> >>> _______________________________________________ >>> >>> Gluster-users mailing list >>> >>> Gluster-users at gluster.org >>> >>> >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> -- >>> >>> >>> James P. Kinney III >>> >>> >>> Every time you stop a school, you will have to build a jail. What you >>> >>> gain at one end you lose at the other. It's like feeding a dog on his >>> >>> own tail. It won't fatten the dog. >>> >>> - Speech 11/23/1900 Mark Twain >>> >>> >>> http://heretothereideas.blogspot.com/ >>> >>> >>> >>> >>> -- >>> >>> James P. Kinney III Every time you stop a school, you will have to build >>> a jail. What you gain at one end you lose at the other. It's like feeding a >>> dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark >>> Twain http://heretothereideas.blogspot.com/ >>> >>> >> >> -- >> Tami >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Fri Mar 1 12:19:29 2019 From: spisla80 at gmail.com (David Spisla) Date: Fri, 1 Mar 2019 13:19:29 +0100 Subject: [Gluster-users] Bitrot: Time of signing depending on the file size??? In-Reply-To: References: Message-ID: Hello Amudhan, What does exactly mean "it takes < 250KB/s"? I figured out this discussion between you and Kotresh: https://lists.gluster.org/pipermail/gluster-users/2016-September/028354.html Kotresh mentioned there that the problem is because for some files fd process are still up in the brick process list. Bitrot signer can only sign a file if the fd is closed. And according to my observations it seems to be that as bigger a file is as longer the fd is still up. I could verify this with a 500MiB file and some smaller files. After a specific time only the fd for the 500MiB was up and the file still had no signature, for the smaller files there were no fds and they already had a signature. Does anybody know what is the reason for this? For me it looks loke a bug. Regards David Am Fr., 1. M?rz 2019 um 08:58 Uhr schrieb Amudhan P : > Hi David, > > I have also tested the bitrot signature process by default it takes < 250 > KB/s. > > regards > Amudhan P > > > On Fri, Mar 1, 2019 at 1:19 PM David Spisla wrote: > >> Hello folks, >> >> I did some observations concerning the bitrot daemon. It seems to be that >> the bitrot signer is signing files depending on file size. I copied files >> with different sizes into a volume and I was wonderung because the files >> get their signature not the same time (I keep the expiry time default with >> 120). Here are some examples: >> >> 300 KB file ~2-3 m >> 70 MB file ~ 40 m >> 115 MB file ~ 1 Sh >> 800 MB file ~ 4,5 h >> >> What is the expected behaviour here? >> Why does it take so long to sign a 800MB file? >> What about 500GB or 1TB? >> Is there a way to speed up the sign process? >> >> My ambition is to understand this observation >> >> Regards >> David Spisla >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amudhan83 at gmail.com Fri Mar 1 13:57:49 2019 From: amudhan83 at gmail.com (Amudhan P) Date: Fri, 1 Mar 2019 19:27:49 +0530 Subject: [Gluster-users] Bitrot: Time of signing depending on the file size??? In-Reply-To: References: Message-ID: Hi David, Once after file write completes (fd closed) bitrot process will wait for 120 seconds and if there no fd is opened for the file it will trigger the signer process. considering the signer, process start and end time file read speed was < 250KB/s. To increase bitrot signer read speed need to modify the bitrot source file. regards Amudhan On Fri, Mar 1, 2019 at 5:49 PM David Spisla wrote: > Hello Amudhan, > > What does exactly mean "it takes < 250KB/s"? > I figured out this discussion between you and Kotresh: > https://lists.gluster.org/pipermail/gluster-users/2016-September/028354.html > Kotresh mentioned there that the problem is because for some files fd > process are still up in the brick process list. Bitrot signer can only sign > a file if the fd is closed. And according to my observations it seems to be > that as bigger a file is as longer the fd is still up. I could verify this > with a 500MiB file and some smaller files. After a specific time only the > fd for the 500MiB was up and the file still had no signature, for the > smaller files there were no fds and they already had a signature. > > Does anybody know what is the reason for this? For me it looks loke a bug. > > Regards > David > > Am Fr., 1. M?rz 2019 um 08:58 Uhr schrieb Amudhan P : > >> Hi David, >> >> I have also tested the bitrot signature process by default it takes < 250 >> KB/s. >> >> regards >> Amudhan P >> >> >> On Fri, Mar 1, 2019 at 1:19 PM David Spisla wrote: >> >>> Hello folks, >>> >>> I did some observations concerning the bitrot daemon. It seems to be >>> that the bitrot signer is signing files depending on file size. I copied >>> files with different sizes into a volume and I was wonderung because the >>> files get their signature not the same time (I keep the expiry time default >>> with 120). Here are some examples: >>> >>> 300 KB file ~2-3 m >>> 70 MB file ~ 40 m >>> 115 MB file ~ 1 Sh >>> 800 MB file ~ 4,5 h >>> >>> What is the expected behaviour here? >>> Why does it take so long to sign a 800MB file? >>> What about 500GB or 1TB? >>> Is there a way to speed up the sign process? >>> >>> My ambition is to understand this observation >>> >>> Regards >>> David Spisla >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Fri Mar 1 14:00:33 2019 From: spisla80 at gmail.com (David Spisla) Date: Fri, 1 Mar 2019 15:00:33 +0100 Subject: [Gluster-users] Bitrot: Time of signing depending on the file size??? In-Reply-To: References: Message-ID: Hello Amudhan, Am Fr., 1. M?rz 2019 um 14:56 Uhr schrieb Amudhan P : > Hi David, > > Once after file write completes (fd closed) bitrot process will wait for > 120 seconds and if there no fd is opened for the file it will trigger the > signer process. > Yes, I already know this. But there is still the problem that a fd will no closed after open() as bigger s file is. See link to discussion. > > considering the signer, process start and end time file read speed was < > 250KB/s. > How can I measure this? > > To increase bitrot signer read speed need to modify the bitrot source file. > How can I do that? Regards David > > regards > Amudhan > > On Fri, Mar 1, 2019 at 5:49 PM David Spisla wrote: > >> Hello Amudhan, >> >> What does exactly mean "it takes < 250KB/s"? >> I figured out this discussion between you and Kotresh: >> https://lists.gluster.org/pipermail/gluster-users/2016-September/028354.html >> Kotresh mentioned there that the problem is because for some files fd >> process are still up in the brick process list. Bitrot signer can only sign >> a file if the fd is closed. And according to my observations it seems to be >> that as bigger a file is as longer the fd is still up. I could verify this >> with a 500MiB file and some smaller files. After a specific time only the >> fd for the 500MiB was up and the file still had no signature, for the >> smaller files there were no fds and they already had a signature. >> >> Does anybody know what is the reason for this? For me it looks loke a >> bug. >> >> Regards >> David >> >> Am Fr., 1. M?rz 2019 um 08:58 Uhr schrieb Amudhan P > >: >> >>> Hi David, >>> >>> I have also tested the bitrot signature process by default it takes < >>> 250 KB/s. >>> >>> regards >>> Amudhan P >>> >>> >>> On Fri, Mar 1, 2019 at 1:19 PM David Spisla wrote: >>> >>>> Hello folks, >>>> >>>> I did some observations concerning the bitrot daemon. It seems to be >>>> that the bitrot signer is signing files depending on file size. I copied >>>> files with different sizes into a volume and I was wonderung because the >>>> files get their signature not the same time (I keep the expiry time default >>>> with 120). Here are some examples: >>>> >>>> 300 KB file ~2-3 m >>>> 70 MB file ~ 40 m >>>> 115 MB file ~ 1 Sh >>>> 800 MB file ~ 4,5 h >>>> >>>> What is the expected behaviour here? >>>> Why does it take so long to sign a 800MB file? >>>> What about 500GB or 1TB? >>>> Is there a way to speed up the sign process? >>>> >>>> My ambition is to understand this observation >>>> >>>> Regards >>>> David Spisla >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From khiremat at redhat.com Fri Mar 1 17:29:01 2019 From: khiremat at redhat.com (Kotresh Hiremath Ravishankar) Date: Fri, 1 Mar 2019 22:59:01 +0530 Subject: [Gluster-users] [Gluster-devel] Bitrot: Time of signing depending on the file size??? In-Reply-To: References: Message-ID: Interesting observation! But as discussed in the thread bitrot signing processes depends 2 min timeout (by default) after last fd closes. It doesn't have any co-relation with the size of the file. Did you happen to verify that the fd was still open for large files for some reason? On Fri, Mar 1, 2019 at 1:19 PM David Spisla wrote: > Hello folks, > > I did some observations concerning the bitrot daemon. It seems to be that > the bitrot signer is signing files depending on file size. I copied files > with different sizes into a volume and I was wonderung because the files > get their signature not the same time (I keep the expiry time default with > 120). Here are some examples: > > 300 KB file ~2-3 m > 70 MB file ~ 40 m > 115 MB file ~ 1 Sh > 800 MB file ~ 4,5 h > > What is the expected behaviour here? > Why does it take so long to sign a 800MB file? > What about 500GB or 1TB? > Is there a way to speed up the sign process? > > My ambition is to understand this observation > > Regards > David Spisla > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel -- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: From dijuremo at gmail.com Fri Mar 1 22:09:14 2019 From: dijuremo at gmail.com (Diego Remolina) Date: Fri, 1 Mar 2019 17:09:14 -0500 Subject: [Gluster-users] Gluster eating up a lot of ram Message-ID: I am using glusterfs with two servers as a file server sharing files via samba and ctdb. I cannot use samba vfs gluster plugin, due to bug in current Centos version of samba. So I am mounting via fuse and exporting the volume to samba from the mount point. Upon initial boot, the server where samba is exporting files climbs up to ~10GB RAM within a couple hours of use. From then on, it is a constant slow memory increase. In the past with gluster 3.8.x we had to reboot the servers at around 30 days . With gluster 4.1.6 we are getting up to 48 days, but RAM use is at 48GB out of 64GB. Is this normal? The particular versions are below, [root at ysmha01 home]# uptime 16:59:39 up 48 days, 9:56, 1 user, load average: 3.75, 3.17, 3.00 [root at ysmha01 home]# rpm -qa | grep gluster centos-release-gluster41-1.0-3.el7.centos.noarch glusterfs-server-4.1.6-1.el7.x86_64 glusterfs-api-4.1.6-1.el7.x86_64 centos-release-gluster-legacy-4.0-2.el7.centos.noarch glusterfs-4.1.6-1.el7.x86_64 glusterfs-client-xlators-4.1.6-1.el7.x86_64 libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.8.x86_64 glusterfs-fuse-4.1.6-1.el7.x86_64 glusterfs-libs-4.1.6-1.el7.x86_64 glusterfs-rdma-4.1.6-1.el7.x86_64 glusterfs-cli-4.1.6-1.el7.x86_64 samba-vfs-glusterfs-4.8.3-4.el7.x86_64 [root at ysmha01 home]# rpm -qa | grep samba samba-common-tools-4.8.3-4.el7.x86_64 samba-client-libs-4.8.3-4.el7.x86_64 samba-libs-4.8.3-4.el7.x86_64 samba-4.8.3-4.el7.x86_64 samba-common-libs-4.8.3-4.el7.x86_64 samba-common-4.8.3-4.el7.noarch samba-vfs-glusterfs-4.8.3-4.el7.x86_64 [root at ysmha01 home]# cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core) RAM view using top Tasks: 398 total, 1 running, 397 sleeping, 0 stopped, 0 zombie %Cpu(s): 7.0 us, 9.3 sy, 1.7 ni, 71.6 id, 9.7 wa, 0.0 hi, 0.8 si, 0.0 st KiB Mem : 65772000 total, 1851344 free, 60487404 used, 3433252 buff/cache KiB Swap: 0 total, 0 free, 0 used. 3134316 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9953 root 20 0 3727912 946496 3196 S 150.2 1.4 38626:27 glusterfsd 9634 root 20 0 48.1g 47.2g 3184 S 96.3 75.3 29513:55 glusterfs 14485 root 20 0 3404140 63780 2052 S 80.7 0.1 1590:13 glusterfs [root at ysmha01 ~]# gluster v status export Status of volume: export Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.0.1.7:/bricks/hdds/brick 49157 0 Y 13986 Brick 10.0.1.6:/bricks/hdds/brick 49153 0 Y 9953 Self-heal Daemon on localhost N/A N/A Y 14485 Self-heal Daemon on 10.0.1.7 N/A N/A Y 21934 Self-heal Daemon on 10.0.1.5 N/A N/A Y 4598 Task Status of Volume export ------------------------------------------------------------------------------ There are no active volume tasks Virus-free. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgurusid at redhat.com Sat Mar 2 04:07:33 2019 From: pgurusid at redhat.com (Poornima Gurusiddaiah) Date: Sat, 2 Mar 2019 09:37:33 +0530 Subject: [Gluster-users] Gluster eating up a lot of ram In-Reply-To: References: Message-ID: This high memory consumption is not normal. Looks like it's a memory leak. Is it possible to try it on test setup with gluster-6rc? What is the kind of workload that goes into fuse mount? Large files or small files? We need the following information to debug further: - Gluster volume info output - Statedump of the Gluster fuse mount process consuming 44G ram. Regards, Poornima On Sat, Mar 2, 2019, 3:40 AM Diego Remolina wrote: > I am using glusterfs with two servers as a file server sharing files via > samba and ctdb. I cannot use samba vfs gluster plugin, due to bug in > current Centos version of samba. So I am mounting via fuse and exporting > the volume to samba from the mount point. > > Upon initial boot, the server where samba is exporting files climbs up to > ~10GB RAM within a couple hours of use. From then on, it is a constant slow > memory increase. In the past with gluster 3.8.x we had to reboot the > servers at around 30 days . With gluster 4.1.6 we are getting up to 48 > days, but RAM use is at 48GB out of 64GB. Is this normal? > > The particular versions are below, > > [root at ysmha01 home]# uptime > 16:59:39 up 48 days, 9:56, 1 user, load average: 3.75, 3.17, 3.00 > [root at ysmha01 home]# rpm -qa | grep gluster > centos-release-gluster41-1.0-3.el7.centos.noarch > glusterfs-server-4.1.6-1.el7.x86_64 > glusterfs-api-4.1.6-1.el7.x86_64 > centos-release-gluster-legacy-4.0-2.el7.centos.noarch > glusterfs-4.1.6-1.el7.x86_64 > glusterfs-client-xlators-4.1.6-1.el7.x86_64 > libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.8.x86_64 > glusterfs-fuse-4.1.6-1.el7.x86_64 > glusterfs-libs-4.1.6-1.el7.x86_64 > glusterfs-rdma-4.1.6-1.el7.x86_64 > glusterfs-cli-4.1.6-1.el7.x86_64 > samba-vfs-glusterfs-4.8.3-4.el7.x86_64 > [root at ysmha01 home]# rpm -qa | grep samba > samba-common-tools-4.8.3-4.el7.x86_64 > samba-client-libs-4.8.3-4.el7.x86_64 > samba-libs-4.8.3-4.el7.x86_64 > samba-4.8.3-4.el7.x86_64 > samba-common-libs-4.8.3-4.el7.x86_64 > samba-common-4.8.3-4.el7.noarch > samba-vfs-glusterfs-4.8.3-4.el7.x86_64 > [root at ysmha01 home]# cat /etc/redhat-release > CentOS Linux release 7.6.1810 (Core) > > RAM view using top > Tasks: 398 total, 1 running, 397 sleeping, 0 stopped, 0 zombie > %Cpu(s): 7.0 us, 9.3 sy, 1.7 ni, 71.6 id, 9.7 wa, 0.0 hi, 0.8 si, > 0.0 st > KiB Mem : 65772000 total, 1851344 free, 60487404 used, 3433252 buff/cache > KiB Swap: 0 total, 0 free, 0 used. 3134316 avail Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 9953 root 20 0 3727912 946496 3196 S 150.2 1.4 38626:27 > glusterfsd > 9634 root 20 0 48.1g 47.2g 3184 S 96.3 75.3 29513:55 > glusterfs > 14485 root 20 0 3404140 63780 2052 S 80.7 0.1 1590:13 > glusterfs > > [root at ysmha01 ~]# gluster v status export > Status of volume: export > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick 10.0.1.7:/bricks/hdds/brick 49157 0 Y > 13986 > Brick 10.0.1.6:/bricks/hdds/brick 49153 0 Y > 9953 > Self-heal Daemon on localhost N/A N/A Y > 14485 > Self-heal Daemon on 10.0.1.7 N/A N/A Y > 21934 > Self-heal Daemon on 10.0.1.5 N/A N/A Y > 4598 > > Task Status of Volume export > > ------------------------------------------------------------------------------ > There are no active volume tasks > > > > > > Virus-free. > www.avast.com > > <#m_5816452762692804512_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From dijuremo at gmail.com Sun Mar 3 18:53:34 2019 From: dijuremo at gmail.com (Diego Remolina) Date: Sun, 3 Mar 2019 13:53:34 -0500 Subject: [Gluster-users] Gluster eating up a lot of ram In-Reply-To: References: Message-ID: Hi, I will not be able to test gluster-6rc because this is a production environment and it takes several days for memory to grow a lot. The Samba server is hosting all types of files, small and large from small roaming profile type files to bigger files like adobe suite, autodesk Revit (file sizes in the hundreds of megabytes). As I stated before, this same issue was present back with 3.8.x which I was running before. The information you requested: [root at ysmha02 ~]# gluster v info export Volume Name: export Type: Replicate Volume ID: b4353b3f-6ef6-4813-819a-8e85e5a95cff Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.0.1.7:/bricks/hdds/brick Brick2: 10.0.1.6:/bricks/hdds/brick Options Reconfigured: performance.stat-prefetch: on performance.cache-min-file-size: 0 network.inode-lru-limit: 65536 performance.cache-invalidation: on features.cache-invalidation: on performance.md-cache-timeout: 600 features.cache-invalidation-timeout: 600 performance.cache-samba-metadata: on transport.address-family: inet server.allow-insecure: on performance.cache-size: 10GB cluster.server-quorum-type: server nfs.disable: on performance.io-thread-count: 64 performance.io-cache: on cluster.lookup-optimize: on cluster.readdir-optimize: on server.event-threads: 5 client.event-threads: 5 performance.cache-max-file-size: 256MB diagnostics.client-log-level: INFO diagnostics.brick-log-level: INFO cluster.server-quorum-ratio: 51% Virus-free. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> On Fri, Mar 1, 2019 at 11:07 PM Poornima Gurusiddaiah wrote: > This high memory consumption is not normal. Looks like it's a memory leak. > Is it possible to try it on test setup with gluster-6rc? What is the kind > of workload that goes into fuse mount? Large files or small files? We need > the following information to debug further: > - Gluster volume info output > - Statedump of the Gluster fuse mount process consuming 44G ram. > > Regards, > Poornima > > > On Sat, Mar 2, 2019, 3:40 AM Diego Remolina wrote: > >> I am using glusterfs with two servers as a file server sharing files via >> samba and ctdb. I cannot use samba vfs gluster plugin, due to bug in >> current Centos version of samba. So I am mounting via fuse and exporting >> the volume to samba from the mount point. >> >> Upon initial boot, the server where samba is exporting files climbs up to >> ~10GB RAM within a couple hours of use. From then on, it is a constant slow >> memory increase. In the past with gluster 3.8.x we had to reboot the >> servers at around 30 days . With gluster 4.1.6 we are getting up to 48 >> days, but RAM use is at 48GB out of 64GB. Is this normal? >> >> The particular versions are below, >> >> [root at ysmha01 home]# uptime >> 16:59:39 up 48 days, 9:56, 1 user, load average: 3.75, 3.17, 3.00 >> [root at ysmha01 home]# rpm -qa | grep gluster >> centos-release-gluster41-1.0-3.el7.centos.noarch >> glusterfs-server-4.1.6-1.el7.x86_64 >> glusterfs-api-4.1.6-1.el7.x86_64 >> centos-release-gluster-legacy-4.0-2.el7.centos.noarch >> glusterfs-4.1.6-1.el7.x86_64 >> glusterfs-client-xlators-4.1.6-1.el7.x86_64 >> libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.8.x86_64 >> glusterfs-fuse-4.1.6-1.el7.x86_64 >> glusterfs-libs-4.1.6-1.el7.x86_64 >> glusterfs-rdma-4.1.6-1.el7.x86_64 >> glusterfs-cli-4.1.6-1.el7.x86_64 >> samba-vfs-glusterfs-4.8.3-4.el7.x86_64 >> [root at ysmha01 home]# rpm -qa | grep samba >> samba-common-tools-4.8.3-4.el7.x86_64 >> samba-client-libs-4.8.3-4.el7.x86_64 >> samba-libs-4.8.3-4.el7.x86_64 >> samba-4.8.3-4.el7.x86_64 >> samba-common-libs-4.8.3-4.el7.x86_64 >> samba-common-4.8.3-4.el7.noarch >> samba-vfs-glusterfs-4.8.3-4.el7.x86_64 >> [root at ysmha01 home]# cat /etc/redhat-release >> CentOS Linux release 7.6.1810 (Core) >> >> RAM view using top >> Tasks: 398 total, 1 running, 397 sleeping, 0 stopped, 0 zombie >> %Cpu(s): 7.0 us, 9.3 sy, 1.7 ni, 71.6 id, 9.7 wa, 0.0 hi, 0.8 si, >> 0.0 st >> KiB Mem : 65772000 total, 1851344 free, 60487404 used, 3433252 >> buff/cache >> KiB Swap: 0 total, 0 free, 0 used. 3134316 avail Mem >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >> COMMAND >> 9953 root 20 0 3727912 946496 3196 S 150.2 1.4 38626:27 >> glusterfsd >> 9634 root 20 0 48.1g 47.2g 3184 S 96.3 75.3 29513:55 >> glusterfs >> 14485 root 20 0 3404140 63780 2052 S 80.7 0.1 1590:13 >> glusterfs >> >> [root at ysmha01 ~]# gluster v status export >> Status of volume: export >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick 10.0.1.7:/bricks/hdds/brick 49157 0 Y >> 13986 >> Brick 10.0.1.6:/bricks/hdds/brick 49153 0 Y >> 9953 >> Self-heal Daemon on localhost N/A N/A Y >> 14485 >> Self-heal Daemon on 10.0.1.7 N/A N/A Y >> 21934 >> Self-heal Daemon on 10.0.1.5 N/A N/A Y >> 4598 >> >> Task Status of Volume export >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> >> >> >> >> Virus-free. >> www.avast.com >> >> <#m_1092070095161815064_m_5816452762692804512_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Mon Mar 4 09:43:43 2019 From: spisla80 at gmail.com (David Spisla) Date: Mon, 4 Mar 2019 10:43:43 +0100 Subject: [Gluster-users] [Gluster-devel] Bitrot: Time of signing depending on the file size??? In-Reply-To: References: Message-ID: Hello Kotresh, Yes, the fd was still open for larger files. I could verify this with a 500MiB file and some smaller files. After a specific time only the fd for the 500MiB was up and the file still had no signature, for the smaller files there were no fds and they already had a signature. I don't know the reason for this. Maybe the client still keep th fd open? I opened a bug for this: https://bugzilla.redhat.com/show_bug.cgi?id=1685023 Regards David Am Fr., 1. M?rz 2019 um 18:29 Uhr schrieb Kotresh Hiremath Ravishankar < khiremat at redhat.com>: > Interesting observation! But as discussed in the thread bitrot signing > processes depends 2 min timeout (by default) after last fd closes. It > doesn't have any co-relation with the size of the file. > Did you happen to verify that the fd was still open for large files for > some reason? > > > > On Fri, Mar 1, 2019 at 1:19 PM David Spisla wrote: > >> Hello folks, >> >> I did some observations concerning the bitrot daemon. It seems to be that >> the bitrot signer is signing files depending on file size. I copied files >> with different sizes into a volume and I was wonderung because the files >> get their signature not the same time (I keep the expiry time default with >> 120). Here are some examples: >> >> 300 KB file ~2-3 m >> 70 MB file ~ 40 m >> 115 MB file ~ 1 Sh >> 800 MB file ~ 4,5 h >> >> What is the expected behaviour here? >> Why does it take so long to sign a 800MB file? >> What about 500GB or 1TB? >> Is there a way to speed up the sign process? >> >> My ambition is to understand this observation >> >> Regards >> David Spisla >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-devel > > > > -- > Thanks and Regards, > Kotresh H R > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgurusid at redhat.com Mon Mar 4 10:07:13 2019 From: pgurusid at redhat.com (Poornima Gurusiddaiah) Date: Mon, 4 Mar 2019 15:37:13 +0530 Subject: [Gluster-users] Gluster eating up a lot of ram In-Reply-To: References: Message-ID: Could you also provide the statedump of the gluster process consuming 44G ram [1]. Please make sure the statedump is taken when the memory consumption is very high, like 10s of GBs, otherwise we may not be able to identify the issue. Also i see that the cache size is 10G is that something you arrived at, after doing some tests? Its relatively higher than normal. [1] https://docs.gluster.org/en/v3/Troubleshooting/statedump/#generate-a-statedump On Mon, Mar 4, 2019 at 12:23 AM Diego Remolina wrote: > Hi, > > I will not be able to test gluster-6rc because this is a production > environment and it takes several days for memory to grow a lot. > > The Samba server is hosting all types of files, small and large from small > roaming profile type files to bigger files like adobe suite, autodesk Revit > (file sizes in the hundreds of megabytes). > > As I stated before, this same issue was present back with 3.8.x which I > was running before. > > The information you requested: > > [root at ysmha02 ~]# gluster v info export > > Volume Name: export > Type: Replicate > Volume ID: b4353b3f-6ef6-4813-819a-8e85e5a95cff > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 10.0.1.7:/bricks/hdds/brick > Brick2: 10.0.1.6:/bricks/hdds/brick > Options Reconfigured: > performance.stat-prefetch: on > performance.cache-min-file-size: 0 > network.inode-lru-limit: 65536 > performance.cache-invalidation: on > features.cache-invalidation: on > performance.md-cache-timeout: 600 > features.cache-invalidation-timeout: 600 > performance.cache-samba-metadata: on > transport.address-family: inet > server.allow-insecure: on > performance.cache-size: 10GB > cluster.server-quorum-type: server > nfs.disable: on > performance.io-thread-count: 64 > performance.io-cache: on > cluster.lookup-optimize: on > cluster.readdir-optimize: on > server.event-threads: 5 > client.event-threads: 5 > performance.cache-max-file-size: 256MB > diagnostics.client-log-level: INFO > diagnostics.brick-log-level: INFO > cluster.server-quorum-ratio: 51% > > > > > > > > Virus-free. > www.avast.com > > <#m_-4429654867678350131_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > On Fri, Mar 1, 2019 at 11:07 PM Poornima Gurusiddaiah > wrote: > >> This high memory consumption is not normal. Looks like it's a memory >> leak. Is it possible to try it on test setup with gluster-6rc? What is the >> kind of workload that goes into fuse mount? Large files or small files? We >> need the following information to debug further: >> - Gluster volume info output >> - Statedump of the Gluster fuse mount process consuming 44G ram. >> >> Regards, >> Poornima >> >> >> On Sat, Mar 2, 2019, 3:40 AM Diego Remolina wrote: >> >>> I am using glusterfs with two servers as a file server sharing files via >>> samba and ctdb. I cannot use samba vfs gluster plugin, due to bug in >>> current Centos version of samba. So I am mounting via fuse and exporting >>> the volume to samba from the mount point. >>> >>> Upon initial boot, the server where samba is exporting files climbs up >>> to ~10GB RAM within a couple hours of use. From then on, it is a constant >>> slow memory increase. In the past with gluster 3.8.x we had to reboot the >>> servers at around 30 days . With gluster 4.1.6 we are getting up to 48 >>> days, but RAM use is at 48GB out of 64GB. Is this normal? >>> >>> The particular versions are below, >>> >>> [root at ysmha01 home]# uptime >>> 16:59:39 up 48 days, 9:56, 1 user, load average: 3.75, 3.17, 3.00 >>> [root at ysmha01 home]# rpm -qa | grep gluster >>> centos-release-gluster41-1.0-3.el7.centos.noarch >>> glusterfs-server-4.1.6-1.el7.x86_64 >>> glusterfs-api-4.1.6-1.el7.x86_64 >>> centos-release-gluster-legacy-4.0-2.el7.centos.noarch >>> glusterfs-4.1.6-1.el7.x86_64 >>> glusterfs-client-xlators-4.1.6-1.el7.x86_64 >>> libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.8.x86_64 >>> glusterfs-fuse-4.1.6-1.el7.x86_64 >>> glusterfs-libs-4.1.6-1.el7.x86_64 >>> glusterfs-rdma-4.1.6-1.el7.x86_64 >>> glusterfs-cli-4.1.6-1.el7.x86_64 >>> samba-vfs-glusterfs-4.8.3-4.el7.x86_64 >>> [root at ysmha01 home]# rpm -qa | grep samba >>> samba-common-tools-4.8.3-4.el7.x86_64 >>> samba-client-libs-4.8.3-4.el7.x86_64 >>> samba-libs-4.8.3-4.el7.x86_64 >>> samba-4.8.3-4.el7.x86_64 >>> samba-common-libs-4.8.3-4.el7.x86_64 >>> samba-common-4.8.3-4.el7.noarch >>> samba-vfs-glusterfs-4.8.3-4.el7.x86_64 >>> [root at ysmha01 home]# cat /etc/redhat-release >>> CentOS Linux release 7.6.1810 (Core) >>> >>> RAM view using top >>> Tasks: 398 total, 1 running, 397 sleeping, 0 stopped, 0 zombie >>> %Cpu(s): 7.0 us, 9.3 sy, 1.7 ni, 71.6 id, 9.7 wa, 0.0 hi, 0.8 si, >>> 0.0 st >>> KiB Mem : 65772000 total, 1851344 free, 60487404 used, 3433252 >>> buff/cache >>> KiB Swap: 0 total, 0 free, 0 used. 3134316 avail >>> Mem >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >>> COMMAND >>> 9953 root 20 0 3727912 946496 3196 S 150.2 1.4 38626:27 >>> glusterfsd >>> 9634 root 20 0 48.1g 47.2g 3184 S 96.3 75.3 29513:55 >>> glusterfs >>> 14485 root 20 0 3404140 63780 2052 S 80.7 0.1 1590:13 >>> glusterfs >>> >>> [root at ysmha01 ~]# gluster v status export >>> Status of volume: export >>> Gluster process TCP Port RDMA Port Online >>> Pid >>> >>> ------------------------------------------------------------------------------ >>> Brick 10.0.1.7:/bricks/hdds/brick 49157 0 Y >>> 13986 >>> Brick 10.0.1.6:/bricks/hdds/brick 49153 0 Y >>> 9953 >>> Self-heal Daemon on localhost N/A N/A Y >>> 14485 >>> Self-heal Daemon on 10.0.1.7 N/A N/A Y >>> 21934 >>> Self-heal Daemon on 10.0.1.5 N/A N/A Y >>> 4598 >>> >>> Task Status of Volume export >>> >>> ------------------------------------------------------------------------------ >>> There are no active volume tasks >>> >>> >>> >>> >>> >>> Virus-free. >>> www.avast.com >>> >>> <#m_-4429654867678350131_m_1092070095161815064_m_5816452762692804512_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Mon Mar 4 10:09:36 2019 From: revirii at googlemail.com (Hu Bert) Date: Mon, 4 Mar 2019 11:09:36 +0100 Subject: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters Message-ID: Good morning, we use gluster v5.3 (replicate with 3 servers, 2 volumes, raid10 as brick) with at the moment 10 clients; 3 of them do heavy I/O operations (apache tomcats, read+write of (small) images). These 3 clients have a quite high I/O wait (stats from yesterday) as can be seen here: client: https://abload.de/img/client1-cpu-dayulkza.png server: https://abload.de/img/server1-cpu-dayayjdq.png The iowait in the graphics differ a lot. I checked netstat for the different clients; the other clients have 8 open connections: https://pastebin.com/bSN5fXwc 4 for each server and each volume. The 3 clients with the heavy I/O have (at the moment) according to netstat 170, 139 and 153 connections. An example for one client can be found here: https://pastebin.com/2zfWXASZ gluster volume info: https://pastebin.com/13LXPhmd gluster volume status: https://pastebin.com/cYFnWjUJ I just was wondering if the iowait is based on the clients and their workflow: requesting a lot of files (up to hundreds per second), opening a lot of connections and the servers aren't able to answer properly. Maybe something can be tuned here? Especially the server|client.event-threads (both set to 4) and performance.(high|normal|low|least)-prio-threads (all at default value 16) and performance.io-thread-count (32) options, maybe these aren't properly configured for up to 170 client connections. Both servers and clients have a Xeon CPU (6 cores, 12 threads), a 10 GBit connection and 128G (servers) respectively 256G (clients) RAM. Enough power :-) Thx for reading && best regards, Hubert From rgowdapp at redhat.com Mon Mar 4 10:31:04 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Mon, 4 Mar 2019 16:01:04 +0530 Subject: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters In-Reply-To: References: Message-ID: what is the per thread CPU usage like on these clients? With highly concurrent workloads we've seen single thread that reads requests from /dev/fuse (fuse reader thread) becoming bottleneck. Would like to know what is the cpu usage of this thread looks like (you can use top -H). On Mon, Mar 4, 2019 at 3:39 PM Hu Bert wrote: > Good morning, > > we use gluster v5.3 (replicate with 3 servers, 2 volumes, raid10 as > brick) with at the moment 10 clients; 3 of them do heavy I/O > operations (apache tomcats, read+write of (small) images). These 3 > clients have a quite high I/O wait (stats from yesterday) as can be > seen here: > > client: https://abload.de/img/client1-cpu-dayulkza.png > server: https://abload.de/img/server1-cpu-dayayjdq.png > > The iowait in the graphics differ a lot. I checked netstat for the > different clients; the other clients have 8 open connections: > https://pastebin.com/bSN5fXwc > > 4 for each server and each volume. The 3 clients with the heavy I/O > have (at the moment) according to netstat 170, 139 and 153 > connections. An example for one client can be found here: > https://pastebin.com/2zfWXASZ > > gluster volume info: https://pastebin.com/13LXPhmd > gluster volume status: https://pastebin.com/cYFnWjUJ > > I just was wondering if the iowait is based on the clients and their > workflow: requesting a lot of files (up to hundreds per second), > opening a lot of connections and the servers aren't able to answer > properly. Maybe something can be tuned here? > > Especially the server|client.event-threads (both set to 4) and > performance.(high|normal|low|least)-prio-threads (all at default value > 16) and performance.io-thread-count (32) options, maybe these aren't > properly configured for up to 170 client connections. > > Both servers and clients have a Xeon CPU (6 cores, 12 threads), a 10 > GBit connection and 128G (servers) respectively 256G (clients) RAM. > Enough power :-) > > > Thx for reading && best regards, > > Hubert > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Mon Mar 4 10:56:35 2019 From: revirii at googlemail.com (Hu Bert) Date: Mon, 4 Mar 2019 11:56:35 +0100 Subject: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters In-Reply-To: References: Message-ID: Hi Raghavendra, at the moment iowait and cpu consumption is quite low, the main problems appear during the weekend (high traffic, especially on sunday), so either we have to wait until next sunday or use a time machine ;-) I made a screenshot of top (https://abload.de/img/top-hvvjt2.jpg) and a text output (https://pastebin.com/TkTWnqxt), maybe that helps. Seems like processes like glfs_fuseproc (>204h) and glfs_epoll (64h for each process) consume a lot of CPU (uptime 24 days). Is that already helpful? Hubert Am Mo., 4. M?rz 2019 um 11:31 Uhr schrieb Raghavendra Gowdappa : > > what is the per thread CPU usage like on these clients? With highly concurrent workloads we've seen single thread that reads requests from /dev/fuse (fuse reader thread) becoming bottleneck. Would like to know what is the cpu usage of this thread looks like (you can use top -H). > > On Mon, Mar 4, 2019 at 3:39 PM Hu Bert wrote: >> >> Good morning, >> >> we use gluster v5.3 (replicate with 3 servers, 2 volumes, raid10 as >> brick) with at the moment 10 clients; 3 of them do heavy I/O >> operations (apache tomcats, read+write of (small) images). These 3 >> clients have a quite high I/O wait (stats from yesterday) as can be >> seen here: >> >> client: https://abload.de/img/client1-cpu-dayulkza.png >> server: https://abload.de/img/server1-cpu-dayayjdq.png >> >> The iowait in the graphics differ a lot. I checked netstat for the >> different clients; the other clients have 8 open connections: >> https://pastebin.com/bSN5fXwc >> >> 4 for each server and each volume. The 3 clients with the heavy I/O >> have (at the moment) according to netstat 170, 139 and 153 >> connections. An example for one client can be found here: >> https://pastebin.com/2zfWXASZ >> >> gluster volume info: https://pastebin.com/13LXPhmd >> gluster volume status: https://pastebin.com/cYFnWjUJ >> >> I just was wondering if the iowait is based on the clients and their >> workflow: requesting a lot of files (up to hundreds per second), >> opening a lot of connections and the servers aren't able to answer >> properly. Maybe something can be tuned here? >> >> Especially the server|client.event-threads (both set to 4) and >> performance.(high|normal|low|least)-prio-threads (all at default value >> 16) and performance.io-thread-count (32) options, maybe these aren't >> properly configured for up to 170 client connections. >> >> Both servers and clients have a Xeon CPU (6 cores, 12 threads), a 10 >> GBit connection and 128G (servers) respectively 256G (clients) RAM. >> Enough power :-) >> >> >> Thx for reading && best regards, >> >> Hubert >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users From alberto at bengoa.com.br Mon Mar 4 13:57:11 2019 From: alberto at bengoa.com.br (Alberto Bengoa) Date: Mon, 4 Mar 2019 13:57:11 +0000 Subject: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters In-Reply-To: References: Message-ID: Hello Hubert, On Mon, 4 Mar 2019 at 10:56, Hu Bert wrote: > Hi Raghavendra, > > at the moment iowait and cpu consumption is quite low, the main > problems appear during the weekend (high traffic, especially on > sunday), so either we have to wait until next sunday or use a time > machine ;-) > > Check if your high IO Wait is not related to high network traffic. We had to left 5.3 version due this issue[1]: [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1673058 Cheers, Alberto -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Mon Mar 4 14:17:25 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Mon, 4 Mar 2019 19:47:25 +0530 Subject: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters In-Reply-To: References: Message-ID: On Mon, Mar 4, 2019 at 4:26 PM Hu Bert wrote: > Hi Raghavendra, > > at the moment iowait and cpu consumption is quite low, the main > problems appear during the weekend (high traffic, especially on > sunday), so either we have to wait until next sunday or use a time > machine ;-) > > I made a screenshot of top (https://abload.de/img/top-hvvjt2.jpg) and > a text output (https://pastebin.com/TkTWnqxt), maybe that helps. Seems > like processes like glfs_fuseproc (>204h) and glfs_epoll (64h for each > process) consume a lot of CPU (uptime 24 days). Is that already > helpful? > Not much. The TIME field just says the amount of time the thread has been executing. Since its a long standing mount, we can expect such large values. But, the value itself doesn't indicate whether the thread itself was overloaded at any (some) interval(s). Can you please collect output of following command and send back the collected data? # top -bHd 3 > top.output > > Hubert > > Am Mo., 4. M?rz 2019 um 11:31 Uhr schrieb Raghavendra Gowdappa > : > > > > what is the per thread CPU usage like on these clients? With highly > concurrent workloads we've seen single thread that reads requests from > /dev/fuse (fuse reader thread) becoming bottleneck. Would like to know what > is the cpu usage of this thread looks like (you can use top -H). > > > > On Mon, Mar 4, 2019 at 3:39 PM Hu Bert wrote: > >> > >> Good morning, > >> > >> we use gluster v5.3 (replicate with 3 servers, 2 volumes, raid10 as > >> brick) with at the moment 10 clients; 3 of them do heavy I/O > >> operations (apache tomcats, read+write of (small) images). These 3 > >> clients have a quite high I/O wait (stats from yesterday) as can be > >> seen here: > >> > >> client: https://abload.de/img/client1-cpu-dayulkza.png > >> server: https://abload.de/img/server1-cpu-dayayjdq.png > >> > >> The iowait in the graphics differ a lot. I checked netstat for the > >> different clients; the other clients have 8 open connections: > >> https://pastebin.com/bSN5fXwc > >> > >> 4 for each server and each volume. The 3 clients with the heavy I/O > >> have (at the moment) according to netstat 170, 139 and 153 > >> connections. An example for one client can be found here: > >> https://pastebin.com/2zfWXASZ > >> > >> gluster volume info: https://pastebin.com/13LXPhmd > >> gluster volume status: https://pastebin.com/cYFnWjUJ > >> > >> I just was wondering if the iowait is based on the clients and their > >> workflow: requesting a lot of files (up to hundreds per second), > >> opening a lot of connections and the servers aren't able to answer > >> properly. Maybe something can be tuned here? > >> > >> Especially the server|client.event-threads (both set to 4) and > >> performance.(high|normal|low|least)-prio-threads (all at default value > >> 16) and performance.io-thread-count (32) options, maybe these aren't > >> properly configured for up to 170 client connections. > >> > >> Both servers and clients have a Xeon CPU (6 cores, 12 threads), a 10 > >> GBit connection and 128G (servers) respectively 256G (clients) RAM. > >> Enough power :-) > >> > >> > >> Thx for reading && best regards, > >> > >> Hubert > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Mon Mar 4 14:21:13 2019 From: revirii at googlemail.com (Hu Bert) Date: Mon, 4 Mar 2019 15:21:13 +0100 Subject: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters In-Reply-To: References: Message-ID: Hi Alberto, wow, good hint! We switched from old servers with version 4.1.6 to new servers (fresh install) with version 5.3 on february 5th. I saw that there was more network traffic on server side, but didn't watch it on client side - the traffic went up significantly on both sides, from about 20-40 MBit/s up to 200 MBit/s, on server side from about 20-40 MBit/s up to 500 MBit/s. Here's a screenshot of the munin graphs: network traffic on high iowait client: https://abload.de/img/client-eth1-traffic76j4i.jpg network traffic on old servers: https://abload.de/img/oldservers-eth1nejzt.jpg network traffic on new servers: https://abload.de/img/newservers-eth17ojkf.jpg Don't know if that's related to our iowait problem, maybe only a correlation. But we see the same high network traffic with v5.3. Thx, Hubert Am Mo., 4. M?rz 2019 um 14:57 Uhr schrieb Alberto Bengoa : > > Hello Hubert, > > On Mon, 4 Mar 2019 at 10:56, Hu Bert wrote: >> >> Hi Raghavendra, >> >> at the moment iowait and cpu consumption is quite low, the main >> problems appear during the weekend (high traffic, especially on >> sunday), so either we have to wait until next sunday or use a time >> machine ;-) >> > > Check if your high IO Wait is not related to high network traffic. We had to left 5.3 version due this issue[1]: > > [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1673058 > > Cheers, > Alberto From rgowdapp at redhat.com Mon Mar 4 14:21:43 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Mon, 4 Mar 2019 19:51:43 +0530 Subject: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters In-Reply-To: References: Message-ID: On Mon, Mar 4, 2019 at 7:47 PM Raghavendra Gowdappa wrote: > > > On Mon, Mar 4, 2019 at 4:26 PM Hu Bert wrote: > >> Hi Raghavendra, >> >> at the moment iowait and cpu consumption is quite low, the main >> problems appear during the weekend (high traffic, especially on >> sunday), so either we have to wait until next sunday or use a time >> machine ;-) >> >> I made a screenshot of top (https://abload.de/img/top-hvvjt2.jpg) and >> a text output (https://pastebin.com/TkTWnqxt), maybe that helps. Seems >> like processes like glfs_fuseproc (>204h) and glfs_epoll (64h for each >> process) consume a lot of CPU (uptime 24 days). Is that already >> helpful? >> > > Not much. The TIME field just says the amount of time the thread has been > executing. Since its a long standing mount, we can expect such large > values. But, the value itself doesn't indicate whether the thread itself > was overloaded at any (some) interval(s). > > Can you please collect output of following command and send back the > collected data? > > # top -bHd 3 > top.output > Please collect this on problematic mounts and bricks. > >> >> Hubert >> >> Am Mo., 4. M?rz 2019 um 11:31 Uhr schrieb Raghavendra Gowdappa >> : >> > >> > what is the per thread CPU usage like on these clients? With highly >> concurrent workloads we've seen single thread that reads requests from >> /dev/fuse (fuse reader thread) becoming bottleneck. Would like to know what >> is the cpu usage of this thread looks like (you can use top -H). >> > >> > On Mon, Mar 4, 2019 at 3:39 PM Hu Bert wrote: >> >> >> >> Good morning, >> >> >> >> we use gluster v5.3 (replicate with 3 servers, 2 volumes, raid10 as >> >> brick) with at the moment 10 clients; 3 of them do heavy I/O >> >> operations (apache tomcats, read+write of (small) images). These 3 >> >> clients have a quite high I/O wait (stats from yesterday) as can be >> >> seen here: >> >> >> >> client: https://abload.de/img/client1-cpu-dayulkza.png >> >> server: https://abload.de/img/server1-cpu-dayayjdq.png >> >> >> >> The iowait in the graphics differ a lot. I checked netstat for the >> >> different clients; the other clients have 8 open connections: >> >> https://pastebin.com/bSN5fXwc >> >> >> >> 4 for each server and each volume. The 3 clients with the heavy I/O >> >> have (at the moment) according to netstat 170, 139 and 153 >> >> connections. An example for one client can be found here: >> >> https://pastebin.com/2zfWXASZ >> >> >> >> gluster volume info: https://pastebin.com/13LXPhmd >> >> gluster volume status: https://pastebin.com/cYFnWjUJ >> >> >> >> I just was wondering if the iowait is based on the clients and their >> >> workflow: requesting a lot of files (up to hundreds per second), >> >> opening a lot of connections and the servers aren't able to answer >> >> properly. Maybe something can be tuned here? >> >> >> >> Especially the server|client.event-threads (both set to 4) and >> >> performance.(high|normal|low|least)-prio-threads (all at default value >> >> 16) and performance.io-thread-count (32) options, maybe these aren't >> >> properly configured for up to 170 client connections. >> >> >> >> Both servers and clients have a Xeon CPU (6 cores, 12 threads), a 10 >> >> GBit connection and 128G (servers) respectively 256G (clients) RAM. >> >> Enough power :-) >> >> >> >> >> >> Thx for reading && best regards, >> >> >> >> Hubert >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> Gluster-users at gluster.org >> >> https://lists.gluster.org/mailman/listinfo/gluster-users >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Mon Mar 4 14:44:12 2019 From: spisla80 at gmail.com (David Spisla) Date: Mon, 4 Mar 2019 15:44:12 +0100 Subject: [Gluster-users] Questions and notes to "Simplify recovery steps of corrupted files" Message-ID: Hello Gluster Community, I have questions and notes concerning the steps mentioned in https://github.com/gluster/glusterfs/issues/491 " *2. Delete the corrupted files* ": In my experience there are two GFID files if a copy gets corrupted. Example: *$ find /gluster/brick1/glusterbrick/.glusterfs -name fc36e347-53c7-4a0a-8150-c070143d3b34/gluster/brick1/glusterbrick/.glusterfs/quarantine/fc36e347-53c7-4a0a-8150-c070143d3b34/gluster/brick1/glusterbrick/.glusterfs/fc/36/fc36e347-53c7-4a0a-8150-c070143d3b34* Both GFID files has to be deleted. If a copy is NOT corrupted, there seems to be no GFID file in *.glusterfs/quarantine . *Even one executes scub ondemand, the file is not there. The file in *.glusterfs/quarantine* occurs if one executes "scrub status". " *3. Restore the file* ": One alternatively trigger self heal manually with *gluster vo heal VOLNAME* But in my experience this is not working. One have to trigger a full heal: *gluster vo heal VOLNAME* *full* Imagine, one will restore a copy with manual self heal. It is neccesary to set some VOLUME options (stat-prefetch, dht.force-readdirp and performance.force-readdirp disabled) and mount via FUSE with some special parameters to heal the file? In my experience I do only a full heal after deleting the bad copy and the GFID files. This seems to be working. Or it is dangerous? Regards David Spisla -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Mon Mar 4 15:01:38 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Mon, 4 Mar 2019 20:31:38 +0530 Subject: [Gluster-users] GlusterFS - 6.0RC - Test days (27th, 28th Feb) In-Reply-To: References: Message-ID: Thanks to those who participated. Update at present: We found 3 blocker bugs in upgrade scenarios, and hence have marked release as pending upon them. We will keep these lists updated about progress. -Amar On Mon, Feb 25, 2019 at 11:41 PM Amar Tumballi Suryanarayan < atumball at redhat.com> wrote: > Hi all, > > We are calling out our users, and developers to contribute in validating > ?glusterfs-6.0rc? build in their usecase. Specially for the cases of > upgrade, stability, and performance. > > Some of the key highlights of the release are listed in release-notes > draft > . > Please note that there are some of the features which are being dropped out > of this release, and hence making sure your setup is not going to have an > issue is critical. Also the default lru-limit option in fuse mount for > Inodes should help to control the memory usage of client processes. All the > good reason to give it a shot in your test setup. > > If you are developer using gfapi interface to integrate with other > projects, you also have some signature changes, so please make sure your > project would work with latest release. Or even if you are using a project > which depends on gfapi, report the error with new RPMs (if any). We will > help fix it. > > As part of test days, we want to focus on testing the latest upcoming > release i.e. GlusterFS-6, and one or the other gluster volunteers would be > there on #gluster channel on freenode to assist the people. Some of the key > things we are looking as bug reports are: > > - > > See if upgrade from your current version to 6.0rc is smooth, and works > as documented. > - Report bugs in process, or in documentation if you find mismatch. > - > > Functionality is all as expected for your usecase. > - No issues with actual application you would run on production etc. > - > > Performance has not degraded in your usecase. > - While we have added some performance options to the code, not all of > them are turned on, as they have to be done based on usecases. > - Make sure the default setup is at least same as your current > version > - Try out few options mentioned in release notes (especially, > --auto-invalidation=no) and see if it helps performance. > - > > While doing all the above, check below: > - see if the log files are making sense, and not flooding with some > ?for developer only? type of messages. > - get ?profile info? output from old and now, and see if there is > anything which is out of normal expectation. Check with us on the numbers. > - get a ?statedump? when there are some issues. Try to make sense > of it, and raise a bug if you don?t understand it completely. > > > Process > expected on test days. > > - > > We have a tracker bug > [0] > - We will attach all the ?blocker? bugs to this bug. > - > > Use this link to report bugs, so that we have more metadata around > given bugzilla. > - Click Here > > [1] > - > > The test cases which are to be tested are listed here in this sheet > [2], > please add, update, and keep it up-to-date to reduce duplicate efforts. > > Lets together make this release a success. > > Also check if we covered some of the open issues from Weekly untriaged > bugs > > [3] > > For details on build and RPMs check this email > > [4] > > Finally, the dates :-) > > - Wednesday - Feb 27th, and > - Thursday - Feb 28th > > Note that our goal is to identify as many issues as possible in upgrade > and stability scenarios, and if any blockers are found, want to make sure > we release with the fix for same. So each of you, Gluster users, feel > comfortable to upgrade to 6.0 version. > > Regards, > Gluster Ants. > > -- > Amar Tumballi (amarts) > -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Mon Mar 4 15:03:18 2019 From: spisla80 at gmail.com (David Spisla) Date: Mon, 4 Mar 2019 16:03:18 +0100 Subject: [Gluster-users] SLES15 packages for v5.4 Message-ID: Hello folks, can someone please provide packages for Gluster v5.4 for SLES15? For CentOS and Ubuntu there are already packages. Regards David Spisla -------------- next part -------------- An HTML attachment was scrubbed... URL: From amukherj at redhat.com Mon Mar 4 15:08:04 2019 From: amukherj at redhat.com (Atin Mukherjee) Date: Mon, 4 Mar 2019 20:38:04 +0530 Subject: [Gluster-users] [Gluster-Maintainers] GlusterFS - 6.0RC - Test days (27th, 28th Feb) In-Reply-To: References: Message-ID: On Mon, 4 Mar 2019 at 20:33, Amar Tumballi Suryanarayan wrote: > Thanks to those who participated. > > Update at present: > > We found 3 blocker bugs in upgrade scenarios, and hence have marked release > as pending upon them. We will keep these lists updated about progress. I?d like to clarify that upgrade testing is blocked. So just fixing these test blocker(s) isn?t enough to call release-6 green. We need to continue and finish the rest of the upgrade tests once the respective bugs are fixed. > > -Amar > > On Mon, Feb 25, 2019 at 11:41 PM Amar Tumballi Suryanarayan < > atumball at redhat.com> wrote: > > > Hi all, > > > > We are calling out our users, and developers to contribute in validating > > ?glusterfs-6.0rc? build in their usecase. Specially for the cases of > > upgrade, stability, and performance. > > > > Some of the key highlights of the release are listed in release-notes > > draft > > < > https://github.com/gluster/glusterfs/blob/release-6/doc/release-notes/6.0.md > >. > > Please note that there are some of the features which are being dropped > out > > of this release, and hence making sure your setup is not going to have an > > issue is critical. Also the default lru-limit option in fuse mount for > > Inodes should help to control the memory usage of client processes. All > the > > good reason to give it a shot in your test setup. > > > > If you are developer using gfapi interface to integrate with other > > projects, you also have some signature changes, so please make sure your > > project would work with latest release. Or even if you are using a > project > > which depends on gfapi, report the error with new RPMs (if any). We will > > help fix it. > > > > As part of test days, we want to focus on testing the latest upcoming > > release i.e. GlusterFS-6, and one or the other gluster volunteers would > be > > there on #gluster channel on freenode to assist the people. Some of the > key > > things we are looking as bug reports are: > > > > - > > > > See if upgrade from your current version to 6.0rc is smooth, and works > > as documented. > > - Report bugs in process, or in documentation if you find mismatch. > > - > > > > Functionality is all as expected for your usecase. > > - No issues with actual application you would run on production etc. > > - > > > > Performance has not degraded in your usecase. > > - While we have added some performance options to the code, not all of > > them are turned on, as they have to be done based on usecases. > > - Make sure the default setup is at least same as your current > > version > > - Try out few options mentioned in release notes (especially, > > --auto-invalidation=no) and see if it helps performance. > > - > > > > While doing all the above, check below: > > - see if the log files are making sense, and not flooding with some > > ?for developer only? type of messages. > > - get ?profile info? output from old and now, and see if there is > > anything which is out of normal expectation. Check with us on the > numbers. > > - get a ?statedump? when there are some issues. Try to make sense > > of it, and raise a bug if you don?t understand it completely. > > > > > > < > https://hackmd.io/YB60uRCMQRC90xhNt4r6gA?both#Process-expected-on-test-days > >Process > > expected on test days. > > > > - > > > > We have a tracker bug > > [0] > > - We will attach all the ?blocker? bugs to this bug. > > - > > > > Use this link to report bugs, so that we have more metadata around > > given bugzilla. > > - Click Here > > < > https://bugzilla.redhat.com/enter_bug.cgi?blocked=1672818&bug_severity=high&component=core&priority=high&product=GlusterFS&status_whiteboard=gluster-test-day&version=6 > > > > [1] > > - > > > > The test cases which are to be tested are listed here in this sheet > > < > https://docs.google.com/spreadsheets/d/1AS-tDiJmAr9skK535MbLJGe_RfqDQ3j1abX1wtjwpL4/edit?usp=sharing > >[2], > > please add, update, and keep it up-to-date to reduce duplicate efforts -- - Atin (atinm) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Mon Mar 4 15:07:10 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Mon, 4 Mar 2019 20:37:10 +0530 Subject: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters In-Reply-To: References: Message-ID: What does self-heal pending numbers show? On Mon, Mar 4, 2019 at 7:52 PM Hu Bert wrote: > Hi Alberto, > > wow, good hint! We switched from old servers with version 4.1.6 to new > servers (fresh install) with version 5.3 on february 5th. I saw that > there was more network traffic on server side, but didn't watch it on > client side - the traffic went up significantly on both sides, from > about 20-40 MBit/s up to 200 MBit/s, on server side from about 20-40 > MBit/s up to 500 MBit/s. Here's a screenshot of the munin graphs: > > network traffic on high iowait client: > https://abload.de/img/client-eth1-traffic76j4i.jpg > network traffic on old servers: > https://abload.de/img/oldservers-eth1nejzt.jpg > network traffic on new servers: > https://abload.de/img/newservers-eth17ojkf.jpg > > Don't know if that's related to our iowait problem, maybe only a > correlation. But we see the same high network traffic with v5.3. > > > Thx, > Hubert > > Am Mo., 4. M?rz 2019 um 14:57 Uhr schrieb Alberto Bengoa > : > > > > Hello Hubert, > > > > On Mon, 4 Mar 2019 at 10:56, Hu Bert wrote: > >> > >> Hi Raghavendra, > >> > >> at the moment iowait and cpu consumption is quite low, the main > >> problems appear during the weekend (high traffic, especially on > >> sunday), so either we have to wait until next sunday or use a time > >> machine ;-) > >> > > > > Check if your high IO Wait is not related to high network traffic. We > had to left 5.3 version due this issue[1]: > > > > [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1673058 > > > > Cheers, > > Alberto > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Mon Mar 4 15:11:15 2019 From: revirii at googlemail.com (Hu Bert) Date: Mon, 4 Mar 2019 16:11:15 +0100 Subject: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters In-Reply-To: References: Message-ID: Do you mean "gluster volume heal $volname statistics heal-count"? If yes: 0 for both volumes. Am Mo., 4. M?rz 2019 um 16:08 Uhr schrieb Amar Tumballi Suryanarayan : > > What does self-heal pending numbers show? > > On Mon, Mar 4, 2019 at 7:52 PM Hu Bert wrote: >> >> Hi Alberto, >> >> wow, good hint! We switched from old servers with version 4.1.6 to new >> servers (fresh install) with version 5.3 on february 5th. I saw that >> there was more network traffic on server side, but didn't watch it on >> client side - the traffic went up significantly on both sides, from >> about 20-40 MBit/s up to 200 MBit/s, on server side from about 20-40 >> MBit/s up to 500 MBit/s. Here's a screenshot of the munin graphs: >> >> network traffic on high iowait client: >> https://abload.de/img/client-eth1-traffic76j4i.jpg >> network traffic on old servers: https://abload.de/img/oldservers-eth1nejzt.jpg >> network traffic on new servers: https://abload.de/img/newservers-eth17ojkf.jpg >> >> Don't know if that's related to our iowait problem, maybe only a >> correlation. But we see the same high network traffic with v5.3. >> >> >> Thx, >> Hubert >> >> Am Mo., 4. M?rz 2019 um 14:57 Uhr schrieb Alberto Bengoa >> : >> > >> > Hello Hubert, >> > >> > On Mon, 4 Mar 2019 at 10:56, Hu Bert wrote: >> >> >> >> Hi Raghavendra, >> >> >> >> at the moment iowait and cpu consumption is quite low, the main >> >> problems appear during the weekend (high traffic, especially on >> >> sunday), so either we have to wait until next sunday or use a time >> >> machine ;-) >> >> >> > >> > Check if your high IO Wait is not related to high network traffic. We had to left 5.3 version due this issue[1]: >> > >> > [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1673058 >> > >> > Cheers, >> > Alberto >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Amar Tumballi (amarts) From srangana at redhat.com Mon Mar 4 17:33:49 2019 From: srangana at redhat.com (Shyam Ranganathan) Date: Mon, 4 Mar 2019 12:33:49 -0500 Subject: [Gluster-users] [Gluster-devel] [Gluster-Maintainers] GlusterFS - 6.0RC - Test days (27th, 28th Feb) In-Reply-To: References: Message-ID: <5f27393d-cefb-aa47-12f8-1597677ffb50@redhat.com> On 3/4/19 10:08 AM, Atin Mukherjee wrote: > > > On Mon, 4 Mar 2019 at 20:33, Amar Tumballi Suryanarayan > > wrote: > > Thanks to those who participated. > > Update at present: > > We found 3 blocker bugs in upgrade scenarios, and hence have marked > release > as pending upon them. We will keep these lists updated about progress. > > > I?d like to clarify that upgrade testing is blocked. So just fixing > these test blocker(s) isn?t enough to call release-6 green. We need to > continue and finish the rest of the upgrade tests once the respective > bugs are fixed. Based on fixes expected by tomorrow for the upgrade fixes, we will build an RC1 candidate on Wednesday (6-Mar) (tagging early Wed. Eastern TZ). This RC can be used for further testing. > > > > -Amar > > On Mon, Feb 25, 2019 at 11:41 PM Amar Tumballi Suryanarayan < > atumball at redhat.com > wrote: > > > Hi all, > > > > We are calling out our users, and developers to contribute in > validating > > ?glusterfs-6.0rc? build in their usecase. Specially for the cases of > > upgrade, stability, and performance. > > > > Some of the key highlights of the release are listed in release-notes > > draft > > > . > > Please note that there are some of the features which are being > dropped out > > of this release, and hence making sure your setup is not going to > have an > > issue is critical. Also the default lru-limit option in fuse mount for > > Inodes should help to control the memory usage of client > processes. All the > > good reason to give it a shot in your test setup. > > > > If you are developer using gfapi interface to integrate with other > > projects, you also have some signature changes, so please make > sure your > > project would work with latest release. Or even if you are using a > project > > which depends on gfapi, report the error with new RPMs (if any). > We will > > help fix it. > > > > As part of test days, we want to focus on testing the latest upcoming > > release i.e. GlusterFS-6, and one or the other gluster volunteers > would be > > there on #gluster channel on freenode to assist the people. Some > of the key > > things we are looking as bug reports are: > > > >? ? - > > > >? ? See if upgrade from your current version to 6.0rc is smooth, > and works > >? ? as documented. > >? ? - Report bugs in process, or in documentation if you find mismatch. > >? ? - > > > >? ? Functionality is all as expected for your usecase. > >? ? - No issues with actual application you would run on production > etc. > >? ? - > > > >? ? Performance has not degraded in your usecase. > >? ? - While we have added some performance options to the code, not > all of > >? ? ? ?them are turned on, as they have to be done based on usecases. > >? ? ? ?- Make sure the default setup is at least same as your current > >? ? ? ?version > >? ? ? ?- Try out few options mentioned in release notes (especially, > >? ? ? ?--auto-invalidation=no) and see if it helps performance. > >? ? - > > > >? ? While doing all the above, check below: > >? ? - see if the log files are making sense, and not flooding with some > >? ? ? ??for developer only? type of messages. > >? ? ? ?- get ?profile info? output from old and now, and see if > there is > >? ? ? ?anything which is out of normal expectation. Check with us > on the numbers. > >? ? ? ?- get a ?statedump? when there are some issues. Try to make > sense > >? ? ? ?of it, and raise a bug if you don?t understand it completely. > > > > > > > Process > > expected on test days. > > > >? ? - > > > >? ? We have a tracker bug > >? ? [0] > >? ? - We will attach all the ?blocker? bugs to this bug. > >? ? - > > > >? ? Use this link to report bugs, so that we have more metadata around > >? ? given bugzilla. > >? ? - Click Here > >? ? ? > ? > >? ? ? ?[1] > >? ? - > > > >? ? The test cases which are to be tested are listed here in this sheet > >? ? > [2], > >? ? please add, update, and keep it up-to-date to reduce duplicate > efforts > > -- > - Atin (atinm) > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > From rgowdapp at redhat.com Mon Mar 4 17:45:59 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Mon, 4 Mar 2019 23:15:59 +0530 Subject: [Gluster-users] "rpc_clnt_ping_timer_expired" errors In-Reply-To: References: <96B07283-D8AB-4F06-909D-E00424625528@cmcc.it> <42758A0E-8BE9-497D-BDE3-55D7DC633BC7@cmcc.it> <6A8CF4A4-98EA-48C3-A059-D60D1B2721C7@cmcc.it> <2CF49168-9C1B-4931-8C34-8157262A137A@cmcc.it> <7A151CC9-A0AE-4A45-B450-A4063D216D9E@cmcc.it> <32D53ECE-3F49-4415-A6EE-241B351AC2BA@cmcc.it> <4685A75B-5978-4338-9C9F-4A02FB40B9BC@cmcc.it> <4D2E6B04-C2E8-4FD5-B72D-E7C05931C6F9@cmcc.it> <4E332A56-B318-4BC1-9F44-84AB4392A5DE@cmcc.it> <832FD362-3B14-40D8-8530-604419300476@cmcc.it> <8D926643-1093-48ED-823F-D8F117F702CF@cmcc.it> <9D0BE438-DF11-4D0A-AB85-B44357032F29@cmcc.it> <5F0AC378-8170-4342-8473-9C17159CAC1D@cmcc.it> <7A50E86D-9E27-4EA7-883B-45E9F973991A@cmcc.it> <58B5DB7F-DCB4-4FBF-B1F7-681315B1613A@cmcc.it> <6327B44F-4E7E-46BB-A74C-70F4457DD1EB@cmcc.it> <0167DF4A-8329-4A1A-B439-857DFA6F78BB@cmcc.it> <763F334E-35B8-4729-B8E1-D30866754EEE@cmcc.it> <91DFC9AC-4805-41EB-AC6F-5722E018DD6E@cmcc.it> <8A9752B8-B231-4570-8FF4-8D3D781E7D42@cmcc.it> <47A24804-F975-4EE6-9FA5-67FCDA18D707@cmcc.it> <637FF5D2-D1F4-4686-9D48-646A96F67B96@cmcc.it> <4A87495F-3755-48F7-8507-085869069C64@cmcc.it> <3854BBF6-5B98-4AB3-A67E-E7DE59E69A63@cmcc.it> <313DA021-9173-4899-96B0-831B10B00B61@cmcc.it> <17996AFD-DFC8-40F3-9D09-DB6DDAD5B7D6@cmcc.it> <7074B5D8-955A-4802-A9F3-606C99734417@cmcc.it> <83B84BF9-8334-4230-B6F8-0BC4BFBEFF15@cmcc.it> <133B9AE4-9792-4F72-AD91-D36E7B9EC711@cmcc.it> <6611C4B0-57FD-4390-88B5-BD373555D4C5@cmcc.it> Message-ID: +Gluster Devel , +Gluster-users I would like to point out another issue. Even if what I suggested prevents disconnects, part of the solution would be only symptomatic treatment and doesn't address the root cause of the problem. In most of the ping-timer-expiry issues, the root cause is the increased load on bricks and the inability of bricks to be responsive under high load. So, the actual solution would be doing any or both of the following: * identify the source of increased load and if possible throttle it. Internal heal processes like self-heal, rebalance, quota heal are known to pump traffic into bricks without much throttling (io-threads _might_ do some throttling, but my understanding is its not sufficient). * identify the reason for bricks to become unresponsive during load. This may be fixable issues like not enough event-threads to read from network or difficult to fix issues like fsync on backend fs freezing the process or semi fixable issues (in code) like lock contention. So any genuine effort to fix ping-timer-issues (to be honest most of the times they are not issues related to rpc/network) would involve performance characterization of various subsystems on bricks and clients. Various subsystems can include (but not necessarily limited to), underlying OS/filesystem, glusterfs processes, CPU consumption etc regards, Raghavendra On Mon, Mar 4, 2019 at 9:31 PM Mauro Tridici wrote: > Thank you, let?s try! > I will inform you about the effects of the change. > > Regards, > Mauro > > On 4 Mar 2019, at 16:55, Raghavendra Gowdappa wrote: > > > > On Mon, Mar 4, 2019 at 8:54 PM Mauro Tridici > wrote: > >> Hi Raghavendra, >> >> thank you for your reply. >> Yes, you are right. It is a problem that seems to happen randomly. >> At this moment, server.event-threads value is 4. I will try to increase >> this value to 8. Do you think that it could be a valid value ? >> > > Yes. We can try with that. You should see at least frequency of ping-timer > related disconnects reduce with this value (even if it doesn't eliminate > the problem completely). > > >> Regards, >> Mauro >> >> >> On 4 Mar 2019, at 15:36, Raghavendra Gowdappa >> wrote: >> >> >> >> On Mon, Mar 4, 2019 at 8:01 PM Nithya Balachandran >> wrote: >> >>> Hi Mauro, >>> >>> It looks like some problem on s06. Are all your other nodes ok? Can you >>> send us the gluster logs from this node? >>> >>> @Raghavendra G , do you have any idea as to >>> how this can be debugged? Maybe running top ? Or debug brick logs? >>> >> >> If we can reproduce the problem, collecting tcpdump on both ends of >> connection will help. But, one common problem is these bugs are >> inconsistently reproducible and hence we may not be able to capture tcpdump >> at correct intervals. Other than that, we can try to collect some evidence >> that poller threads were busy (waiting on locks). But, not sure what debug >> data provides that information. >> >> From what I know, its difficult to collect evidence for this issue and we >> could only reason about it. >> >> We can try a workaround though - try increasing server.event-threads and >> see whether ping-timer expiry issues go away with an optimal value. If >> that's the case, it kind of provides proof for our hypothesis. >> >> >>> >>> Regards, >>> Nithya >>> >>> On Mon, 4 Mar 2019 at 15:25, Mauro Tridici >>> wrote: >>> >>>> Hi All, >>>> >>>> some minutes ago I received this message from NAGIOS server >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ****** Nagios *****Notification Type: PROBLEMService: Brick - >>>> /gluster/mnt2/brickHost: s06Address: s06-stgState: CRITICALDate/Time: Mon >>>> Mar 4 10:25:33 CET 2019Additional Info:CHECK_NRPE STATE CRITICAL: Socket >>>> timeout after 10 seconds.* >>>> >>>> I checked the network, RAM and CPUs usage on s06 node and everything >>>> seems to be ok. >>>> No bricks are in error state. In /var/log/messages, I detected again a >>>> crash of ?check_vol_utili? that I think it is a module used by NRPE >>>> executable (that is the NAGIOS client). >>>> >>>> Mar 4 10:15:29 s06 kernel: traps: check_vol_utili[161224] general >>>> protection ip:7facffa0a66d sp:7ffe9f4e6fc0 error:0 in >>>> libglusterfs.so.0.0.1[7facff9b7000+f7000] >>>> Mar 4 10:15:29 s06 abrt-hook-ccpp: Process 161224 (python2.7) of user >>>> 0 killed by SIGSEGV - dumping core >>>> Mar 4 10:15:29 s06 abrt-server: Generating core_backtrace >>>> Mar 4 10:15:29 s06 abrt-server: Error: Unable to open './coredump': No >>>> such file or directory >>>> Mar 4 10:16:01 s06 systemd: Created slice User Slice of root. >>>> Mar 4 10:16:01 s06 systemd: Starting User Slice of root. >>>> Mar 4 10:16:01 s06 systemd: Started Session 201010 of user root. >>>> Mar 4 10:16:01 s06 systemd: Starting Session 201010 of user root. >>>> Mar 4 10:16:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 4 10:16:01 s06 systemd: Stopping User Slice of root. >>>> Mar 4 10:16:24 s06 abrt-server: Duplicate: UUID >>>> Mar 4 10:16:24 s06 abrt-server: DUP_OF_DIR: >>>> /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>> Mar 4 10:16:24 s06 abrt-server: Deleting problem directory >>>> ccpp-2019-03-04-10:15:29-161224 (dup of ccpp-2018-09-25-12:27:42-13041) >>>> Mar 4 10:16:24 s06 abrt-server: Generating core_backtrace >>>> Mar 4 10:16:24 s06 abrt-server: Error: Unable to open './coredump': No >>>> such file or directory >>>> Mar 4 10:16:24 s06 abrt-server: Cannot notify >>>> '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event >>>> 'report_uReport' exited with 1 >>>> Mar 4 10:16:24 s06 abrt-hook-ccpp: Process 161391 (python2.7) of user >>>> 0 killed by SIGABRT - dumping core >>>> Mar 4 10:16:25 s06 abrt-server: Generating core_backtrace >>>> Mar 4 10:16:25 s06 abrt-server: Error: Unable to open './coredump': No >>>> such file or directory >>>> Mar 4 10:17:01 s06 systemd: Created slice User Slice of root. >>>> >>>> Also, I noticed the following errors that I think are very critical: >>>> >>>> Mar 4 10:21:12 s06 glustershd[20355]: [2019-03-04 09:21:12.954798] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server >>>> 192.168.0.55:49158 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 4 10:22:01 s06 systemd: Created slice User Slice of root. >>>> Mar 4 10:22:01 s06 systemd: Starting User Slice of root. >>>> Mar 4 10:22:01 s06 systemd: Started Session 201017 of user root. >>>> Mar 4 10:22:01 s06 systemd: Starting Session 201017 of user root. >>>> Mar 4 10:22:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 4 10:22:01 s06 systemd: Stopping User Slice of root. >>>> Mar 4 10:22:03 s06 glustershd[20355]: [2019-03-04 09:22:03.964120] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server >>>> 192.168.0.54:49165 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 4 10:23:01 s06 systemd: Created slice User Slice of root. >>>> Mar 4 10:23:01 s06 systemd: Starting User Slice of root. >>>> Mar 4 10:23:01 s06 systemd: Started Session 201018 of user root. >>>> Mar 4 10:23:01 s06 systemd: Starting Session 201018 of user root. >>>> Mar 4 10:23:02 s06 systemd: Removed slice User Slice of root. >>>> Mar 4 10:23:02 s06 systemd: Stopping User Slice of root. >>>> Mar 4 10:24:01 s06 systemd: Created slice User Slice of root. >>>> Mar 4 10:24:01 s06 systemd: Starting User Slice of root. >>>> Mar 4 10:24:01 s06 systemd: Started Session 201019 of user root. >>>> Mar 4 10:24:01 s06 systemd: Starting Session 201019 of user root. >>>> Mar 4 10:24:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 4 10:24:01 s06 systemd: Stopping User Slice of root. >>>> Mar 4 10:24:03 s06 glustershd[20355]: [2019-03-04 09:24:03.982502] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-16: server >>>> 192.168.0.52:49158 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746109] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-3: server >>>> 192.168.0.51:49153 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746215] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server >>>> 192.168.0.52:49156 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746260] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-21: server >>>> 192.168.0.51:49159 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746296] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server >>>> 192.168.0.52:49161 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746413] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server >>>> 192.168.0.54:49165 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 4 10:24:07 s06 glustershd[20355]: [2019-03-04 09:24:07.982952] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-45: server >>>> 192.168.0.54:49155 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 4 10:24:18 s06 glustershd[20355]: [2019-03-04 09:24:18.990929] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server >>>> 192.168.0.52:49161 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 4 10:24:31 s06 glustershd[20355]: [2019-03-04 09:24:31.995781] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-20: server >>>> 192.168.0.53:49159 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 4 10:25:01 s06 systemd: Created slice User Slice of root. >>>> Mar 4 10:25:01 s06 systemd: Starting User Slice of root. >>>> Mar 4 10:25:01 s06 systemd: Started Session 201020 of user root. >>>> Mar 4 10:25:01 s06 systemd: Starting Session 201020 of user root. >>>> Mar 4 10:25:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 4 10:25:01 s06 systemd: Stopping User Slice of root. >>>> Mar 4 10:25:57 s06 systemd: Created slice User Slice of root. >>>> Mar 4 10:25:57 s06 systemd: Starting User Slice of root. >>>> Mar 4 10:25:57 s06 systemd-logind: New session 201021 of user root. >>>> Mar 4 10:25:57 s06 systemd: Started Session 201021 of user root. >>>> Mar 4 10:25:57 s06 systemd: Starting Session 201021 of user root. >>>> Mar 4 10:26:01 s06 systemd: Started Session 201022 of user root. >>>> Mar 4 10:26:01 s06 systemd: Starting Session 201022 of user root. >>>> Mar 4 10:26:21 s06 nrpe[162388]: Error: Could not complete SSL >>>> handshake with 192.168.1.56: 5 >>>> Mar 4 10:27:01 s06 systemd: Started Session 201023 of user root. >>>> Mar 4 10:27:01 s06 systemd: Starting Session 201023 of user root. >>>> Mar 4 10:28:01 s06 systemd: Started Session 201024 of user root. >>>> Mar 4 10:28:01 s06 systemd: Starting Session 201024 of user root. >>>> Mar 4 10:29:01 s06 systemd: Started Session 201025 of user root. >>>> Mar 4 10:29:01 s06 systemd: Starting Session 201025 of user root. >>>> >>>> But, unfortunately, I don?t understand why it is happening. >>>> Now, NAGIOS server shows that s06 status is ok: >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ****** Nagios *****Notification Type: RECOVERYService: Brick - >>>> /gluster/mnt2/brickHost: s06Address: s06-stgState: OKDate/Time: Mon Mar 4 >>>> 10:35:23 CET 2019Additional Info:OK: Brick /gluster/mnt2/brick is up* >>>> >>>> Nothing is changed from RAM, CPUs, and NETWORK point of view. >>>> /var/log/message file has been updated: >>>> >>>> Mar 4 10:32:01 s06 systemd: Starting Session 201029 of user root. >>>> Mar 4 10:32:30 s06 glustershd[20355]: [2019-03-04 09:32:30.069082] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server >>>> 192.168.0.52:49156 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 4 10:32:55 s06 glustershd[20355]: [2019-03-04 09:32:55.074689] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-66: server >>>> 192.168.0.54:49167 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 4 10:33:01 s06 systemd: Started Session 201030 of user root. >>>> Mar 4 10:33:01 s06 systemd: Starting Session 201030 of user root. >>>> Mar 4 10:34:01 s06 systemd: Started Session 201031 of user root. >>>> Mar 4 10:34:01 s06 systemd: Starting Session 201031 of user root. >>>> Mar 4 10:35:01 s06 nrpe[162562]: Could not read request from client >>>> 192.168.1.56, bailing out... >>>> Mar 4 10:35:01 s06 nrpe[162562]: INFO: SSL Socket Shutdown. >>>> Mar 4 10:35:01 s06 systemd: Started Session 201032 of user root. >>>> Mar 4 10:35:01 s06 systemd: Starting Session 201032 of user root. >>>> >>>> Could you please help me to understand what it?s happening ? >>>> Thank you in advance. >>>> >>>> Rergards, >>>> Mauro >>>> >>>> >>>> On 1 Mar 2019, at 12:17, Mauro Tridici wrote: >>>> >>>> >>>> Thank you, Milind. >>>> I executed the instructions you suggested: >>>> >>>> - grep ?blocked for? /var/log/messages on s06 returns no output (no >>>> ?blocked? word is detected in messages file); >>>> - in /var/log/messages file I can see this kind of error repeated for a >>>> lot of times: >>>> >>>> Mar 1 08:43:01 s06 systemd: Starting Session 196071 of user root. >>>> Mar 1 08:43:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 1 08:43:01 s06 systemd: Stopping User Slice of root. >>>> Mar 1 08:43:02 s06 kernel: traps: check_vol_utili[57091] general >>>> protection ip:7f88e76ee66d sp:7ffe5a5bcc30 error:0 in >>>> libglusterfs.so.0.0.1[7f88e769b000+f7000] >>>> Mar 1 08:43:02 s06 abrt-hook-ccpp: Process 57091 (python2.7) of user 0 >>>> killed by SIGSEGV - dumping core >>>> Mar 1 08:43:02 s06 abrt-server: Generating core_backtrace >>>> Mar 1 08:43:02 s06 abrt-server: Error: Unable to open './coredump': No >>>> such file or directory >>>> Mar 1 08:43:58 s06 abrt-server: Duplicate: UUID >>>> Mar 1 08:43:58 s06 abrt-server: DUP_OF_DIR: >>>> /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>> Mar 1 08:43:58 s06 abrt-server: Deleting problem directory >>>> ccpp-2019-03-01-08:43:02-57091 (dup of ccpp-2018-09-25-12:27:42-13041) >>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Activating service >>>> name='org.freedesktop.problems' (using servicehelper) >>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Successfully activated service >>>> 'org.freedesktop.problems' >>>> Mar 1 08:43:58 s06 abrt-server: Generating core_backtrace >>>> Mar 1 08:43:58 s06 abrt-server: Error: Unable to open './coredump': No >>>> such file or directory >>>> Mar 1 08:43:58 s06 abrt-server: Cannot notify >>>> '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event >>>> 'report_uReport' exited with 1 >>>> Mar 1 08:44:01 s06 systemd: Created slice User Slice of root. >>>> Mar 1 08:44:01 s06 systemd: Starting User Slice of root. >>>> Mar 1 08:44:01 s06 systemd: Started Session 196072 of user root. >>>> Mar 1 08:44:01 s06 systemd: Starting Session 196072 of user root. >>>> Mar 1 08:44:01 s06 systemd: Removed slice User Slice of root. >>>> >>>> - in /var/log/messages file I can see also 4 errors related to other >>>> cluster servers: >>>> >>>> Mar 1 11:05:01 s06 systemd: Starting User Slice of root. >>>> Mar 1 11:05:01 s06 systemd: Started Session 196230 of user root. >>>> Mar 1 11:05:01 s06 systemd: Starting Session 196230 of user root. >>>> Mar 1 11:05:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 1 11:05:01 s06 systemd: Stopping User Slice of root. >>>> Mar 1 11:05:59 s06 glustershd[70117]: [2019-03-01 10:05:59.347094] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-33: server >>>> 192.168.0.51:49163 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 1 11:06:01 s06 systemd: Created slice User Slice of root. >>>> Mar 1 11:06:01 s06 systemd: Starting User Slice of root. >>>> Mar 1 11:06:01 s06 systemd: Started Session 196231 of user root. >>>> Mar 1 11:06:01 s06 systemd: Starting Session 196231 of user root. >>>> Mar 1 11:06:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 1 11:06:01 s06 systemd: Stopping User Slice of root. >>>> Mar 1 11:06:12 s06 glustershd[70117]: [2019-03-01 10:06:12.351319] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-1: server >>>> 192.168.0.52:49153 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 1 11:06:38 s06 glustershd[70117]: [2019-03-01 10:06:38.356920] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-7: server >>>> 192.168.0.52:49155 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 1 11:07:01 s06 systemd: Created slice User Slice of root. >>>> Mar 1 11:07:01 s06 systemd: Starting User Slice of root. >>>> Mar 1 11:07:01 s06 systemd: Started Session 196232 of user root. >>>> Mar 1 11:07:01 s06 systemd: Starting Session 196232 of user root. >>>> Mar 1 11:07:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 1 11:07:01 s06 systemd: Stopping User Slice of root. >>>> Mar 1 11:07:36 s06 glustershd[70117]: [2019-03-01 10:07:36.366259] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-0: server >>>> 192.168.0.51:49152 has not responded in the last 42 seconds, >>>> disconnecting. >>>> Mar 1 11:08:01 s06 systemd: Created slice User Slice of root. >>>> >>>> No ?blocked? word is in /var/log/messages files on other cluster >>>> servers. >>>> In attachment, the /var/log/messages file from s06 server. >>>> >>>> Thank you in advance, >>>> Mauro >>>> >>>> >>>> >>>> >>>> On 1 Mar 2019, at 11:47, Milind Changire wrote: >>>> >>>> The traces of very high disk activity on the servers are often found in >>>> /var/log/messages >>>> You might want to grep for "blocked for" in /var/log/messages on s06 >>>> and correlate the timestamps to confirm the unresponsiveness as reported in >>>> gluster client logs. >>>> In cases of high disk activity, although the operating system continues >>>> to respond to ICMP pings, the processes writing to disks often get blocked >>>> to a large flush to the disk which could span beyond 42 seconds and hence >>>> result in ping-timer-expiry logs. >>>> >>>> As a side note: >>>> If you indeed find gluster processes being blocked in >>>> /var/log/messages, you might want to tweak sysctl tunables called >>>> vm.dirty_background_ratio or vm.dirty_background_bytes to a smaller value >>>> than the existing. Please read up more on those tunables before touching >>>> the settings. >>>> >>>> >>>> On Fri, Mar 1, 2019 at 4:06 PM Mauro Tridici >>>> wrote: >>>> >>>>> >>>>> Hi all, >>>>> >>>>> in attachment the client log captured after changing >>>>> network.ping-timeout option. >>>>> I noticed this error involving server 192.168.0.56 (s06) >>>>> >>>>> [2019-03-01 09:23:36.077287] I [rpc-clnt.c:1962:rpc_clnt_reconfig] >>>>> 0-tier2-client-71: changing ping timeout to 42 (from 0) >>>>> [2019-03-01 09:23:36.078213] I >>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>> volfile,continuing >>>>> [2019-03-01 09:23:36.078432] I >>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>> volfile,continuing >>>>> [2019-03-01 09:23:36.092357] I >>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>> volfile,continuing >>>>> [2019-03-01 09:23:36.094146] I >>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>> volfile,continuing >>>>> [2019-03-01 10:06:24.708082] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-50: server >>>>> 192.168.0.56:49156 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> >>>>> I don?t know why it happens, s06 server seems to be reachable. >>>>> >>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>> Trying 192.168.0.56... >>>>> Connected to 192.168.0.56. >>>>> Escape character is '^]'. >>>>> ^CConnection closed by foreign host. >>>>> [athena_login2][/users/home/sysm02/]> ping 192.168.0.56 >>>>> PING 192.168.0.56 (192.168.0.56) 56(84) bytes of data. >>>>> 64 bytes from 192.168.0.56: icmp_seq=1 ttl=64 time=0.116 ms >>>>> 64 bytes from 192.168.0.56: icmp_seq=2 ttl=64 time=0.101 ms >>>>> >>>>> --- 192.168.0.56 ping statistics --- >>>>> 2 packets transmitted, 2 received, 0% packet loss, time 1528ms >>>>> rtt min/avg/max/mdev = 0.101/0.108/0.116/0.012 ms >>>>> >>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>> Trying 192.168.0.56... >>>>> Connected to 192.168.0.56. >>>>> Escape character is '^]'. >>>>> >>>>> Thank you for your help, >>>>> Mauro >>>>> >>>>> >>>>> >>>>> On 1 Mar 2019, at 10:29, Mauro Tridici wrote: >>>>> >>>>> Hi all, >>>>> >>>>> thank you for the explanation. >>>>> I just changed network.ping-timeout option to default value >>>>> (network.ping-timeout=42). >>>>> >>>>> I will check the logs to see if the errors will appear again. >>>>> >>>>> Regards, >>>>> Mauro >>>>> >>>>> On 1 Mar 2019, at 04:43, Milind Changire wrote: >>>>> >>>>> network.ping-timeout should not be set to zero for non-glusterd >>>>> clients. >>>>> glusterd is a special case for which ping-timeout is set to zero via >>>>> /etc/glusterfs/glusterd.vol >>>>> >>>>> Setting network.ping-timeout to zero disables arming of the ping timer >>>>> for connections. This disables testing the connection for responsiveness >>>>> and hence avoids proactive fail-over. >>>>> >>>>> Please reset network.ping-timeout to a non-zero positive value, eg. 42 >>>>> >>>>> >>>>> On Thu, Feb 28, 2019 at 5:07 PM Nithya Balachandran < >>>>> nbalacha at redhat.com> wrote: >>>>> >>>>>> Adding Raghavendra and Milind to comment on this. >>>>>> >>>>>> What is the effect of setting network.ping-timeout to 0 and should it >>>>>> be set back to 42? >>>>>> Regards, >>>>>> Nithya >>>>>> >>>>>> On Thu, 28 Feb 2019 at 16:01, Mauro Tridici >>>>>> wrote: >>>>>> >>>>>>> Hi Nithya, >>>>>>> >>>>>>> sorry for the late. >>>>>>> network.ping-timeout has been set to 0 in order to try to solve some >>>>>>> timeout problems, but it didn?t help. >>>>>>> I can set it to the default value. >>>>>>> >>>>>>> Can I proceed with the change? >>>>>>> >>>>>>> Thank you, >>>>>>> Mauro >>>>>>> >>>>>>> >>>>>>> On 28 Feb 2019, at 04:41, Nithya Balachandran >>>>>>> wrote: >>>>>>> >>>>>>> Hi Mauro, >>>>>>> >>>>>>> Is network.ping-timeout still set to 0. The default value is 42. Is >>>>>>> there a particular reason why this was changed? >>>>>>> >>>>>>> Regards, >>>>>>> Nithya >>>>>>> >>>>>>> >>>>>>> On Wed, 27 Feb 2019 at 21:32, Mauro Tridici >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Hi Xavi, >>>>>>>> >>>>>>>> thank you for the detailed explanation and suggestions. >>>>>>>> Yes, transport.listen-backlog option is still set to 1024. >>>>>>>> >>>>>>>> I will check the network and connectivity status using ?ping? and >>>>>>>> ?telnet? as soon as the errors will come back again. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Mauro >>>>>>>> >>>>>>>> Il giorno 27 feb 2019, alle ore 16:42, Xavi Hernandez < >>>>>>>> jahernan at redhat.com> ha scritto: >>>>>>>> >>>>>>>> Hi Mauro, >>>>>>>> >>>>>>>> those errors say that the mount point is not connected to some of >>>>>>>> the bricks while executing operations. I see references to 3rd and 6th >>>>>>>> bricks of several disperse sets, which seem to map to server s06. For some >>>>>>>> reason, gluster is having troubles connecting from the client machine to >>>>>>>> that particular server. At the end of the log I see that after long time a >>>>>>>> reconnect is done to both of them. However little after, other bricks from >>>>>>>> the s05 get disconnected and a reconnect times out. >>>>>>>> >>>>>>>> That's really odd. It seems like if server/communication is cut to >>>>>>>> s06 for some time, then restored, and then the same happens to the next >>>>>>>> server. >>>>>>>> >>>>>>>> If the servers are really online and it's only a communication >>>>>>>> issue, it explains why server memory and network has increased: if the >>>>>>>> problem only exists between the client and servers, any write made by the >>>>>>>> client will automatically mark the file as damaged, since some of the >>>>>>>> servers have not been updated. Since self-heal runs from the server nodes, >>>>>>>> they will probably be correctly connected to all bricks, which allows them >>>>>>>> to heal the just damaged file, which increases memory and network usage. >>>>>>>> >>>>>>>> I guess you still have transport.listen-backlog set to 1024, right ? >>>>>>>> >>>>>>>> Just to try to identify if the problem really comes from network, >>>>>>>> can you check if you lose some pings from the client to all of the servers >>>>>>>> while you are seeing those errors in the log file ? >>>>>>>> >>>>>>>> You can also check if during those errors, you can telnet to the >>>>>>>> port of the brick from the client. >>>>>>>> >>>>>>>> Xavi >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Feb 26, 2019 at 10:17 AM Mauro Tridici < >>>>>>>> mauro.tridici at cmcc.it> wrote: >>>>>>>> >>>>>>>>> Hi Nithya, >>>>>>>>> >>>>>>>>> ?df -h? operation is not still slow, but no users are using the >>>>>>>>> volume, RAM and NETWORK usage is ok on the client node. >>>>>>>>> >>>>>>>>> I was worried about this kind of warnings/errors: >>>>>>>>> >>>>>>>>> [2019-02-25 10:59:00.664323] W [MSGID: 122035] >>>>>>>>> [ec-common.c:571:ec_child_select] 0-tier2-disperse-6: Executing operation >>>>>>>>> with some subvolumes unavailable (20) >>>>>>>>> >>>>>>>>> [2019-02-26 03:11:35.212603] E >>>>>>>>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>>>>>>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> >>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> >>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> >>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] >>>>>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) >>>>>>>>> 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) >>>>>>>>> called at 2019-02-26 03:10:56.549903 (xid=0x106f1c5) >>>>>>>>> >>>>>>>>> [2019-02-26 03:13:03.313831] E >>>>>>>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-50: connection to >>>>>>>>> 192.168.0.56:49156 failed (Timeout della connessione); >>>>>>>>> disconnecting socket >>>>>>>>> >>>>>>>>> It seems that some subvolumes are not available and 192.168.0.56 >>>>>>>>> server (s06) is not reachable. >>>>>>>>> But gluster servers are up&running and bricks are ok. >>>>>>>>> >>>>>>>>> In attachment the updated tier2.log file. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thank you. >>>>>>>>> Regards, >>>>>>>>> Mauro >>>>>>>>> >>>>>>>>> Il giorno 26 feb 2019, alle ore 04:03, Nithya Balachandran < >>>>>>>>> nbalacha at redhat.com> ha scritto: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I see a lot of EC messages in the log but they don't seem very >>>>>>>>> serious. Xavi, can you take a look? >>>>>>>>> >>>>>>>>> The only errors I see are: >>>>>>>>> [2019-02-25 10:58:45.519871] E >>>>>>>>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>>>>>>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> >>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> >>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> >>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] >>>>>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) >>>>>>>>> 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) >>>>>>>>> called at 2019-02-25 10:57:47.429969 (xid=0xd26fe7) >>>>>>>>> [2019-02-25 10:58:51.461493] E >>>>>>>>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>>>>>>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> >>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> >>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> >>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] >>>>>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) >>>>>>>>> 0-tier2-client-41: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) >>>>>>>>> called at 2019-02-25 10:57:47.499174 (xid=0xf47d6a) >>>>>>>>> [2019-02-25 11:07:57.152874] E >>>>>>>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-70: connection to >>>>>>>>> 192.168.0.55:49163 failed (Timeout della connessione); >>>>>>>>> disconnecting socket >>>>>>>>> >>>>>>>>> >>>>>>>>> Is the df -h operation still slow? If yes, can you take a tcpdump >>>>>>>>> of the client while running df -h and send that across? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Nithya >>>>>>>>> >>>>>>>>> On Mon, 25 Feb 2019 at 17:27, Mauro Tridici >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Sorry, some minutes after my last mail message, I noticed that >>>>>>>>>> ?df -h? command hanged for a while before returns the prompt. >>>>>>>>>> Yesterday, everything was ok in the gluster client log, but, >>>>>>>>>> today, I see a lot of errors (please, take a look to the attached file). >>>>>>>>>> >>>>>>>>>> On the client node, I detected an important RAM e NETWORK usage. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Do you think that the errors have been caused by the client >>>>>>>>>> resources usage? >>>>>>>>>> >>>>>>>>>> Thank you in advance, >>>>>>>>>> Mauro >>>>>>>>>> >>>>>>>>>> >>>> >> > ------------------------- > Mauro Tridici > > Fondazione CMCC > CMCC Supercomputing Center > presso Complesso Ecotekne - Universit? del Salento - > Strada Prov.le Lecce - Monteroni sn > 73100 Lecce IT > http://www.cmcc.it > > mobile: (+39) 327 5630841 > email: mauro.tridici at cmcc.it > https://it.linkedin.com/in/mauro-tridici-5977238b > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rabhat at redhat.com Mon Mar 4 19:18:54 2019 From: rabhat at redhat.com (FNU Raghavendra Manjunath) Date: Mon, 4 Mar 2019 14:18:54 -0500 Subject: [Gluster-users] Questions and notes to "Simplify recovery steps of corrupted files" In-Reply-To: References: Message-ID: Hi David, Doing full heal after deleting the gfid entries (and the bad copy) is fine. It is not dangerous. Regards, Raghavendra On Mon, Mar 4, 2019 at 9:44 AM David Spisla wrote: > Hello Gluster Community, > > I have questions and notes concerning the steps mentioned in > https://github.com/gluster/glusterfs/issues/491 > > " *2. Delete the corrupted files* ": > In my experience there are two GFID files if a copy gets corrupted. > Example: > > > > *$ find /gluster/brick1/glusterbrick/.glusterfs -name > fc36e347-53c7-4a0a-8150-c070143d3b34/gluster/brick1/glusterbrick/.glusterfs/quarantine/fc36e347-53c7-4a0a-8150-c070143d3b34/gluster/brick1/glusterbrick/.glusterfs/fc/36/fc36e347-53c7-4a0a-8150-c070143d3b34* > > Both GFID files has to be deleted. If a copy is NOT corrupted, there seems > to be no GFID file in > *.glusterfs/quarantine . *Even one executes scub ondemand, the file is > not there. The file in *.glusterfs/quarantine* occurs if one executes > "scrub status". > > " *3. Restore the file* ": > One alternatively trigger self heal manually with > *gluster vo heal VOLNAME* > But in my experience this is not working. One have to trigger a full heal: > *gluster vo heal VOLNAME* *full* > > Imagine, one will restore a copy with manual self heal. It is neccesary to > set some VOLUME options (stat-prefetch, dht.force-readdirp and > performance.force-readdirp disabled) and mount via FUSE with some special > parameters to heal the file? > In my experience I do only a full heal after deleting the bad copy and the > GFID files. > This seems to be working. Or it is dangerous? > > Regards > David Spisla > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Tue Mar 5 06:43:19 2019 From: revirii at googlemail.com (Hu Bert) Date: Tue, 5 Mar 2019 07:43:19 +0100 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP Message-ID: Good morning, i have a replicate 3 setup with 2 volumes, running on version 5.3 on debian stretch. This morning i upgraded one server to version 5.4 and rebooted the machine; after the restart i noticed that: - no brick process is running - gluster volume status only shows the server itself: gluster volume status workdata Status of volume: workdata Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gluster1:/gluster/md4/workdata N/A N/A N N/A NFS Server on localhost N/A N/A N N/A - gluster peer status on the server gluster peer status Number of Peers: 2 Hostname: gluster3 Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a State: Peer Rejected (Connected) Hostname: gluster2 Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 State: Peer Rejected (Connected) - gluster peer status on the other 2 servers: gluster peer status Number of Peers: 2 Hostname: gluster1 Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef State: Peer Rejected (Connected) Hostname: gluster3 Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a State: Peer in Cluster (Connected) I noticed that, in the brick logs, i see that the public IP is used instead of the LAN IP. brick logs from one of the volumes: rejected node: https://pastebin.com/qkpj10Sd connected nodes: https://pastebin.com/8SxVVYFV Why is the public IP suddenly used instead of the LAN IP? Killing all gluster processes and rebooting (again) didn't help. Thx, Hubert From mchangir at redhat.com Tue Mar 5 06:58:02 2019 From: mchangir at redhat.com (Milind Changire) Date: Tue, 5 Mar 2019 12:28:02 +0530 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: Message-ID: There are probably DNS entries or /etc/hosts entries with the public IP Addresses that the host names (gluster1, gluster2, gluster3) are getting resolved to. /etc/resolv.conf would tell which is the default domain searched for the node names and the DNS servers which respond to the queries. On Tue, Mar 5, 2019 at 12:14 PM Hu Bert wrote: > Good morning, > > i have a replicate 3 setup with 2 volumes, running on version 5.3 on > debian stretch. This morning i upgraded one server to version 5.4 and > rebooted the machine; after the restart i noticed that: > > - no brick process is running > - gluster volume status only shows the server itself: > gluster volume status workdata > Status of volume: workdata > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick gluster1:/gluster/md4/workdata N/A N/A N > N/A > NFS Server on localhost N/A N/A N > N/A > > - gluster peer status on the server > gluster peer status > Number of Peers: 2 > > Hostname: gluster3 > Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > State: Peer Rejected (Connected) > > Hostname: gluster2 > Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 > State: Peer Rejected (Connected) > > - gluster peer status on the other 2 servers: > gluster peer status > Number of Peers: 2 > > Hostname: gluster1 > Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef > State: Peer Rejected (Connected) > > Hostname: gluster3 > Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > State: Peer in Cluster (Connected) > > I noticed that, in the brick logs, i see that the public IP is used > instead of the LAN IP. brick logs from one of the volumes: > > rejected node: https://pastebin.com/qkpj10Sd > connected nodes: https://pastebin.com/8SxVVYFV > > Why is the public IP suddenly used instead of the LAN IP? Killing all > gluster processes and rebooting (again) didn't help. > > > Thx, > Hubert > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -- Milind -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Tue Mar 5 07:12:04 2019 From: spisla80 at gmail.com (David Spisla) Date: Tue, 5 Mar 2019 08:12:04 +0100 Subject: [Gluster-users] Questions and notes to "Simplify recovery steps of corrupted files" In-Reply-To: References: Message-ID: Thank you for the clarification. Am Mo., 4. M?rz 2019 um 20:19 Uhr schrieb FNU Raghavendra Manjunath < rabhat at redhat.com>: > Hi David, > > Doing full heal after deleting the gfid entries (and the bad copy) is > fine. It is not dangerous. > > Regards, > Raghavendra > > On Mon, Mar 4, 2019 at 9:44 AM David Spisla wrote: > >> Hello Gluster Community, >> >> I have questions and notes concerning the steps mentioned in >> https://github.com/gluster/glusterfs/issues/491 >> >> " *2. Delete the corrupted files* ": >> In my experience there are two GFID files if a copy gets corrupted. >> Example: >> >> >> >> *$ find /gluster/brick1/glusterbrick/.glusterfs -name >> fc36e347-53c7-4a0a-8150-c070143d3b34/gluster/brick1/glusterbrick/.glusterfs/quarantine/fc36e347-53c7-4a0a-8150-c070143d3b34/gluster/brick1/glusterbrick/.glusterfs/fc/36/fc36e347-53c7-4a0a-8150-c070143d3b34* >> >> Both GFID files has to be deleted. If a copy is NOT corrupted, there >> seems to be no GFID file in >> *.glusterfs/quarantine . *Even one executes scub ondemand, the file is >> not there. The file in *.glusterfs/quarantine* occurs if one executes >> "scrub status". >> >> " *3. Restore the file* ": >> One alternatively trigger self heal manually with >> *gluster vo heal VOLNAME* >> But in my experience this is not working. One have to trigger a full heal: >> *gluster vo heal VOLNAME* *full* >> >> Imagine, one will restore a copy with manual self heal. It is neccesary >> to set some VOLUME options (stat-prefetch, dht.force-readdirp and >> performance.force-readdirp disabled) and mount via FUSE with some special >> parameters to heal the file? >> In my experience I do only a full heal after deleting the bad copy and >> the GFID files. >> This seems to be working. Or it is dangerous? >> >> Regards >> David Spisla >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Tue Mar 5 07:18:21 2019 From: revirii at googlemail.com (Hu Bert) Date: Tue, 5 Mar 2019 08:18:21 +0100 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: Message-ID: Hi Miling, well, there are such entries, but those haven't been a problem during install and the last kernel update+reboot. The entries look like: PUBLIC_IP gluster2.alpserver.de gluster2 192.168.0.50 gluster1 192.168.0.51 gluster2 192.168.0.52 gluster3 'ping gluster2' resolves to LAN IP; I removed the last entry in the 1st line, did a reboot ... no, didn't help. From /var/log/glusterfs/glusterd.log on gluster 2: [2019-03-05 07:04:36.188128] E [MSGID: 106010] [glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management: Version of Cksums persistent differ. local cksum = 3950307018, remote cksum = 455409345 on peer gluster1 [2019-03-05 07:04:36.188314] I [MSGID: 106493] [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to gluster1 (0), ret: 0, op_ret: -1 Interestingly there are no entries in the brick logs of the rejected server. Well, not surprising as no brick process is running. The server gluster1 is still in rejected state. 'gluster volume start workdata force' starts the brick process on gluster1, and some heals are happening on gluster2+3, but via 'gluster volume status workdata' the volumes still aren't complete. gluster1: ------------------------------------------------------------------------------ Brick gluster1:/gluster/md4/workdata 49152 0 Y 2523 Self-heal Daemon on localhost N/A N/A Y 2549 gluster2: Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gluster2:/gluster/md4/workdata 49153 0 Y 1723 Brick gluster3:/gluster/md4/workdata 49153 0 Y 2068 Self-heal Daemon on localhost N/A N/A Y 1732 Self-heal Daemon on gluster3 N/A N/A Y 2077 Hubert Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire : > > There are probably DNS entries or /etc/hosts entries with the public IP Addresses that the host names (gluster1, gluster2, gluster3) are getting resolved to. > /etc/resolv.conf would tell which is the default domain searched for the node names and the DNS servers which respond to the queries. > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert wrote: >> >> Good morning, >> >> i have a replicate 3 setup with 2 volumes, running on version 5.3 on >> debian stretch. This morning i upgraded one server to version 5.4 and >> rebooted the machine; after the restart i noticed that: >> >> - no brick process is running >> - gluster volume status only shows the server itself: >> gluster volume status workdata >> Status of volume: workdata >> Gluster process TCP Port RDMA Port Online Pid >> ------------------------------------------------------------------------------ >> Brick gluster1:/gluster/md4/workdata N/A N/A N N/A >> NFS Server on localhost N/A N/A N N/A >> >> - gluster peer status on the server >> gluster peer status >> Number of Peers: 2 >> >> Hostname: gluster3 >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a >> State: Peer Rejected (Connected) >> >> Hostname: gluster2 >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 >> State: Peer Rejected (Connected) >> >> - gluster peer status on the other 2 servers: >> gluster peer status >> Number of Peers: 2 >> >> Hostname: gluster1 >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef >> State: Peer Rejected (Connected) >> >> Hostname: gluster3 >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a >> State: Peer in Cluster (Connected) >> >> I noticed that, in the brick logs, i see that the public IP is used >> instead of the LAN IP. brick logs from one of the volumes: >> >> rejected node: https://pastebin.com/qkpj10Sd >> connected nodes: https://pastebin.com/8SxVVYFV >> >> Why is the public IP suddenly used instead of the LAN IP? Killing all >> gluster processes and rebooting (again) didn't help. >> >> >> Thx, >> Hubert >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Milind > From revirii at googlemail.com Tue Mar 5 07:23:59 2019 From: revirii at googlemail.com (Hu Bert) Date: Tue, 5 Mar 2019 08:23:59 +0100 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: Message-ID: Interestingly: gluster volume status misses gluster1, while heal statistics show gluster1: gluster volume status workdata Status of volume: workdata Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gluster2:/gluster/md4/workdata 49153 0 Y 1723 Brick gluster3:/gluster/md4/workdata 49153 0 Y 2068 Self-heal Daemon on localhost N/A N/A Y 1732 Self-heal Daemon on gluster3 N/A N/A Y 2077 vs. gluster volume heal workdata statistics heal-count Gathering count of entries to be healed on volume workdata has been successful Brick gluster1:/gluster/md4/workdata Number of entries: 0 Brick gluster2:/gluster/md4/workdata Number of entries: 10745 Brick gluster3:/gluster/md4/workdata Number of entries: 10744 Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert : > > Hi Miling, > > well, there are such entries, but those haven't been a problem during > install and the last kernel update+reboot. The entries look like: > > PUBLIC_IP gluster2.alpserver.de gluster2 > > 192.168.0.50 gluster1 > 192.168.0.51 gluster2 > 192.168.0.52 gluster3 > > 'ping gluster2' resolves to LAN IP; I removed the last entry in the > 1st line, did a reboot ... no, didn't help. From > /var/log/glusterfs/glusterd.log > on gluster 2: > > [2019-03-05 07:04:36.188128] E [MSGID: 106010] > [glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management: > Version of Cksums persistent differ. local cksum = 3950307018, remote > cksum = 455409345 on peer gluster1 > [2019-03-05 07:04:36.188314] I [MSGID: 106493] > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd: > Responded to gluster1 (0), ret: 0, op_ret: -1 > > Interestingly there are no entries in the brick logs of the rejected > server. Well, not surprising as no brick process is running. The > server gluster1 is still in rejected state. > > 'gluster volume start workdata force' starts the brick process on > gluster1, and some heals are happening on gluster2+3, but via 'gluster > volume status workdata' the volumes still aren't complete. > > gluster1: > ------------------------------------------------------------------------------ > Brick gluster1:/gluster/md4/workdata 49152 0 Y 2523 > Self-heal Daemon on localhost N/A N/A Y 2549 > > gluster2: > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick gluster2:/gluster/md4/workdata 49153 0 Y 1723 > Brick gluster3:/gluster/md4/workdata 49153 0 Y 2068 > Self-heal Daemon on localhost N/A N/A Y 1732 > Self-heal Daemon on gluster3 N/A N/A Y 2077 > > > Hubert > > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire : > > > > There are probably DNS entries or /etc/hosts entries with the public IP Addresses that the host names (gluster1, gluster2, gluster3) are getting resolved to. > > /etc/resolv.conf would tell which is the default domain searched for the node names and the DNS servers which respond to the queries. > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert wrote: > >> > >> Good morning, > >> > >> i have a replicate 3 setup with 2 volumes, running on version 5.3 on > >> debian stretch. This morning i upgraded one server to version 5.4 and > >> rebooted the machine; after the restart i noticed that: > >> > >> - no brick process is running > >> - gluster volume status only shows the server itself: > >> gluster volume status workdata > >> Status of volume: workdata > >> Gluster process TCP Port RDMA Port Online Pid > >> ------------------------------------------------------------------------------ > >> Brick gluster1:/gluster/md4/workdata N/A N/A N N/A > >> NFS Server on localhost N/A N/A N N/A > >> > >> - gluster peer status on the server > >> gluster peer status > >> Number of Peers: 2 > >> > >> Hostname: gluster3 > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > >> State: Peer Rejected (Connected) > >> > >> Hostname: gluster2 > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 > >> State: Peer Rejected (Connected) > >> > >> - gluster peer status on the other 2 servers: > >> gluster peer status > >> Number of Peers: 2 > >> > >> Hostname: gluster1 > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef > >> State: Peer Rejected (Connected) > >> > >> Hostname: gluster3 > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > >> State: Peer in Cluster (Connected) > >> > >> I noticed that, in the brick logs, i see that the public IP is used > >> instead of the LAN IP. brick logs from one of the volumes: > >> > >> rejected node: https://pastebin.com/qkpj10Sd > >> connected nodes: https://pastebin.com/8SxVVYFV > >> > >> Why is the public IP suddenly used instead of the LAN IP? Killing all > >> gluster processes and rebooting (again) didn't help. > >> > >> > >> Thx, > >> Hubert > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > -- > > Milind > > From hgowtham at redhat.com Tue Mar 5 07:32:01 2019 From: hgowtham at redhat.com (Hari Gowtham) Date: Tue, 5 Mar 2019 13:02:01 +0530 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: Message-ID: Hi, This is a known issue we are working on. As the checksum differs between the updated and non updated node, the peers are getting rejected. The bricks aren't coming because of the same issue. More about the issue: https://bugzilla.redhat.com/show_bug.cgi?id=1685120 On Tue, Mar 5, 2019 at 12:56 PM Hu Bert wrote: > > Interestingly: gluster volume status misses gluster1, while heal > statistics show gluster1: > > gluster volume status workdata > Status of volume: workdata > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick gluster2:/gluster/md4/workdata 49153 0 Y 1723 > Brick gluster3:/gluster/md4/workdata 49153 0 Y 2068 > Self-heal Daemon on localhost N/A N/A Y 1732 > Self-heal Daemon on gluster3 N/A N/A Y 2077 > > vs. > > gluster volume heal workdata statistics heal-count > Gathering count of entries to be healed on volume workdata has been successful > > Brick gluster1:/gluster/md4/workdata > Number of entries: 0 > > Brick gluster2:/gluster/md4/workdata > Number of entries: 10745 > > Brick gluster3:/gluster/md4/workdata > Number of entries: 10744 > > Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert : > > > > Hi Miling, > > > > well, there are such entries, but those haven't been a problem during > > install and the last kernel update+reboot. The entries look like: > > > > PUBLIC_IP gluster2.alpserver.de gluster2 > > > > 192.168.0.50 gluster1 > > 192.168.0.51 gluster2 > > 192.168.0.52 gluster3 > > > > 'ping gluster2' resolves to LAN IP; I removed the last entry in the > > 1st line, did a reboot ... no, didn't help. From > > /var/log/glusterfs/glusterd.log > > on gluster 2: > > > > [2019-03-05 07:04:36.188128] E [MSGID: 106010] > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management: > > Version of Cksums persistent differ. local cksum = 3950307018, remote > > cksum = 455409345 on peer gluster1 > > [2019-03-05 07:04:36.188314] I [MSGID: 106493] > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd: > > Responded to gluster1 (0), ret: 0, op_ret: -1 > > > > Interestingly there are no entries in the brick logs of the rejected > > server. Well, not surprising as no brick process is running. The > > server gluster1 is still in rejected state. > > > > 'gluster volume start workdata force' starts the brick process on > > gluster1, and some heals are happening on gluster2+3, but via 'gluster > > volume status workdata' the volumes still aren't complete. > > > > gluster1: > > ------------------------------------------------------------------------------ > > Brick gluster1:/gluster/md4/workdata 49152 0 Y 2523 > > Self-heal Daemon on localhost N/A N/A Y 2549 > > > > gluster2: > > Gluster process TCP Port RDMA Port Online Pid > > ------------------------------------------------------------------------------ > > Brick gluster2:/gluster/md4/workdata 49153 0 Y 1723 > > Brick gluster3:/gluster/md4/workdata 49153 0 Y 2068 > > Self-heal Daemon on localhost N/A N/A Y 1732 > > Self-heal Daemon on gluster3 N/A N/A Y 2077 > > > > > > Hubert > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire : > > > > > > There are probably DNS entries or /etc/hosts entries with the public IP Addresses that the host names (gluster1, gluster2, gluster3) are getting resolved to. > > > /etc/resolv.conf would tell which is the default domain searched for the node names and the DNS servers which respond to the queries. > > > > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert wrote: > > >> > > >> Good morning, > > >> > > >> i have a replicate 3 setup with 2 volumes, running on version 5.3 on > > >> debian stretch. This morning i upgraded one server to version 5.4 and > > >> rebooted the machine; after the restart i noticed that: > > >> > > >> - no brick process is running > > >> - gluster volume status only shows the server itself: > > >> gluster volume status workdata > > >> Status of volume: workdata > > >> Gluster process TCP Port RDMA Port Online Pid > > >> ------------------------------------------------------------------------------ > > >> Brick gluster1:/gluster/md4/workdata N/A N/A N N/A > > >> NFS Server on localhost N/A N/A N N/A > > >> > > >> - gluster peer status on the server > > >> gluster peer status > > >> Number of Peers: 2 > > >> > > >> Hostname: gluster3 > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > >> State: Peer Rejected (Connected) > > >> > > >> Hostname: gluster2 > > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 > > >> State: Peer Rejected (Connected) > > >> > > >> - gluster peer status on the other 2 servers: > > >> gluster peer status > > >> Number of Peers: 2 > > >> > > >> Hostname: gluster1 > > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef > > >> State: Peer Rejected (Connected) > > >> > > >> Hostname: gluster3 > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > >> State: Peer in Cluster (Connected) > > >> > > >> I noticed that, in the brick logs, i see that the public IP is used > > >> instead of the LAN IP. brick logs from one of the volumes: > > >> > > >> rejected node: https://pastebin.com/qkpj10Sd > > >> connected nodes: https://pastebin.com/8SxVVYFV > > >> > > >> Why is the public IP suddenly used instead of the LAN IP? Killing all > > >> gluster processes and rebooting (again) didn't help. > > >> > > >> > > >> Thx, > > >> Hubert > > >> _______________________________________________ > > >> Gluster-users mailing list > > >> Gluster-users at gluster.org > > >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > -- > > > Milind > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Regards, Hari Gowtham. From revirii at googlemail.com Tue Mar 5 07:36:48 2019 From: revirii at googlemail.com (Hu Bert) Date: Tue, 5 Mar 2019 08:36:48 +0100 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: Message-ID: Hi Hari, thx for the hint. Do you know when this will be fixed? Is a downgrade 5.4 -> 5.3 a possibility to fix this? Hubert Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb Hari Gowtham : > > Hi, > > This is a known issue we are working on. > As the checksum differs between the updated and non updated node, the > peers are getting rejected. > The bricks aren't coming because of the same issue. > > More about the issue: https://bugzilla.redhat.com/show_bug.cgi?id=1685120 > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert wrote: > > > > Interestingly: gluster volume status misses gluster1, while heal > > statistics show gluster1: > > > > gluster volume status workdata > > Status of volume: workdata > > Gluster process TCP Port RDMA Port Online Pid > > ------------------------------------------------------------------------------ > > Brick gluster2:/gluster/md4/workdata 49153 0 Y 1723 > > Brick gluster3:/gluster/md4/workdata 49153 0 Y 2068 > > Self-heal Daemon on localhost N/A N/A Y 1732 > > Self-heal Daemon on gluster3 N/A N/A Y 2077 > > > > vs. > > > > gluster volume heal workdata statistics heal-count > > Gathering count of entries to be healed on volume workdata has been successful > > > > Brick gluster1:/gluster/md4/workdata > > Number of entries: 0 > > > > Brick gluster2:/gluster/md4/workdata > > Number of entries: 10745 > > > > Brick gluster3:/gluster/md4/workdata > > Number of entries: 10744 > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert : > > > > > > Hi Miling, > > > > > > well, there are such entries, but those haven't been a problem during > > > install and the last kernel update+reboot. The entries look like: > > > > > > PUBLIC_IP gluster2.alpserver.de gluster2 > > > > > > 192.168.0.50 gluster1 > > > 192.168.0.51 gluster2 > > > 192.168.0.52 gluster3 > > > > > > 'ping gluster2' resolves to LAN IP; I removed the last entry in the > > > 1st line, did a reboot ... no, didn't help. From > > > /var/log/glusterfs/glusterd.log > > > on gluster 2: > > > > > > [2019-03-05 07:04:36.188128] E [MSGID: 106010] > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management: > > > Version of Cksums persistent differ. local cksum = 3950307018, remote > > > cksum = 455409345 on peer gluster1 > > > [2019-03-05 07:04:36.188314] I [MSGID: 106493] > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd: > > > Responded to gluster1 (0), ret: 0, op_ret: -1 > > > > > > Interestingly there are no entries in the brick logs of the rejected > > > server. Well, not surprising as no brick process is running. The > > > server gluster1 is still in rejected state. > > > > > > 'gluster volume start workdata force' starts the brick process on > > > gluster1, and some heals are happening on gluster2+3, but via 'gluster > > > volume status workdata' the volumes still aren't complete. > > > > > > gluster1: > > > ------------------------------------------------------------------------------ > > > Brick gluster1:/gluster/md4/workdata 49152 0 Y 2523 > > > Self-heal Daemon on localhost N/A N/A Y 2549 > > > > > > gluster2: > > > Gluster process TCP Port RDMA Port Online Pid > > > ------------------------------------------------------------------------------ > > > Brick gluster2:/gluster/md4/workdata 49153 0 Y 1723 > > > Brick gluster3:/gluster/md4/workdata 49153 0 Y 2068 > > > Self-heal Daemon on localhost N/A N/A Y 1732 > > > Self-heal Daemon on gluster3 N/A N/A Y 2077 > > > > > > > > > Hubert > > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire : > > > > > > > > There are probably DNS entries or /etc/hosts entries with the public IP Addresses that the host names (gluster1, gluster2, gluster3) are getting resolved to. > > > > /etc/resolv.conf would tell which is the default domain searched for the node names and the DNS servers which respond to the queries. > > > > > > > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert wrote: > > > >> > > > >> Good morning, > > > >> > > > >> i have a replicate 3 setup with 2 volumes, running on version 5.3 on > > > >> debian stretch. This morning i upgraded one server to version 5.4 and > > > >> rebooted the machine; after the restart i noticed that: > > > >> > > > >> - no brick process is running > > > >> - gluster volume status only shows the server itself: > > > >> gluster volume status workdata > > > >> Status of volume: workdata > > > >> Gluster process TCP Port RDMA Port Online Pid > > > >> ------------------------------------------------------------------------------ > > > >> Brick gluster1:/gluster/md4/workdata N/A N/A N N/A > > > >> NFS Server on localhost N/A N/A N N/A > > > >> > > > >> - gluster peer status on the server > > > >> gluster peer status > > > >> Number of Peers: 2 > > > >> > > > >> Hostname: gluster3 > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > >> State: Peer Rejected (Connected) > > > >> > > > >> Hostname: gluster2 > > > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 > > > >> State: Peer Rejected (Connected) > > > >> > > > >> - gluster peer status on the other 2 servers: > > > >> gluster peer status > > > >> Number of Peers: 2 > > > >> > > > >> Hostname: gluster1 > > > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef > > > >> State: Peer Rejected (Connected) > > > >> > > > >> Hostname: gluster3 > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > >> State: Peer in Cluster (Connected) > > > >> > > > >> I noticed that, in the brick logs, i see that the public IP is used > > > >> instead of the LAN IP. brick logs from one of the volumes: > > > >> > > > >> rejected node: https://pastebin.com/qkpj10Sd > > > >> connected nodes: https://pastebin.com/8SxVVYFV > > > >> > > > >> Why is the public IP suddenly used instead of the LAN IP? Killing all > > > >> gluster processes and rebooting (again) didn't help. > > > >> > > > >> > > > >> Thx, > > > >> Hubert > > > >> _______________________________________________ > > > >> Gluster-users mailing list > > > >> Gluster-users at gluster.org > > > >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > -- > > > > Milind > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Regards, > Hari Gowtham. From sabose at redhat.com Tue Mar 5 07:50:20 2019 From: sabose at redhat.com (Sahina Bose) Date: Tue, 5 Mar 2019 13:20:20 +0530 Subject: [Gluster-users] [ovirt-users] Re: Advice around ovirt 4.3 / gluster 5.x In-Reply-To: References: Message-ID: Adding gluster ml On Mon, Mar 4, 2019 at 7:17 AM Guillaume Pavese wrote: > > I got that too so upgraded to gluster6-rc0 nit still, this morning one engine brick is down : > > [2019-03-04 01:33:22.492206] E [MSGID: 101191] [event-epoll.c:765:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler > [2019-03-04 01:38:34.601381] I [addr.c:54:compare_addr_and_update] 0-/gluster_bricks/engine/engine: allowed = "*", received addr = "10.199.211.5" > [2019-03-04 01:38:34.601410] I [login.c:110:gf_auth] 0-auth/login: allowed user names: 9e360b5b-34d3-4076-bc7e-ed78e4e0dc01 > [2019-03-04 01:38:34.601421] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-engine-server: accepted client from CTX_ID:f7603ec6-9914-408b-85e6-e64e9844e326-GRAPH_ID:0-PID:300490-HOST:ps-inf-int-kvm-fr-305-210.hostics.fr-PC_NAME:engine-client-0-RECON_NO:-0 (version: 6.0rc0) with subvol /gluster_bricks/engine/engine > [2019-03-04 01:38:34.610400] I [MSGID: 115036] [server.c:498:server_rpc_notify] 0-engine-server: disconnecting connection from CTX_ID:f7603ec6-9914-408b-85e6-e64e9844e326-GRAPH_ID:0-PID:300490-HOST:ps-inf-int-kvm-fr-305-210.hostics.fr-PC_NAME:engine-client-0-RECON_NO:-0 > [2019-03-04 01:38:34.610531] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-engine-server: Shutting down connection CTX_ID:f7603ec6-9914-408b-85e6-e64e9844e326-GRAPH_ID:0-PID:300490-HOST:ps-inf-int-kvm-fr-305-210.hostics.fr-PC_NAME:engine-client-0-RECON_NO:-0 > [2019-03-04 01:38:34.610574] E [MSGID: 101191] [event-epoll.c:765:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler > [2019-03-04 01:39:18.520347] I [addr.c:54:compare_addr_and_update] 0-/gluster_bricks/engine/engine: allowed = "*", received addr = "10.199.211.5" > [2019-03-04 01:39:18.520373] I [login.c:110:gf_auth] 0-auth/login: allowed user names: 9e360b5b-34d3-4076-bc7e-ed78e4e0dc01 > [2019-03-04 01:39:18.520383] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-engine-server: accepted client from CTX_ID:f3be82ea-6340-4bd4-afb3-aa9db432f779-GRAPH_ID:0-PID:300885-HOST:ps-inf-int-kvm-fr-305-210.hostics.fr-PC_NAME:engine-client-0-RECON_NO:-0 (version: 6.0rc0) with subvol /gluster_bricks/engine/engine > [2019-03-04 01:39:19.711947] I [MSGID: 115036] [server.c:498:server_rpc_notify] 0-engine-server: disconnecting connection from CTX_ID:f3be82ea-6340-4bd4-afb3-aa9db432f779-GRAPH_ID:0-PID:300885-HOST:ps-inf-int-kvm-fr-305-210.hostics.fr-PC_NAME:engine-client-0-RECON_NO:-0 > [2019-03-04 01:39:19.712431] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-engine-server: Shutting down connection CTX_ID:f3be82ea-6340-4bd4-afb3-aa9db432f779-GRAPH_ID:0-PID:300885-HOST:ps-inf-int-kvm-fr-305-210.hostics.fr-PC_NAME:engine-client-0-RECON_NO:-0 > [2019-03-04 01:39:19.712484] E [MSGID: 101191] [event-epoll.c:765:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler > (END) > > > Guillaume Pavese > Ing?nieur Syst?me et R?seau > Interactiv-Group > > > On Mon, Mar 4, 2019 at 3:56 AM Endre Karlson wrote: >> >> I have tried bumping to 5.4 now and still getting alot of "Failed Eventhandler" errors in the logs, any ideas guys? >> >> Den s?n. 3. mar. 2019 kl. 09:03 skrev Guillaume Pavese : >>> >>> Gluster 5.4 is released but not yet in official repository >>> If like me you can not wait the official release of Gluster 5.4 with the instability bugfixes (planned for around March 12 hopefully), you can use the following repository : >>> >>> For Gluster 5.4-1 : >>> >>> #/etc/yum.repos.d/Gluster5-Testing.repo >>> [Gluster5-Testing] >>> name=Gluster5-Testing $basearch >>> baseurl=https://cbs.centos.org/repos/storage7-gluster-5-testing/os/$basearch/ >>> enabled=1 >>> #metadata_expire=60m >>> gpgcheck=0 >>> >>> >>> If adventurous ;) Gluster 6-rc0 : >>> >>> #/etc/yum.repos.d/Gluster6-Testing.repo >>> [Gluster6-Testing] >>> name=Gluster6-Testing $basearch >>> baseurl=https://cbs.centos.org/repos/storage7-gluster-6-testing/os/$basearch/ >>> enabled=1 >>> #metadata_expire=60m >>> gpgcheck=0 >>> >>> >>> GLHF >>> >>> Guillaume Pavese >>> Ing?nieur Syst?me et R?seau >>> Interactiv-Group >>> >>> >>> On Sun, Mar 3, 2019 at 6:16 AM Endre Karlson wrote: >>>> >>>> Hi, should we downgrade / reinstall our cluster? we have a 4 node cluster that's breakin apart daily due to the issues with GlusterFS after upgrading from 4.2.8 that was rock solid. I am wondering why 4.3 was released as a stable version at all?? **FRUSTRATION** >>>> >>>> Endre >>>> _______________________________________________ >>>> Users mailing list -- users at ovirt.org >>>> To unsubscribe send an email to users-leave at ovirt.org >>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ >>>> List Archives: https://lists.ovirt.org/archives/list/users at ovirt.org/message/3TJKJGGWCANXWZED2WF5ZHTSRS2DVHR2/ > > _______________________________________________ > Users mailing list -- users at ovirt.org > To unsubscribe send an email to users-leave at ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ > List Archives: https://lists.ovirt.org/archives/list/users at ovirt.org/message/53PH4H7HNDVQOTJSYYUO77KPFUH2TOPT/ From hgowtham at redhat.com Tue Mar 5 08:26:09 2019 From: hgowtham at redhat.com (Hari Gowtham) Date: Tue, 5 Mar 2019 13:56:09 +0530 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: Message-ID: There are plans to revert the patch causing this error and rebuilt 5.4. This should happen faster. the rebuilt 5.4 should be void of this upgrade issue. In the meantime, you can use 5.3 for this cluster. Downgrading to 5.3 will work if it was just one node that was upgrade to 5.4 and the other nodes are still in 5.3. On Tue, Mar 5, 2019 at 1:07 PM Hu Bert wrote: > > Hi Hari, > > thx for the hint. Do you know when this will be fixed? Is a downgrade > 5.4 -> 5.3 a possibility to fix this? > > Hubert > > Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb Hari Gowtham : > > > > Hi, > > > > This is a known issue we are working on. > > As the checksum differs between the updated and non updated node, the > > peers are getting rejected. > > The bricks aren't coming because of the same issue. > > > > More about the issue: https://bugzilla.redhat.com/show_bug.cgi?id=1685120 > > > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert wrote: > > > > > > Interestingly: gluster volume status misses gluster1, while heal > > > statistics show gluster1: > > > > > > gluster volume status workdata > > > Status of volume: workdata > > > Gluster process TCP Port RDMA Port Online Pid > > > ------------------------------------------------------------------------------ > > > Brick gluster2:/gluster/md4/workdata 49153 0 Y 1723 > > > Brick gluster3:/gluster/md4/workdata 49153 0 Y 2068 > > > Self-heal Daemon on localhost N/A N/A Y 1732 > > > Self-heal Daemon on gluster3 N/A N/A Y 2077 > > > > > > vs. > > > > > > gluster volume heal workdata statistics heal-count > > > Gathering count of entries to be healed on volume workdata has been successful > > > > > > Brick gluster1:/gluster/md4/workdata > > > Number of entries: 0 > > > > > > Brick gluster2:/gluster/md4/workdata > > > Number of entries: 10745 > > > > > > Brick gluster3:/gluster/md4/workdata > > > Number of entries: 10744 > > > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert : > > > > > > > > Hi Miling, > > > > > > > > well, there are such entries, but those haven't been a problem during > > > > install and the last kernel update+reboot. The entries look like: > > > > > > > > PUBLIC_IP gluster2.alpserver.de gluster2 > > > > > > > > 192.168.0.50 gluster1 > > > > 192.168.0.51 gluster2 > > > > 192.168.0.52 gluster3 > > > > > > > > 'ping gluster2' resolves to LAN IP; I removed the last entry in the > > > > 1st line, did a reboot ... no, didn't help. From > > > > /var/log/glusterfs/glusterd.log > > > > on gluster 2: > > > > > > > > [2019-03-05 07:04:36.188128] E [MSGID: 106010] > > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management: > > > > Version of Cksums persistent differ. local cksum = 3950307018, remote > > > > cksum = 455409345 on peer gluster1 > > > > [2019-03-05 07:04:36.188314] I [MSGID: 106493] > > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd: > > > > Responded to gluster1 (0), ret: 0, op_ret: -1 > > > > > > > > Interestingly there are no entries in the brick logs of the rejected > > > > server. Well, not surprising as no brick process is running. The > > > > server gluster1 is still in rejected state. > > > > > > > > 'gluster volume start workdata force' starts the brick process on > > > > gluster1, and some heals are happening on gluster2+3, but via 'gluster > > > > volume status workdata' the volumes still aren't complete. > > > > > > > > gluster1: > > > > ------------------------------------------------------------------------------ > > > > Brick gluster1:/gluster/md4/workdata 49152 0 Y 2523 > > > > Self-heal Daemon on localhost N/A N/A Y 2549 > > > > > > > > gluster2: > > > > Gluster process TCP Port RDMA Port Online Pid > > > > ------------------------------------------------------------------------------ > > > > Brick gluster2:/gluster/md4/workdata 49153 0 Y 1723 > > > > Brick gluster3:/gluster/md4/workdata 49153 0 Y 2068 > > > > Self-heal Daemon on localhost N/A N/A Y 1732 > > > > Self-heal Daemon on gluster3 N/A N/A Y 2077 > > > > > > > > > > > > Hubert > > > > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire : > > > > > > > > > > There are probably DNS entries or /etc/hosts entries with the public IP Addresses that the host names (gluster1, gluster2, gluster3) are getting resolved to. > > > > > /etc/resolv.conf would tell which is the default domain searched for the node names and the DNS servers which respond to the queries. > > > > > > > > > > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert wrote: > > > > >> > > > > >> Good morning, > > > > >> > > > > >> i have a replicate 3 setup with 2 volumes, running on version 5.3 on > > > > >> debian stretch. This morning i upgraded one server to version 5.4 and > > > > >> rebooted the machine; after the restart i noticed that: > > > > >> > > > > >> - no brick process is running > > > > >> - gluster volume status only shows the server itself: > > > > >> gluster volume status workdata > > > > >> Status of volume: workdata > > > > >> Gluster process TCP Port RDMA Port Online Pid > > > > >> ------------------------------------------------------------------------------ > > > > >> Brick gluster1:/gluster/md4/workdata N/A N/A N N/A > > > > >> NFS Server on localhost N/A N/A N N/A > > > > >> > > > > >> - gluster peer status on the server > > > > >> gluster peer status > > > > >> Number of Peers: 2 > > > > >> > > > > >> Hostname: gluster3 > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > > >> State: Peer Rejected (Connected) > > > > >> > > > > >> Hostname: gluster2 > > > > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 > > > > >> State: Peer Rejected (Connected) > > > > >> > > > > >> - gluster peer status on the other 2 servers: > > > > >> gluster peer status > > > > >> Number of Peers: 2 > > > > >> > > > > >> Hostname: gluster1 > > > > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef > > > > >> State: Peer Rejected (Connected) > > > > >> > > > > >> Hostname: gluster3 > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > > >> State: Peer in Cluster (Connected) > > > > >> > > > > >> I noticed that, in the brick logs, i see that the public IP is used > > > > >> instead of the LAN IP. brick logs from one of the volumes: > > > > >> > > > > >> rejected node: https://pastebin.com/qkpj10Sd > > > > >> connected nodes: https://pastebin.com/8SxVVYFV > > > > >> > > > > >> Why is the public IP suddenly used instead of the LAN IP? Killing all > > > > >> gluster processes and rebooting (again) didn't help. > > > > >> > > > > >> > > > > >> Thx, > > > > >> Hubert > > > > >> _______________________________________________ > > > > >> Gluster-users mailing list > > > > >> Gluster-users at gluster.org > > > > >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > -- > > > > > Milind > > > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > -- > > Regards, > > Hari Gowtham. -- Regards, Hari Gowtham. From aspandey at redhat.com Tue Mar 5 08:51:43 2019 From: aspandey at redhat.com (Ashish Pandey) Date: Tue, 5 Mar 2019 03:51:43 -0500 (EST) Subject: [Gluster-users] Gluster : Improvements on "heal info" command In-Reply-To: <1555599631.5547169.1551771451185.JavaMail.zimbra@redhat.com> Message-ID: <1136179364.5559263.1551775903014.JavaMail.zimbra@redhat.com> Hi All, We have observed and heard from gluster users about the long time "heal info" command takes. Even when we all want to know if a gluster volume is healthy or not, it takes time to list down all the files from all the bricks after which we can be sure if the volume is healthy or not. Here, we have come up with some options for "heal info" command which provide report quickly and reliably. gluster v heal vol info --subvol=[number of the subvol] --brick=[one,all] -------- Problem: "gluster v heal info" command picks each subvolume and checks the .glusterfs/indices/xattrop folder of every brick of that subvolume to find out if there is any entry which needs to be healed. It picks the entry and takes a lock on that entry to check xattrs to find out if that entry actually needs heal or not. This LOCK->CHECK-XATTR->UNLOCK cycle takes lot of time for each file. Let's consider two most often seen cases for which we use "heal info" and try to understand the improvements. Case -1 : Consider 4+2 EC volume and all the bricks on 6 different nodes. A brick of the volume is down and client has written 10000 files on one of the mount point of this volume. Entries for these 10K files will be created on ".glusterfs/indices/xattrop" on all the rest of 5 bricks. Now, brick is UP and when we use "heal info" command for this volume, it goes to all the bricks and picks these 10K file entries and goes through LOCK->CHECK-XATTR->UNLOCK cycle for all the files. This happens for all the bricks, that means, we check 50K files and perform the LOCK->CHECK-XATTR->UNLOCK cycle 50K times, while only 10K entries were sufficient to check. It is a very time consuming operation. If IO"s are happening one some of the new files, we check these files also which will add the time. Here, all we wanted to know if our volume has been healed and healthy. Solution : Whenever a brick goes down and comes up and when we use "heal info" command, our *main intention* is to find out if the volume is *healthy* or *unhealthy*. A volume is unhealthy even if one file is not healthy. So, we should scan bricks one by one and as soon as we find that one brick is having some entries which require to be healed, we can come out and list the files and say the volume is not healthy. No need to scan rest of the bricks. That's where "--brick=[one,all]" option has been introduced. "gluster v heal vol info --brick=[one,all]" "one" - It will scan the brick sequentially and as soon as it will find any unhealthy entries, it will list it out and stop scanning other bricks. "all" - It will act just like current behavior and provide all the files from all the bricks. If we do not provide this option, default (current) behavior will be applicable. Case -2 : Consider 24 X (4+2) EC volume. Let's say one brick from *only one* of the sub volume has been replaced and a heal has been triggered. To know if the volume is in healthy state, we go to each brick of *each and every sub volume* and check if there are any entries in ".glusterfs/indices/xattrop" folder which need heal or not. If we know which sub volume participated in brick replacement, we just need to check health of that sub volume and not query/check other sub volumes. If several clients are writing number of files on this volume, an entry for each of these files will be created in .glusterfs/indices/xattrop and "heal info' command will go through LOCK->CHECK-XATTR->UNLOCK cycle to find out if these entries need heal or not which takes lot of time. In addition to this a client will also see performance drop as it will have to release and take lock again. Solution: Provide an option to mention number of sub volume for which we want to check heal info. "gluster v heal vol info --subvol= " Here, --subvol will be given number of the subvolume we want to check. Example: "gluster v heal vol info --subvol=1 " =================================== Performance Data - A quick performance test done on standalone system. Type: Distributed-Disperse Volume ID: ea40eb13-d42c-431c-9c89-0153e834e67e Status: Started Snapshot Count: 0 Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: apandey:/home/apandey/bricks/gluster/vol-1 Brick2: apandey:/home/apandey/bricks/gluster/vol-2 Brick3: apandey:/home/apandey/bricks/gluster/vol-3 Brick4: apandey:/home/apandey/bricks/gluster/vol-4 Brick5: apandey:/home/apandey/bricks/gluster/vol-5 Brick6: apandey:/home/apandey/bricks/gluster/vol-6 Brick7: apandey:/home/apandey/bricks/gluster/new-1 Brick8: apandey:/home/apandey/bricks/gluster/new-2 Brick9: apandey:/home/apandey/bricks/gluster/new-3 Brick10: apandey:/home/apandey/bricks/gluster/new-4 Brick11: apandey:/home/apandey/bricks/gluster/new-5 Brick12: apandey:/home/apandey/bricks/gluster/new-6 Just disabled the shd to get the data - Killed one brick each from two subvolumes and wrote 2000 files on mount point. [root at apandey vol]# for i in {1..2000};do echo abc >> file-$i; done Start the volume using force option and get the heal info. Following is the data - [root at apandey glusterfs]# time gluster v heal vol info --brick=one >> /dev/null <<<<<<<< This will scan brick one by one and come out as soon as we find volume is unhealthy. real 0m8.316s user 0m2.241s sys 0m1.278s [root at apandey glusterfs]# [root at apandey glusterfs]# time gluster v heal vol info >> /dev/null <<<<<<<< This is current behavior. real 0m26.097s user 0m10.868s sys 0m6.198s [root at apandey glusterfs]# =================================== I would like your comments/suggestions on this improvements. Specially, would like to hear on the new syntax of the command - gluster v heal vol info --subvol=[number of the subvol] --brick=[one,all] Note that if we do not provide new options, command will behave just like it does right now. Also, this improvement is valid for AFR and EC. --- Ashish -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Tue Mar 5 09:13:43 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Tue, 5 Mar 2019 14:43:43 +0530 Subject: [Gluster-users] Experiences with FUSE in real world - Presentation at Vault 2019 Message-ID: All, Recently me, Manoj and Csaba presented on positives and negatives of implementing File systems in userspace using FUSE [1]. We had based the talk on our experiences with Glusterfs having FUSE as the native interface. The slides can also be found at [1]. [1] https://www.usenix.org/conference/vault19/presentation/pillai regards, Raghavendra -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Tue Mar 5 10:01:42 2019 From: revirii at googlemail.com (Hu Bert) Date: Tue, 5 Mar 2019 11:01:42 +0100 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: Message-ID: fyi: did a downgrade 5.4 -> 5.3 and it worked. all replicas are up and running. Awaiting updated v5.4. thx :-) Am Di., 5. M?rz 2019 um 09:26 Uhr schrieb Hari Gowtham : > > There are plans to revert the patch causing this error and rebuilt 5.4. > This should happen faster. the rebuilt 5.4 should be void of this upgrade issue. > > In the meantime, you can use 5.3 for this cluster. > Downgrading to 5.3 will work if it was just one node that was upgrade to 5.4 > and the other nodes are still in 5.3. > > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert wrote: > > > > Hi Hari, > > > > thx for the hint. Do you know when this will be fixed? Is a downgrade > > 5.4 -> 5.3 a possibility to fix this? > > > > Hubert > > > > Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb Hari Gowtham : > > > > > > Hi, > > > > > > This is a known issue we are working on. > > > As the checksum differs between the updated and non updated node, the > > > peers are getting rejected. > > > The bricks aren't coming because of the same issue. > > > > > > More about the issue: https://bugzilla.redhat.com/show_bug.cgi?id=1685120 > > > > > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert wrote: > > > > > > > > Interestingly: gluster volume status misses gluster1, while heal > > > > statistics show gluster1: > > > > > > > > gluster volume status workdata > > > > Status of volume: workdata > > > > Gluster process TCP Port RDMA Port Online Pid > > > > ------------------------------------------------------------------------------ > > > > Brick gluster2:/gluster/md4/workdata 49153 0 Y 1723 > > > > Brick gluster3:/gluster/md4/workdata 49153 0 Y 2068 > > > > Self-heal Daemon on localhost N/A N/A Y 1732 > > > > Self-heal Daemon on gluster3 N/A N/A Y 2077 > > > > > > > > vs. > > > > > > > > gluster volume heal workdata statistics heal-count > > > > Gathering count of entries to be healed on volume workdata has been successful > > > > > > > > Brick gluster1:/gluster/md4/workdata > > > > Number of entries: 0 > > > > > > > > Brick gluster2:/gluster/md4/workdata > > > > Number of entries: 10745 > > > > > > > > Brick gluster3:/gluster/md4/workdata > > > > Number of entries: 10744 > > > > > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert : > > > > > > > > > > Hi Miling, > > > > > > > > > > well, there are such entries, but those haven't been a problem during > > > > > install and the last kernel update+reboot. The entries look like: > > > > > > > > > > PUBLIC_IP gluster2.alpserver.de gluster2 > > > > > > > > > > 192.168.0.50 gluster1 > > > > > 192.168.0.51 gluster2 > > > > > 192.168.0.52 gluster3 > > > > > > > > > > 'ping gluster2' resolves to LAN IP; I removed the last entry in the > > > > > 1st line, did a reboot ... no, didn't help. From > > > > > /var/log/glusterfs/glusterd.log > > > > > on gluster 2: > > > > > > > > > > [2019-03-05 07:04:36.188128] E [MSGID: 106010] > > > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management: > > > > > Version of Cksums persistent differ. local cksum = 3950307018, remote > > > > > cksum = 455409345 on peer gluster1 > > > > > [2019-03-05 07:04:36.188314] I [MSGID: 106493] > > > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd: > > > > > Responded to gluster1 (0), ret: 0, op_ret: -1 > > > > > > > > > > Interestingly there are no entries in the brick logs of the rejected > > > > > server. Well, not surprising as no brick process is running. The > > > > > server gluster1 is still in rejected state. > > > > > > > > > > 'gluster volume start workdata force' starts the brick process on > > > > > gluster1, and some heals are happening on gluster2+3, but via 'gluster > > > > > volume status workdata' the volumes still aren't complete. > > > > > > > > > > gluster1: > > > > > ------------------------------------------------------------------------------ > > > > > Brick gluster1:/gluster/md4/workdata 49152 0 Y 2523 > > > > > Self-heal Daemon on localhost N/A N/A Y 2549 > > > > > > > > > > gluster2: > > > > > Gluster process TCP Port RDMA Port Online Pid > > > > > ------------------------------------------------------------------------------ > > > > > Brick gluster2:/gluster/md4/workdata 49153 0 Y 1723 > > > > > Brick gluster3:/gluster/md4/workdata 49153 0 Y 2068 > > > > > Self-heal Daemon on localhost N/A N/A Y 1732 > > > > > Self-heal Daemon on gluster3 N/A N/A Y 2077 > > > > > > > > > > > > > > > Hubert > > > > > > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire : > > > > > > > > > > > > There are probably DNS entries or /etc/hosts entries with the public IP Addresses that the host names (gluster1, gluster2, gluster3) are getting resolved to. > > > > > > /etc/resolv.conf would tell which is the default domain searched for the node names and the DNS servers which respond to the queries. > > > > > > > > > > > > > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert wrote: > > > > > >> > > > > > >> Good morning, > > > > > >> > > > > > >> i have a replicate 3 setup with 2 volumes, running on version 5.3 on > > > > > >> debian stretch. This morning i upgraded one server to version 5.4 and > > > > > >> rebooted the machine; after the restart i noticed that: > > > > > >> > > > > > >> - no brick process is running > > > > > >> - gluster volume status only shows the server itself: > > > > > >> gluster volume status workdata > > > > > >> Status of volume: workdata > > > > > >> Gluster process TCP Port RDMA Port Online Pid > > > > > >> ------------------------------------------------------------------------------ > > > > > >> Brick gluster1:/gluster/md4/workdata N/A N/A N N/A > > > > > >> NFS Server on localhost N/A N/A N N/A > > > > > >> > > > > > >> - gluster peer status on the server > > > > > >> gluster peer status > > > > > >> Number of Peers: 2 > > > > > >> > > > > > >> Hostname: gluster3 > > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > > > >> State: Peer Rejected (Connected) > > > > > >> > > > > > >> Hostname: gluster2 > > > > > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 > > > > > >> State: Peer Rejected (Connected) > > > > > >> > > > > > >> - gluster peer status on the other 2 servers: > > > > > >> gluster peer status > > > > > >> Number of Peers: 2 > > > > > >> > > > > > >> Hostname: gluster1 > > > > > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef > > > > > >> State: Peer Rejected (Connected) > > > > > >> > > > > > >> Hostname: gluster3 > > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > > > >> State: Peer in Cluster (Connected) > > > > > >> > > > > > >> I noticed that, in the brick logs, i see that the public IP is used > > > > > >> instead of the LAN IP. brick logs from one of the volumes: > > > > > >> > > > > > >> rejected node: https://pastebin.com/qkpj10Sd > > > > > >> connected nodes: https://pastebin.com/8SxVVYFV > > > > > >> > > > > > >> Why is the public IP suddenly used instead of the LAN IP? Killing all > > > > > >> gluster processes and rebooting (again) didn't help. > > > > > >> > > > > > >> > > > > > >> Thx, > > > > > >> Hubert > > > > > >> _______________________________________________ > > > > > >> Gluster-users mailing list > > > > > >> Gluster-users at gluster.org > > > > > >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Milind > > > > > > > > > > _______________________________________________ > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > -- > > > Regards, > > > Hari Gowtham. > > > > -- > Regards, > Hari Gowtham. From ville-pekka.vainio at csc.fi Tue Mar 5 13:13:53 2019 From: ville-pekka.vainio at csc.fi (Ville-Pekka Vainio) Date: Tue, 5 Mar 2019 15:13:53 +0200 Subject: [Gluster-users] Constant fuse client crashes "fixed" by setting performance.write-behind: off. Any hope for a 4.1.8 release? Message-ID: <221F501D-887A-47D7-B2FE-7F7A2AAE08C8@csc.fi> Hi all, We?ve seen intermittent Gluster fuse client crashes with Gluster 4.1.7 on CentOS 7.6.1810 when using a Distributed-Replicate volume. Today, for some reason, the fuse client started crashing constantly. I was able to work around the crashes by using performance.write-behind: off, which was mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1671556 The bug report has some patches attached to it. Is there any hope for a 4.1.8 release with these fixes? https://www.gluster.org/release-schedule/ says 4.1.7 is an EOL version. Best regards, Ville-Pekka Vainio From khiremat at redhat.com Tue Mar 5 15:13:23 2019 From: khiremat at redhat.com (Kotresh Hiremath Ravishankar) Date: Tue, 5 Mar 2019 20:43:23 +0530 Subject: [Gluster-users] [Gluster-devel] Bitrot: Time of signing depending on the file size??? In-Reply-To: References: Message-ID: Hi David, Thanks for raising the bug. But from the above validation, it's clear that bitrot is not directly involved. Bitrot waits for last fd to be closed. We will have to investigate the reason for fd not being closed for large files. Thanks, Kotresh HR On Mon, Mar 4, 2019 at 3:13 PM David Spisla wrote: > Hello Kotresh, > > Yes, the fd was still open for larger files. I could verify this with a > 500MiB file and some smaller files. After a specific time only the fd for > the 500MiB was up and the file still had no signature, for the smaller > files there were no fds and they already had a signature. I don't know the > reason for this. Maybe the client still keep th fd open? I opened a bug for > this: > https://bugzilla.redhat.com/show_bug.cgi?id=1685023 > > Regards > David > > Am Fr., 1. M?rz 2019 um 18:29 Uhr schrieb Kotresh Hiremath Ravishankar < > khiremat at redhat.com>: > >> Interesting observation! But as discussed in the thread bitrot signing >> processes depends 2 min timeout (by default) after last fd closes. It >> doesn't have any co-relation with the size of the file. >> Did you happen to verify that the fd was still open for large files for >> some reason? >> >> >> >> On Fri, Mar 1, 2019 at 1:19 PM David Spisla wrote: >> >>> Hello folks, >>> >>> I did some observations concerning the bitrot daemon. It seems to be >>> that the bitrot signer is signing files depending on file size. I copied >>> files with different sizes into a volume and I was wonderung because the >>> files get their signature not the same time (I keep the expiry time default >>> with 120). Here are some examples: >>> >>> 300 KB file ~2-3 m >>> 70 MB file ~ 40 m >>> 115 MB file ~ 1 Sh >>> 800 MB file ~ 4,5 h >>> >>> What is the expected behaviour here? >>> Why does it take so long to sign a 800MB file? >>> What about 500GB or 1TB? >>> Is there a way to speed up the sign process? >>> >>> My ambition is to understand this observation >>> >>> Regards >>> David Spisla >>> _______________________________________________ >>> Gluster-devel mailing list >>> Gluster-devel at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-devel >> >> >> >> -- >> Thanks and Regards, >> Kotresh H R >> > -- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Tue Mar 5 15:30:18 2019 From: spisla80 at gmail.com (David Spisla) Date: Tue, 5 Mar 2019 16:30:18 +0100 Subject: [Gluster-users] [Gluster-devel] Bitrot: Time of signing depending on the file size??? In-Reply-To: References: Message-ID: Hello Kotresh, allrigt, I have updated the "Component" field in the bug to 'core' . I am looking forward fixing this bug Regards David Spisla Am Di., 5. M?rz 2019 um 16:13 Uhr schrieb Kotresh Hiremath Ravishankar < khiremat at redhat.com>: > Hi David, > > Thanks for raising the bug. But from the above validation, it's clear that > bitrot is not directly involved. Bitrot waits for last fd to be closed. We > will have to investigate the reason for fd not being closed for large files. > > Thanks, > Kotresh HR > > On Mon, Mar 4, 2019 at 3:13 PM David Spisla wrote: > >> Hello Kotresh, >> >> Yes, the fd was still open for larger files. I could verify this with a >> 500MiB file and some smaller files. After a specific time only the fd for >> the 500MiB was up and the file still had no signature, for the smaller >> files there were no fds and they already had a signature. I don't know the >> reason for this. Maybe the client still keep th fd open? I opened a bug for >> this: >> https://bugzilla.redhat.com/show_bug.cgi?id=1685023 >> >> Regards >> David >> >> Am Fr., 1. M?rz 2019 um 18:29 Uhr schrieb Kotresh Hiremath Ravishankar < >> khiremat at redhat.com>: >> >>> Interesting observation! But as discussed in the thread bitrot signing >>> processes depends 2 min timeout (by default) after last fd closes. It >>> doesn't have any co-relation with the size of the file. >>> Did you happen to verify that the fd was still open for large files for >>> some reason? >>> >>> >>> >>> On Fri, Mar 1, 2019 at 1:19 PM David Spisla wrote: >>> >>>> Hello folks, >>>> >>>> I did some observations concerning the bitrot daemon. It seems to be >>>> that the bitrot signer is signing files depending on file size. I copied >>>> files with different sizes into a volume and I was wonderung because the >>>> files get their signature not the same time (I keep the expiry time default >>>> with 120). Here are some examples: >>>> >>>> 300 KB file ~2-3 m >>>> 70 MB file ~ 40 m >>>> 115 MB file ~ 1 Sh >>>> 800 MB file ~ 4,5 h >>>> >>>> What is the expected behaviour here? >>>> Why does it take so long to sign a 800MB file? >>>> What about 500GB or 1TB? >>>> Is there a way to speed up the sign process? >>>> >>>> My ambition is to understand this observation >>>> >>>> Regards >>>> David Spisla >>>> _______________________________________________ >>>> Gluster-devel mailing list >>>> Gluster-devel at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>> >>> >>> >>> -- >>> Thanks and Regards, >>> Kotresh H R >>> >> > > -- > Thanks and Regards, > Kotresh H R > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srangana at redhat.com Tue Mar 5 18:17:19 2019 From: srangana at redhat.com (Shyam Ranganathan) Date: Tue, 5 Mar 2019 13:17:19 -0500 Subject: [Gluster-users] Release 6: Release date update Message-ID: <8c0c5f02-3d31-526a-9c6e-e8221e23cccd@redhat.com> Hi, Release-6 was to be an early March release, and due to finding bugs while performing upgrade testing, is now expected in the week of 18th March, 2019. RC1 builds are expected this week, to contain the required fixes, next week would be testing our RC1 for release fitness before the release. As always, request that users test the RC builds and report back issues they encounter, to help make the release a better quality. Shyam From archon810 at gmail.com Tue Mar 5 18:27:18 2019 From: archon810 at gmail.com (Artem Russakovskii) Date: Tue, 5 Mar 2019 10:27:18 -0800 Subject: [Gluster-users] SLES15 packages for v5.4 In-Reply-To: References: Message-ID: I'm seeing it here now: http://download.opensuse.org/repositories/home:/glusterfs:/SLES15-5/SLE_15/x86_64/ . And yay, OpenSUSE 15 is up too http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/x86_64/ . Finally going to end the log spam and the nasty crashes. Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | +ArtemRussakovskii | @ArtemR On Mon, Mar 4, 2019 at 7:03 AM David Spisla wrote: > Hello folks, > > can someone please provide packages for Gluster v5.4 for SLES15? For > CentOS and Ubuntu there are already packages. > > Regards > David Spisla > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From archon810 at gmail.com Tue Mar 5 18:57:59 2019 From: archon810 at gmail.com (Artem Russakovskii) Date: Tue, 5 Mar 2019 10:57:59 -0800 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: Message-ID: Noticed the same when upgrading from 5.3 to 5.4, as mentioned. I'm confused though. Is actual replication affected, because the 5.4 server and the 3x 5.3 servers still show heal info as all 4 connected, and the files seem to be replicating correctly as well. So what's actually affected - just the status command, or leaving 5.4 on one of the nodes is doing some damage to the underlying fs? Is it fixable by tweaking transport.socket.ssl-enabled? Does upgrading all servers to 5.4 resolve it, or should we revert back to 5.3? Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | +ArtemRussakovskii | @ArtemR On Tue, Mar 5, 2019 at 2:02 AM Hu Bert wrote: > fyi: did a downgrade 5.4 -> 5.3 and it worked. all replicas are up and > running. Awaiting updated v5.4. > > thx :-) > > Am Di., 5. M?rz 2019 um 09:26 Uhr schrieb Hari Gowtham < > hgowtham at redhat.com>: > > > > There are plans to revert the patch causing this error and rebuilt 5.4. > > This should happen faster. the rebuilt 5.4 should be void of this > upgrade issue. > > > > In the meantime, you can use 5.3 for this cluster. > > Downgrading to 5.3 will work if it was just one node that was upgrade to > 5.4 > > and the other nodes are still in 5.3. > > > > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert wrote: > > > > > > Hi Hari, > > > > > > thx for the hint. Do you know when this will be fixed? Is a downgrade > > > 5.4 -> 5.3 a possibility to fix this? > > > > > > Hubert > > > > > > Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb Hari Gowtham < > hgowtham at redhat.com>: > > > > > > > > Hi, > > > > > > > > This is a known issue we are working on. > > > > As the checksum differs between the updated and non updated node, the > > > > peers are getting rejected. > > > > The bricks aren't coming because of the same issue. > > > > > > > > More about the issue: > https://bugzilla.redhat.com/show_bug.cgi?id=1685120 > > > > > > > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert > wrote: > > > > > > > > > > Interestingly: gluster volume status misses gluster1, while heal > > > > > statistics show gluster1: > > > > > > > > > > gluster volume status workdata > > > > > Status of volume: workdata > > > > > Gluster process TCP Port RDMA Port > Online Pid > > > > > > ------------------------------------------------------------------------------ > > > > > Brick gluster2:/gluster/md4/workdata 49153 0 > Y 1723 > > > > > Brick gluster3:/gluster/md4/workdata 49153 0 > Y 2068 > > > > > Self-heal Daemon on localhost N/A N/A > Y 1732 > > > > > Self-heal Daemon on gluster3 N/A N/A > Y 2077 > > > > > > > > > > vs. > > > > > > > > > > gluster volume heal workdata statistics heal-count > > > > > Gathering count of entries to be healed on volume workdata has > been successful > > > > > > > > > > Brick gluster1:/gluster/md4/workdata > > > > > Number of entries: 0 > > > > > > > > > > Brick gluster2:/gluster/md4/workdata > > > > > Number of entries: 10745 > > > > > > > > > > Brick gluster3:/gluster/md4/workdata > > > > > Number of entries: 10744 > > > > > > > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert < > revirii at googlemail.com>: > > > > > > > > > > > > Hi Miling, > > > > > > > > > > > > well, there are such entries, but those haven't been a problem > during > > > > > > install and the last kernel update+reboot. The entries look like: > > > > > > > > > > > > PUBLIC_IP gluster2.alpserver.de gluster2 > > > > > > > > > > > > 192.168.0.50 gluster1 > > > > > > 192.168.0.51 gluster2 > > > > > > 192.168.0.52 gluster3 > > > > > > > > > > > > 'ping gluster2' resolves to LAN IP; I removed the last entry in > the > > > > > > 1st line, did a reboot ... no, didn't help. From > > > > > > /var/log/glusterfs/glusterd.log > > > > > > on gluster 2: > > > > > > > > > > > > [2019-03-05 07:04:36.188128] E [MSGID: 106010] > > > > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] > 0-management: > > > > > > Version of Cksums persistent differ. local cksum = 3950307018, > remote > > > > > > cksum = 455409345 on peer gluster1 > > > > > > [2019-03-05 07:04:36.188314] I [MSGID: 106493] > > > > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] > 0-glusterd: > > > > > > Responded to gluster1 (0), ret: 0, op_ret: -1 > > > > > > > > > > > > Interestingly there are no entries in the brick logs of the > rejected > > > > > > server. Well, not surprising as no brick process is running. The > > > > > > server gluster1 is still in rejected state. > > > > > > > > > > > > 'gluster volume start workdata force' starts the brick process on > > > > > > gluster1, and some heals are happening on gluster2+3, but via > 'gluster > > > > > > volume status workdata' the volumes still aren't complete. > > > > > > > > > > > > gluster1: > > > > > > > ------------------------------------------------------------------------------ > > > > > > Brick gluster1:/gluster/md4/workdata 49152 0 > Y 2523 > > > > > > Self-heal Daemon on localhost N/A N/A > Y 2549 > > > > > > > > > > > > gluster2: > > > > > > Gluster process TCP Port RDMA Port > Online Pid > > > > > > > ------------------------------------------------------------------------------ > > > > > > Brick gluster2:/gluster/md4/workdata 49153 0 > Y 1723 > > > > > > Brick gluster3:/gluster/md4/workdata 49153 0 > Y 2068 > > > > > > Self-heal Daemon on localhost N/A N/A > Y 1732 > > > > > > Self-heal Daemon on gluster3 N/A N/A > Y 2077 > > > > > > > > > > > > > > > > > > Hubert > > > > > > > > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire < > mchangir at redhat.com>: > > > > > > > > > > > > > > There are probably DNS entries or /etc/hosts entries with the > public IP Addresses that the host names (gluster1, gluster2, gluster3) are > getting resolved to. > > > > > > > /etc/resolv.conf would tell which is the default domain > searched for the node names and the DNS servers which respond to the > queries. > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert < > revirii at googlemail.com> wrote: > > > > > > >> > > > > > > >> Good morning, > > > > > > >> > > > > > > >> i have a replicate 3 setup with 2 volumes, running on version > 5.3 on > > > > > > >> debian stretch. This morning i upgraded one server to version > 5.4 and > > > > > > >> rebooted the machine; after the restart i noticed that: > > > > > > >> > > > > > > >> - no brick process is running > > > > > > >> - gluster volume status only shows the server itself: > > > > > > >> gluster volume status workdata > > > > > > >> Status of volume: workdata > > > > > > >> Gluster process TCP Port RDMA > Port Online Pid > > > > > > >> > ------------------------------------------------------------------------------ > > > > > > >> Brick gluster1:/gluster/md4/workdata N/A N/A > N N/A > > > > > > >> NFS Server on localhost N/A N/A > N N/A > > > > > > >> > > > > > > >> - gluster peer status on the server > > > > > > >> gluster peer status > > > > > > >> Number of Peers: 2 > > > > > > >> > > > > > > >> Hostname: gluster3 > > > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > > > > >> State: Peer Rejected (Connected) > > > > > > >> > > > > > > >> Hostname: gluster2 > > > > > > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 > > > > > > >> State: Peer Rejected (Connected) > > > > > > >> > > > > > > >> - gluster peer status on the other 2 servers: > > > > > > >> gluster peer status > > > > > > >> Number of Peers: 2 > > > > > > >> > > > > > > >> Hostname: gluster1 > > > > > > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef > > > > > > >> State: Peer Rejected (Connected) > > > > > > >> > > > > > > >> Hostname: gluster3 > > > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > > > > >> State: Peer in Cluster (Connected) > > > > > > >> > > > > > > >> I noticed that, in the brick logs, i see that the public IP > is used > > > > > > >> instead of the LAN IP. brick logs from one of the volumes: > > > > > > >> > > > > > > >> rejected node: https://pastebin.com/qkpj10Sd > > > > > > >> connected nodes: https://pastebin.com/8SxVVYFV > > > > > > >> > > > > > > >> Why is the public IP suddenly used instead of the LAN IP? > Killing all > > > > > > >> gluster processes and rebooting (again) didn't help. > > > > > > >> > > > > > > >> > > > > > > >> Thx, > > > > > > >> Hubert > > > > > > >> _______________________________________________ > > > > > > >> Gluster-users mailing list > > > > > > >> Gluster-users at gluster.org > > > > > > >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Milind > > > > > > > > > > > > _______________________________________________ > > > > > Gluster-users mailing list > > > > > Gluster-users at gluster.org > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > Hari Gowtham. > > > > > > > > -- > > Regards, > > Hari Gowtham. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From archon810 at gmail.com Tue Mar 5 19:09:10 2019 From: archon810 at gmail.com (Artem Russakovskii) Date: Tue, 5 Mar 2019 11:09:10 -0800 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: Message-ID: Ended up downgrading to 5.3 just in case. Peer status and volume status are OK now. zypper install --oldpackage glusterfs-5.3-lp150.100.1 Loading repository data... Reading installed packages... Resolving package dependencies... Problem: glusterfs-5.3-lp150.100.1.x86_64 requires libgfapi0 = 5.3, but this requirement cannot be provided not installable providers: libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] Solution 1: Following actions will be done: downgrade of libgfapi0-5.4-lp150.100.1.x86_64 to libgfapi0-5.3-lp150.100.1.x86_64 downgrade of libgfchangelog0-5.4-lp150.100.1.x86_64 to libgfchangelog0-5.3-lp150.100.1.x86_64 downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 to libgfrpc0-5.3-lp150.100.1.x86_64 downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 to libgfxdr0-5.3-lp150.100.1.x86_64 downgrade of libglusterfs0-5.4-lp150.100.1.x86_64 to libglusterfs0-5.3-lp150.100.1.x86_64 Solution 2: do not install glusterfs-5.3-lp150.100.1.x86_64 Solution 3: break glusterfs-5.3-lp150.100.1.x86_64 by ignoring some of its dependencies Choose from above solutions by number or cancel [1/2/3/c] (c): 1 Resolving dependencies... Resolving package dependencies... The following 6 packages are going to be downgraded: glusterfs libgfapi0 libgfchangelog0 libgfrpc0 libgfxdr0 libglusterfs0 6 packages to downgrade. Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | +ArtemRussakovskii | @ArtemR On Tue, Mar 5, 2019 at 10:57 AM Artem Russakovskii wrote: > Noticed the same when upgrading from 5.3 to 5.4, as mentioned. > > I'm confused though. Is actual replication affected, because the 5.4 > server and the 3x 5.3 servers still show heal info as all 4 connected, and > the files seem to be replicating correctly as well. > > So what's actually affected - just the status command, or leaving 5.4 on > one of the nodes is doing some damage to the underlying fs? Is it fixable > by tweaking transport.socket.ssl-enabled? Does upgrading all servers to 5.4 > resolve it, or should we revert back to 5.3? > > Sincerely, > Artem > > -- > Founder, Android Police , APK Mirror > , Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > | @ArtemR > > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert wrote: > >> fyi: did a downgrade 5.4 -> 5.3 and it worked. all replicas are up and >> running. Awaiting updated v5.4. >> >> thx :-) >> >> Am Di., 5. M?rz 2019 um 09:26 Uhr schrieb Hari Gowtham < >> hgowtham at redhat.com>: >> > >> > There are plans to revert the patch causing this error and rebuilt 5.4. >> > This should happen faster. the rebuilt 5.4 should be void of this >> upgrade issue. >> > >> > In the meantime, you can use 5.3 for this cluster. >> > Downgrading to 5.3 will work if it was just one node that was upgrade >> to 5.4 >> > and the other nodes are still in 5.3. >> > >> > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert wrote: >> > > >> > > Hi Hari, >> > > >> > > thx for the hint. Do you know when this will be fixed? Is a downgrade >> > > 5.4 -> 5.3 a possibility to fix this? >> > > >> > > Hubert >> > > >> > > Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb Hari Gowtham < >> hgowtham at redhat.com>: >> > > > >> > > > Hi, >> > > > >> > > > This is a known issue we are working on. >> > > > As the checksum differs between the updated and non updated node, >> the >> > > > peers are getting rejected. >> > > > The bricks aren't coming because of the same issue. >> > > > >> > > > More about the issue: >> https://bugzilla.redhat.com/show_bug.cgi?id=1685120 >> > > > >> > > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert >> wrote: >> > > > > >> > > > > Interestingly: gluster volume status misses gluster1, while heal >> > > > > statistics show gluster1: >> > > > > >> > > > > gluster volume status workdata >> > > > > Status of volume: workdata >> > > > > Gluster process TCP Port RDMA Port >> Online Pid >> > > > > >> ------------------------------------------------------------------------------ >> > > > > Brick gluster2:/gluster/md4/workdata 49153 0 >> Y 1723 >> > > > > Brick gluster3:/gluster/md4/workdata 49153 0 >> Y 2068 >> > > > > Self-heal Daemon on localhost N/A N/A >> Y 1732 >> > > > > Self-heal Daemon on gluster3 N/A N/A >> Y 2077 >> > > > > >> > > > > vs. >> > > > > >> > > > > gluster volume heal workdata statistics heal-count >> > > > > Gathering count of entries to be healed on volume workdata has >> been successful >> > > > > >> > > > > Brick gluster1:/gluster/md4/workdata >> > > > > Number of entries: 0 >> > > > > >> > > > > Brick gluster2:/gluster/md4/workdata >> > > > > Number of entries: 10745 >> > > > > >> > > > > Brick gluster3:/gluster/md4/workdata >> > > > > Number of entries: 10744 >> > > > > >> > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert < >> revirii at googlemail.com>: >> > > > > > >> > > > > > Hi Miling, >> > > > > > >> > > > > > well, there are such entries, but those haven't been a problem >> during >> > > > > > install and the last kernel update+reboot. The entries look >> like: >> > > > > > >> > > > > > PUBLIC_IP gluster2.alpserver.de gluster2 >> > > > > > >> > > > > > 192.168.0.50 gluster1 >> > > > > > 192.168.0.51 gluster2 >> > > > > > 192.168.0.52 gluster3 >> > > > > > >> > > > > > 'ping gluster2' resolves to LAN IP; I removed the last entry in >> the >> > > > > > 1st line, did a reboot ... no, didn't help. From >> > > > > > /var/log/glusterfs/glusterd.log >> > > > > > on gluster 2: >> > > > > > >> > > > > > [2019-03-05 07:04:36.188128] E [MSGID: 106010] >> > > > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] >> 0-management: >> > > > > > Version of Cksums persistent differ. local cksum = 3950307018, >> remote >> > > > > > cksum = 455409345 on peer gluster1 >> > > > > > [2019-03-05 07:04:36.188314] I [MSGID: 106493] >> > > > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] >> 0-glusterd: >> > > > > > Responded to gluster1 (0), ret: 0, op_ret: -1 >> > > > > > >> > > > > > Interestingly there are no entries in the brick logs of the >> rejected >> > > > > > server. Well, not surprising as no brick process is running. The >> > > > > > server gluster1 is still in rejected state. >> > > > > > >> > > > > > 'gluster volume start workdata force' starts the brick process >> on >> > > > > > gluster1, and some heals are happening on gluster2+3, but via >> 'gluster >> > > > > > volume status workdata' the volumes still aren't complete. >> > > > > > >> > > > > > gluster1: >> > > > > > >> ------------------------------------------------------------------------------ >> > > > > > Brick gluster1:/gluster/md4/workdata 49152 0 >> Y 2523 >> > > > > > Self-heal Daemon on localhost N/A N/A >> Y 2549 >> > > > > > >> > > > > > gluster2: >> > > > > > Gluster process TCP Port RDMA >> Port Online Pid >> > > > > > >> ------------------------------------------------------------------------------ >> > > > > > Brick gluster2:/gluster/md4/workdata 49153 0 >> Y 1723 >> > > > > > Brick gluster3:/gluster/md4/workdata 49153 0 >> Y 2068 >> > > > > > Self-heal Daemon on localhost N/A N/A >> Y 1732 >> > > > > > Self-heal Daemon on gluster3 N/A N/A >> Y 2077 >> > > > > > >> > > > > > >> > > > > > Hubert >> > > > > > >> > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire < >> mchangir at redhat.com>: >> > > > > > > >> > > > > > > There are probably DNS entries or /etc/hosts entries with the >> public IP Addresses that the host names (gluster1, gluster2, gluster3) are >> getting resolved to. >> > > > > > > /etc/resolv.conf would tell which is the default domain >> searched for the node names and the DNS servers which respond to the >> queries. >> > > > > > > >> > > > > > > >> > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert < >> revirii at googlemail.com> wrote: >> > > > > > >> >> > > > > > >> Good morning, >> > > > > > >> >> > > > > > >> i have a replicate 3 setup with 2 volumes, running on >> version 5.3 on >> > > > > > >> debian stretch. This morning i upgraded one server to >> version 5.4 and >> > > > > > >> rebooted the machine; after the restart i noticed that: >> > > > > > >> >> > > > > > >> - no brick process is running >> > > > > > >> - gluster volume status only shows the server itself: >> > > > > > >> gluster volume status workdata >> > > > > > >> Status of volume: workdata >> > > > > > >> Gluster process TCP Port RDMA >> Port Online Pid >> > > > > > >> >> ------------------------------------------------------------------------------ >> > > > > > >> Brick gluster1:/gluster/md4/workdata N/A N/A >> N N/A >> > > > > > >> NFS Server on localhost N/A N/A >> N N/A >> > > > > > >> >> > > > > > >> - gluster peer status on the server >> > > > > > >> gluster peer status >> > > > > > >> Number of Peers: 2 >> > > > > > >> >> > > > > > >> Hostname: gluster3 >> > > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a >> > > > > > >> State: Peer Rejected (Connected) >> > > > > > >> >> > > > > > >> Hostname: gluster2 >> > > > > > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 >> > > > > > >> State: Peer Rejected (Connected) >> > > > > > >> >> > > > > > >> - gluster peer status on the other 2 servers: >> > > > > > >> gluster peer status >> > > > > > >> Number of Peers: 2 >> > > > > > >> >> > > > > > >> Hostname: gluster1 >> > > > > > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef >> > > > > > >> State: Peer Rejected (Connected) >> > > > > > >> >> > > > > > >> Hostname: gluster3 >> > > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a >> > > > > > >> State: Peer in Cluster (Connected) >> > > > > > >> >> > > > > > >> I noticed that, in the brick logs, i see that the public IP >> is used >> > > > > > >> instead of the LAN IP. brick logs from one of the volumes: >> > > > > > >> >> > > > > > >> rejected node: https://pastebin.com/qkpj10Sd >> > > > > > >> connected nodes: https://pastebin.com/8SxVVYFV >> > > > > > >> >> > > > > > >> Why is the public IP suddenly used instead of the LAN IP? >> Killing all >> > > > > > >> gluster processes and rebooting (again) didn't help. >> > > > > > >> >> > > > > > >> >> > > > > > >> Thx, >> > > > > > >> Hubert >> > > > > > >> _______________________________________________ >> > > > > > >> Gluster-users mailing list >> > > > > > >> Gluster-users at gluster.org >> > > > > > >> https://lists.gluster.org/mailman/listinfo/gluster-users >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > Milind >> > > > > > > >> > > > > _______________________________________________ >> > > > > Gluster-users mailing list >> > > > > Gluster-users at gluster.org >> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users >> > > > >> > > > >> > > > >> > > > -- >> > > > Regards, >> > > > Hari Gowtham. >> > >> > >> > >> > -- >> > Regards, >> > Hari Gowtham. >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abhishpaliwal at gmail.com Wed Mar 6 04:35:44 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Wed, 6 Mar 2019 10:05:44 +0530 Subject: [Gluster-users] Not able to start glusterd Message-ID: Hi Team, I am facing the issue where at the time of starting the glusterd segmentation fault is reported. Below are the logs root at 128:/usr/sbin# ./glusterd --debug [1970-01-01 15:19:43.940386] I [MSGID: 100030] [glusterfsd.c:2691:main] 0-./glusterd: Started running ./glusterd version 5.0 (args: ./glusterd --debug) [1970-01-01 15:19:43.940855] D [logging.c:1833:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5 [1970-01-01 15:19:43.941736] D [MSGID: 0] [glusterfsd.c:747:get_volfp] 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol [1970-01-01 15:19:43.945796] D [MSGID: 101097] [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on /usr/lib64/glusterfs/5.0/xlator/mgmt/glusterd.so: undefined symbol: xlator_api. Fall back to old symbols [1970-01-01 15:19:43.946279] I [MSGID: 106478] [glusterd.c:1435:init] 0-management: Maximum allowed open file descriptors set to 65536 [1970-01-01 15:19:43.946419] I [MSGID: 106479] [glusterd.c:1491:init] 0-management: Using /var/lib/glusterd as working directory [1970-01-01 15:19:43.946515] I [MSGID: 106479] [glusterd.c:1497:init] 0-management: Using /var/run/gluster as pid file working directory [1970-01-01 15:19:43.946968] D [MSGID: 0] [glusterd.c:458:glusterd_rpcsvc_options_build] 0-glusterd: listen-backlog value: 10 [1970-01-01 15:19:43.947139] D [rpcsvc.c:2607:rpcsvc_init] 0-rpc-service: RPC service inited. [1970-01-01 15:19:43.947241] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0 [1970-01-01 15:19:43.947379] D [rpc-transport.c:269:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/5.0/rpc-transport/socket.so [1970-01-01 15:19:43.955198] D [socket.c:4464:socket_init] 0-socket.management: Configued transport.tcp-user-timeout=0 [1970-01-01 15:19:43.955316] D [socket.c:4482:socket_init] 0-socket.management: Reconfigued transport.keepalivecnt=9 [1970-01-01 15:19:43.955415] D [socket.c:4167:ssl_setup_connection_params] 0-socket.management: SSL support on the I/O path is NOT enabled [1970-01-01 15:19:43.955504] D [socket.c:4170:ssl_setup_connection_params] 0-socket.management: SSL support for glusterd is NOT enabled [1970-01-01 15:19:43.955612] D [name.c:572:server_fill_address_family] 0-socket.management: option address-family not specified, defaulting to inet6 [1970-01-01 15:19:43.955928] D [rpc-transport.c:269:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/5.0/rpc-transport/rdma.so [1970-01-01 15:19:43.956079] E [rpc-transport.c:273:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/5.0/rpc-transport/rdma.so: cannot open shared object file: No such file or directory [1970-01-01 15:19:43.956177] W [rpc-transport.c:277:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine [1970-01-01 15:19:43.956270] W [rpcsvc.c:1789:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [1970-01-01 15:19:43.956362] E [MSGID: 106244] [glusterd.c:1798:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [1970-01-01 15:19:43.956459] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc peer, Num: 1238437, Ver: 2, Port: 0 [1970-01-01 15:19:43.956561] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc cli read-only, Num: 1238463, Ver: 2, Port: 0 [1970-01-01 15:19:43.956666] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc mgmt, Num: 1238433, Ver: 2, Port: 0 [1970-01-01 15:19:43.956758] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc mgmt v3, Num: 1238433, Ver: 3, Port: 0 [1970-01-01 15:19:43.956853] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Portmap, Num: 34123456, Ver: 1, Port: 0 [1970-01-01 15:19:43.956946] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Handshake, Num: 14398633, Ver: 2, Port: 0 [1970-01-01 15:19:43.957062] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster MGMT Handshake, Num: 1239873, Ver: 1, Port: 0 [1970-01-01 15:19:43.957205] D [rpcsvc.c:2607:rpcsvc_init] 0-rpc-service: RPC service inited. [1970-01-01 15:19:43.957303] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0 [1970-01-01 15:19:43.957408] D [rpc-transport.c:269:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/5.0/rpc-transport/socket.so [1970-01-01 15:19:43.957563] D [socket.c:4424:socket_init] 0-socket.management: disabling nodelay [1970-01-01 15:19:43.957650] D [socket.c:4464:socket_init] 0-socket.management: Configued transport.tcp-user-timeout=0 [1970-01-01 15:19:43.957738] D [socket.c:4482:socket_init] 0-socket.management: Reconfigued transport.keepalivecnt=9 [1970-01-01 15:19:43.957830] D [socket.c:4167:ssl_setup_connection_params] 0-socket.management: SSL support on the I/O path is NOT enabled [1970-01-01 15:19:43.957922] D [socket.c:4170:ssl_setup_connection_params] 0-socket.management: SSL support for glusterd is NOT enabled [1970-01-01 15:19:43.958186] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc cli, Num: 1238463, Ver: 2, Port: 0 [1970-01-01 15:19:43.958280] D [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Handshake (CLI Getspec), Num: 14398633, Ver: 2, Port: 0 [1970-01-01 15:19:43.958461] D [MSGID: 0] [glusterd-utils.c:7878:glusterd_sm_tr_log_init] 0-glusterd: returning 0 [1970-01-01 15:19:43.958557] D [MSGID: 0] [glusterd.c:1875:init] 0-management: cannot get run-with-valgrind value [1970-01-01 15:19:43.960895] E [MSGID: 101032] [store.c:447:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [1970-01-01 15:19:43.961016] D [MSGID: 0] [store.c:452:gf_store_handle_retrieve] 0-: Returning -1 [1970-01-01 15:19:43.961108] D [MSGID: 0] [glusterd-store.c:2169:glusterd_retrieve_op_version] 0-management: Unable to get store handle! [1970-01-01 15:19:43.961216] E [MSGID: 101032] [store.c:447:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [1970-01-01 15:19:43.961325] D [MSGID: 0] [store.c:452:gf_store_handle_retrieve] 0-: Returning -1 [1970-01-01 15:19:43.961428] D [MSGID: 0] [glusterd-store.c:2345:glusterd_retrieve_uuid] 0-management: Unable to get storehandle! [1970-01-01 15:19:43.961523] D [MSGID: 0] [glusterd-store.c:2366:glusterd_retrieve_uuid] 0-management: Returning -1 [1970-01-01 15:19:43.961617] I [MSGID: 106514] [glusterd-store.c:2304:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 50000 [1970-01-01 15:19:43.962658] D [MSGID: 0] [store.c:432:gf_store_handle_new] 0-: Returning 0 [1970-01-01 15:19:43.962750] D [MSGID: 0] [store.c:452:gf_store_handle_retrieve] 0-: Returning 0 [1970-01-01 15:19:43.963047] D [MSGID: 0] [store.c:515:gf_store_iter_new] 0-: Returning with 0 [1970-01-01 15:19:43.963194] D [MSGID: 0] [store.c:632:gf_store_iter_get_next] 0-: Returning with 0 [1970-01-01 15:19:43.963318] D [MSGID: 0] [store.c:632:gf_store_iter_get_next] 0-: Returning with -1 [1970-01-01 15:19:43.963455] D [MSGID: 0] [store.c:473:gf_store_handle_destroy] 0-: Returning 0 [1970-01-01 15:19:43.963757] D [MSGID: 0] [glusterd-store.c:3546:glusterd_store_retrieve_volumes] 0-management: Returning with 0 [1970-01-01 15:19:43.964159] D [MSGID: 0] [glusterd-store.c:4662:glusterd_store_retrieve_peers] 0-management: Returning with 0 [1970-01-01 15:19:43.964471] I [MSGID: 106194] [glusterd-store.c:3983:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list. [1970-01-01 15:19:43.964580] D [MSGID: 0] [glusterd-store.c:4104:glusterd_store_retrieve_snaps] 0-management: Returning with 0 [1970-01-01 15:19:43.964680] D [MSGID: 0] [glusterd-store.c:4894:glusterd_restore] 0-management: Returning 0 [1970-01-01 15:19:43.965060] D [MSGID: 0] [options.c:1225:xlator_option_init_int32] 0-management: option event-threads using set value 1 Final graph: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option rpc-auth-allow-insecure on 7: option transport.listen-backlog 10 8: option event-threads 1 9: option ping-timeout 0 10: option transport.socket.read-fail-log off 11: option transport.socket.keepalive-interval 2 12: option transport.socket.keepalive-time 10 13: option transport-type rdma 14: option working-directory /var/lib/glusterd 15: end-volume 16: +------------------------------------------------------------------------------+ [1970-01-01 15:19:43.966808] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [1970-01-01 15:19:44.840454] E [rpcsvc.c:513:rpcsvc_request_create] 0-rpc-service: RPC version not supported (XID: 0x0, Ver: 0, Program: 0, ProgVers: 0, Proc: 2) from trans (socket.management) [1970-01-01 15:19:44.840884] D [rpcsvc.c:1416:rpcsvc_error_reply] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xffbac)[0x3fffa12acbe4] (--> /usr/lib64/libgfrpc.so.0(+0xc5f4)[0x3fffa12525f4] (--> /usr/lib64/libgfrpc.so.0(+0xcf00)[0x3fffa1252f00] (--> /usr/lib64/libgfrpc.so.0(+0xd224)[0x3fffa1253224] (--> /usr/lib64/libgfrpc.so.0(+0xd84c)[0x3fffa125384c] ))))) 0-: sending a RPC error reply [1970-01-01 15:19:44.841055] D [logging.c:1805:gf_log_flush_extra_msgs] 0-logging-infra: Log buffer size reduced. About to flush 5 extra log messages [1970-01-01 15:19:44.841156] D [logging.c:1808:gf_log_flush_extra_msgs] 0-logging-infra: Just flushed 5 extra log messages pending frames: patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 1970-01-01 15:19:44 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 5.0 /usr/lib64/libglusterfs.so.0(+0x422a4)[0x3fffa12ab2a4] /usr/lib64/libglusterfs.so.0(gf_print_trace-0xf5080)[0x3fffa12b82e0] ./glusterd(glusterfsd_print_trace-0x22fa4)[0x100067ec] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x3fffa13f0478] /lib64/libc.so.6(xdr_accepted_reply-0x72d3c)[0x3fffa11375cc] /lib64/libc.so.6(xdr_accepted_reply-0x72d9c)[0x3fffa113756c] /lib64/libc.so.6(xdr_union-0x63a94)[0x3fffa1147dd4] /lib64/libc.so.6(xdr_replymsg-0x72c58)[0x3fffa11376e0] /lib64/libc.so.6(xdr_sizeof-0x62a78)[0x3fffa1149120] /usr/lib64/libgfrpc.so.0(+0x9b0c)[0x3fffa124fb0c] /usr/lib64/libgfrpc.so.0(rpcsvc_submit_generic-0x149f4)[0x3fffa125228c] /usr/lib64/libgfrpc.so.0(+0xc614)[0x3fffa1252614] /usr/lib64/libgfrpc.so.0(+0xcf00)[0x3fffa1252f00] /usr/lib64/libgfrpc.so.0(+0xd224)[0x3fffa1253224] /usr/lib64/libgfrpc.so.0(+0xd84c)[0x3fffa125384c] /usr/lib64/libgfrpc.so.0(rpc_transport_notify-0x10eec)[0x3fffa125610c] /usr/lib64/glusterfs/5.0/rpc-transport/socket.so(+0xc09c)[0x3fff9d51709c] /usr/lib64/libglusterfs.so.0(+0xb84bc)[0x3fffa13214bc] /lib64/libpthread.so.0(+0xbb30)[0x3fffa11bdb30] /lib64/libc.so.6(clone-0x9e964)[0x3fffa110817c] --------- Segmentation fault (core dumped) Could you please help me, what actually the problem? -- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: From guillaume.pavese at interactiv-group.com Wed Mar 6 05:34:37 2019 From: guillaume.pavese at interactiv-group.com (Guillaume Pavese) Date: Wed, 6 Mar 2019 14:34:37 +0900 Subject: [Gluster-users] Release 6: Release date update In-Reply-To: <8c0c5f02-3d31-526a-9c6e-e8221e23cccd@redhat.com> References: <8c0c5f02-3d31-526a-9c6e-e8221e23cccd@redhat.com> Message-ID: Ready to test, as soon as a build is made. I have been keeping refreshing https://cbs.centos.org/koji/packageinfo?packageID=5 since the last one 6.0-0.1.rc0.el7 built on 22th feb which did not include important patches (event handler fails) merged for 5.4-1 since... I think a RC1 would have been warranted for those patches already while waiting for the upgrade bugs to be fixed in a RC2... Guillaume Pavese Ing?nieur Syst?me et R?seau Interactiv-Group On Wed, Mar 6, 2019 at 3:17 AM Shyam Ranganathan wrote: > Hi, > > Release-6 was to be an early March release, and due to finding bugs > while performing upgrade testing, is now expected in the week of 18th > March, 2019. > > RC1 builds are expected this week, to contain the required fixes, next > week would be testing our RC1 for release fitness before the release. > > As always, request that users test the RC builds and report back issues > they encounter, to help make the release a better quality. > > Shyam > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srakonde at redhat.com Wed Mar 6 07:28:08 2019 From: srakonde at redhat.com (Sanju Rakonde) Date: Wed, 6 Mar 2019 12:58:08 +0530 Subject: [Gluster-users] Not able to start glusterd In-Reply-To: References: Message-ID: Abhishek, We need below information on investigate this issue. 1. gluster --version 2. Please run glusterd in gdb, so that we can capture the backtrace. I see some rpc errors in log, but backtrace will be more helpful. To run glusterd in gdb, you need start glusterd in gdb (i.e. gdb glusterd, and then give the command "run -N"). when you see a segmentation fault, please capture the backtrace and paste it here. On Wed, Mar 6, 2019 at 10:07 AM ABHISHEK PALIWAL wrote: > Hi Team, > > I am facing the issue where at the time of starting the glusterd > segmentation fault is reported. > > Below are the logs > > root at 128:/usr/sbin# ./glusterd --debug > [1970-01-01 15:19:43.940386] I [MSGID: 100030] [glusterfsd.c:2691:main] > 0-./glusterd: Started running ./glusterd version 5.0 (args: ./glusterd > --debug) > [1970-01-01 15:19:43.940855] D > [logging.c:1833:__gf_log_inject_timer_event] 0-logging-infra: Starting > timer now. Timeout = 120, current buf size = 5 > [1970-01-01 15:19:43.941736] D [MSGID: 0] [glusterfsd.c:747:get_volfp] > 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol > [1970-01-01 15:19:43.945796] D [MSGID: 101097] > [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on > /usr/lib64/glusterfs/5.0/xlator/mgmt/glusterd.so: undefined symbol: > xlator_api. Fall back to old symbols > [1970-01-01 15:19:43.946279] I [MSGID: 106478] [glusterd.c:1435:init] > 0-management: Maximum allowed open file descriptors set to 65536 > [1970-01-01 15:19:43.946419] I [MSGID: 106479] [glusterd.c:1491:init] > 0-management: Using /var/lib/glusterd as working directory > [1970-01-01 15:19:43.946515] I [MSGID: 106479] [glusterd.c:1497:init] > 0-management: Using /var/run/gluster as pid file working directory > [1970-01-01 15:19:43.946968] D [MSGID: 0] > [glusterd.c:458:glusterd_rpcsvc_options_build] 0-glusterd: listen-backlog > value: 10 > [1970-01-01 15:19:43.947139] D [rpcsvc.c:2607:rpcsvc_init] 0-rpc-service: > RPC service inited. > [1970-01-01 15:19:43.947241] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, > Port: 0 > [1970-01-01 15:19:43.947379] D [rpc-transport.c:269:rpc_transport_load] > 0-rpc-transport: attempt to load file > /usr/lib64/glusterfs/5.0/rpc-transport/socket.so > [1970-01-01 15:19:43.955198] D [socket.c:4464:socket_init] > 0-socket.management: Configued transport.tcp-user-timeout=0 > [1970-01-01 15:19:43.955316] D [socket.c:4482:socket_init] > 0-socket.management: Reconfigued transport.keepalivecnt=9 > [1970-01-01 15:19:43.955415] D [socket.c:4167:ssl_setup_connection_params] > 0-socket.management: SSL support on the I/O path is NOT enabled > [1970-01-01 15:19:43.955504] D [socket.c:4170:ssl_setup_connection_params] > 0-socket.management: SSL support for glusterd is NOT enabled > [1970-01-01 15:19:43.955612] D [name.c:572:server_fill_address_family] > 0-socket.management: option address-family not specified, defaulting to > inet6 > [1970-01-01 15:19:43.955928] D [rpc-transport.c:269:rpc_transport_load] > 0-rpc-transport: attempt to load file > /usr/lib64/glusterfs/5.0/rpc-transport/rdma.so > [1970-01-01 15:19:43.956079] E [rpc-transport.c:273:rpc_transport_load] > 0-rpc-transport: /usr/lib64/glusterfs/5.0/rpc-transport/rdma.so: cannot > open shared object file: No such file or directory > [1970-01-01 15:19:43.956177] W [rpc-transport.c:277:rpc_transport_load] > 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not > valid or not found on this machine > [1970-01-01 15:19:43.956270] W [rpcsvc.c:1789:rpcsvc_create_listener] > 0-rpc-service: cannot create listener, initing the transport failed > [1970-01-01 15:19:43.956362] E [MSGID: 106244] [glusterd.c:1798:init] > 0-management: creation of 1 listeners failed, continuing with succeeded > transport > [1970-01-01 15:19:43.956459] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: GlusterD svc peer, Num: 1238437, > Ver: 2, Port: 0 > [1970-01-01 15:19:43.956561] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: GlusterD svc cli read-only, Num: > 1238463, Ver: 2, Port: 0 > [1970-01-01 15:19:43.956666] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: GlusterD svc mgmt, Num: 1238433, > Ver: 2, Port: 0 > [1970-01-01 15:19:43.956758] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: GlusterD svc mgmt v3, Num: 1238433, > Ver: 3, Port: 0 > [1970-01-01 15:19:43.956853] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: Gluster Portmap, Num: 34123456, Ver: > 1, Port: 0 > [1970-01-01 15:19:43.956946] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: Gluster Handshake, Num: 14398633, > Ver: 2, Port: 0 > [1970-01-01 15:19:43.957062] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: Gluster MGMT Handshake, Num: > 1239873, Ver: 1, Port: 0 > [1970-01-01 15:19:43.957205] D [rpcsvc.c:2607:rpcsvc_init] 0-rpc-service: > RPC service inited. > [1970-01-01 15:19:43.957303] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, > Port: 0 > [1970-01-01 15:19:43.957408] D [rpc-transport.c:269:rpc_transport_load] > 0-rpc-transport: attempt to load file > /usr/lib64/glusterfs/5.0/rpc-transport/socket.so > [1970-01-01 15:19:43.957563] D [socket.c:4424:socket_init] > 0-socket.management: disabling nodelay > [1970-01-01 15:19:43.957650] D [socket.c:4464:socket_init] > 0-socket.management: Configued transport.tcp-user-timeout=0 > [1970-01-01 15:19:43.957738] D [socket.c:4482:socket_init] > 0-socket.management: Reconfigued transport.keepalivecnt=9 > [1970-01-01 15:19:43.957830] D [socket.c:4167:ssl_setup_connection_params] > 0-socket.management: SSL support on the I/O path is NOT enabled > [1970-01-01 15:19:43.957922] D [socket.c:4170:ssl_setup_connection_params] > 0-socket.management: SSL support for glusterd is NOT enabled > [1970-01-01 15:19:43.958186] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: GlusterD svc cli, Num: 1238463, Ver: > 2, Port: 0 > [1970-01-01 15:19:43.958280] D [rpcsvc.c:2146:rpcsvc_program_register] > 0-rpc-service: New program registered: Gluster Handshake (CLI Getspec), > Num: 14398633, Ver: 2, Port: 0 > [1970-01-01 15:19:43.958461] D [MSGID: 0] > [glusterd-utils.c:7878:glusterd_sm_tr_log_init] 0-glusterd: returning 0 > [1970-01-01 15:19:43.958557] D [MSGID: 0] [glusterd.c:1875:init] > 0-management: cannot get run-with-valgrind value > [1970-01-01 15:19:43.960895] E [MSGID: 101032] > [store.c:447:gf_store_handle_retrieve] 0-: Path corresponding to > /var/lib/glusterd/glusterd.info. [No such file or directory] > [1970-01-01 15:19:43.961016] D [MSGID: 0] > [store.c:452:gf_store_handle_retrieve] 0-: Returning -1 > [1970-01-01 15:19:43.961108] D [MSGID: 0] > [glusterd-store.c:2169:glusterd_retrieve_op_version] 0-management: Unable > to get store handle! > [1970-01-01 15:19:43.961216] E [MSGID: 101032] > [store.c:447:gf_store_handle_retrieve] 0-: Path corresponding to > /var/lib/glusterd/glusterd.info. [No such file or directory] > [1970-01-01 15:19:43.961325] D [MSGID: 0] > [store.c:452:gf_store_handle_retrieve] 0-: Returning -1 > [1970-01-01 15:19:43.961428] D [MSGID: 0] > [glusterd-store.c:2345:glusterd_retrieve_uuid] 0-management: Unable to get > storehandle! > [1970-01-01 15:19:43.961523] D [MSGID: 0] > [glusterd-store.c:2366:glusterd_retrieve_uuid] 0-management: Returning -1 > [1970-01-01 15:19:43.961617] I [MSGID: 106514] > [glusterd-store.c:2304:glusterd_restore_op_version] 0-management: Detected > new install. Setting op-version to maximum : 50000 > [1970-01-01 15:19:43.962658] D [MSGID: 0] > [store.c:432:gf_store_handle_new] 0-: Returning 0 > [1970-01-01 15:19:43.962750] D [MSGID: 0] > [store.c:452:gf_store_handle_retrieve] 0-: Returning 0 > [1970-01-01 15:19:43.963047] D [MSGID: 0] [store.c:515:gf_store_iter_new] > 0-: Returning with 0 > [1970-01-01 15:19:43.963194] D [MSGID: 0] > [store.c:632:gf_store_iter_get_next] 0-: Returning with 0 > [1970-01-01 15:19:43.963318] D [MSGID: 0] > [store.c:632:gf_store_iter_get_next] 0-: Returning with -1 > [1970-01-01 15:19:43.963455] D [MSGID: 0] > [store.c:473:gf_store_handle_destroy] 0-: Returning 0 > [1970-01-01 15:19:43.963757] D [MSGID: 0] > [glusterd-store.c:3546:glusterd_store_retrieve_volumes] 0-management: > Returning with 0 > [1970-01-01 15:19:43.964159] D [MSGID: 0] > [glusterd-store.c:4662:glusterd_store_retrieve_peers] 0-management: > Returning with 0 > [1970-01-01 15:19:43.964471] I [MSGID: 106194] > [glusterd-store.c:3983:glusterd_store_retrieve_missed_snaps_list] > 0-management: No missed snaps list. > [1970-01-01 15:19:43.964580] D [MSGID: 0] > [glusterd-store.c:4104:glusterd_store_retrieve_snaps] 0-management: > Returning with 0 > [1970-01-01 15:19:43.964680] D [MSGID: 0] > [glusterd-store.c:4894:glusterd_restore] 0-management: Returning 0 > [1970-01-01 15:19:43.965060] D [MSGID: 0] > [options.c:1225:xlator_option_init_int32] 0-management: option > event-threads using set value 1 > Final graph: > > +------------------------------------------------------------------------------+ > 1: volume management > 2: type mgmt/glusterd > 3: option rpc-auth.auth-glusterfs on > 4: option rpc-auth.auth-unix on > 5: option rpc-auth.auth-null on > 6: option rpc-auth-allow-insecure on > 7: option transport.listen-backlog 10 > 8: option event-threads 1 > 9: option ping-timeout 0 > 10: option transport.socket.read-fail-log off > 11: option transport.socket.keepalive-interval 2 > 12: option transport.socket.keepalive-time 10 > 13: option transport-type rdma > 14: option working-directory /var/lib/glusterd > 15: end-volume > 16: > > +------------------------------------------------------------------------------+ > [1970-01-01 15:19:43.966808] I [MSGID: 101190] > [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [1970-01-01 15:19:44.840454] E [rpcsvc.c:513:rpcsvc_request_create] > 0-rpc-service: RPC version not supported (XID: 0x0, Ver: 0, Program: 0, > ProgVers: 0, Proc: 2) from trans (socket.management) > [1970-01-01 15:19:44.840884] D [rpcsvc.c:1416:rpcsvc_error_reply] (--> > /usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xffbac)[0x3fffa12acbe4] > (--> /usr/lib64/libgfrpc.so.0(+0xc5f4)[0x3fffa12525f4] (--> > /usr/lib64/libgfrpc.so.0(+0xcf00)[0x3fffa1252f00] (--> > /usr/lib64/libgfrpc.so.0(+0xd224)[0x3fffa1253224] (--> > /usr/lib64/libgfrpc.so.0(+0xd84c)[0x3fffa125384c] ))))) 0-: sending a RPC > error reply > [1970-01-01 15:19:44.841055] D [logging.c:1805:gf_log_flush_extra_msgs] > 0-logging-infra: Log buffer size reduced. About to flush 5 extra log > messages > [1970-01-01 15:19:44.841156] D [logging.c:1808:gf_log_flush_extra_msgs] > 0-logging-infra: Just flushed 5 extra log messages > pending frames: > patchset: git://git.gluster.org/glusterfs.git > signal received: 11 > time of crash: > 1970-01-01 15:19:44 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 5.0 > /usr/lib64/libglusterfs.so.0(+0x422a4)[0x3fffa12ab2a4] > /usr/lib64/libglusterfs.so.0(gf_print_trace-0xf5080)[0x3fffa12b82e0] > ./glusterd(glusterfsd_print_trace-0x22fa4)[0x100067ec] > linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x3fffa13f0478] > /lib64/libc.so.6(xdr_accepted_reply-0x72d3c)[0x3fffa11375cc] > /lib64/libc.so.6(xdr_accepted_reply-0x72d9c)[0x3fffa113756c] > /lib64/libc.so.6(xdr_union-0x63a94)[0x3fffa1147dd4] > /lib64/libc.so.6(xdr_replymsg-0x72c58)[0x3fffa11376e0] > /lib64/libc.so.6(xdr_sizeof-0x62a78)[0x3fffa1149120] > /usr/lib64/libgfrpc.so.0(+0x9b0c)[0x3fffa124fb0c] > /usr/lib64/libgfrpc.so.0(rpcsvc_submit_generic-0x149f4)[0x3fffa125228c] > /usr/lib64/libgfrpc.so.0(+0xc614)[0x3fffa1252614] > /usr/lib64/libgfrpc.so.0(+0xcf00)[0x3fffa1252f00] > /usr/lib64/libgfrpc.so.0(+0xd224)[0x3fffa1253224] > /usr/lib64/libgfrpc.so.0(+0xd84c)[0x3fffa125384c] > /usr/lib64/libgfrpc.so.0(rpc_transport_notify-0x10eec)[0x3fffa125610c] > /usr/lib64/glusterfs/5.0/rpc-transport/socket.so(+0xc09c)[0x3fff9d51709c] > /usr/lib64/libglusterfs.so.0(+0xb84bc)[0x3fffa13214bc] > /lib64/libpthread.so.0(+0xbb30)[0x3fffa11bdb30] > /lib64/libc.so.6(clone-0x9e964)[0x3fffa110817c] > --------- > Segmentation fault (core dumped) > > Could you please help me, what actually the problem? > > > -- > > > > > Regards > Abhishek Paliwal > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: From abhishpaliwal at gmail.com Wed Mar 6 08:33:09 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Wed, 6 Mar 2019 14:03:09 +0530 Subject: [Gluster-users] Not able to start glusterd In-Reply-To: References: Message-ID: Hi Sanju, Thanks for the response. I have resolved the issue, actually I have updated from 3.7.6 to 5.0, in new version RPC is coming from libtirpb , but I forgot to enable "--with-libtirpc" in configuration. After enabling able to start glusterd. Regards, Abhishek On Wed, Mar 6, 2019 at 12:58 PM Sanju Rakonde wrote: > Abhishek, > > We need below information on investigate this issue. > 1. gluster --version > 2. Please run glusterd in gdb, so that we can capture the backtrace. I see > some rpc errors in log, but backtrace will be more helpful. > To run glusterd in gdb, you need start glusterd in gdb (i.e. gdb > glusterd, and then give the command "run -N"). when you see a segmentation > fault, please capture the backtrace and paste it here. > > On Wed, Mar 6, 2019 at 10:07 AM ABHISHEK PALIWAL > wrote: > >> Hi Team, >> >> I am facing the issue where at the time of starting the glusterd >> segmentation fault is reported. >> >> Below are the logs >> >> root at 128:/usr/sbin# ./glusterd --debug >> [1970-01-01 15:19:43.940386] I [MSGID: 100030] [glusterfsd.c:2691:main] >> 0-./glusterd: Started running ./glusterd version 5.0 (args: ./glusterd >> --debug) >> [1970-01-01 15:19:43.940855] D >> [logging.c:1833:__gf_log_inject_timer_event] 0-logging-infra: Starting >> timer now. Timeout = 120, current buf size = 5 >> [1970-01-01 15:19:43.941736] D [MSGID: 0] [glusterfsd.c:747:get_volfp] >> 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol >> [1970-01-01 15:19:43.945796] D [MSGID: 101097] >> [xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on >> /usr/lib64/glusterfs/5.0/xlator/mgmt/glusterd.so: undefined symbol: >> xlator_api. Fall back to old symbols >> [1970-01-01 15:19:43.946279] I [MSGID: 106478] [glusterd.c:1435:init] >> 0-management: Maximum allowed open file descriptors set to 65536 >> [1970-01-01 15:19:43.946419] I [MSGID: 106479] [glusterd.c:1491:init] >> 0-management: Using /var/lib/glusterd as working directory >> [1970-01-01 15:19:43.946515] I [MSGID: 106479] [glusterd.c:1497:init] >> 0-management: Using /var/run/gluster as pid file working directory >> [1970-01-01 15:19:43.946968] D [MSGID: 0] >> [glusterd.c:458:glusterd_rpcsvc_options_build] 0-glusterd: listen-backlog >> value: 10 >> [1970-01-01 15:19:43.947139] D [rpcsvc.c:2607:rpcsvc_init] 0-rpc-service: >> RPC service inited. >> [1970-01-01 15:19:43.947241] D [rpcsvc.c:2146:rpcsvc_program_register] >> 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, >> Port: 0 >> [1970-01-01 15:19:43.947379] D [rpc-transport.c:269:rpc_transport_load] >> 0-rpc-transport: attempt to load file >> /usr/lib64/glusterfs/5.0/rpc-transport/socket.so >> [1970-01-01 15:19:43.955198] D [socket.c:4464:socket_init] >> 0-socket.management: Configued transport.tcp-user-timeout=0 >> [1970-01-01 15:19:43.955316] D [socket.c:4482:socket_init] >> 0-socket.management: Reconfigued transport.keepalivecnt=9 >> [1970-01-01 15:19:43.955415] D >> [socket.c:4167:ssl_setup_connection_params] 0-socket.management: SSL >> support on the I/O path is NOT enabled >> [1970-01-01 15:19:43.955504] D >> [socket.c:4170:ssl_setup_connection_params] 0-socket.management: SSL >> support for glusterd is NOT enabled >> [1970-01-01 15:19:43.955612] D [name.c:572:server_fill_address_family] >> 0-socket.management: option address-family not specified, defaulting to >> inet6 >> [1970-01-01 15:19:43.955928] D [rpc-transport.c:269:rpc_transport_load] >> 0-rpc-transport: attempt to load file >> /usr/lib64/glusterfs/5.0/rpc-transport/rdma.so >> [1970-01-01 15:19:43.956079] E [rpc-transport.c:273:rpc_transport_load] >> 0-rpc-transport: /usr/lib64/glusterfs/5.0/rpc-transport/rdma.so: cannot >> open shared object file: No such file or directory >> [1970-01-01 15:19:43.956177] W [rpc-transport.c:277:rpc_transport_load] >> 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not >> valid or not found on this machine >> [1970-01-01 15:19:43.956270] W [rpcsvc.c:1789:rpcsvc_create_listener] >> 0-rpc-service: cannot create listener, initing the transport failed >> [1970-01-01 15:19:43.956362] E [MSGID: 106244] [glusterd.c:1798:init] >> 0-management: creation of 1 listeners failed, continuing with succeeded >> transport >> [1970-01-01 15:19:43.956459] D [rpcsvc.c:2146:rpcsvc_program_register] >> 0-rpc-service: New program registered: GlusterD svc peer, Num: 1238437, >> Ver: 2, Port: 0 >> [1970-01-01 15:19:43.956561] D [rpcsvc.c:2146:rpcsvc_program_register] >> 0-rpc-service: New program registered: GlusterD svc cli read-only, Num: >> 1238463, Ver: 2, Port: 0 >> [1970-01-01 15:19:43.956666] D [rpcsvc.c:2146:rpcsvc_program_register] >> 0-rpc-service: New program registered: GlusterD svc mgmt, Num: 1238433, >> Ver: 2, Port: 0 >> [1970-01-01 15:19:43.956758] D [rpcsvc.c:2146:rpcsvc_program_register] >> 0-rpc-service: New program registered: GlusterD svc mgmt v3, Num: 1238433, >> Ver: 3, Port: 0 >> [1970-01-01 15:19:43.956853] D [rpcsvc.c:2146:rpcsvc_program_register] >> 0-rpc-service: New program registered: Gluster Portmap, Num: 34123456, Ver: >> 1, Port: 0 >> [1970-01-01 15:19:43.956946] D [rpcsvc.c:2146:rpcsvc_program_register] >> 0-rpc-service: New program registered: Gluster Handshake, Num: 14398633, >> Ver: 2, Port: 0 >> [1970-01-01 15:19:43.957062] D [rpcsvc.c:2146:rpcsvc_program_register] >> 0-rpc-service: New program registered: Gluster MGMT Handshake, Num: >> 1239873, Ver: 1, Port: 0 >> [1970-01-01 15:19:43.957205] D [rpcsvc.c:2607:rpcsvc_init] 0-rpc-service: >> RPC service inited. >> [1970-01-01 15:19:43.957303] D [rpcsvc.c:2146:rpcsvc_program_register] >> 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, >> Port: 0 >> [1970-01-01 15:19:43.957408] D [rpc-transport.c:269:rpc_transport_load] >> 0-rpc-transport: attempt to load file >> /usr/lib64/glusterfs/5.0/rpc-transport/socket.so >> [1970-01-01 15:19:43.957563] D [socket.c:4424:socket_init] >> 0-socket.management: disabling nodelay >> [1970-01-01 15:19:43.957650] D [socket.c:4464:socket_init] >> 0-socket.management: Configued transport.tcp-user-timeout=0 >> [1970-01-01 15:19:43.957738] D [socket.c:4482:socket_init] >> 0-socket.management: Reconfigued transport.keepalivecnt=9 >> [1970-01-01 15:19:43.957830] D >> [socket.c:4167:ssl_setup_connection_params] 0-socket.management: SSL >> support on the I/O path is NOT enabled >> [1970-01-01 15:19:43.957922] D >> [socket.c:4170:ssl_setup_connection_params] 0-socket.management: SSL >> support for glusterd is NOT enabled >> [1970-01-01 15:19:43.958186] D [rpcsvc.c:2146:rpcsvc_program_register] >> 0-rpc-service: New program registered: GlusterD svc cli, Num: 1238463, Ver: >> 2, Port: 0 >> [1970-01-01 15:19:43.958280] D [rpcsvc.c:2146:rpcsvc_program_register] >> 0-rpc-service: New program registered: Gluster Handshake (CLI Getspec), >> Num: 14398633, Ver: 2, Port: 0 >> [1970-01-01 15:19:43.958461] D [MSGID: 0] >> [glusterd-utils.c:7878:glusterd_sm_tr_log_init] 0-glusterd: returning 0 >> [1970-01-01 15:19:43.958557] D [MSGID: 0] [glusterd.c:1875:init] >> 0-management: cannot get run-with-valgrind value >> [1970-01-01 15:19:43.960895] E [MSGID: 101032] >> [store.c:447:gf_store_handle_retrieve] 0-: Path corresponding to >> /var/lib/glusterd/glusterd.info. [No such file or directory] >> [1970-01-01 15:19:43.961016] D [MSGID: 0] >> [store.c:452:gf_store_handle_retrieve] 0-: Returning -1 >> [1970-01-01 15:19:43.961108] D [MSGID: 0] >> [glusterd-store.c:2169:glusterd_retrieve_op_version] 0-management: Unable >> to get store handle! >> [1970-01-01 15:19:43.961216] E [MSGID: 101032] >> [store.c:447:gf_store_handle_retrieve] 0-: Path corresponding to >> /var/lib/glusterd/glusterd.info. [No such file or directory] >> [1970-01-01 15:19:43.961325] D [MSGID: 0] >> [store.c:452:gf_store_handle_retrieve] 0-: Returning -1 >> [1970-01-01 15:19:43.961428] D [MSGID: 0] >> [glusterd-store.c:2345:glusterd_retrieve_uuid] 0-management: Unable to get >> storehandle! >> [1970-01-01 15:19:43.961523] D [MSGID: 0] >> [glusterd-store.c:2366:glusterd_retrieve_uuid] 0-management: Returning -1 >> [1970-01-01 15:19:43.961617] I [MSGID: 106514] >> [glusterd-store.c:2304:glusterd_restore_op_version] 0-management: Detected >> new install. Setting op-version to maximum : 50000 >> [1970-01-01 15:19:43.962658] D [MSGID: 0] >> [store.c:432:gf_store_handle_new] 0-: Returning 0 >> [1970-01-01 15:19:43.962750] D [MSGID: 0] >> [store.c:452:gf_store_handle_retrieve] 0-: Returning 0 >> [1970-01-01 15:19:43.963047] D [MSGID: 0] [store.c:515:gf_store_iter_new] >> 0-: Returning with 0 >> [1970-01-01 15:19:43.963194] D [MSGID: 0] >> [store.c:632:gf_store_iter_get_next] 0-: Returning with 0 >> [1970-01-01 15:19:43.963318] D [MSGID: 0] >> [store.c:632:gf_store_iter_get_next] 0-: Returning with -1 >> [1970-01-01 15:19:43.963455] D [MSGID: 0] >> [store.c:473:gf_store_handle_destroy] 0-: Returning 0 >> [1970-01-01 15:19:43.963757] D [MSGID: 0] >> [glusterd-store.c:3546:glusterd_store_retrieve_volumes] 0-management: >> Returning with 0 >> [1970-01-01 15:19:43.964159] D [MSGID: 0] >> [glusterd-store.c:4662:glusterd_store_retrieve_peers] 0-management: >> Returning with 0 >> [1970-01-01 15:19:43.964471] I [MSGID: 106194] >> [glusterd-store.c:3983:glusterd_store_retrieve_missed_snaps_list] >> 0-management: No missed snaps list. >> [1970-01-01 15:19:43.964580] D [MSGID: 0] >> [glusterd-store.c:4104:glusterd_store_retrieve_snaps] 0-management: >> Returning with 0 >> [1970-01-01 15:19:43.964680] D [MSGID: 0] >> [glusterd-store.c:4894:glusterd_restore] 0-management: Returning 0 >> [1970-01-01 15:19:43.965060] D [MSGID: 0] >> [options.c:1225:xlator_option_init_int32] 0-management: option >> event-threads using set value 1 >> Final graph: >> >> +------------------------------------------------------------------------------+ >> 1: volume management >> 2: type mgmt/glusterd >> 3: option rpc-auth.auth-glusterfs on >> 4: option rpc-auth.auth-unix on >> 5: option rpc-auth.auth-null on >> 6: option rpc-auth-allow-insecure on >> 7: option transport.listen-backlog 10 >> 8: option event-threads 1 >> 9: option ping-timeout 0 >> 10: option transport.socket.read-fail-log off >> 11: option transport.socket.keepalive-interval 2 >> 12: option transport.socket.keepalive-time 10 >> 13: option transport-type rdma >> 14: option working-directory /var/lib/glusterd >> 15: end-volume >> 16: >> >> +------------------------------------------------------------------------------+ >> [1970-01-01 15:19:43.966808] I [MSGID: 101190] >> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >> with index 1 >> [1970-01-01 15:19:44.840454] E [rpcsvc.c:513:rpcsvc_request_create] >> 0-rpc-service: RPC version not supported (XID: 0x0, Ver: 0, Program: 0, >> ProgVers: 0, Proc: 2) from trans (socket.management) >> [1970-01-01 15:19:44.840884] D [rpcsvc.c:1416:rpcsvc_error_reply] (--> >> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xffbac)[0x3fffa12acbe4] >> (--> /usr/lib64/libgfrpc.so.0(+0xc5f4)[0x3fffa12525f4] (--> >> /usr/lib64/libgfrpc.so.0(+0xcf00)[0x3fffa1252f00] (--> >> /usr/lib64/libgfrpc.so.0(+0xd224)[0x3fffa1253224] (--> >> /usr/lib64/libgfrpc.so.0(+0xd84c)[0x3fffa125384c] ))))) 0-: sending a RPC >> error reply >> [1970-01-01 15:19:44.841055] D [logging.c:1805:gf_log_flush_extra_msgs] >> 0-logging-infra: Log buffer size reduced. About to flush 5 extra log >> messages >> [1970-01-01 15:19:44.841156] D [logging.c:1808:gf_log_flush_extra_msgs] >> 0-logging-infra: Just flushed 5 extra log messages >> pending frames: >> patchset: git://git.gluster.org/glusterfs.git >> signal received: 11 >> time of crash: >> 1970-01-01 15:19:44 >> configuration details: >> argp 1 >> backtrace 1 >> dlfcn 1 >> libpthread 1 >> llistxattr 1 >> setfsid 1 >> spinlock 1 >> epoll.h 1 >> xattr.h 1 >> st_atim.tv_nsec 1 >> package-string: glusterfs 5.0 >> /usr/lib64/libglusterfs.so.0(+0x422a4)[0x3fffa12ab2a4] >> /usr/lib64/libglusterfs.so.0(gf_print_trace-0xf5080)[0x3fffa12b82e0] >> ./glusterd(glusterfsd_print_trace-0x22fa4)[0x100067ec] >> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x3fffa13f0478] >> /lib64/libc.so.6(xdr_accepted_reply-0x72d3c)[0x3fffa11375cc] >> /lib64/libc.so.6(xdr_accepted_reply-0x72d9c)[0x3fffa113756c] >> /lib64/libc.so.6(xdr_union-0x63a94)[0x3fffa1147dd4] >> /lib64/libc.so.6(xdr_replymsg-0x72c58)[0x3fffa11376e0] >> /lib64/libc.so.6(xdr_sizeof-0x62a78)[0x3fffa1149120] >> /usr/lib64/libgfrpc.so.0(+0x9b0c)[0x3fffa124fb0c] >> /usr/lib64/libgfrpc.so.0(rpcsvc_submit_generic-0x149f4)[0x3fffa125228c] >> /usr/lib64/libgfrpc.so.0(+0xc614)[0x3fffa1252614] >> /usr/lib64/libgfrpc.so.0(+0xcf00)[0x3fffa1252f00] >> /usr/lib64/libgfrpc.so.0(+0xd224)[0x3fffa1253224] >> /usr/lib64/libgfrpc.so.0(+0xd84c)[0x3fffa125384c] >> /usr/lib64/libgfrpc.so.0(rpc_transport_notify-0x10eec)[0x3fffa125610c] >> /usr/lib64/glusterfs/5.0/rpc-transport/socket.so(+0xc09c)[0x3fff9d51709c] >> /usr/lib64/libglusterfs.so.0(+0xb84bc)[0x3fffa13214bc] >> /lib64/libpthread.so.0(+0xbb30)[0x3fffa11bdb30] >> /lib64/libc.so.6(clone-0x9e964)[0x3fffa110817c] >> --------- >> Segmentation fault (core dumped) >> >> Could you please help me, what actually the problem? >> >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Thanks, > Sanju > -- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndevos at redhat.com Wed Mar 6 08:37:15 2019 From: ndevos at redhat.com (Niels de Vos) Date: Wed, 6 Mar 2019 09:37:15 +0100 Subject: [Gluster-users] [Gluster-Maintainers] Release 6: Release date update In-Reply-To: References: <8c0c5f02-3d31-526a-9c6e-e8221e23cccd@redhat.com> Message-ID: <20190306083715.GE16424@ndevos-x270> On Wed, Mar 06, 2019 at 02:34:37PM +0900, Guillaume Pavese wrote: > Ready to test, as soon as a build is made. > I have been keeping refreshing > https://cbs.centos.org/koji/packageinfo?packageID=5 since the last one > 6.0-0.1.rc0.el7 built on 22th feb which did not include important patches > (event handler fails) merged for 5.4-1 since... If you are interested in testing packages before they get released, I recommend to subscribe to the packaging mailinglist. We normally announce it there when packages are available. For CentOS there is a centos-release-gluster6 package that provides the yum repository configuration so that installing/updating is really easy: https://lists.gluster.org/pipermail/packaging/2019-February/000702.html Niels > I think a RC1 would have been warranted for those patches already while > waiting for the upgrade bugs to be fixed in a RC2... > > Guillaume Pavese > Ing?nieur Syst?me et R?seau > Interactiv-Group > > > On Wed, Mar 6, 2019 at 3:17 AM Shyam Ranganathan > wrote: > > > Hi, > > > > Release-6 was to be an early March release, and due to finding bugs > > while performing upgrade testing, is now expected in the week of 18th > > March, 2019. > > > > RC1 builds are expected this week, to contain the required fixes, next > > week would be testing our RC1 for release fitness before the release. > > > > As always, request that users test the RC builds and report back issues > > they encounter, to help make the release a better quality. > > > > Shyam > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > maintainers mailing list > maintainers at gluster.org > https://lists.gluster.org/mailman/listinfo/maintainers From rkavunga at redhat.com Wed Mar 6 08:57:44 2019 From: rkavunga at redhat.com (RAFI KC) Date: Wed, 6 Mar 2019 14:27:44 +0530 Subject: [Gluster-users] Not able to start glusterd In-Reply-To: References: Message-ID: Hi Abhishek, Good to know that you have resolved your problem. Do you think any more information is required to add in the upgrade doc for a smooth upgrade flow. It would be great to see a PR to the repo https://github.com/gluster/glusterdocs/tree/master/docs/Upgrade-Guide for updating the doc with the information. Regards Rafi KC On 3/6/19 2:03 PM, ABHISHEK PALIWAL wrote: > Hi Sanju, > > Thanks for the response. > > I have resolved the issue, actually I have updated from 3.7.6 to 5.0, > in new version RPC is coming from libtirpb , but I forgot to enable > "--with-libtirpc" in configuration. > > After enabling able to start glusterd. > > Regards, > Abhishek > > On Wed, Mar 6, 2019 at 12:58 PM Sanju Rakonde > wrote: > > Abhishek, > > We need below information on investigate this issue. > 1. gluster --version > 2. Please run glusterd in gdb, so that we can capture the > backtrace. I see some rpc errors in log, but backtrace will be > more helpful. > ? ? To run glusterd in gdb, you need start glusterd in gdb (i.e. > gdb glusterd, and then give the command "run -N"). when you see a > segmentation? ? ? ? ? ?fault, please capture the backtrace and > paste it here. > > On Wed, Mar 6, 2019 at 10:07 AM ABHISHEK PALIWAL > > wrote: > > Hi Team, > > I am facing the issue where at the time of starting the > glusterd segmentation fault is reported. > > Below are the logs > > root at 128:/usr/sbin# ./glusterd? --debug > [1970-01-01 15:19:43.940386] I [MSGID: 100030] > [glusterfsd.c:2691:main] 0-./glusterd: Started running > ./glusterd version 5.0 (args: ./glusterd --debug) > [1970-01-01 15:19:43.940855] D > [logging.c:1833:__gf_log_inject_timer_event] 0-logging-infra: > Starting timer now. Timeout = 120, current buf size = 5 > [1970-01-01 15:19:43.941736] D [MSGID: 0] > [glusterfsd.c:747:get_volfp] 0-glusterfsd: loading volume file > /etc/glusterfs/glusterd.vol > [1970-01-01 15:19:43.945796] D [MSGID: 101097] > [xlator.c:341:xlator_dynload_newway] 0-xlator: > dlsym(xlator_api) on > /usr/lib64/glusterfs/5.0/xlator/mgmt/glusterd.so: undefined > symbol: xlator_api. Fall back to old symbols > [1970-01-01 15:19:43.946279] I [MSGID: 106478] > [glusterd.c:1435:init] 0-management: Maximum allowed open file > descriptors set to 65536 > [1970-01-01 15:19:43.946419] I [MSGID: 106479] > [glusterd.c:1491:init] 0-management: Using /var/lib/glusterd > as working directory > [1970-01-01 15:19:43.946515] I [MSGID: 106479] > [glusterd.c:1497:init] 0-management: Using /var/run/gluster as > pid file working directory > [1970-01-01 15:19:43.946968] D [MSGID: 0] > [glusterd.c:458:glusterd_rpcsvc_options_build] 0-glusterd: > listen-backlog value: 10 > [1970-01-01 15:19:43.947139] D [rpcsvc.c:2607:rpcsvc_init] > 0-rpc-service: RPC service inited. > [1970-01-01 15:19:43.947241] D > [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New > program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0 > [1970-01-01 15:19:43.947379] D > [rpc-transport.c:269:rpc_transport_load] 0-rpc-transport: > attempt to load file > /usr/lib64/glusterfs/5.0/rpc-transport/socket.so > [1970-01-01 15:19:43.955198] D [socket.c:4464:socket_init] > 0-socket.management: Configued transport.tcp-user-timeout=0 > [1970-01-01 15:19:43.955316] D [socket.c:4482:socket_init] > 0-socket.management: Reconfigued transport.keepalivecnt=9 > [1970-01-01 15:19:43.955415] D > [socket.c:4167:ssl_setup_connection_params] > 0-socket.management: SSL support on the I/O path is NOT enabled > [1970-01-01 15:19:43.955504] D > [socket.c:4170:ssl_setup_connection_params] > 0-socket.management: SSL support for glusterd is NOT enabled > [1970-01-01 15:19:43.955612] D > [name.c:572:server_fill_address_family] 0-socket.management: > option address-family not specified, defaulting to inet6 > [1970-01-01 15:19:43.955928] D > [rpc-transport.c:269:rpc_transport_load] 0-rpc-transport: > attempt to load file > /usr/lib64/glusterfs/5.0/rpc-transport/rdma.so > [1970-01-01 15:19:43.956079] E > [rpc-transport.c:273:rpc_transport_load] 0-rpc-transport: > /usr/lib64/glusterfs/5.0/rpc-transport/rdma.so: cannot open > shared object file: No such file or directory > [1970-01-01 15:19:43.956177] W > [rpc-transport.c:277:rpc_transport_load] 0-rpc-transport: > volume 'rdma.management': transport-type 'rdma' is not valid > or not found on this machine > [1970-01-01 15:19:43.956270] W > [rpcsvc.c:1789:rpcsvc_create_listener] 0-rpc-service: cannot > create listener, initing the transport failed > [1970-01-01 15:19:43.956362] E [MSGID: 106244] > [glusterd.c:1798:init] 0-management: creation of 1 listeners > failed, continuing with succeeded transport > [1970-01-01 15:19:43.956459] D > [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New > program registered: GlusterD svc peer, Num: 1238437, Ver: 2, > Port: 0 > [1970-01-01 15:19:43.956561] D > [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New > program registered: GlusterD svc cli read-only, Num: 1238463, > Ver: 2, Port: 0 > [1970-01-01 15:19:43.956666] D > [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New > program registered: GlusterD svc mgmt, Num: 1238433, Ver: 2, > Port: 0 > [1970-01-01 15:19:43.956758] D > [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New > program registered: GlusterD svc mgmt v3, Num: 1238433, Ver: > 3, Port: 0 > [1970-01-01 15:19:43.956853] D > [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New > program registered: Gluster Portmap, Num: 34123456, Ver: 1, > Port: 0 > [1970-01-01 15:19:43.956946] D > [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New > program registered: Gluster Handshake, Num: 14398633, Ver: 2, > Port: 0 > [1970-01-01 15:19:43.957062] D > [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New > program registered: Gluster MGMT Handshake, Num: 1239873, Ver: > 1, Port: 0 > [1970-01-01 15:19:43.957205] D [rpcsvc.c:2607:rpcsvc_init] > 0-rpc-service: RPC service inited. > [1970-01-01 15:19:43.957303] D > [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New > program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0 > [1970-01-01 15:19:43.957408] D > [rpc-transport.c:269:rpc_transport_load] 0-rpc-transport: > attempt to load file > /usr/lib64/glusterfs/5.0/rpc-transport/socket.so > [1970-01-01 15:19:43.957563] D [socket.c:4424:socket_init] > 0-socket.management: disabling nodelay > [1970-01-01 15:19:43.957650] D [socket.c:4464:socket_init] > 0-socket.management: Configued transport.tcp-user-timeout=0 > [1970-01-01 15:19:43.957738] D [socket.c:4482:socket_init] > 0-socket.management: Reconfigued transport.keepalivecnt=9 > [1970-01-01 15:19:43.957830] D > [socket.c:4167:ssl_setup_connection_params] > 0-socket.management: SSL support on the I/O path is NOT enabled > [1970-01-01 15:19:43.957922] D > [socket.c:4170:ssl_setup_connection_params] > 0-socket.management: SSL support for glusterd is NOT enabled > [1970-01-01 15:19:43.958186] D > [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New > program registered: GlusterD svc cli, Num: 1238463, Ver: 2, > Port: 0 > [1970-01-01 15:19:43.958280] D > [rpcsvc.c:2146:rpcsvc_program_register] 0-rpc-service: New > program registered: Gluster Handshake (CLI Getspec), Num: > 14398633, Ver: 2, Port: 0 > [1970-01-01 15:19:43.958461] D [MSGID: 0] > [glusterd-utils.c:7878:glusterd_sm_tr_log_init] 0-glusterd: > returning 0 > [1970-01-01 15:19:43.958557] D [MSGID: 0] > [glusterd.c:1875:init] 0-management: cannot get > run-with-valgrind value > [1970-01-01 15:19:43.960895] E [MSGID: 101032] > [store.c:447:gf_store_handle_retrieve] 0-: Path corresponding > to /var/lib/glusterd/glusterd.info . [No > such file or directory] > [1970-01-01 15:19:43.961016] D [MSGID: 0] > [store.c:452:gf_store_handle_retrieve] 0-: Returning -1 > [1970-01-01 15:19:43.961108] D [MSGID: 0] > [glusterd-store.c:2169:glusterd_retrieve_op_version] > 0-management: Unable to get store handle! > [1970-01-01 15:19:43.961216] E [MSGID: 101032] > [store.c:447:gf_store_handle_retrieve] 0-: Path corresponding > to /var/lib/glusterd/glusterd.info . [No > such file or directory] > [1970-01-01 15:19:43.961325] D [MSGID: 0] > [store.c:452:gf_store_handle_retrieve] 0-: Returning -1 > [1970-01-01 15:19:43.961428] D [MSGID: 0] > [glusterd-store.c:2345:glusterd_retrieve_uuid] 0-management: > Unable to get storehandle! > [1970-01-01 15:19:43.961523] D [MSGID: 0] > [glusterd-store.c:2366:glusterd_retrieve_uuid] 0-management: > Returning -1 > [1970-01-01 15:19:43.961617] I [MSGID: 106514] > [glusterd-store.c:2304:glusterd_restore_op_version] > 0-management: Detected new install. Setting op-version to > maximum : 50000 > [1970-01-01 15:19:43.962658] D [MSGID: 0] > [store.c:432:gf_store_handle_new] 0-: Returning 0 > [1970-01-01 15:19:43.962750] D [MSGID: 0] > [store.c:452:gf_store_handle_retrieve] 0-: Returning 0 > [1970-01-01 15:19:43.963047] D [MSGID: 0] > [store.c:515:gf_store_iter_new] 0-: Returning with 0 > [1970-01-01 15:19:43.963194] D [MSGID: 0] > [store.c:632:gf_store_iter_get_next] 0-: Returning with 0 > [1970-01-01 15:19:43.963318] D [MSGID: 0] > [store.c:632:gf_store_iter_get_next] 0-: Returning with -1 > [1970-01-01 15:19:43.963455] D [MSGID: 0] > [store.c:473:gf_store_handle_destroy] 0-: Returning 0 > [1970-01-01 15:19:43.963757] D [MSGID: 0] > [glusterd-store.c:3546:glusterd_store_retrieve_volumes] > 0-management: Returning with 0 > [1970-01-01 15:19:43.964159] D [MSGID: 0] > [glusterd-store.c:4662:glusterd_store_retrieve_peers] > 0-management: Returning with 0 > [1970-01-01 15:19:43.964471] I [MSGID: 106194] > [glusterd-store.c:3983:glusterd_store_retrieve_missed_snaps_list] > 0-management: No missed snaps list. > [1970-01-01 15:19:43.964580] D [MSGID: 0] > [glusterd-store.c:4104:glusterd_store_retrieve_snaps] > 0-management: Returning with 0 > [1970-01-01 15:19:43.964680] D [MSGID: 0] > [glusterd-store.c:4894:glusterd_restore] 0-management: Returning 0 > [1970-01-01 15:19:43.965060] D [MSGID: 0] > [options.c:1225:xlator_option_init_int32] 0-management: option > event-threads using set value 1 > Final graph: > +------------------------------------------------------------------------------+ > ? 1: volume management > ? 2:? ? ?type mgmt/glusterd > ? 3:? ? ?option rpc-auth.auth-glusterfs on > ? 4:? ? ?option rpc-auth.auth-unix on > ? 5:? ? ?option rpc-auth.auth-null on > ? 6:? ? ?option rpc-auth-allow-insecure on > ? 7:? ? ?option transport.listen-backlog 10 > ? 8:? ? ?option event-threads 1 > ? 9:? ? ?option ping-timeout 0 > ?10:? ? ?option transport.socket.read-fail-log off > ?11:? ? ?option transport.socket.keepalive-interval 2 > ?12:? ? ?option transport.socket.keepalive-time 10 > ?13:? ? ?option transport-type rdma > ?14:? ? ?option working-directory /var/lib/glusterd > ?15: end-volume > ?16: > +------------------------------------------------------------------------------+ > [1970-01-01 15:19:43.966808] I [MSGID: 101190] > [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: > Started thread with index 1 > [1970-01-01 15:19:44.840454] E > [rpcsvc.c:513:rpcsvc_request_create] 0-rpc-service: RPC > version not supported (XID: 0x0, Ver: 0, Program: 0, ProgVers: > 0, Proc: 2) from trans (socket.management) > [1970-01-01 15:19:44.840884] D > [rpcsvc.c:1416:rpcsvc_error_reply] (--> > /usr/lib64/libglusterfs.so.0(_gf_log_callingfn-0xffbac)[0x3fffa12acbe4] > (--> /usr/lib64/libgfrpc.so.0(+0xc5f4)[0x3fffa12525f4] (--> > /usr/lib64/libgfrpc.so.0(+0xcf00)[0x3fffa1252f00] (--> > /usr/lib64/libgfrpc.so.0(+0xd224)[0x3fffa1253224] (--> > /usr/lib64/libgfrpc.so.0(+0xd84c)[0x3fffa125384c] ))))) 0-: > sending a RPC error reply > [1970-01-01 15:19:44.841055] D > [logging.c:1805:gf_log_flush_extra_msgs] 0-logging-infra: Log > buffer size reduced. About to flush 5 extra log messages > [1970-01-01 15:19:44.841156] D > [logging.c:1808:gf_log_flush_extra_msgs] 0-logging-infra: Just > flushed 5 extra log messages > pending frames: > patchset: git://git.gluster.org/glusterfs.git > > signal received: 11 > time of crash: > 1970-01-01 15:19:44 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 5.0 > /usr/lib64/libglusterfs.so.0(+0x422a4)[0x3fffa12ab2a4] > /usr/lib64/libglusterfs.so.0(gf_print_trace-0xf5080)[0x3fffa12b82e0] > ./glusterd(glusterfsd_print_trace-0x22fa4)[0x100067ec] > linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x3fffa13f0478] > /lib64/libc.so.6(xdr_accepted_reply-0x72d3c)[0x3fffa11375cc] > /lib64/libc.so.6(xdr_accepted_reply-0x72d9c)[0x3fffa113756c] > /lib64/libc.so.6(xdr_union-0x63a94)[0x3fffa1147dd4] > /lib64/libc.so.6(xdr_replymsg-0x72c58)[0x3fffa11376e0] > /lib64/libc.so.6(xdr_sizeof-0x62a78)[0x3fffa1149120] > /usr/lib64/libgfrpc.so.0(+0x9b0c)[0x3fffa124fb0c] > /usr/lib64/libgfrpc.so.0(rpcsvc_submit_generic-0x149f4)[0x3fffa125228c] > /usr/lib64/libgfrpc.so.0(+0xc614)[0x3fffa1252614] > /usr/lib64/libgfrpc.so.0(+0xcf00)[0x3fffa1252f00] > /usr/lib64/libgfrpc.so.0(+0xd224)[0x3fffa1253224] > /usr/lib64/libgfrpc.so.0(+0xd84c)[0x3fffa125384c] > /usr/lib64/libgfrpc.so.0(rpc_transport_notify-0x10eec)[0x3fffa125610c] > /usr/lib64/glusterfs/5.0/rpc-transport/socket.so(+0xc09c)[0x3fff9d51709c] > /usr/lib64/libglusterfs.so.0(+0xb84bc)[0x3fffa13214bc] > /lib64/libpthread.so.0(+0xbb30)[0x3fffa11bdb30] > /lib64/libc.so.6(clone-0x9e964)[0x3fffa110817c] > --------- > Segmentation fault (core dumped) > > Could you? please help me, what actually the problem? > > > -- > > > > > Regards > Abhishek Paliwal > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Thanks, > Sanju > > > > -- > > > > > Regards > Abhishek Paliwal > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Wed Mar 6 16:51:26 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Wed, 06 Mar 2019 18:51:26 +0200 Subject: [Gluster-users] Gluster : Improvements on "heal info" command Message-ID: <1arqcq3vmp99pobus3sb2kaw.1551891086158@email.android.com> Hi , This sounds nice. I would like to ask if the order is starting from the local node's bricks first ? (I am talking about --brick=one) Best Regards, Strahil NikolovOn Mar 5, 2019 10:51, Ashish Pandey wrote: > > Hi All, > > We have observed and heard from gluster users about the long time "heal info" command takes. > Even when we all want to know if a gluster volume is healthy or not, it takes time to list down all the files from all the bricks after which we can be > sure if the volume is healthy or not. > Here, we have come up with some options for "heal info" command which provide report quickly and reliably. > > gluster v heal vol info --subvol=[number of the subvol] --brick=[one,all] > -------- > > Problem: "gluster v heal info" command picks each subvolume and checks the .glusterfs/indices/xattrop folder of? every brick of that subvolume to find out if there is any entry > which needs to be healed. It picks the entry and takes a lock? on that entry to check xattrs to find out if that entry actually needs heal or not. > This LOCK->CHECK-XATTR->UNLOCK cycle takes lot of time for each file. > > Let's consider two most often seen cases for which we use "heal info" and try to understand the improvements. > > Case -1 : Consider 4+2 EC volume and all the bricks on 6 different nodes. > A brick of the volume is down and client has written 10000 files on one of the mount point of this volume. Entries for these 10K files will be created on ".glusterfs/indices/xattrop" > on all the rest of 5 bricks. Now, brick is UP and when we use "heal info" command for this volume, it goes to all the bricks and picks these 10K file entries and > goes through LOCK->CHECK-XATTR->UNLOCK cycle for all the files. This happens for all the bricks, that means, we check 50K files and perform the?LOCK->CHECK-XATTR->UNLOCK cycle 50K times, > while only 10K entries were sufficient to check. It is a very time consuming operation. If IO"s are happening one some of the new files, we check these files also which will add the time. > Here, all we wanted to know if our volume has been healed and healthy. > > Solution : Whenever a brick goes down and comes up and when we use "heal info" command, our *main intention* is to find out if the volume is *healthy* or *unhealthy*. A volume is unhealthy even if one > file is not healthy. So, we should scan bricks one by one and as soon as we find that one brick is having some entries which require to be healed, we can come out and list the files and say the volume is not > healthy. No need to scan rest of the bricks. That's where "--brick=[one,all]" option has been introduced. > > "gluster v heal vol info --brick=[one,all]" > "one" - It will scan the brick sequentially and as soon as it will find any unhealthy entries, it will list it out and stop scanning other bricks. > "all" - It will act just like current behavior and provide all the files from all the bricks. If we do not provide this option, default (current) behavior will be applicable. > > Case -2 : Consider 24 X (4+2) EC volume. Let's say one brick from *only one* of the sub volume has been replaced and a heal has been triggered. > To know if the volume is in healthy state, we go to each brick of *each and every sub volume* and check if there are any entries in ".glusterfs/indices/xattrop" folder which need heal or not. > If we know which sub volume participated in brick replacement, we just need to check health of that sub volume and not query/check other sub volumes. > > If several clients are writing number of files on this volume, an entry for each of these files will be created in? .glusterfs/indices/xattrop and "heal info' > command will go through LOCK->CHECK-XATTR->UNLOCK cycle to find out if these entries need heal or not which takes lot of time. > In addition to this a client will also see performance drop as it will have to release and take lock again. > > Solution: Provide an option to mention number of sub volume for which we want to check heal info. > > "gluster v heal vol info --subvol=? " > Here, --subvol will be given number of the subvolume we want to check. > Example: > "gluster v heal vol info --subvol=1 " > > > =================================== > Performance Data - > A quick performance test done on standalone system. > > Type: Distributed-Disperse > Volume ID: ea40eb13-d42c-431c-9c89-0153e834e67e > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x (4 + 2) = 12 > Transport-type: tcp > Bricks: > Brick1: apandey:/home/apandey/bricks/gluster/vol-1 > Brick2: apandey:/home/apandey/bricks/gluster/vol-2 > Brick3: apandey:/home/apandey/bricks/gluster/vol-3 > Brick4: apandey:/home/apandey/bricks/gluster/vol-4 > Brick5: apandey:/home/apandey/bricks/gluster/vol-5 > Brick6: apandey:/home/apandey/bricks/gluster/vol-6 > Brick7: apandey:/home/apandey/bricks/gluster/new-1 > Brick8: apandey:/home/apandey/bricks/gluster/new-2 > Brick9: apandey:/home/apandey/bricks/gluster/new-3 > Brick10: apandey:/home/apandey/bricks/gluster/new-4 > Brick11: apandey:/home/apandey/bricks/gluster/new-5 > Brick12: apandey:/home/apandey/bricks/gluster/new-6 > > Just disabled the shd to get the data - > > Killed one brick each from two subvolumes and wrote 2000 files on mount point. > [root at apandey vol]# for i in {1..2000};do echo abc >> file-$i; done > > Start the volume using force option and get the heal info. Following is the data - > > [root at apandey glusterfs]# time gluster v heal vol info --brick=one >> /dev/null ????????????<<<<<<<< This will scan brick one by one and come out as soon as we find volume is unhealthy. > > real?? ?0m8.316s > user?? ?0m2.241s > sys?? ?0m1.278s > [root at apandey glusterfs]# > > [root at apandey glusterfs]# time gluster v heal vol info >> /dev/null????????????????????????? ????????<<<<<<<< This is current behavior. > > real?? ?0m26.097s > user?? ?0m10.868s > sys?? ?0m6.198s > [root at apandey glusterfs]# > =================================== > > I would like your comments/suggestions on this improvements. > Specially, would like to hear on the new syntax of the command - > > gluster v heal vol info --subvol=[number of the subvol] --brick=[one,all] > > Note that if we do not provide new options, command will behave just like it does right now. > Also, this improvement is valid for AFR and EC. > > --- > Ashish > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aspandey at redhat.com Wed Mar 6 17:03:17 2019 From: aspandey at redhat.com (Ashish Pandey) Date: Wed, 6 Mar 2019 12:03:17 -0500 (EST) Subject: [Gluster-users] Gluster : Improvements on "heal info" command In-Reply-To: <1arqcq3vmp99pobus3sb2kaw.1551891086158@email.android.com> References: <1arqcq3vmp99pobus3sb2kaw.1551891086158@email.android.com> Message-ID: <1454335424.6314606.1551891797145.JavaMail.zimbra@redhat.com> No, it is not necessary that the first brick would be local one. I really don't think starting from local node will make a difference. The major time spent is not in getting list of entries from .gluster/indices/xattrop folder. The LOCK->XATTR_CHECK->UNLOCK is the cycle which takes most of the time which is not going to change even if it is from local brick. --- Ashish ----- Original Message ----- From: "Strahil" To: "Ashish" , "Gluster" , "Gluster" Sent: Wednesday, March 6, 2019 10:21:26 PM Subject: Re: [Gluster-users] Gluster : Improvements on "heal info" command Hi , This sounds nice. I would like to ask if the order is starting from the local node's bricks first ? (I am talking about --brick=one) Best Regards, Strahil Nikolov On Mar 5, 2019 10:51, Ashish Pandey wrote: Hi All, We have observed and heard from gluster users about the long time "heal info" command takes. Even when we all want to know if a gluster volume is healthy or not, it takes time to list down all the files from all the bricks after which we can be sure if the volume is healthy or not. Here, we have come up with some options for "heal info" command which provide report quickly and reliably. gluster v heal vol info --subvol=[number of the subvol] --brick=[one,all] -------- Problem: "gluster v heal info" command picks each subvolume and checks the .glusterfs/indices/xattrop folder of every brick of that subvolume to find out if there is any entry which needs to be healed. It picks the entry and takes a lock on that entry to check xattrs to find out if that entry actually needs heal or not. This LOCK->CHECK-XATTR->UNLOCK cycle takes lot of time for each file. Let's consider two most often seen cases for which we use "heal info" and try to understand the improvements. Case -1 : Consider 4+2 EC volume and all the bricks on 6 different nodes. A brick of the volume is down and client has written 10000 files on one of the mount point of this volume. Entries for these 10K files will be created on ".glusterfs/indices/xattrop" on all the rest of 5 bricks. Now, brick is UP and when we use "heal info" command for this volume, it goes to all the bricks and picks these 10K file entries and goes through LOCK->CHECK-XATTR->UNLOCK cycle for all the files. This happens for all the bricks, that means, we check 50K files and perform the LOCK->CHECK-XATTR->UNLOCK cycle 50K times, while only 10K entries were sufficient to check. It is a very time consuming operation. If IO"s are happening one some of the new files, we check these files also which will add the time. Here, all we wanted to know if our volume has been healed and healthy. Solution : Whenever a brick goes down and comes up and when we use "heal info" command, our *main intention* is to find out if the volume is *healthy* or *unhealthy*. A volume is unhealthy even if one file is not healthy. So, we should scan bricks one by one and as soon as we find that one brick is having some entries which require to be healed, we can come out and list the files and say the volume is not healthy. No need to scan rest of the bricks. That's where "--brick=[one,all]" option has been introduced. "gluster v heal vol info --brick=[one,all]" "one" - It will scan the brick sequentially and as soon as it will find any unhealthy entries, it will list it out and stop scanning other bricks. "all" - It will act just like current behavior and provide all the files from all the bricks. If we do not provide this option, default (current) behavior will be applicable. Case -2 : Consider 24 X (4+2) EC volume. Let's say one brick from *only one* of the sub volume has been replaced and a heal has been triggered. To know if the volume is in healthy state, we go to each brick of *each and every sub volume* and check if there are any entries in ".glusterfs/indices/xattrop" folder which need heal or not. If we know which sub volume participated in brick replacement, we just need to check health of that sub volume and not query/check other sub volumes. If several clients are writing number of files on this volume, an entry for each of these files will be created in .glusterfs/indices/xattrop and "heal info' command will go through LOCK->CHECK-XATTR->UNLOCK cycle to find out if these entries need heal or not which takes lot of time. In addition to this a client will also see performance drop as it will have to release and take lock again. Solution: Provide an option to mention number of sub volume for which we want to check heal info. "gluster v heal vol info --subvol= " Here, --subvol will be given number of the subvolume we want to check. Example: "gluster v heal vol info --subvol=1 " =================================== Performance Data - A quick performance test done on standalone system. Type: Distributed-Disperse Volume ID: ea40eb13-d42c-431c-9c89-0153e834e67e Status: Started Snapshot Count: 0 Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: apandey:/home/apandey/bricks/gluster/vol-1 Brick2: apandey:/home/apandey/bricks/gluster/vol-2 Brick3: apandey:/home/apandey/bricks/gluster/vol-3 Brick4: apandey:/home/apandey/bricks/gluster/vol-4 Brick5: apandey:/home/apandey/bricks/gluster/vol-5 Brick6: apandey:/home/apandey/bricks/gluster/vol-6 Brick7: apandey:/home/apandey/bricks/gluster/new-1 Brick8: apandey:/home/apandey/bricks/gluster/new-2 Brick9: apandey:/home/apandey/bricks/gluster/new-3 Brick10: apandey:/home/apandey/bricks/gluster/new-4 Brick11: apandey:/home/apandey/bricks/gluster/new-5 Brick12: apandey:/home/apandey/bricks/gluster/new-6 Just disabled the shd to get the data - Killed one brick each from two subvolumes and wrote 2000 files on mount point. [root at apandey vol]# for i in {1..2000};do echo abc >> file-$i; done Start the volume using force option and get the heal info. Following is the data - [root at apandey glusterfs]# time gluster v heal vol info --brick=one >> /dev/null <<<<<<<< This will scan brick one by one and come out as soon as we find volume is unhealthy. real 0m8.316s user 0m2.241s sys 0m1.278s [root at apandey glusterfs]# [root at apandey glusterfs]# time gluster v heal vol info >> /dev/null <<<<<<<< This is current behavior. real 0m26.097s user 0m10.868s sys 0m6.198s [root at apandey glusterfs]# =================================== I would like your comments/suggestions on this improvements. Specially, would like to hear on the new syntax of the command - gluster v heal vol info --subvol=[number of the subvol] --brick=[one,all] Note that if we do not provide new options, command will behave just like it does right now. Also, this improvement is valid for AFR and EC. --- Ashish _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Mar 7 05:04:42 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Thu, 07 Mar 2019 07:04:42 +0200 Subject: [Gluster-users] Experiences with FUSE in real world - Presentationat Vault 2019 Message-ID: Thanks a lot. Is there a recording of that ? Best Regards, Strahil NikolovOn Mar 5, 2019 11:13, Raghavendra Gowdappa wrote: > > All, > > Recently me, Manoj and Csaba presented on positives and negatives of implementing File systems in userspace using FUSE [1]. We had based the talk on our experiences with Glusterfs having FUSE as the native interface. The slides can also be found at [1]. > > [1] https://www.usenix.org/conference/vault19/presentation/pillai > > regards, > Raghavendra -------------- next part -------------- An HTML attachment was scrubbed... URL: From archon810 at gmail.com Thu Mar 7 06:23:01 2019 From: archon810 at gmail.com (Artem Russakovskii) Date: Wed, 6 Mar 2019 22:23:01 -0800 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: Message-ID: Is the next release going to be an imminent hotfix, i.e. something like today/tomorrow, or are we talking weeks? Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | +ArtemRussakovskii | @ArtemR On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii wrote: > Ended up downgrading to 5.3 just in case. Peer status and volume status > are OK now. > > zypper install --oldpackage glusterfs-5.3-lp150.100.1 > Loading repository data... > Reading installed packages... > Resolving package dependencies... > > Problem: glusterfs-5.3-lp150.100.1.x86_64 requires libgfapi0 = 5.3, but > this requirement cannot be provided > not installable providers: libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] > Solution 1: Following actions will be done: > downgrade of libgfapi0-5.4-lp150.100.1.x86_64 to > libgfapi0-5.3-lp150.100.1.x86_64 > downgrade of libgfchangelog0-5.4-lp150.100.1.x86_64 to > libgfchangelog0-5.3-lp150.100.1.x86_64 > downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 to > libgfrpc0-5.3-lp150.100.1.x86_64 > downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 to > libgfxdr0-5.3-lp150.100.1.x86_64 > downgrade of libglusterfs0-5.4-lp150.100.1.x86_64 to > libglusterfs0-5.3-lp150.100.1.x86_64 > Solution 2: do not install glusterfs-5.3-lp150.100.1.x86_64 > Solution 3: break glusterfs-5.3-lp150.100.1.x86_64 by ignoring some of > its dependencies > > Choose from above solutions by number or cancel [1/2/3/c] (c): 1 > Resolving dependencies... > Resolving package dependencies... > > The following 6 packages are going to be downgraded: > glusterfs libgfapi0 libgfchangelog0 libgfrpc0 libgfxdr0 libglusterfs0 > > 6 packages to downgrade. > > Sincerely, > Artem > > -- > Founder, Android Police , APK Mirror > , Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > | @ArtemR > > > > On Tue, Mar 5, 2019 at 10:57 AM Artem Russakovskii > wrote: > >> Noticed the same when upgrading from 5.3 to 5.4, as mentioned. >> >> I'm confused though. Is actual replication affected, because the 5.4 >> server and the 3x 5.3 servers still show heal info as all 4 connected, and >> the files seem to be replicating correctly as well. >> >> So what's actually affected - just the status command, or leaving 5.4 on >> one of the nodes is doing some damage to the underlying fs? Is it fixable >> by tweaking transport.socket.ssl-enabled? Does upgrading all servers to 5.4 >> resolve it, or should we revert back to 5.3? >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police , APK Mirror >> , Illogical Robot LLC >> beerpla.net | +ArtemRussakovskii >> | @ArtemR >> >> >> >> On Tue, Mar 5, 2019 at 2:02 AM Hu Bert wrote: >> >>> fyi: did a downgrade 5.4 -> 5.3 and it worked. all replicas are up and >>> running. Awaiting updated v5.4. >>> >>> thx :-) >>> >>> Am Di., 5. M?rz 2019 um 09:26 Uhr schrieb Hari Gowtham < >>> hgowtham at redhat.com>: >>> > >>> > There are plans to revert the patch causing this error and rebuilt 5.4. >>> > This should happen faster. the rebuilt 5.4 should be void of this >>> upgrade issue. >>> > >>> > In the meantime, you can use 5.3 for this cluster. >>> > Downgrading to 5.3 will work if it was just one node that was upgrade >>> to 5.4 >>> > and the other nodes are still in 5.3. >>> > >>> > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert wrote: >>> > > >>> > > Hi Hari, >>> > > >>> > > thx for the hint. Do you know when this will be fixed? Is a downgrade >>> > > 5.4 -> 5.3 a possibility to fix this? >>> > > >>> > > Hubert >>> > > >>> > > Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb Hari Gowtham < >>> hgowtham at redhat.com>: >>> > > > >>> > > > Hi, >>> > > > >>> > > > This is a known issue we are working on. >>> > > > As the checksum differs between the updated and non updated node, >>> the >>> > > > peers are getting rejected. >>> > > > The bricks aren't coming because of the same issue. >>> > > > >>> > > > More about the issue: >>> https://bugzilla.redhat.com/show_bug.cgi?id=1685120 >>> > > > >>> > > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert >>> wrote: >>> > > > > >>> > > > > Interestingly: gluster volume status misses gluster1, while heal >>> > > > > statistics show gluster1: >>> > > > > >>> > > > > gluster volume status workdata >>> > > > > Status of volume: workdata >>> > > > > Gluster process TCP Port RDMA Port >>> Online Pid >>> > > > > >>> ------------------------------------------------------------------------------ >>> > > > > Brick gluster2:/gluster/md4/workdata 49153 0 >>> Y 1723 >>> > > > > Brick gluster3:/gluster/md4/workdata 49153 0 >>> Y 2068 >>> > > > > Self-heal Daemon on localhost N/A N/A >>> Y 1732 >>> > > > > Self-heal Daemon on gluster3 N/A N/A >>> Y 2077 >>> > > > > >>> > > > > vs. >>> > > > > >>> > > > > gluster volume heal workdata statistics heal-count >>> > > > > Gathering count of entries to be healed on volume workdata has >>> been successful >>> > > > > >>> > > > > Brick gluster1:/gluster/md4/workdata >>> > > > > Number of entries: 0 >>> > > > > >>> > > > > Brick gluster2:/gluster/md4/workdata >>> > > > > Number of entries: 10745 >>> > > > > >>> > > > > Brick gluster3:/gluster/md4/workdata >>> > > > > Number of entries: 10744 >>> > > > > >>> > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert < >>> revirii at googlemail.com>: >>> > > > > > >>> > > > > > Hi Miling, >>> > > > > > >>> > > > > > well, there are such entries, but those haven't been a problem >>> during >>> > > > > > install and the last kernel update+reboot. The entries look >>> like: >>> > > > > > >>> > > > > > PUBLIC_IP gluster2.alpserver.de gluster2 >>> > > > > > >>> > > > > > 192.168.0.50 gluster1 >>> > > > > > 192.168.0.51 gluster2 >>> > > > > > 192.168.0.52 gluster3 >>> > > > > > >>> > > > > > 'ping gluster2' resolves to LAN IP; I removed the last entry >>> in the >>> > > > > > 1st line, did a reboot ... no, didn't help. From >>> > > > > > /var/log/glusterfs/glusterd.log >>> > > > > > on gluster 2: >>> > > > > > >>> > > > > > [2019-03-05 07:04:36.188128] E [MSGID: 106010] >>> > > > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] >>> 0-management: >>> > > > > > Version of Cksums persistent differ. local cksum = 3950307018, >>> remote >>> > > > > > cksum = 455409345 on peer gluster1 >>> > > > > > [2019-03-05 07:04:36.188314] I [MSGID: 106493] >>> > > > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] >>> 0-glusterd: >>> > > > > > Responded to gluster1 (0), ret: 0, op_ret: -1 >>> > > > > > >>> > > > > > Interestingly there are no entries in the brick logs of the >>> rejected >>> > > > > > server. Well, not surprising as no brick process is running. >>> The >>> > > > > > server gluster1 is still in rejected state. >>> > > > > > >>> > > > > > 'gluster volume start workdata force' starts the brick process >>> on >>> > > > > > gluster1, and some heals are happening on gluster2+3, but via >>> 'gluster >>> > > > > > volume status workdata' the volumes still aren't complete. >>> > > > > > >>> > > > > > gluster1: >>> > > > > > >>> ------------------------------------------------------------------------------ >>> > > > > > Brick gluster1:/gluster/md4/workdata 49152 0 >>> Y 2523 >>> > > > > > Self-heal Daemon on localhost N/A N/A >>> Y 2549 >>> > > > > > >>> > > > > > gluster2: >>> > > > > > Gluster process TCP Port RDMA >>> Port Online Pid >>> > > > > > >>> ------------------------------------------------------------------------------ >>> > > > > > Brick gluster2:/gluster/md4/workdata 49153 0 >>> Y 1723 >>> > > > > > Brick gluster3:/gluster/md4/workdata 49153 0 >>> Y 2068 >>> > > > > > Self-heal Daemon on localhost N/A N/A >>> Y 1732 >>> > > > > > Self-heal Daemon on gluster3 N/A N/A >>> Y 2077 >>> > > > > > >>> > > > > > >>> > > > > > Hubert >>> > > > > > >>> > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire < >>> mchangir at redhat.com>: >>> > > > > > > >>> > > > > > > There are probably DNS entries or /etc/hosts entries with >>> the public IP Addresses that the host names (gluster1, gluster2, gluster3) >>> are getting resolved to. >>> > > > > > > /etc/resolv.conf would tell which is the default domain >>> searched for the node names and the DNS servers which respond to the >>> queries. >>> > > > > > > >>> > > > > > > >>> > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert < >>> revirii at googlemail.com> wrote: >>> > > > > > >> >>> > > > > > >> Good morning, >>> > > > > > >> >>> > > > > > >> i have a replicate 3 setup with 2 volumes, running on >>> version 5.3 on >>> > > > > > >> debian stretch. This morning i upgraded one server to >>> version 5.4 and >>> > > > > > >> rebooted the machine; after the restart i noticed that: >>> > > > > > >> >>> > > > > > >> - no brick process is running >>> > > > > > >> - gluster volume status only shows the server itself: >>> > > > > > >> gluster volume status workdata >>> > > > > > >> Status of volume: workdata >>> > > > > > >> Gluster process TCP Port RDMA >>> Port Online Pid >>> > > > > > >> >>> ------------------------------------------------------------------------------ >>> > > > > > >> Brick gluster1:/gluster/md4/workdata N/A N/A >>> N N/A >>> > > > > > >> NFS Server on localhost N/A N/A >>> N N/A >>> > > > > > >> >>> > > > > > >> - gluster peer status on the server >>> > > > > > >> gluster peer status >>> > > > > > >> Number of Peers: 2 >>> > > > > > >> >>> > > > > > >> Hostname: gluster3 >>> > > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a >>> > > > > > >> State: Peer Rejected (Connected) >>> > > > > > >> >>> > > > > > >> Hostname: gluster2 >>> > > > > > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 >>> > > > > > >> State: Peer Rejected (Connected) >>> > > > > > >> >>> > > > > > >> - gluster peer status on the other 2 servers: >>> > > > > > >> gluster peer status >>> > > > > > >> Number of Peers: 2 >>> > > > > > >> >>> > > > > > >> Hostname: gluster1 >>> > > > > > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef >>> > > > > > >> State: Peer Rejected (Connected) >>> > > > > > >> >>> > > > > > >> Hostname: gluster3 >>> > > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a >>> > > > > > >> State: Peer in Cluster (Connected) >>> > > > > > >> >>> > > > > > >> I noticed that, in the brick logs, i see that the public IP >>> is used >>> > > > > > >> instead of the LAN IP. brick logs from one of the volumes: >>> > > > > > >> >>> > > > > > >> rejected node: https://pastebin.com/qkpj10Sd >>> > > > > > >> connected nodes: https://pastebin.com/8SxVVYFV >>> > > > > > >> >>> > > > > > >> Why is the public IP suddenly used instead of the LAN IP? >>> Killing all >>> > > > > > >> gluster processes and rebooting (again) didn't help. >>> > > > > > >> >>> > > > > > >> >>> > > > > > >> Thx, >>> > > > > > >> Hubert >>> > > > > > >> _______________________________________________ >>> > > > > > >> Gluster-users mailing list >>> > > > > > >> Gluster-users at gluster.org >>> > > > > > >> https://lists.gluster.org/mailman/listinfo/gluster-users >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > -- >>> > > > > > > Milind >>> > > > > > > >>> > > > > _______________________________________________ >>> > > > > Gluster-users mailing list >>> > > > > Gluster-users at gluster.org >>> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users >>> > > > >>> > > > >>> > > > >>> > > > -- >>> > > > Regards, >>> > > > Hari Gowtham. >>> > >>> > >>> > >>> > -- >>> > Regards, >>> > Hari Gowtham. >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Thu Mar 7 06:28:39 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Thu, 7 Mar 2019 11:58:39 +0530 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: Message-ID: We are talking days. Not weeks. Considering already it is Thursday here. 1 more day for tagging, and packaging. May be ok to expect it on Monday. -Amar On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii wrote: > Is the next release going to be an imminent hotfix, i.e. something like > today/tomorrow, or are we talking weeks? > > Sincerely, > Artem > > -- > Founder, Android Police , APK Mirror > , Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > | @ArtemR > > > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii > wrote: > >> Ended up downgrading to 5.3 just in case. Peer status and volume status >> are OK now. >> >> zypper install --oldpackage glusterfs-5.3-lp150.100.1 >> Loading repository data... >> Reading installed packages... >> Resolving package dependencies... >> >> Problem: glusterfs-5.3-lp150.100.1.x86_64 requires libgfapi0 = 5.3, but >> this requirement cannot be provided >> not installable providers: libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] >> Solution 1: Following actions will be done: >> downgrade of libgfapi0-5.4-lp150.100.1.x86_64 to >> libgfapi0-5.3-lp150.100.1.x86_64 >> downgrade of libgfchangelog0-5.4-lp150.100.1.x86_64 to >> libgfchangelog0-5.3-lp150.100.1.x86_64 >> downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 to >> libgfrpc0-5.3-lp150.100.1.x86_64 >> downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 to >> libgfxdr0-5.3-lp150.100.1.x86_64 >> downgrade of libglusterfs0-5.4-lp150.100.1.x86_64 to >> libglusterfs0-5.3-lp150.100.1.x86_64 >> Solution 2: do not install glusterfs-5.3-lp150.100.1.x86_64 >> Solution 3: break glusterfs-5.3-lp150.100.1.x86_64 by ignoring some of >> its dependencies >> >> Choose from above solutions by number or cancel [1/2/3/c] (c): 1 >> Resolving dependencies... >> Resolving package dependencies... >> >> The following 6 packages are going to be downgraded: >> glusterfs libgfapi0 libgfchangelog0 libgfrpc0 libgfxdr0 libglusterfs0 >> >> 6 packages to downgrade. >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police , APK Mirror >> , Illogical Robot LLC >> beerpla.net | +ArtemRussakovskii >> | @ArtemR >> >> >> >> On Tue, Mar 5, 2019 at 10:57 AM Artem Russakovskii >> wrote: >> >>> Noticed the same when upgrading from 5.3 to 5.4, as mentioned. >>> >>> I'm confused though. Is actual replication affected, because the 5.4 >>> server and the 3x 5.3 servers still show heal info as all 4 connected, and >>> the files seem to be replicating correctly as well. >>> >>> So what's actually affected - just the status command, or leaving 5.4 on >>> one of the nodes is doing some damage to the underlying fs? Is it fixable >>> by tweaking transport.socket.ssl-enabled? Does upgrading all servers to 5.4 >>> resolve it, or should we revert back to 5.3? >>> >>> Sincerely, >>> Artem >>> >>> -- >>> Founder, Android Police , APK Mirror >>> , Illogical Robot LLC >>> beerpla.net | +ArtemRussakovskii >>> | @ArtemR >>> >>> >>> >>> On Tue, Mar 5, 2019 at 2:02 AM Hu Bert wrote: >>> >>>> fyi: did a downgrade 5.4 -> 5.3 and it worked. all replicas are up and >>>> running. Awaiting updated v5.4. >>>> >>>> thx :-) >>>> >>>> Am Di., 5. M?rz 2019 um 09:26 Uhr schrieb Hari Gowtham < >>>> hgowtham at redhat.com>: >>>> > >>>> > There are plans to revert the patch causing this error and rebuilt >>>> 5.4. >>>> > This should happen faster. the rebuilt 5.4 should be void of this >>>> upgrade issue. >>>> > >>>> > In the meantime, you can use 5.3 for this cluster. >>>> > Downgrading to 5.3 will work if it was just one node that was upgrade >>>> to 5.4 >>>> > and the other nodes are still in 5.3. >>>> > >>>> > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert >>>> wrote: >>>> > > >>>> > > Hi Hari, >>>> > > >>>> > > thx for the hint. Do you know when this will be fixed? Is a >>>> downgrade >>>> > > 5.4 -> 5.3 a possibility to fix this? >>>> > > >>>> > > Hubert >>>> > > >>>> > > Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb Hari Gowtham < >>>> hgowtham at redhat.com>: >>>> > > > >>>> > > > Hi, >>>> > > > >>>> > > > This is a known issue we are working on. >>>> > > > As the checksum differs between the updated and non updated node, >>>> the >>>> > > > peers are getting rejected. >>>> > > > The bricks aren't coming because of the same issue. >>>> > > > >>>> > > > More about the issue: >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1685120 >>>> > > > >>>> > > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert >>>> wrote: >>>> > > > > >>>> > > > > Interestingly: gluster volume status misses gluster1, while heal >>>> > > > > statistics show gluster1: >>>> > > > > >>>> > > > > gluster volume status workdata >>>> > > > > Status of volume: workdata >>>> > > > > Gluster process TCP Port RDMA >>>> Port Online Pid >>>> > > > > >>>> ------------------------------------------------------------------------------ >>>> > > > > Brick gluster2:/gluster/md4/workdata 49153 0 >>>> Y 1723 >>>> > > > > Brick gluster3:/gluster/md4/workdata 49153 0 >>>> Y 2068 >>>> > > > > Self-heal Daemon on localhost N/A N/A >>>> Y 1732 >>>> > > > > Self-heal Daemon on gluster3 N/A N/A >>>> Y 2077 >>>> > > > > >>>> > > > > vs. >>>> > > > > >>>> > > > > gluster volume heal workdata statistics heal-count >>>> > > > > Gathering count of entries to be healed on volume workdata has >>>> been successful >>>> > > > > >>>> > > > > Brick gluster1:/gluster/md4/workdata >>>> > > > > Number of entries: 0 >>>> > > > > >>>> > > > > Brick gluster2:/gluster/md4/workdata >>>> > > > > Number of entries: 10745 >>>> > > > > >>>> > > > > Brick gluster3:/gluster/md4/workdata >>>> > > > > Number of entries: 10744 >>>> > > > > >>>> > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert < >>>> revirii at googlemail.com>: >>>> > > > > > >>>> > > > > > Hi Miling, >>>> > > > > > >>>> > > > > > well, there are such entries, but those haven't been a >>>> problem during >>>> > > > > > install and the last kernel update+reboot. The entries look >>>> like: >>>> > > > > > >>>> > > > > > PUBLIC_IP gluster2.alpserver.de gluster2 >>>> > > > > > >>>> > > > > > 192.168.0.50 gluster1 >>>> > > > > > 192.168.0.51 gluster2 >>>> > > > > > 192.168.0.52 gluster3 >>>> > > > > > >>>> > > > > > 'ping gluster2' resolves to LAN IP; I removed the last entry >>>> in the >>>> > > > > > 1st line, did a reboot ... no, didn't help. From >>>> > > > > > /var/log/glusterfs/glusterd.log >>>> > > > > > on gluster 2: >>>> > > > > > >>>> > > > > > [2019-03-05 07:04:36.188128] E [MSGID: 106010] >>>> > > > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] >>>> 0-management: >>>> > > > > > Version of Cksums persistent differ. local cksum = >>>> 3950307018, remote >>>> > > > > > cksum = 455409345 on peer gluster1 >>>> > > > > > [2019-03-05 07:04:36.188314] I [MSGID: 106493] >>>> > > > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] >>>> 0-glusterd: >>>> > > > > > Responded to gluster1 (0), ret: 0, op_ret: -1 >>>> > > > > > >>>> > > > > > Interestingly there are no entries in the brick logs of the >>>> rejected >>>> > > > > > server. Well, not surprising as no brick process is running. >>>> The >>>> > > > > > server gluster1 is still in rejected state. >>>> > > > > > >>>> > > > > > 'gluster volume start workdata force' starts the brick >>>> process on >>>> > > > > > gluster1, and some heals are happening on gluster2+3, but via >>>> 'gluster >>>> > > > > > volume status workdata' the volumes still aren't complete. >>>> > > > > > >>>> > > > > > gluster1: >>>> > > > > > >>>> ------------------------------------------------------------------------------ >>>> > > > > > Brick gluster1:/gluster/md4/workdata 49152 0 >>>> Y 2523 >>>> > > > > > Self-heal Daemon on localhost N/A N/A >>>> Y 2549 >>>> > > > > > >>>> > > > > > gluster2: >>>> > > > > > Gluster process TCP Port RDMA >>>> Port Online Pid >>>> > > > > > >>>> ------------------------------------------------------------------------------ >>>> > > > > > Brick gluster2:/gluster/md4/workdata 49153 0 >>>> Y 1723 >>>> > > > > > Brick gluster3:/gluster/md4/workdata 49153 0 >>>> Y 2068 >>>> > > > > > Self-heal Daemon on localhost N/A N/A >>>> Y 1732 >>>> > > > > > Self-heal Daemon on gluster3 N/A N/A >>>> Y 2077 >>>> > > > > > >>>> > > > > > >>>> > > > > > Hubert >>>> > > > > > >>>> > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire < >>>> mchangir at redhat.com>: >>>> > > > > > > >>>> > > > > > > There are probably DNS entries or /etc/hosts entries with >>>> the public IP Addresses that the host names (gluster1, gluster2, gluster3) >>>> are getting resolved to. >>>> > > > > > > /etc/resolv.conf would tell which is the default domain >>>> searched for the node names and the DNS servers which respond to the >>>> queries. >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert < >>>> revirii at googlemail.com> wrote: >>>> > > > > > >> >>>> > > > > > >> Good morning, >>>> > > > > > >> >>>> > > > > > >> i have a replicate 3 setup with 2 volumes, running on >>>> version 5.3 on >>>> > > > > > >> debian stretch. This morning i upgraded one server to >>>> version 5.4 and >>>> > > > > > >> rebooted the machine; after the restart i noticed that: >>>> > > > > > >> >>>> > > > > > >> - no brick process is running >>>> > > > > > >> - gluster volume status only shows the server itself: >>>> > > > > > >> gluster volume status workdata >>>> > > > > > >> Status of volume: workdata >>>> > > > > > >> Gluster process TCP Port RDMA >>>> Port Online Pid >>>> > > > > > >> >>>> ------------------------------------------------------------------------------ >>>> > > > > > >> Brick gluster1:/gluster/md4/workdata N/A N/A >>>> N N/A >>>> > > > > > >> NFS Server on localhost N/A N/A >>>> N N/A >>>> > > > > > >> >>>> > > > > > >> - gluster peer status on the server >>>> > > > > > >> gluster peer status >>>> > > > > > >> Number of Peers: 2 >>>> > > > > > >> >>>> > > > > > >> Hostname: gluster3 >>>> > > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a >>>> > > > > > >> State: Peer Rejected (Connected) >>>> > > > > > >> >>>> > > > > > >> Hostname: gluster2 >>>> > > > > > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 >>>> > > > > > >> State: Peer Rejected (Connected) >>>> > > > > > >> >>>> > > > > > >> - gluster peer status on the other 2 servers: >>>> > > > > > >> gluster peer status >>>> > > > > > >> Number of Peers: 2 >>>> > > > > > >> >>>> > > > > > >> Hostname: gluster1 >>>> > > > > > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef >>>> > > > > > >> State: Peer Rejected (Connected) >>>> > > > > > >> >>>> > > > > > >> Hostname: gluster3 >>>> > > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a >>>> > > > > > >> State: Peer in Cluster (Connected) >>>> > > > > > >> >>>> > > > > > >> I noticed that, in the brick logs, i see that the public >>>> IP is used >>>> > > > > > >> instead of the LAN IP. brick logs from one of the volumes: >>>> > > > > > >> >>>> > > > > > >> rejected node: https://pastebin.com/qkpj10Sd >>>> > > > > > >> connected nodes: https://pastebin.com/8SxVVYFV >>>> > > > > > >> >>>> > > > > > >> Why is the public IP suddenly used instead of the LAN IP? >>>> Killing all >>>> > > > > > >> gluster processes and rebooting (again) didn't help. >>>> > > > > > >> >>>> > > > > > >> >>>> > > > > > >> Thx, >>>> > > > > > >> Hubert >>>> > > > > > >> _______________________________________________ >>>> > > > > > >> Gluster-users mailing list >>>> > > > > > >> Gluster-users at gluster.org >>>> > > > > > >> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > -- >>>> > > > > > > Milind >>>> > > > > > > >>>> > > > > _______________________________________________ >>>> > > > > Gluster-users mailing list >>>> > > > > Gluster-users at gluster.org >>>> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users >>>> > > > >>>> > > > >>>> > > > >>>> > > > -- >>>> > > > Regards, >>>> > > > Hari Gowtham. >>>> > >>>> > >>>> > >>>> > -- >>>> > Regards, >>>> > Hari Gowtham. >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Thu Mar 7 06:54:01 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Thu, 7 Mar 2019 12:24:01 +0530 Subject: [Gluster-users] Experiences with FUSE in real world - Presentationat Vault 2019 In-Reply-To: References: Message-ID: Unfortunately, there is no recording. However, we are willing to discuss our findings if you've specific questions. We can do that in this thread. On Thu, Mar 7, 2019 at 10:33 AM Strahil wrote: > Thanks a lot. > Is there a recording of that ? > > Best Regards, > Strahil Nikolov > On Mar 5, 2019 11:13, Raghavendra Gowdappa wrote: > > All, > > Recently me, Manoj and Csaba presented on positives and negatives of > implementing File systems in userspace using FUSE [1]. We had based the > talk on our experiences with Glusterfs having FUSE as the native interface. > The slides can also be found at [1]. > > [1] https://www.usenix.org/conference/vault19/presentation/pillai > > regards, > Raghavendra > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Mar 7 11:22:38 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Thu, 07 Mar 2019 13:22:38 +0200 Subject: [Gluster-users] Experiences with FUSE in real world - Presentationat Vault 2019 Message-ID: <0qhas4o39y8l9my09wxfritk.1551957758236@email.android.com> Thanks, I have nothing in mind - but I know from experience that live sessions are much more interesting and going in deep. Best Regards, Strahil Nikolov On Mar 7, 2019 08:54, Raghavendra Gowdappa wrote: > > Unfortunately, there is no recording. However, we are willing to discuss our findings if you've specific questions. We can do that in this thread. > > On Thu, Mar 7, 2019 at 10:33 AM Strahil wrote: >> >> Thanks a lot. >> Is there a recording of that ? >> >> Best Regards, >> Strahil Nikolov >> >> On Mar 5, 2019 11:13, Raghavendra Gowdappa wrote: >>> >>> All, >>> >>> Recently me, Manoj and Csaba presented on positives and negatives of implementing File systems in userspace using FUSE [1]. We had based the talk on our experiences with Glusterfs having FUSE as the native interface. The slides can also be found at [1]. >>> >>> [1] https://www.usenix.org/conference/vault19/presentation/pillai >>> >>> regards, >>> Raghavendra -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Thu Mar 7 11:33:07 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Thu, 7 Mar 2019 17:03:07 +0530 Subject: [Gluster-users] Release 6: Release date update In-Reply-To: <8c0c5f02-3d31-526a-9c6e-e8221e23cccd@redhat.com> References: <8c0c5f02-3d31-526a-9c6e-e8221e23cccd@redhat.com> Message-ID: I just found a fix for https://bugzilla.redhat.com/show_bug.cgi?id=1674412. Since its a deadlock I am wondering whether this should be in 6.0. What do you think? On Tue, Mar 5, 2019 at 11:47 PM Shyam Ranganathan wrote: > Hi, > > Release-6 was to be an early March release, and due to finding bugs > while performing upgrade testing, is now expected in the week of 18th > March, 2019. > > RC1 builds are expected this week, to contain the required fixes, next > week would be testing our RC1 for release fitness before the release. > > As always, request that users test the RC builds and report back issues > they encounter, to help make the release a better quality. > > Shyam > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srangana at redhat.com Thu Mar 7 12:15:53 2019 From: srangana at redhat.com (Shyam Ranganathan) Date: Thu, 7 Mar 2019 07:15:53 -0500 Subject: [Gluster-users] Release 6: Release date update In-Reply-To: References: <8c0c5f02-3d31-526a-9c6e-e8221e23cccd@redhat.com> Message-ID: Bug fixes are always welcome, features or big ticket changes at this point in the release cycle are not. I checked the patch and it is a 2 liner in readdir-ahead, and hence I would backport it (once it gets merged into master). Thanks for checking, Shyam On 3/7/19 6:33 AM, Raghavendra Gowdappa wrote: > I just found a fix for > https://bugzilla.redhat.com/show_bug.cgi?id=1674412. Since its a > deadlock I am wondering whether this should be in 6.0. What do you think? > > On Tue, Mar 5, 2019 at 11:47 PM Shyam Ranganathan > wrote: > > Hi, > > Release-6 was to be an early March release, and due to finding bugs > while performing upgrade testing, is now expected in the week of 18th > March, 2019. > > RC1 builds are expected this week, to contain the required fixes, next > week would be testing our RC1 for release fitness before the release. > > As always, request that users test the RC builds and report back issues > they encounter, to help make the release a better quality. > > Shyam > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > From rgowdapp at redhat.com Thu Mar 7 13:43:32 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Thu, 7 Mar 2019 19:13:32 +0530 Subject: [Gluster-users] Experiences with FUSE in real world - Presentationat Vault 2019 In-Reply-To: <0qhas4o39y8l9my09wxfritk.1551957758236@email.android.com> References: <0qhas4o39y8l9my09wxfritk.1551957758236@email.android.com> Message-ID: On Thu, Mar 7, 2019 at 4:51 PM Strahil wrote: > Thanks, > > I have nothing in mind - but I know from experience that live sessions are > much more interesting and going in deep. > I'll schedule a Bluejeans session on this. Will update the thread with a date and time. Best Regards, > Strahil Nikolov > On Mar 7, 2019 08:54, Raghavendra Gowdappa wrote: > > Unfortunately, there is no recording. However, we are willing to discuss > our findings if you've specific questions. We can do that in this thread. > > On Thu, Mar 7, 2019 at 10:33 AM Strahil wrote: > > Thanks a lot. > Is there a recording of that ? > > Best Regards, > Strahil Nikolov > On Mar 5, 2019 11:13, Raghavendra Gowdappa wrote: > > All, > > Recently me, Manoj and Csaba presented on positives and negatives of > implementing File systems in userspace using FUSE [1]. We had based the > talk on our experiences with Glusterfs having FUSE as the native interface. > The slides can also be found at [1]. > > [1] https://www.usenix.org/conference/vault19/presentation/pillai > > regards, > Raghavendra > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlawrence at squaretrade.com Thu Mar 7 19:32:14 2019 From: jlawrence at squaretrade.com (Jamie Lawrence) Date: Thu, 7 Mar 2019 11:32:14 -0800 Subject: [Gluster-users] Cannot write more than 512 bytes to gluster vol Message-ID: I just stood up a new cluster running 4.1.7, my first experience with version 4. It is a simple replica 3 volume: gluster v create la1_db_1 replica 3 \ gluster-10g-1:/gluster-bricks/la1_db_1/la1_db_1 \ gluster-10g-2:/gluster-bricks/la1_db_1/la1_db_1 \ gluster-10g-3:/gluster-bricks/la1_db_1/la1_db_1 gluster v set la1_db_1 storage.owner-uid 130 gluster v set la1_db_1 storage.owner-gid 130 gluster v set la1_db_1 server.allow-insecure on gluster v set la1_db_1 auth.allow [various IPs] After mounting on a client, everything appears fine until you try to use it. dd if=/dev/zero of=/path/on/client/foo will write 512 bytes and then hang until timeout, at which point it declares "Transport not connected". Notably, if I mount the volume on one of the gluster machines over the same interface, it behave like it should. That led me to investigate packet filtering, which is configured correctly, and in any case, after flushing all rules on all involved machines, the issue persists. cli.log contains a lot of: [2019-03-06 17:00:02.893553] I [cli.c:773:main] 0-cli: Started running /sbin/gluster with version 4.1.7 [2019-03-06 17:00:02.897199] I [cli-cmd-volume.c:2375:cli_check_gsync_present] 0-: geo-replication not installed [2019-03-06 17:00:02.897545] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-03-06 17:00:02.897617] I [socket.c:2632:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2019-03-06 17:00:02.897678] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0 [2019-03-06 17:00:02.898244] I [input.c:31:cli_batch] 0-: Exiting with: 0 [2019-03-06 17:00:02.922637] I [cli.c:773:main] 0-cli: Started running /sbin/gluster with version 4.1.7 [2019-03-06 17:00:02.926599] I [cli-cmd-volume.c:2375:cli_check_gsync_present] 0-: geo-replication not installed [2019-03-06 17:00:02.926906] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-03-06 17:00:02.926956] I [socket.c:2632:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2019-03-06 17:00:02.927113] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0 [2019-03-06 17:00:02.927573] I [input.c:31:cli_batch] 0-: Exiting with: 0 The client log is more interesting, I just don't know what to make of it: [2019-03-07 19:18:36.674687] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-0: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.674726] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-1: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.674752] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-2: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.674806] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-la1_db_1-client-0: changing port to 49152 (from 0) [2019-03-07 19:18:36.674815] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-la1_db_1-client-1: changing port to 49152 (from 0) [2019-03-07 19:18:36.674927] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-la1_db_1-client-2: changing port to 49152 (from 0) [2019-03-07 19:18:36.675012] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-1: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.675054] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-0: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.675155] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-2: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.675203] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-1: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.675243] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-0: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.675306] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-2: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.675563] I [MSGID: 114046] [client-handshake.c:1095:client_setvolume_cbk] 0-la1_db_1-client-1: Connected to la1_db_1-client-1, attached to remote volume '/gluster-bricks/la1_db_1/la1_db_1'. [2019-03-07 19:18:36.675573] I [MSGID: 108005] [afr-common.c:5336:__afr_handle_child_up_event] 0-la1_db_1-replicate-0: Subvolume 'la1_db_1-client-1' came back up; going online. [2019-03-07 19:18:36.675722] I [MSGID: 114046] [client-handshake.c:1095:client_setvolume_cbk] 0-la1_db_1-client-0: Connected to la1_db_1-client-0, attached to remote volume '/gluster-bricks/la1_db_1/la1_db_1'. [2019-03-07 19:18:36.675728] I [MSGID: 114046] [client-handshake.c:1095:client_setvolume_cbk] 0-la1_db_1-client-2: Connected to la1_db_1-client-2, attached to remote volume '/gluster-bricks/la1_db_1/la1_db_1'. [2019-03-07 19:18:36.675743] I [MSGID: 108002] [afr-common.c:5611:afr_notify] 0-la1_db_1-replicate-0: Client-quorum is met [2019-03-07 19:18:36.676564] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26 [2019-03-07 19:18:36.676578] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0 [2019-03-07 19:18:36.677623] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-la1_db_1-dht: Directory selfheal failed: Unable to form layout for directory / [2019-03-07 19:20:01.674361] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-la1_db_1-client-0: server 172.16.0.171:49152 has not responded in the last 42 seconds, disconnecting. [2019-03-07 19:20:01.674440] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-la1_db_1-client-1: server 172.16.0.172:49152 has not responded in the last 42 seconds, disconnecting. [2019-03-07 19:20:01.674458] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-la1_db_1-client-2: server 172.16.0.174:49152 has not responded in the last 42 seconds, disconnecting. [2019-03-07 19:20:01.674455] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-la1_db_1-client-0: disconnected from la1_db_1-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2019-03-07 19:20:01.674530] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-la1_db_1-client-1: disconnected from la1_db_1-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2019-03-07 19:20:01.674557] W [MSGID: 108001] [afr-common.c:5618:afr_notify] 0-la1_db_1-replicate-0: Client-quorum is not met [2019-03-07 19:20:01.674776] E [rpc-clnt.c:348:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 0-la1_db_1-client-0: forced unwinding frame type(GlusterFS 4.x v1) op(LOOKUP(27)) called at 2019-03-07 19:18:47.415290 (xid=0x11) [2019-03-07 19:20:01.674798] W [MSGID: 114031] [client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-la1_db_1-client-0: remote operation failed. Path: /foo (ef9860a9-a366-49a4-9305-08cdb164395b) [Transport endpoint is not connected] [2019-03-07 19:20:01.674808] E [rpc-clnt.c:348:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 0-la1_db_1-client-1: forced unwinding frame type(GlusterFS 4.x v1) op(LOOKUP(27)) called at 2019-03-07 19:18:47.415303 (xid=0x15) [2019-03-07 19:20:01.674830] W [MSGID: 114031] [client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-la1_db_1-client-1: remote operation failed. Path: /foo (ef9860a9-a366-49a4-9305-08cdb164395b) [Transport endpoint is not connected] [2019-03-07 19:20:01.674888] E [rpc-clnt.c:348:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 0-la1_db_1-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-07 19:19:19.670189 (xid=0x12) [2019-03-07 19:20:01.674902] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] 0-la1_db_1-client-0: socket disconnected [2019-03-07 19:20:01.674916] E [rpc-clnt.c:348:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 0-la1_db_1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-07 19:19:19.670216 (xid=0x16) [2019-03-07 19:20:01.674932] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] 0-la1_db_1-client-1: socket disconnected [2019-03-07 19:20:01.674935] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-la1_db_1-client-2: disconnected from la1_db_1-client-2. Client process will keep trying to connect to glusterd until brick's port is available [2019-03-07 19:20:01.674947] E [MSGID: 108006] [afr-common.c:5413:__afr_handle_child_down_event] 0-la1_db_1-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2019-03-07 19:20:01.675040] E [rpc-clnt.c:348:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 0-la1_db_1-client-2: forced unwinding frame type(GlusterFS 4.x v1) op(LOOKUP(27)) called at 2019-03-07 19:18:47.415315 (xid=0x11) [2019-03-07 19:20:01.675055] W [MSGID: 114031] [client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-la1_db_1-client-2: remote operation failed. Path: /foo (ef9860a9-a366-49a4-9305-08cdb164395b) [Transport endpoint is not connected] [2019-03-07 19:20:01.675077] E [MSGID: 101046] [dht-common.c:1905:dht_revalidate_cbk] 0-la1_db_1-dht: dict is null [2019-03-07 19:20:01.675097] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/foo: failed to resolve (Transport endpoint is not connected) [2019-03-07 19:20:01.675164] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675176] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675179] E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 0-la1_db_1-dht: dict is null [2019-03-07 19:20:01.675194] W [fuse-bridge.c:565:fuse_entry_cbk] 0-glusterfs-fuse: 18: LOOKUP() /foo => -1 (Transport endpoint is not connected) [2019-03-07 19:20:01.675280] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675282] E [rpc-clnt.c:348:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 0-la1_db_1-client-2: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-07 19:19:19.670241 (xid=0x12) [2019-03-07 19:20:01.675308] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] 0-la1_db_1-client-2: socket disconnected [2019-03-07 19:20:01.675310] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675314] E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 0-la1_db_1-dht: dict is null [2019-03-07 19:20:01.675332] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/foo: failed to resolve (Transport endpoint is not connected) [2019-03-07 19:20:01.675379] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675390] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675393] E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 0-la1_db_1-dht: dict is null [2019-03-07 19:20:01.675405] W [fuse-bridge.c:565:fuse_entry_cbk] 0-glusterfs-fuse: 20: LOOKUP() /foo => -1 (Transport endpoint is not connected) [2019-03-07 19:20:01.675447] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675470] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675472] E [MSGID: 101046] [dht-common.c:1905:dht_revalidate_cbk] 0-la1_db_1-dht: dict is null [2019-03-07 19:20:01.675485] W [fuse-bridge.c:899:fuse_attr_cbk] 0-glusterfs-fuse: 21: LOOKUP() / => -1 (Transport endpoint is not connected) [2019-03-07 19:20:11.675946] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-0: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.675989] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-1: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676040] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-2: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676101] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-0: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676138] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-1: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-la1_db_1-client-0: changing port to 49152 (from 0) [2019-03-07 19:20:11.676261] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-2: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676271] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-la1_db_1-client-1: changing port to 49152 (from 0) [2019-03-07 19:20:11.676293] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-0: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676358] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-la1_db_1-client-2: changing port to 49152 (from 0) [2019-03-07 19:20:11.676455] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-0: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676477] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-1: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676571] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-2: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676684] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-1: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676753] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-2: error returned while attempting to connect to host:(null), port:0 This is a 10G network that is otherwise performing flawlessly, and I'm seeing no errors on the switches; as mentioned, I've tried this both in our normal configuration and with iptables flushed. Not entirely sure what to try next; any suggestions? Thanks, -j From amye at redhat.com Thu Mar 7 21:50:45 2019 From: amye at redhat.com (Amye Scavarda) Date: Thu, 7 Mar 2019 13:50:45 -0800 Subject: [Gluster-users] Gluster Monthly Newsletter, February 2019 Message-ID: Thank you all for giving us feedback in our user survey for February! Help us test Gluster 6! https://lists.gluster.org/pipermail/gluster-devel/2019-February/055876.html Contributors Top Contributing Companies: Red Hat, Comcast, DataLab, Gentoo Linux, Facebook, BioDec, Samsung, Etersoft Top Contributors in February: Yaniv Kaul, Raghavendra G, Nithya B, Amar Tumballi, Sanju Rakonde, Shyamsundar R Noteworthy Threads: [Gluster-users] Memory management, OOM kills and glusterfs https://lists.gluster.org/pipermail/gluster-users/2019-February/035782.html [Gluster-users] Code of Conduct Update https://lists.gluster.org/pipermail/gluster-users/2019-February/035895.html [Gluster-users] Disabling read-ahead and io-cache for native fuse mounts https://lists.gluster.org/pipermail/gluster-users/2019-February/035848.html [Gluster-users] Gluster Container Storage: Release Update https://lists.gluster.org/pipermail/gluster-users/2019-February/035860.html [Gluster-devel] I/O performance https://lists.gluster.org/pipermail/gluster-devel/2019-February/055855.html [Gluster-devel] Path based Geo-replication https://lists.gluster.org/pipermail/gluster-devel/2019-February/055836.html [Gluster-devel] Failing test case ./tests/bugs/distribute/bug-1161311.t https://lists.gluster.org/pipermail/gluster-devel/2019-February/055842.html [Gluster-devel] GlusterFs v4.1.5: Need help on bitrot detection https://lists.gluster.org/pipermail/gluster-devel/2019-February/055859.html [Gluster-devel] md-cache: May bug found in md-cache.c https://lists.gluster.org/pipermail/gluster-devel/2019-February/055862.html [Gluster-devel] [Gluster-Maintainers] glusterfs-6.0rc0 released https://lists.gluster.org/pipermail/gluster-devel/2019-February/055875.html [Gluster-devel] GlusterFS - 6.0RC - Test days (27th, 28th Feb) https://lists.gluster.org/pipermail/gluster-devel/2019-February/055876.html https://lists.gluster.org/pipermail/gluster-users/2019-March/035938.html [Gluster-users] Release 6: Release date update https://lists.gluster.org/pipermail/gluster-users/2019-March/035961.html Events: Red Hat Summit, May 4-6, 2019 - https://www.redhat.com/en/summit/2019 Open Source Summit and KubeCon + CloudNativeCon Shanghai, June 24-26, 2019 https://www.lfasiallc.com/events/kubecon-cloudnativecon-china-2019/ -- Amye Scavarda | amye at redhat.com | Gluster Community Lead From pgurusid at redhat.com Fri Mar 8 01:09:11 2019 From: pgurusid at redhat.com (Poornima Gurusiddaiah) Date: Fri, 8 Mar 2019 06:39:11 +0530 Subject: [Gluster-users] Cannot write more than 512 bytes to gluster vol In-Reply-To: References: Message-ID: >From the client log, looks like the host is null and port is 0, hence the client is not able to connect to the bricks(Gluster volume). The client tries to connect to Glusterd daemon on the host specified in the mount command to get the hosts and port(volfile) on which the bricks are running. Have you set the firewall rules for opening the ports required by Gluster? Also can you share the complete client log preferably with TRACE log level? #gluster vol set volname client-log-level TRACE Please reset it back once you have collected the log, else this can slow down and also fill up the logs dir. #gluster vol reset volname client-log-level Regards, Poornima On Fri, Mar 8, 2019, 1:03 AM Jamie Lawrence wrote: > I just stood up a new cluster running 4.1.7, my first experience with > version 4. It is a simple replica 3 volume: > > gluster v create la1_db_1 replica 3 \ > gluster-10g-1:/gluster-bricks/la1_db_1/la1_db_1 \ > gluster-10g-2:/gluster-bricks/la1_db_1/la1_db_1 \ > gluster-10g-3:/gluster-bricks/la1_db_1/la1_db_1 > > gluster v set la1_db_1 storage.owner-uid 130 > gluster v set la1_db_1 storage.owner-gid 130 > gluster v set la1_db_1 server.allow-insecure on > gluster v set la1_db_1 auth.allow [various IPs] > > After mounting on a client, everything appears fine until you try to use > it. > > dd if=/dev/zero of=/path/on/client/foo > > will write 512 bytes and then hang until timeout, at which point it > declares "Transport not connected". > > Notably, if I mount the volume on one of the gluster machines over the > same interface, it behave like it should. That led me to investigate packet > filtering, which is configured correctly, and in any case, after flushing > all rules on all involved machines, the issue persists. > > cli.log contains a lot of: > > [2019-03-06 17:00:02.893553] I [cli.c:773:main] 0-cli: Started running > /sbin/gluster with version 4.1.7 > [2019-03-06 17:00:02.897199] I > [cli-cmd-volume.c:2375:cli_check_gsync_present] 0-: geo-replication not > installed > [2019-03-06 17:00:02.897545] I [MSGID: 101190] > [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2019-03-06 17:00:02.897617] I [socket.c:2632:socket_event_handler] > 0-transport: EPOLLERR - disconnecting now > [2019-03-06 17:00:02.897678] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glusterfs: error returned while attempting to connect to host:(null), > port:0 > [2019-03-06 17:00:02.898244] I [input.c:31:cli_batch] 0-: Exiting with: 0 > [2019-03-06 17:00:02.922637] I [cli.c:773:main] 0-cli: Started running > /sbin/gluster with version 4.1.7 > [2019-03-06 17:00:02.926599] I > [cli-cmd-volume.c:2375:cli_check_gsync_present] 0-: geo-replication not > installed > [2019-03-06 17:00:02.926906] I [MSGID: 101190] > [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2019-03-06 17:00:02.926956] I [socket.c:2632:socket_event_handler] > 0-transport: EPOLLERR - disconnecting now > [2019-03-06 17:00:02.927113] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glusterfs: error returned while attempting to connect to host:(null), > port:0 > [2019-03-06 17:00:02.927573] I [input.c:31:cli_batch] 0-: Exiting with: 0 > > The client log is more interesting, I just don't know what to make of it: > > [2019-03-07 19:18:36.674687] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-0: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.674726] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-1: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.674752] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-2: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.674806] I [rpc-clnt.c:2105:rpc_clnt_reconfig] > 0-la1_db_1-client-0: changing port to 49152 (from 0) > [2019-03-07 19:18:36.674815] I [rpc-clnt.c:2105:rpc_clnt_reconfig] > 0-la1_db_1-client-1: changing port to 49152 (from 0) > [2019-03-07 19:18:36.674927] I [rpc-clnt.c:2105:rpc_clnt_reconfig] > 0-la1_db_1-client-2: changing port to 49152 (from 0) > [2019-03-07 19:18:36.675012] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-1: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.675054] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-0: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.675155] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-2: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.675203] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-1: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.675243] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-0: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.675306] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-2: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.675563] I [MSGID: 114046] > [client-handshake.c:1095:client_setvolume_cbk] 0-la1_db_1-client-1: > Connected to la1_db_1-client-1, attached to remote volume > '/gluster-bricks/la1_db_1/la1_db_1'. > [2019-03-07 19:18:36.675573] I [MSGID: 108005] > [afr-common.c:5336:__afr_handle_child_up_event] 0-la1_db_1-replicate-0: > Subvolume 'la1_db_1-client-1' came back up; going online. > [2019-03-07 19:18:36.675722] I [MSGID: 114046] > [client-handshake.c:1095:client_setvolume_cbk] 0-la1_db_1-client-0: > Connected to la1_db_1-client-0, attached to remote volume > '/gluster-bricks/la1_db_1/la1_db_1'. > [2019-03-07 19:18:36.675728] I [MSGID: 114046] > [client-handshake.c:1095:client_setvolume_cbk] 0-la1_db_1-client-2: > Connected to la1_db_1-client-2, attached to remote volume > '/gluster-bricks/la1_db_1/la1_db_1'. > [2019-03-07 19:18:36.675743] I [MSGID: 108002] > [afr-common.c:5611:afr_notify] 0-la1_db_1-replicate-0: Client-quorum is met > [2019-03-07 19:18:36.676564] I [fuse-bridge.c:4294:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel > 7.26 > [2019-03-07 19:18:36.676578] I [fuse-bridge.c:4927:fuse_graph_sync] > 0-fuse: switched to graph 0 > [2019-03-07 19:18:36.677623] I [MSGID: 109005] > [dht-selfheal.c:2342:dht_selfheal_directory] 0-la1_db_1-dht: Directory > selfheal failed: Unable to form layout for directory / > [2019-03-07 19:20:01.674361] C > [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-la1_db_1-client-0: > server 172.16.0.171:49152 has not responded in the last 42 seconds, > disconnecting. > [2019-03-07 19:20:01.674440] C > [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-la1_db_1-client-1: > server 172.16.0.172:49152 has not responded in the last 42 seconds, > disconnecting. > [2019-03-07 19:20:01.674458] C > [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-la1_db_1-client-2: > server 172.16.0.174:49152 has not responded in the last 42 seconds, > disconnecting. > [2019-03-07 19:20:01.674455] I [MSGID: 114018] > [client.c:2254:client_rpc_notify] 0-la1_db_1-client-0: disconnected from > la1_db_1-client-0. Client process will keep trying to connect to glusterd > until brick's port is available > [2019-03-07 19:20:01.674530] I [MSGID: 114018] > [client.c:2254:client_rpc_notify] 0-la1_db_1-client-1: disconnected from > la1_db_1-client-1. Client process will keep trying to connect to glusterd > until brick's port is available > [2019-03-07 19:20:01.674557] W [MSGID: 108001] > [afr-common.c:5618:afr_notify] 0-la1_db_1-replicate-0: Client-quorum is not > met > [2019-03-07 19:20:01.674776] E [rpc-clnt.c:348:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) > 0-la1_db_1-client-0: forced unwinding frame type(GlusterFS 4.x v1) > op(LOOKUP(27)) called at 2019-03-07 19:18:47.415290 (xid=0x11) > [2019-03-07 19:20:01.674798] W [MSGID: 114031] > [client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-la1_db_1-client-0: > remote operation failed. Path: /foo (ef9860a9-a366-49a4-9305-08cdb164395b) > [Transport endpoint is not connected] > [2019-03-07 19:20:01.674808] E [rpc-clnt.c:348:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) > 0-la1_db_1-client-1: forced unwinding frame type(GlusterFS 4.x v1) > op(LOOKUP(27)) called at 2019-03-07 19:18:47.415303 (xid=0x15) > [2019-03-07 19:20:01.674830] W [MSGID: 114031] > [client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-la1_db_1-client-1: > remote operation failed. Path: /foo (ef9860a9-a366-49a4-9305-08cdb164395b) > [Transport endpoint is not connected] > [2019-03-07 19:20:01.674888] E [rpc-clnt.c:348:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) > 0-la1_db_1-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) > called at 2019-03-07 19:19:19.670189 (xid=0x12) > [2019-03-07 19:20:01.674902] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] > 0-la1_db_1-client-0: socket disconnected > [2019-03-07 19:20:01.674916] E [rpc-clnt.c:348:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) > 0-la1_db_1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) > called at 2019-03-07 19:19:19.670216 (xid=0x16) > [2019-03-07 19:20:01.674932] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] > 0-la1_db_1-client-1: socket disconnected > [2019-03-07 19:20:01.674935] I [MSGID: 114018] > [client.c:2254:client_rpc_notify] 0-la1_db_1-client-2: disconnected from > la1_db_1-client-2. Client process will keep trying to connect to glusterd > until brick's port is available > [2019-03-07 19:20:01.674947] E [MSGID: 108006] > [afr-common.c:5413:__afr_handle_child_down_event] 0-la1_db_1-replicate-0: > All subvolumes are down. Going offline until atleast one of them comes back > up. > [2019-03-07 19:20:01.675040] E [rpc-clnt.c:348:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) > 0-la1_db_1-client-2: forced unwinding frame type(GlusterFS 4.x v1) > op(LOOKUP(27)) called at 2019-03-07 19:18:47.415315 (xid=0x11) > [2019-03-07 19:20:01.675055] W [MSGID: 114031] > [client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-la1_db_1-client-2: > remote operation failed. Path: /foo (ef9860a9-a366-49a4-9305-08cdb164395b) > [Transport endpoint is not connected] > [2019-03-07 19:20:01.675077] E [MSGID: 101046] > [dht-common.c:1905:dht_revalidate_cbk] 0-la1_db_1-dht: dict is null > [2019-03-07 19:20:01.675097] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] > 0-fuse: 00000000-0000-0000-0000-000000000001/foo: failed to resolve > (Transport endpoint is not connected) > [2019-03-07 19:20:01.675164] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675176] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675179] E [MSGID: 101046] > [dht-common.c:1502:dht_lookup_dir_cbk] 0-la1_db_1-dht: dict is null > [2019-03-07 19:20:01.675194] W [fuse-bridge.c:565:fuse_entry_cbk] > 0-glusterfs-fuse: 18: LOOKUP() /foo => -1 (Transport endpoint is not > connected) > [2019-03-07 19:20:01.675280] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675282] E [rpc-clnt.c:348:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) > 0-la1_db_1-client-2: forced unwinding frame type(GF-DUMP) op(NULL(2)) > called at 2019-03-07 19:19:19.670241 (xid=0x12) > [2019-03-07 19:20:01.675308] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] > 0-la1_db_1-client-2: socket disconnected > [2019-03-07 19:20:01.675310] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675314] E [MSGID: 101046] > [dht-common.c:1502:dht_lookup_dir_cbk] 0-la1_db_1-dht: dict is null > [2019-03-07 19:20:01.675332] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] > 0-fuse: 00000000-0000-0000-0000-000000000001/foo: failed to resolve > (Transport endpoint is not connected) > [2019-03-07 19:20:01.675379] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675390] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675393] E [MSGID: 101046] > [dht-common.c:1502:dht_lookup_dir_cbk] 0-la1_db_1-dht: dict is null > [2019-03-07 19:20:01.675405] W [fuse-bridge.c:565:fuse_entry_cbk] > 0-glusterfs-fuse: 20: LOOKUP() /foo => -1 (Transport endpoint is not > connected) > [2019-03-07 19:20:01.675447] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675470] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675472] E [MSGID: 101046] > [dht-common.c:1905:dht_revalidate_cbk] 0-la1_db_1-dht: dict is null > [2019-03-07 19:20:01.675485] W [fuse-bridge.c:899:fuse_attr_cbk] > 0-glusterfs-fuse: 21: LOOKUP() / => -1 (Transport endpoint is not connected) > [2019-03-07 19:20:11.675946] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-0: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.675989] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-1: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676040] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-2: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676101] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-0: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676138] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-1: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] > 0-la1_db_1-client-0: changing port to 49152 (from 0) > [2019-03-07 19:20:11.676261] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-2: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676271] I [rpc-clnt.c:2105:rpc_clnt_reconfig] > 0-la1_db_1-client-1: changing port to 49152 (from 0) > [2019-03-07 19:20:11.676293] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-0: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676358] I [rpc-clnt.c:2105:rpc_clnt_reconfig] > 0-la1_db_1-client-2: changing port to 49152 (from 0) > [2019-03-07 19:20:11.676455] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-0: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676477] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-1: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676571] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-2: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676684] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-1: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676753] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-2: error returned while attempting to connect to > host:(null), port:0 > > > This is a 10G network that is otherwise performing flawlessly, and I'm > seeing no errors on the switches; as mentioned, I've tried this both in our > normal configuration and with iptables flushed. Not entirely sure what to > try next; any suggestions? > > Thanks, > > -j > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ee.magnesia at gmail.com Fri Mar 8 07:22:58 2019 From: ee.magnesia at gmail.com (Ersen E.) Date: Fri, 8 Mar 2019 10:22:58 +0300 Subject: [Gluster-users] Pre-Historic Gluster RPM's Message-ID: Hi, I do have some RHEL5 clients still. I will update OS's but not now. Meantime I am looking for a way to update at least to latest version available. Is there any web site still keeping RHEL5/Centos RPM's ? Regards, Ersen E. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abhishpaliwal at gmail.com Fri Mar 8 08:49:40 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Fri, 8 Mar 2019 14:19:40 +0530 Subject: [Gluster-users] Glusterfsd crashed with SIGSEGV Message-ID: Hi Team, I am using Glusterfs 5.4, where after setting the gluster mount point when trying to access it, glusterfsd is getting crashed and mount point through the "Transport endpoint is not connected error. Here I are the gdb log for the core file warning: Could not load shared library symbols for linux-vdso64.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 --volfile-id gv0.128.224.95.140.tmp-bric'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, bytes=bytes at entry=36) at malloc.c:3327 3327 { [Current thread is 1 (Thread 0x3fff90394160 (LWP 811))] (gdb) (gdb) (gdb) bt #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, bytes=bytes at entry=36) at malloc.c:3327 #1 0x00003fff95ab43dc in __GI___libc_malloc (bytes=36) at malloc.c:2921 #2 0x00003fff95b6ffd0 in x_inline (xdrs=0x3fff90391d20, len=) at xdr_sizeof.c:89 #3 0x00003fff95c4d488 in .xdr_gfx_iattx () from /usr/lib64/libgfxdr.so.0 #4 0x00003fff95c4de84 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 #5 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, pp=0x3fff7c132020, size=, proc=) at xdr_ref.c:84 #6 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, objpp=0x3fff7c132020, obj_size=, xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 #7 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 #8 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, pp=0x3fff7c131ea0, size=, proc=) at xdr_ref.c:84 #9 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, objpp=0x3fff7c131ea0, obj_size=, xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 #10 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 #11 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, pp=0x3fff7c131d20, size=, proc=) at xdr_ref.c:84 #12 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, objpp=0x3fff7c131d20, obj_size=, xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 #13 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 #14 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, pp=0x3fff7c131ba0, size=, proc=) at xdr_ref.c:84 #15 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, objpp=0x3fff7c131ba0, obj_size=, xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 #16 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 #17 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, pp=0x3fff7c131a20, size=, proc=) at xdr_ref.c:84 #18 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, objpp=0x3fff7c131a20, obj_size=, xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 #19 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 #20 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, pp=0x3fff7c1318a0, size=, proc=) at xdr_ref.c:84 #21 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, objpp=0x3fff7c1318a0, obj_size=, xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 #22 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 #23 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, pp=0x3fff7c131720, size=, proc=) at xdr_ref.c:84 #24 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, objpp=0x3fff7c131720, obj_size=, xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 #25 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 #26 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, pp=0x3fff7c1315a0, size=, proc=) at xdr_ref.c:84 #27 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, objpp=0x3fff7c1315a0, obj_size=, xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 #28 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 #29 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, pp=0x3fff7c131420, size=, proc=) at xdr_ref.c:84 #30 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, objpp=0x3fff7c131420, obj_size=, xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 #31 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 #32 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, pp=0x3fff7c1312a0, size=, proc=) at xdr_ref.c:84 #33 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, objpp=0x3fff7c1312a0, obj_size=, xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 Frames are getting repeated, could any one please me. -- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkeithle at redhat.com Fri Mar 8 12:56:16 2019 From: kkeithle at redhat.com (Kaleb Keithley) Date: Fri, 8 Mar 2019 07:56:16 -0500 Subject: [Gluster-users] Pre-Historic Gluster RPM's In-Reply-To: References: Message-ID: https://download.gluster.org/pub/gluster/glusterfs/old-releases/ On Fri, Mar 8, 2019 at 2:24 AM Ersen E. wrote: > Hi, > > I do have some RHEL5 clients still. I will update OS's but not now. > Meantime I am looking for a way to update at least to latest version > available. > Is there any web site still keeping RHEL5/Centos RPM's ? > > Regards, > Ersen E. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian at worldcontrol.com Sat Mar 9 01:45:43 2019 From: brian at worldcontrol.com (Brian Litzinger) Date: Fri, 8 Mar 2019 17:45:43 -0800 Subject: [Gluster-users] Possible memory leak via wordpress wordfence plugin behavior in 4.1.16 Message-ID: I have 4 machines running glusterfs and wordpress with the wordfence plugin. The wordfence plugin in all 4 instance pounds away writing and re-writing the file: /mnt/glusterfs/www/openvpn.net/wp-content/wflogs/config-synced.php This is leading to stale file handle errors reported by glusterfs and while the request is ultimately handled correctly this looks to be leading to a memory leak. I think the leak is dict_t structures based on looking at dump state but my knowledge of glusterfs is mostly as a user. -- brian From ravishankar at redhat.com Mon Mar 11 05:00:59 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Mon, 11 Mar 2019 10:30:59 +0530 Subject: [Gluster-users] Possible memory leak via wordpress wordfence plugin behavior in 4.1.16 In-Reply-To: References: Message-ID: <929b2d18-d4f6-36f4-1d8b-a95c292b9c2e@redhat.com> On 09/03/19 7:15 AM, Brian Litzinger wrote: > I have 4 machines running glusterfs and wordpress with the wordfence plugin. > > The wordfence plugin in all 4 instance pounds away writing and > re-writing the file: > > /mnt/glusterfs/www/openvpn.net/wp-content/wflogs/config-synced.php > > This is leading to stale file handle errors reported by glusterfs and > while the request is ultimately handled correctly this looks to be > leading to a memory leak. Memory leak of the glusterfs client process? What type of volume (`gluster volume info`) are you using and how many clients are accessing the volume? Can you share the client(s) and the bricks logs when you see the ESTALE error? Also share successive state dump outputs of the process where you suspect dict_t leaks. Thanks, Ravi > I think the leak is dict_t structures based on looking at dump state > but my knowledge of glusterfs is mostly as a user. > > -- > brian > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From mauro.tridici at cmcc.it Mon Mar 11 09:09:41 2019 From: mauro.tridici at cmcc.it (Mauro Tridici) Date: Mon, 11 Mar 2019 10:09:41 +0100 Subject: [Gluster-users] "rpc_clnt_ping_timer_expired" errors In-Reply-To: References: <96B07283-D8AB-4F06-909D-E00424625528@cmcc.it> <42758A0E-8BE9-497D-BDE3-55D7DC633BC7@cmcc.it> <6A8CF4A4-98EA-48C3-A059-D60D1B2721C7@cmcc.it> <2CF49168-9C1B-4931-8C34-8157262A137A@cmcc.it> <7A151CC9-A0AE-4A45-B450-A4063D216D9E@cmcc.it> <32D53ECE-3F49-4415-A6EE-241B351AC2BA@cmcc.it> <4685A75B-5978-4338-9C9F-4A02FB40B9BC@cmcc.it> <4D2E6B04-C2E8-4FD5-B72D-E7C05931C6F9@cmcc.it> <4E332A56-B318-4BC1-9F44-84AB4392A5DE@cmcc.it> <832FD362-3B14-40D8-8530-604419300476@cmcc.it> <8D926643-1093-48ED-823F-D8F117F702CF@cmcc.it> <9D0BE438-DF11-4D0A-AB85-B44357032F29@cmcc.it> <5F0AC378-8170-4342-8473-9C17159CAC1D@cmcc.it> <7A50E86D-9E27-4EA7-883B-45E9F973991A@cmcc.it> <58B5DB7F-DCB4-4FBF-B1F7-681315B1613A@cmcc.it> <6327B44F-4E7E-46BB-A74C-70F4457DD1EB@cmcc.it> <0167DF4A-8329-4A1A-B439-857DFA6F78BB@cmcc.it> <763F334E-35B8-4729-B8E1-D30866754EEE@cmcc.it> <91DFC9AC-4805-41EB-AC6F-5722E018DD6E@cmcc.it> <8A9752B8-B231-4570-8FF4-8D3D781E7D42@cmcc.it> <47A24804-F975-4EE6-9FA5-67FCDA18D707@cmcc.it> <637FF5D2-D1F4-4686-9D48-646A96F67B96@cmcc.it> <4A87495F-3755-48F7-8507-085869069C64@cmcc.it> <3854BBF6-5B98-4AB3-A67E-E7DE59E69A63@cmcc.it> <313DA021-9173-4899-96B0-831B10B00B61@cmcc.it> <17996AFD-DFC8-40F3-9D09-DB6DDAD5B7D6@cmcc.it> <7074B5D8-955A-4802-A9F3-606C99734417@cmcc.it> <83B84BF9-8334-4230-B6F8-0BC4BFBEFF15@cmcc.it> <133B9AE4-9792-4F72-AD91-D36E7B9EC711@cmcc.it> <6611C4B0-57FD-4390-88B5-BD373555D4C5@cmcc.it> Message-ID: Dear All, do you have any suggestions about the right way to "debug" this issue? In attachment, the updated logs of ?s06" gluster server. I noticed a lot of intermittent warning and error messages. Thank you in advance, Mauro > On 4 Mar 2019, at 18:45, Raghavendra Gowdappa wrote: > > > +Gluster Devel , +Gluster-users > > I would like to point out another issue. Even if what I suggested prevents disconnects, part of the solution would be only symptomatic treatment and doesn't address the root cause of the problem. In most of the ping-timer-expiry issues, the root cause is the increased load on bricks and the inability of bricks to be responsive under high load. So, the actual solution would be doing any or both of the following: > * identify the source of increased load and if possible throttle it. Internal heal processes like self-heal, rebalance, quota heal are known to pump traffic into bricks without much throttling (io-threads _might_ do some throttling, but my understanding is its not sufficient). > * identify the reason for bricks to become unresponsive during load. This may be fixable issues like not enough event-threads to read from network or difficult to fix issues like fsync on backend fs freezing the process or semi fixable issues (in code) like lock contention. > > So any genuine effort to fix ping-timer-issues (to be honest most of the times they are not issues related to rpc/network) would involve performance characterization of various subsystems on bricks and clients. Various subsystems can include (but not necessarily limited to), underlying OS/filesystem, glusterfs processes, CPU consumption etc > > regards, > Raghavendra > > On Mon, Mar 4, 2019 at 9:31 PM Mauro Tridici > wrote: > Thank you, let?s try! > I will inform you about the effects of the change. > > Regards, > Mauro > >> On 4 Mar 2019, at 16:55, Raghavendra Gowdappa > wrote: >> >> >> >> On Mon, Mar 4, 2019 at 8:54 PM Mauro Tridici > wrote: >> Hi Raghavendra, >> >> thank you for your reply. >> Yes, you are right. It is a problem that seems to happen randomly. >> At this moment, server.event-threads value is 4. I will try to increase this value to 8. Do you think that it could be a valid value ? >> >> Yes. We can try with that. You should see at least frequency of ping-timer related disconnects reduce with this value (even if it doesn't eliminate the problem completely). >> >> >> Regards, >> Mauro >> >> >>> On 4 Mar 2019, at 15:36, Raghavendra Gowdappa > wrote: >>> >>> >>> >>> On Mon, Mar 4, 2019 at 8:01 PM Nithya Balachandran > wrote: >>> Hi Mauro, >>> >>> It looks like some problem on s06. Are all your other nodes ok? Can you send us the gluster logs from this node? >>> >>> @Raghavendra G , do you have any idea as to how this can be debugged? Maybe running top ? Or debug brick logs? >>> >>> If we can reproduce the problem, collecting tcpdump on both ends of connection will help. But, one common problem is these bugs are inconsistently reproducible and hence we may not be able to capture tcpdump at correct intervals. Other than that, we can try to collect some evidence that poller threads were busy (waiting on locks). But, not sure what debug data provides that information. >>> >>> From what I know, its difficult to collect evidence for this issue and we could only reason about it. >>> >>> We can try a workaround though - try increasing server.event-threads and see whether ping-timer expiry issues go away with an optimal value. If that's the case, it kind of provides proof for our hypothesis. >>> >>> >>> >>> Regards, >>> Nithya >>> >>> On Mon, 4 Mar 2019 at 15:25, Mauro Tridici > wrote: >>> Hi All, >>> >>> some minutes ago I received this message from NAGIOS server >>> >>> ***** Nagios ***** >>> >>> Notification Type: PROBLEM >>> >>> Service: Brick - /gluster/mnt2/brick >>> Host: s06 >>> Address: s06-stg >>> State: CRITICAL >>> >>> Date/Time: Mon Mar 4 10:25:33 CET 2019 >>> >>> Additional Info: >>> CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. >>> >>> I checked the network, RAM and CPUs usage on s06 node and everything seems to be ok. >>> No bricks are in error state. In /var/log/messages, I detected again a crash of ?check_vol_utili? that I think it is a module used by NRPE executable (that is the NAGIOS client). >>> >>> Mar 4 10:15:29 s06 kernel: traps: check_vol_utili[161224] general protection ip:7facffa0a66d sp:7ffe9f4e6fc0 error:0 in libglusterfs.so.0.0.1[7facff9b7000+f7000] >>> Mar 4 10:15:29 s06 abrt-hook-ccpp: Process 161224 (python2.7) of user 0 killed by SIGSEGV - dumping core >>> Mar 4 10:15:29 s06 abrt-server: Generating core_backtrace >>> Mar 4 10:15:29 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>> Mar 4 10:16:01 s06 systemd: Created slice User Slice of root. >>> Mar 4 10:16:01 s06 systemd: Starting User Slice of root. >>> Mar 4 10:16:01 s06 systemd: Started Session 201010 of user root. >>> Mar 4 10:16:01 s06 systemd: Starting Session 201010 of user root. >>> Mar 4 10:16:01 s06 systemd: Removed slice User Slice of root. >>> Mar 4 10:16:01 s06 systemd: Stopping User Slice of root. >>> Mar 4 10:16:24 s06 abrt-server: Duplicate: UUID >>> Mar 4 10:16:24 s06 abrt-server: DUP_OF_DIR: /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>> Mar 4 10:16:24 s06 abrt-server: Deleting problem directory ccpp-2019-03-04-10:15:29-161224 (dup of ccpp-2018-09-25-12:27:42-13041) >>> Mar 4 10:16:24 s06 abrt-server: Generating core_backtrace >>> Mar 4 10:16:24 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>> Mar 4 10:16:24 s06 abrt-server: Cannot notify '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event 'report_uReport' exited with 1 >>> Mar 4 10:16:24 s06 abrt-hook-ccpp: Process 161391 (python2.7) of user 0 killed by SIGABRT - dumping core >>> Mar 4 10:16:25 s06 abrt-server: Generating core_backtrace >>> Mar 4 10:16:25 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>> Mar 4 10:17:01 s06 systemd: Created slice User Slice of root. >>> >>> Also, I noticed the following errors that I think are very critical: >>> >>> Mar 4 10:21:12 s06 glustershd[20355]: [2019-03-04 09:21:12.954798] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server 192.168.0.55:49158 has not responded in the last 42 seconds, disconnecting. >>> Mar 4 10:22:01 s06 systemd: Created slice User Slice of root. >>> Mar 4 10:22:01 s06 systemd: Starting User Slice of root. >>> Mar 4 10:22:01 s06 systemd: Started Session 201017 of user root. >>> Mar 4 10:22:01 s06 systemd: Starting Session 201017 of user root. >>> Mar 4 10:22:01 s06 systemd: Removed slice User Slice of root. >>> Mar 4 10:22:01 s06 systemd: Stopping User Slice of root. >>> Mar 4 10:22:03 s06 glustershd[20355]: [2019-03-04 09:22:03.964120] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server 192.168.0.54:49165 has not responded in the last 42 seconds, disconnecting. >>> Mar 4 10:23:01 s06 systemd: Created slice User Slice of root. >>> Mar 4 10:23:01 s06 systemd: Starting User Slice of root. >>> Mar 4 10:23:01 s06 systemd: Started Session 201018 of user root. >>> Mar 4 10:23:01 s06 systemd: Starting Session 201018 of user root. >>> Mar 4 10:23:02 s06 systemd: Removed slice User Slice of root. >>> Mar 4 10:23:02 s06 systemd: Stopping User Slice of root. >>> Mar 4 10:24:01 s06 systemd: Created slice User Slice of root. >>> Mar 4 10:24:01 s06 systemd: Starting User Slice of root. >>> Mar 4 10:24:01 s06 systemd: Started Session 201019 of user root. >>> Mar 4 10:24:01 s06 systemd: Starting Session 201019 of user root. >>> Mar 4 10:24:01 s06 systemd: Removed slice User Slice of root. >>> Mar 4 10:24:01 s06 systemd: Stopping User Slice of root. >>> Mar 4 10:24:03 s06 glustershd[20355]: [2019-03-04 09:24:03.982502] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-16: server 192.168.0.52:49158 has not responded in the last 42 seconds, disconnecting. >>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746109] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-3: server 192.168.0.51:49153 has not responded in the last 42 seconds, disconnecting. >>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746215] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server 192.168.0.52:49156 has not responded in the last 42 seconds, disconnecting. >>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746260] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-21: server 192.168.0.51:49159 has not responded in the last 42 seconds, disconnecting. >>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746296] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server 192.168.0.52:49161 has not responded in the last 42 seconds, disconnecting. >>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746413] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server 192.168.0.54:49165 has not responded in the last 42 seconds, disconnecting. >>> Mar 4 10:24:07 s06 glustershd[20355]: [2019-03-04 09:24:07.982952] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-45: server 192.168.0.54:49155 has not responded in the last 42 seconds, disconnecting. >>> Mar 4 10:24:18 s06 glustershd[20355]: [2019-03-04 09:24:18.990929] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server 192.168.0.52:49161 has not responded in the last 42 seconds, disconnecting. >>> Mar 4 10:24:31 s06 glustershd[20355]: [2019-03-04 09:24:31.995781] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-20: server 192.168.0.53:49159 has not responded in the last 42 seconds, disconnecting. >>> Mar 4 10:25:01 s06 systemd: Created slice User Slice of root. >>> Mar 4 10:25:01 s06 systemd: Starting User Slice of root. >>> Mar 4 10:25:01 s06 systemd: Started Session 201020 of user root. >>> Mar 4 10:25:01 s06 systemd: Starting Session 201020 of user root. >>> Mar 4 10:25:01 s06 systemd: Removed slice User Slice of root. >>> Mar 4 10:25:01 s06 systemd: Stopping User Slice of root. >>> Mar 4 10:25:57 s06 systemd: Created slice User Slice of root. >>> Mar 4 10:25:57 s06 systemd: Starting User Slice of root. >>> Mar 4 10:25:57 s06 systemd-logind: New session 201021 of user root. >>> Mar 4 10:25:57 s06 systemd: Started Session 201021 of user root. >>> Mar 4 10:25:57 s06 systemd: Starting Session 201021 of user root. >>> Mar 4 10:26:01 s06 systemd: Started Session 201022 of user root. >>> Mar 4 10:26:01 s06 systemd: Starting Session 201022 of user root. >>> Mar 4 10:26:21 s06 nrpe[162388]: Error: Could not complete SSL handshake with 192.168.1.56 : 5 >>> Mar 4 10:27:01 s06 systemd: Started Session 201023 of user root. >>> Mar 4 10:27:01 s06 systemd: Starting Session 201023 of user root. >>> Mar 4 10:28:01 s06 systemd: Started Session 201024 of user root. >>> Mar 4 10:28:01 s06 systemd: Starting Session 201024 of user root. >>> Mar 4 10:29:01 s06 systemd: Started Session 201025 of user root. >>> Mar 4 10:29:01 s06 systemd: Starting Session 201025 of user root. >>> >>> But, unfortunately, I don?t understand why it is happening. >>> Now, NAGIOS server shows that s06 status is ok: >>> >>> ***** Nagios ***** >>> >>> Notification Type: RECOVERY >>> >>> Service: Brick - /gluster/mnt2/brick >>> Host: s06 >>> Address: s06-stg >>> State: OK >>> >>> Date/Time: Mon Mar 4 10:35:23 CET 2019 >>> >>> Additional Info: >>> OK: Brick /gluster/mnt2/brick is up >>> >>> Nothing is changed from RAM, CPUs, and NETWORK point of view. >>> /var/log/message file has been updated: >>> >>> Mar 4 10:32:01 s06 systemd: Starting Session 201029 of user root. >>> Mar 4 10:32:30 s06 glustershd[20355]: [2019-03-04 09:32:30.069082] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server 192.168.0.52:49156 has not responded in the last 42 seconds, disconnecting. >>> Mar 4 10:32:55 s06 glustershd[20355]: [2019-03-04 09:32:55.074689] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-66: server 192.168.0.54:49167 has not responded in the last 42 seconds, disconnecting. >>> Mar 4 10:33:01 s06 systemd: Started Session 201030 of user root. >>> Mar 4 10:33:01 s06 systemd: Starting Session 201030 of user root. >>> Mar 4 10:34:01 s06 systemd: Started Session 201031 of user root. >>> Mar 4 10:34:01 s06 systemd: Starting Session 201031 of user root. >>> Mar 4 10:35:01 s06 nrpe[162562]: Could not read request from client 192.168.1.56, bailing out... >>> Mar 4 10:35:01 s06 nrpe[162562]: INFO: SSL Socket Shutdown. >>> Mar 4 10:35:01 s06 systemd: Started Session 201032 of user root. >>> Mar 4 10:35:01 s06 systemd: Starting Session 201032 of user root. >>> >>> Could you please help me to understand what it?s happening ? >>> Thank you in advance. >>> >>> Rergards, >>> Mauro >>> >>> >>>> On 1 Mar 2019, at 12:17, Mauro Tridici > wrote: >>>> >>>> >>>> Thank you, Milind. >>>> I executed the instructions you suggested: >>>> >>>> - grep ?blocked for? /var/log/messages on s06 returns no output (no ?blocked? word is detected in messages file); >>>> - in /var/log/messages file I can see this kind of error repeated for a lot of times: >>>> >>>> Mar 1 08:43:01 s06 systemd: Starting Session 196071 of user root. >>>> Mar 1 08:43:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 1 08:43:01 s06 systemd: Stopping User Slice of root. >>>> Mar 1 08:43:02 s06 kernel: traps: check_vol_utili[57091] general protection ip:7f88e76ee66d sp:7ffe5a5bcc30 error:0 in libglusterfs.so.0.0.1[7f88e769b000+f7000] >>>> Mar 1 08:43:02 s06 abrt-hook-ccpp: Process 57091 (python2.7) of user 0 killed by SIGSEGV - dumping core >>>> Mar 1 08:43:02 s06 abrt-server: Generating core_backtrace >>>> Mar 1 08:43:02 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>> Mar 1 08:43:58 s06 abrt-server: Duplicate: UUID >>>> Mar 1 08:43:58 s06 abrt-server: DUP_OF_DIR: /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>> Mar 1 08:43:58 s06 abrt-server: Deleting problem directory ccpp-2019-03-01-08:43:02-57091 (dup of ccpp-2018-09-25-12:27:42-13041) >>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Activating service name='org.freedesktop.problems' (using servicehelper) >>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Successfully activated service 'org.freedesktop.problems' >>>> Mar 1 08:43:58 s06 abrt-server: Generating core_backtrace >>>> Mar 1 08:43:58 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>> Mar 1 08:43:58 s06 abrt-server: Cannot notify '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event 'report_uReport' exited with 1 >>>> Mar 1 08:44:01 s06 systemd: Created slice User Slice of root. >>>> Mar 1 08:44:01 s06 systemd: Starting User Slice of root. >>>> Mar 1 08:44:01 s06 systemd: Started Session 196072 of user root. >>>> Mar 1 08:44:01 s06 systemd: Starting Session 196072 of user root. >>>> Mar 1 08:44:01 s06 systemd: Removed slice User Slice of root. >>>> >>>> - in /var/log/messages file I can see also 4 errors related to other cluster servers: >>>> >>>> Mar 1 11:05:01 s06 systemd: Starting User Slice of root. >>>> Mar 1 11:05:01 s06 systemd: Started Session 196230 of user root. >>>> Mar 1 11:05:01 s06 systemd: Starting Session 196230 of user root. >>>> Mar 1 11:05:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 1 11:05:01 s06 systemd: Stopping User Slice of root. >>>> Mar 1 11:05:59 s06 glustershd[70117]: [2019-03-01 10:05:59.347094] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-33: server 192.168.0.51:49163 has not responded in the last 42 seconds, disconnecting. >>>> Mar 1 11:06:01 s06 systemd: Created slice User Slice of root. >>>> Mar 1 11:06:01 s06 systemd: Starting User Slice of root. >>>> Mar 1 11:06:01 s06 systemd: Started Session 196231 of user root. >>>> Mar 1 11:06:01 s06 systemd: Starting Session 196231 of user root. >>>> Mar 1 11:06:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 1 11:06:01 s06 systemd: Stopping User Slice of root. >>>> Mar 1 11:06:12 s06 glustershd[70117]: [2019-03-01 10:06:12.351319] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-1: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting. >>>> Mar 1 11:06:38 s06 glustershd[70117]: [2019-03-01 10:06:38.356920] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-7: server 192.168.0.52:49155 has not responded in the last 42 seconds, disconnecting. >>>> Mar 1 11:07:01 s06 systemd: Created slice User Slice of root. >>>> Mar 1 11:07:01 s06 systemd: Starting User Slice of root. >>>> Mar 1 11:07:01 s06 systemd: Started Session 196232 of user root. >>>> Mar 1 11:07:01 s06 systemd: Starting Session 196232 of user root. >>>> Mar 1 11:07:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 1 11:07:01 s06 systemd: Stopping User Slice of root. >>>> Mar 1 11:07:36 s06 glustershd[70117]: [2019-03-01 10:07:36.366259] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-0: server 192.168.0.51:49152 has not responded in the last 42 seconds, disconnecting. >>>> Mar 1 11:08:01 s06 systemd: Created slice User Slice of root. >>>> >>>> No ?blocked? word is in /var/log/messages files on other cluster servers. >>>> In attachment, the /var/log/messages file from s06 server. >>>> >>>> Thank you in advance, >>>> Mauro >>>> >>>> >>>> >>>> >>>>> On 1 Mar 2019, at 11:47, Milind Changire > wrote: >>>>> >>>>> The traces of very high disk activity on the servers are often found in /var/log/messages >>>>> You might want to grep for "blocked for" in /var/log/messages on s06 and correlate the timestamps to confirm the unresponsiveness as reported in gluster client logs. >>>>> In cases of high disk activity, although the operating system continues to respond to ICMP pings, the processes writing to disks often get blocked to a large flush to the disk which could span beyond 42 seconds and hence result in ping-timer-expiry logs. >>>>> >>>>> As a side note: >>>>> If you indeed find gluster processes being blocked in /var/log/messages, you might want to tweak sysctl tunables called vm.dirty_background_ratio or vm.dirty_background_bytes to a smaller value than the existing. Please read up more on those tunables before touching the settings. >>>>> >>>>> >>>>> On Fri, Mar 1, 2019 at 4:06 PM Mauro Tridici > wrote: >>>>> >>>>> Hi all, >>>>> >>>>> in attachment the client log captured after changing network.ping-timeout option. >>>>> I noticed this error involving server 192.168.0.56 (s06) >>>>> >>>>> [2019-03-01 09:23:36.077287] I [rpc-clnt.c:1962:rpc_clnt_reconfig] 0-tier2-client-71: changing ping timeout to 42 (from 0) >>>>> [2019-03-01 09:23:36.078213] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>> [2019-03-01 09:23:36.078432] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>> [2019-03-01 09:23:36.092357] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>> [2019-03-01 09:23:36.094146] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>> [2019-03-01 10:06:24.708082] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-50: server 192.168.0.56:49156 has not responded in the last 42 seconds, disconnecting. >>>>> >>>>> I don?t know why it happens, s06 server seems to be reachable. >>>>> >>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>> Trying 192.168.0.56... >>>>> Connected to 192.168.0.56. >>>>> Escape character is '^]'. >>>>> ^CConnection closed by foreign host. >>>>> [athena_login2][/users/home/sysm02/]> ping 192.168.0.56 >>>>> PING 192.168.0.56 (192.168.0.56) 56(84) bytes of data. >>>>> 64 bytes from 192.168.0.56 : icmp_seq=1 ttl=64 time=0.116 ms >>>>> 64 bytes from 192.168.0.56 : icmp_seq=2 ttl=64 time=0.101 ms >>>>> >>>>> --- 192.168.0.56 ping statistics --- >>>>> 2 packets transmitted, 2 received, 0% packet loss, time 1528ms >>>>> rtt min/avg/max/mdev = 0.101/0.108/0.116/0.012 ms >>>>> >>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>> Trying 192.168.0.56... >>>>> Connected to 192.168.0.56. >>>>> Escape character is '^]'. >>>>> >>>>> Thank you for your help, >>>>> Mauro >>>>> >>>>> >>>>> >>>>>> On 1 Mar 2019, at 10:29, Mauro Tridici > wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> thank you for the explanation. >>>>>> I just changed network.ping-timeout option to default value (network.ping-timeout=42). >>>>>> >>>>>> I will check the logs to see if the errors will appear again. >>>>>> >>>>>> Regards, >>>>>> Mauro >>>>>> >>>>>>> On 1 Mar 2019, at 04:43, Milind Changire > wrote: >>>>>>> >>>>>>> network.ping-timeout should not be set to zero for non-glusterd clients. >>>>>>> glusterd is a special case for which ping-timeout is set to zero via /etc/glusterfs/glusterd.vol >>>>>>> >>>>>>> Setting network.ping-timeout to zero disables arming of the ping timer for connections. This disables testing the connection for responsiveness and hence avoids proactive fail-over. >>>>>>> >>>>>>> Please reset network.ping-timeout to a non-zero positive value, eg. 42 >>>>>>> >>>>>>> >>>>>>> On Thu, Feb 28, 2019 at 5:07 PM Nithya Balachandran > wrote: >>>>>>> Adding Raghavendra and Milind to comment on this. >>>>>>> >>>>>>> What is the effect of setting network.ping-timeout to 0 and should it be set back to 42? >>>>>>> Regards, >>>>>>> Nithya >>>>>>> >>>>>>> On Thu, 28 Feb 2019 at 16:01, Mauro Tridici > wrote: >>>>>>> Hi Nithya, >>>>>>> >>>>>>> sorry for the late. >>>>>>> network.ping-timeout has been set to 0 in order to try to solve some timeout problems, but it didn?t help. >>>>>>> I can set it to the default value. >>>>>>> >>>>>>> Can I proceed with the change? >>>>>>> >>>>>>> Thank you, >>>>>>> Mauro >>>>>>> >>>>>>> >>>>>>>> On 28 Feb 2019, at 04:41, Nithya Balachandran > wrote: >>>>>>>> >>>>>>>> Hi Mauro, >>>>>>>> >>>>>>>> Is network.ping-timeout still set to 0. The default value is 42. Is there a particular reason why this was changed? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Nithya >>>>>>>> >>>>>>>> >>>>>>>> On Wed, 27 Feb 2019 at 21:32, Mauro Tridici > wrote: >>>>>>>> >>>>>>>> Hi Xavi, >>>>>>>> >>>>>>>> thank you for the detailed explanation and suggestions. >>>>>>>> Yes, transport.listen-backlog option is still set to 1024. >>>>>>>> >>>>>>>> I will check the network and connectivity status using ?ping? and ?telnet? as soon as the errors will come back again. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Mauro >>>>>>>> >>>>>>>>> Il giorno 27 feb 2019, alle ore 16:42, Xavi Hernandez > ha scritto: >>>>>>>>> >>>>>>>>> Hi Mauro, >>>>>>>>> >>>>>>>>> those errors say that the mount point is not connected to some of the bricks while executing operations. I see references to 3rd and 6th bricks of several disperse sets, which seem to map to server s06. For some reason, gluster is having troubles connecting from the client machine to that particular server. At the end of the log I see that after long time a reconnect is done to both of them. However little after, other bricks from the s05 get disconnected and a reconnect times out. >>>>>>>>> >>>>>>>>> That's really odd. It seems like if server/communication is cut to s06 for some time, then restored, and then the same happens to the next server. >>>>>>>>> >>>>>>>>> If the servers are really online and it's only a communication issue, it explains why server memory and network has increased: if the problem only exists between the client and servers, any write made by the client will automatically mark the file as damaged, since some of the servers have not been updated. Since self-heal runs from the server nodes, they will probably be correctly connected to all bricks, which allows them to heal the just damaged file, which increases memory and network usage. >>>>>>>>> >>>>>>>>> I guess you still have transport.listen-backlog set to 1024, right ? >>>>>>>>> >>>>>>>>> Just to try to identify if the problem really comes from network, can you check if you lose some pings from the client to all of the servers while you are seeing those errors in the log file ? >>>>>>>>> >>>>>>>>> You can also check if during those errors, you can telnet to the port of the brick from the client. >>>>>>>>> >>>>>>>>> Xavi >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Feb 26, 2019 at 10:17 AM Mauro Tridici > wrote: >>>>>>>>> Hi Nithya, >>>>>>>>> >>>>>>>>> ?df -h? operation is not still slow, but no users are using the volume, RAM and NETWORK usage is ok on the client node. >>>>>>>>> >>>>>>>>> I was worried about this kind of warnings/errors: >>>>>>>>> >>>>>>>>> [2019-02-25 10:59:00.664323] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-6: Executing operation with some subvolumes unavailable (20) >>>>>>>>> >>>>>>>>> [2019-02-26 03:11:35.212603] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-26 03:10:56.549903 (xid=0x106f1c5) >>>>>>>>> >>>>>>>>> [2019-02-26 03:13:03.313831] E [socket.c:2376:socket_connect_finish] 0-tier2-client-50: connection to 192.168.0.56:49156 failed (Timeout della connessione); disconnecting socket >>>>>>>>> >>>>>>>>> It seems that some subvolumes are not available and 192.168.0.56 server (s06) is not reachable. >>>>>>>>> But gluster servers are up&running and bricks are ok. >>>>>>>>> >>>>>>>>> In attachment the updated tier2.log file. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thank you. >>>>>>>>> Regards, >>>>>>>>> Mauro >>>>>>>>> >>>>>>>>>> Il giorno 26 feb 2019, alle ore 04:03, Nithya Balachandran > ha scritto: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I see a lot of EC messages in the log but they don't seem very serious. Xavi, can you take a look? >>>>>>>>>> >>>>>>>>>> The only errors I see are: >>>>>>>>>> [2019-02-25 10:58:45.519871] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-25 10:57:47.429969 (xid=0xd26fe7) >>>>>>>>>> [2019-02-25 10:58:51.461493] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-41: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-25 10:57:47.499174 (xid=0xf47d6a) >>>>>>>>>> [2019-02-25 11:07:57.152874] E [socket.c:2376:socket_connect_finish] 0-tier2-client-70: connection to 192.168.0.55:49163 failed (Timeout della connessione); disconnecting socket >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Is the df -h operation still slow? If yes, can you take a tcpdump of the client while running df -h and send that across? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Nithya >>>>>>>>>> >>>>>>>>>> On Mon, 25 Feb 2019 at 17:27, Mauro Tridici > wrote: >>>>>>>>>> >>>>>>>>>> Sorry, some minutes after my last mail message, I noticed that ?df -h? command hanged for a while before returns the prompt. >>>>>>>>>> Yesterday, everything was ok in the gluster client log, but, today, I see a lot of errors (please, take a look to the attached file). >>>>>>>>>> >>>>>>>>>> On the client node, I detected an important RAM e NETWORK usage. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Do you think that the errors have been caused by the client resources usage? >>>>>>>>>> >>>>>>>>>> Thank you in advance, >>>>>>>>>> Mauro >>>>>>>>>> >>> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: s06-20190311.logs.tar.gz Type: application/x-gzip Size: 2338627 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From abhishpaliwal at gmail.com Mon Mar 11 10:09:34 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Mon, 11 Mar 2019 15:39:34 +0530 Subject: [Gluster-users] Glusterfsd crashed with SIGSEGV In-Reply-To: References: Message-ID: Hi Team, COuld you please provide some pointer to debug it further. Regards, Abhishek On Fri, Mar 8, 2019 at 2:19 PM ABHISHEK PALIWAL wrote: > Hi Team, > > I am using Glusterfs 5.4, where after setting the gluster mount point when > trying to access it, glusterfsd is getting crashed and mount point through > the "Transport endpoint is not connected error. > > Here I are the gdb log for the core file > > warning: Could not load shared library symbols for linux-vdso64.so.1. > Do you need "set solib-search-path" or "set sysroot"? > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 --volfile-id > gv0.128.224.95.140.tmp-bric'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, > bytes=bytes at entry=36) at malloc.c:3327 > 3327 { > [Current thread is 1 (Thread 0x3fff90394160 (LWP 811))] > (gdb) > (gdb) > (gdb) bt > #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, > bytes=bytes at entry=36) at malloc.c:3327 > #1 0x00003fff95ab43dc in __GI___libc_malloc (bytes=36) at malloc.c:2921 > #2 0x00003fff95b6ffd0 in x_inline (xdrs=0x3fff90391d20, len= out>) at xdr_sizeof.c:89 > #3 0x00003fff95c4d488 in .xdr_gfx_iattx () from /usr/lib64/libgfxdr.so.0 > #4 0x00003fff95c4de84 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > #5 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, > pp=0x3fff7c132020, size=, proc=) at > xdr_ref.c:84 > #6 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, > objpp=0x3fff7c132020, obj_size=, > xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > #7 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > #8 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, > pp=0x3fff7c131ea0, size=, proc=) at > xdr_ref.c:84 > #9 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, > objpp=0x3fff7c131ea0, obj_size=, > xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > #10 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > #11 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, > pp=0x3fff7c131d20, size=, proc=) at > xdr_ref.c:84 > #12 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, > objpp=0x3fff7c131d20, obj_size=, > xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > #13 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > #14 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, > pp=0x3fff7c131ba0, size=, proc=) at > xdr_ref.c:84 > #15 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, > objpp=0x3fff7c131ba0, obj_size=, > xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > #16 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > #17 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, > pp=0x3fff7c131a20, size=, proc=) at > xdr_ref.c:84 > #18 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, > objpp=0x3fff7c131a20, obj_size=, > xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > #19 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > #20 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, > pp=0x3fff7c1318a0, size=, proc=) at > xdr_ref.c:84 > #21 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, > objpp=0x3fff7c1318a0, obj_size=, > xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > #22 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > #23 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, > pp=0x3fff7c131720, size=, proc=) at > xdr_ref.c:84 > #24 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, > objpp=0x3fff7c131720, obj_size=, > xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > #25 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > #26 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, > pp=0x3fff7c1315a0, size=, proc=) at > xdr_ref.c:84 > #27 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, > objpp=0x3fff7c1315a0, obj_size=, > xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > #28 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > #29 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, > pp=0x3fff7c131420, size=, proc=) at > xdr_ref.c:84 > #30 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, > objpp=0x3fff7c131420, obj_size=, > xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > #31 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > #32 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, > pp=0x3fff7c1312a0, size=, proc=) at > xdr_ref.c:84 > #33 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, > objpp=0x3fff7c1312a0, obj_size=, > xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > > Frames are getting repeated, could any one please me. > -- > Regards > Abhishek Paliwal > -- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauryam at gmail.com Mon Mar 11 10:38:37 2019 From: mauryam at gmail.com (Maurya M) Date: Mon, 11 Mar 2019 16:08:37 +0530 Subject: [Gluster-users] Gluster 4.1 install on AKS (Azure) Message-ID: Hi All, I am trying to install gluster 4.1 on 3 nodes on my AKS cluster using gluster-kubernetes project. As i followed the pre-setup instructions with opening the firewall, providing raw block devices, install glusterfs-client / server & start the gluster on the node: i am getting this error below: Mar 11 10:22:54 aks-agentpool-26682136-0 systemd[1]: Starting GlusterFS, a clustered file-system server... -- Subject: Unit glusterd.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit glusterd.service has begun starting up. Mar 11 10:22:54 aks-agentpool-26682136-0 GlusterFS[13694]: [glusterfsd.c:2150:parse_cmdline] 0-glusterfs: ERROR: parsing the volfile failed [No such file or directory] Mar 11 10:22:54 aks-agentpool-26682136-0 glusterd[13694]: USAGE: /usr/sbin/glusterd [options] [mountpoint] Mar 11 10:22:54 aks-agentpool-26682136-0 systemd[1]: glusterd.service: Control process exited, code=exited status=255 Mar 11 10:22:54 aks-agentpool-26682136-0 systemd[1]: Failed to start GlusterFS, a clustered file-system server. -- Subject: Unit glusterd.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit glusterd.service has failed. -- -- The result is failed. Mar 11 10:22:54 aks-agentpool-26682136-0 systemd[1]: glusterd.service: Unit entered failed state. Mar 11 10:22:54 aks-agentpool-26682136-0 systemd[1]: glusterd.service: Failed with result 'exit-code'. Any ideas how to troubleshoot this, tried remove / reconfigure / delete the symlink , but yet i am not able to proceed with the install. Thanks in advance for your help & support. Appreciate it, Maurya -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Mon Mar 11 13:38:36 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Mon, 11 Mar 2019 19:08:36 +0530 Subject: [Gluster-users] Glusterfsd crashed with SIGSEGV In-Reply-To: References: Message-ID: Hi Abhishek, Can you check and get back to us? ``` bash# ldd /usr/lib64/libglusterfs.so bash# ldd /usr/lib64/libgfrpc.so ``` Also considering you have the core, can you do `(gdb) thr apply all bt full` and pass it on? Thanks & Regards, Amar On Mon, Mar 11, 2019 at 3:41 PM ABHISHEK PALIWAL wrote: > Hi Team, > > COuld you please provide some pointer to debug it further. > > Regards, > Abhishek > > On Fri, Mar 8, 2019 at 2:19 PM ABHISHEK PALIWAL > wrote: > >> Hi Team, >> >> I am using Glusterfs 5.4, where after setting the gluster mount point >> when trying to access it, glusterfsd is getting crashed and mount point >> through the "Transport endpoint is not connected error. >> >> Here I are the gdb log for the core file >> >> warning: Could not load shared library symbols for linux-vdso64.so.1. >> Do you need "set solib-search-path" or "set sysroot"? >> [Thread debugging using libthread_db enabled] >> Using host libthread_db library "/lib64/libthread_db.so.1". >> Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 >> --volfile-id gv0.128.224.95.140.tmp-bric'. >> Program terminated with signal SIGSEGV, Segmentation fault. >> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >> bytes=bytes at entry=36) at malloc.c:3327 >> 3327 { >> [Current thread is 1 (Thread 0x3fff90394160 (LWP 811))] >> (gdb) >> (gdb) >> (gdb) bt >> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >> bytes=bytes at entry=36) at malloc.c:3327 >> #1 0x00003fff95ab43dc in __GI___libc_malloc (bytes=36) at malloc.c:2921 >> #2 0x00003fff95b6ffd0 in x_inline (xdrs=0x3fff90391d20, len=> out>) at xdr_sizeof.c:89 >> #3 0x00003fff95c4d488 in .xdr_gfx_iattx () from /usr/lib64/libgfxdr.so.0 >> #4 0x00003fff95c4de84 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> #5 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >> pp=0x3fff7c132020, size=, proc=) at >> xdr_ref.c:84 >> #6 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >> objpp=0x3fff7c132020, obj_size=, >> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> #7 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> #8 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >> pp=0x3fff7c131ea0, size=, proc=) at >> xdr_ref.c:84 >> #9 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >> objpp=0x3fff7c131ea0, obj_size=, >> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> #10 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> #11 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >> pp=0x3fff7c131d20, size=, proc=) at >> xdr_ref.c:84 >> #12 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >> objpp=0x3fff7c131d20, obj_size=, >> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> #13 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> #14 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >> pp=0x3fff7c131ba0, size=, proc=) at >> xdr_ref.c:84 >> #15 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >> objpp=0x3fff7c131ba0, obj_size=, >> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> #16 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> #17 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >> pp=0x3fff7c131a20, size=, proc=) at >> xdr_ref.c:84 >> #18 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >> objpp=0x3fff7c131a20, obj_size=, >> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> #19 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> #20 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >> pp=0x3fff7c1318a0, size=, proc=) at >> xdr_ref.c:84 >> #21 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >> objpp=0x3fff7c1318a0, obj_size=, >> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> #22 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> #23 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >> pp=0x3fff7c131720, size=, proc=) at >> xdr_ref.c:84 >> #24 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >> objpp=0x3fff7c131720, obj_size=, >> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> #25 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> #26 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >> pp=0x3fff7c1315a0, size=, proc=) at >> xdr_ref.c:84 >> #27 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >> objpp=0x3fff7c1315a0, obj_size=, >> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> #28 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> #29 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >> pp=0x3fff7c131420, size=, proc=) at >> xdr_ref.c:84 >> #30 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >> objpp=0x3fff7c131420, obj_size=, >> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> #31 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> #32 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >> pp=0x3fff7c1312a0, size=, proc=) at >> xdr_ref.c:84 >> #33 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >> objpp=0x3fff7c1312a0, obj_size=, >> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> >> Frames are getting repeated, could any one please me. >> -- >> Regards >> Abhishek Paliwal >> > > > -- > > > > > Regards > Abhishek Paliwal > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Tue Mar 12 04:17:27 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Tue, 12 Mar 2019 09:47:27 +0530 Subject: [Gluster-users] "rpc_clnt_ping_timer_expired" errors In-Reply-To: References: <96B07283-D8AB-4F06-909D-E00424625528@cmcc.it> <42758A0E-8BE9-497D-BDE3-55D7DC633BC7@cmcc.it> <6A8CF4A4-98EA-48C3-A059-D60D1B2721C7@cmcc.it> <2CF49168-9C1B-4931-8C34-8157262A137A@cmcc.it> <7A151CC9-A0AE-4A45-B450-A4063D216D9E@cmcc.it> <32D53ECE-3F49-4415-A6EE-241B351AC2BA@cmcc.it> <4685A75B-5978-4338-9C9F-4A02FB40B9BC@cmcc.it> <4D2E6B04-C2E8-4FD5-B72D-E7C05931C6F9@cmcc.it> <4E332A56-B318-4BC1-9F44-84AB4392A5DE@cmcc.it> <832FD362-3B14-40D8-8530-604419300476@cmcc.it> <8D926643-1093-48ED-823F-D8F117F702CF@cmcc.it> <9D0BE438-DF11-4D0A-AB85-B44357032F29@cmcc.it> <5F0AC378-8170-4342-8473-9C17159CAC1D@cmcc.it> <7A50E86D-9E27-4EA7-883B-45E9F973991A@cmcc.it> <58B5DB7F-DCB4-4FBF-B1F7-681315B1613A@cmcc.it> <6327B44F-4E7E-46BB-A74C-70F4457DD1EB@cmcc.it> <0167DF4A-8329-4A1A-B439-857DFA6F78BB@cmcc.it> <763F334E-35B8-4729-B8E1-D30866754EEE@cmcc.it> <91DFC9AC-4805-41EB-AC6F-5722E018DD6E@cmcc.it> <8A9752B8-B231-4570-8FF4-8D3D781E7D42@cmcc.it> <47A24804-F975-4EE6-9FA5-67FCDA18D707@cmcc.it> <637FF5D2-D1F4-4686-9D48-646A96F67B96@cmcc.it> <4A87495F-3755-48F7-8507-085869069C64@cmcc.it> <3854BBF6-5B98-4AB3-A67E-E7DE59E69A63@cmcc.it> <313DA021-9173-4899-96B0-831B10B00B61@cmcc.it> <17996AFD-DFC8-40F3-9D09-DB6DDAD5B7D6@cmcc.it> <7074B5D8-955A-4802-A9F3-606C99734417@cmcc.it> <83B84BF9-8334-4230-B6F8-0BC4BFBEFF15@cmcc.it> <133B9AE4-9792-4F72-AD91-D36E7B9EC711@cmcc.it> <6611C4B0-57FD-4390-88B5-BD373555D4C5@cmcc.it> Message-ID: Was the suggestion to increase server.event-thread values tried? If yes, what were the results? On Mon, Mar 11, 2019 at 2:40 PM Mauro Tridici wrote: > Dear All, > > do you have any suggestions about the right way to "debug" this issue? > In attachment, the updated logs of ?s06" gluster server. > > I noticed a lot of intermittent warning and error messages. > > Thank you in advance, > Mauro > > > > On 4 Mar 2019, at 18:45, Raghavendra Gowdappa wrote: > > > +Gluster Devel , +Gluster-users > > > I would like to point out another issue. Even if what I suggested prevents > disconnects, part of the solution would be only symptomatic treatment and > doesn't address the root cause of the problem. In most of the > ping-timer-expiry issues, the root cause is the increased load on bricks > and the inability of bricks to be responsive under high load. So, the > actual solution would be doing any or both of the following: > * identify the source of increased load and if possible throttle it. > Internal heal processes like self-heal, rebalance, quota heal are known to > pump traffic into bricks without much throttling (io-threads _might_ do > some throttling, but my understanding is its not sufficient). > * identify the reason for bricks to become unresponsive during load. This > may be fixable issues like not enough event-threads to read from network or > difficult to fix issues like fsync on backend fs freezing the process or > semi fixable issues (in code) like lock contention. > > So any genuine effort to fix ping-timer-issues (to be honest most of the > times they are not issues related to rpc/network) would involve performance > characterization of various subsystems on bricks and clients. Various > subsystems can include (but not necessarily limited to), underlying > OS/filesystem, glusterfs processes, CPU consumption etc > > regards, > Raghavendra > > On Mon, Mar 4, 2019 at 9:31 PM Mauro Tridici > wrote: > >> Thank you, let?s try! >> I will inform you about the effects of the change. >> >> Regards, >> Mauro >> >> On 4 Mar 2019, at 16:55, Raghavendra Gowdappa >> wrote: >> >> >> >> On Mon, Mar 4, 2019 at 8:54 PM Mauro Tridici >> wrote: >> >>> Hi Raghavendra, >>> >>> thank you for your reply. >>> Yes, you are right. It is a problem that seems to happen randomly. >>> At this moment, server.event-threads value is 4. I will try to increase >>> this value to 8. Do you think that it could be a valid value ? >>> >> >> Yes. We can try with that. You should see at least frequency of >> ping-timer related disconnects reduce with this value (even if it doesn't >> eliminate the problem completely). >> >> >>> Regards, >>> Mauro >>> >>> >>> On 4 Mar 2019, at 15:36, Raghavendra Gowdappa >>> wrote: >>> >>> >>> >>> On Mon, Mar 4, 2019 at 8:01 PM Nithya Balachandran >>> wrote: >>> >>>> Hi Mauro, >>>> >>>> It looks like some problem on s06. Are all your other nodes ok? Can you >>>> send us the gluster logs from this node? >>>> >>>> @Raghavendra G , do you have any idea as to >>>> how this can be debugged? Maybe running top ? Or debug brick logs? >>>> >>> >>> If we can reproduce the problem, collecting tcpdump on both ends of >>> connection will help. But, one common problem is these bugs are >>> inconsistently reproducible and hence we may not be able to capture tcpdump >>> at correct intervals. Other than that, we can try to collect some evidence >>> that poller threads were busy (waiting on locks). But, not sure what debug >>> data provides that information. >>> >>> From what I know, its difficult to collect evidence for this issue and >>> we could only reason about it. >>> >>> We can try a workaround though - try increasing server.event-threads and >>> see whether ping-timer expiry issues go away with an optimal value. If >>> that's the case, it kind of provides proof for our hypothesis. >>> >>> >>>> >>>> Regards, >>>> Nithya >>>> >>>> On Mon, 4 Mar 2019 at 15:25, Mauro Tridici >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> some minutes ago I received this message from NAGIOS server >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ****** Nagios *****Notification Type: PROBLEMService: Brick - >>>>> /gluster/mnt2/brickHost: s06Address: s06-stgState: CRITICALDate/Time: Mon >>>>> Mar 4 10:25:33 CET 2019Additional Info:CHECK_NRPE STATE CRITICAL: Socket >>>>> timeout after 10 seconds.* >>>>> >>>>> I checked the network, RAM and CPUs usage on s06 node and everything >>>>> seems to be ok. >>>>> No bricks are in error state. In /var/log/messages, I detected again a >>>>> crash of ?check_vol_utili? that I think it is a module used by NRPE >>>>> executable (that is the NAGIOS client). >>>>> >>>>> Mar 4 10:15:29 s06 kernel: traps: check_vol_utili[161224] general >>>>> protection ip:7facffa0a66d sp:7ffe9f4e6fc0 error:0 in >>>>> libglusterfs.so.0.0.1[7facff9b7000+f7000] >>>>> Mar 4 10:15:29 s06 abrt-hook-ccpp: Process 161224 (python2.7) of user >>>>> 0 killed by SIGSEGV - dumping core >>>>> Mar 4 10:15:29 s06 abrt-server: Generating core_backtrace >>>>> Mar 4 10:15:29 s06 abrt-server: Error: Unable to open './coredump': >>>>> No such file or directory >>>>> Mar 4 10:16:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 4 10:16:01 s06 systemd: Starting User Slice of root. >>>>> Mar 4 10:16:01 s06 systemd: Started Session 201010 of user root. >>>>> Mar 4 10:16:01 s06 systemd: Starting Session 201010 of user root. >>>>> Mar 4 10:16:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 4 10:16:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 4 10:16:24 s06 abrt-server: Duplicate: UUID >>>>> Mar 4 10:16:24 s06 abrt-server: DUP_OF_DIR: >>>>> /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>>> Mar 4 10:16:24 s06 abrt-server: Deleting problem directory >>>>> ccpp-2019-03-04-10:15:29-161224 (dup of ccpp-2018-09-25-12:27:42-13041) >>>>> Mar 4 10:16:24 s06 abrt-server: Generating core_backtrace >>>>> Mar 4 10:16:24 s06 abrt-server: Error: Unable to open './coredump': >>>>> No such file or directory >>>>> Mar 4 10:16:24 s06 abrt-server: Cannot notify >>>>> '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event >>>>> 'report_uReport' exited with 1 >>>>> Mar 4 10:16:24 s06 abrt-hook-ccpp: Process 161391 (python2.7) of user >>>>> 0 killed by SIGABRT - dumping core >>>>> Mar 4 10:16:25 s06 abrt-server: Generating core_backtrace >>>>> Mar 4 10:16:25 s06 abrt-server: Error: Unable to open './coredump': >>>>> No such file or directory >>>>> Mar 4 10:17:01 s06 systemd: Created slice User Slice of root. >>>>> >>>>> Also, I noticed the following errors that I think are very critical: >>>>> >>>>> Mar 4 10:21:12 s06 glustershd[20355]: [2019-03-04 09:21:12.954798] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server >>>>> 192.168.0.55:49158 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 4 10:22:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 4 10:22:01 s06 systemd: Starting User Slice of root. >>>>> Mar 4 10:22:01 s06 systemd: Started Session 201017 of user root. >>>>> Mar 4 10:22:01 s06 systemd: Starting Session 201017 of user root. >>>>> Mar 4 10:22:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 4 10:22:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 4 10:22:03 s06 glustershd[20355]: [2019-03-04 09:22:03.964120] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server >>>>> 192.168.0.54:49165 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 4 10:23:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 4 10:23:01 s06 systemd: Starting User Slice of root. >>>>> Mar 4 10:23:01 s06 systemd: Started Session 201018 of user root. >>>>> Mar 4 10:23:01 s06 systemd: Starting Session 201018 of user root. >>>>> Mar 4 10:23:02 s06 systemd: Removed slice User Slice of root. >>>>> Mar 4 10:23:02 s06 systemd: Stopping User Slice of root. >>>>> Mar 4 10:24:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 4 10:24:01 s06 systemd: Starting User Slice of root. >>>>> Mar 4 10:24:01 s06 systemd: Started Session 201019 of user root. >>>>> Mar 4 10:24:01 s06 systemd: Starting Session 201019 of user root. >>>>> Mar 4 10:24:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 4 10:24:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 4 10:24:03 s06 glustershd[20355]: [2019-03-04 09:24:03.982502] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-16: server >>>>> 192.168.0.52:49158 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746109] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-3: server >>>>> 192.168.0.51:49153 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746215] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server >>>>> 192.168.0.52:49156 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746260] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-21: server >>>>> 192.168.0.51:49159 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746296] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server >>>>> 192.168.0.52:49161 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746413] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server >>>>> 192.168.0.54:49165 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 4 10:24:07 s06 glustershd[20355]: [2019-03-04 09:24:07.982952] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-45: server >>>>> 192.168.0.54:49155 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 4 10:24:18 s06 glustershd[20355]: [2019-03-04 09:24:18.990929] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server >>>>> 192.168.0.52:49161 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 4 10:24:31 s06 glustershd[20355]: [2019-03-04 09:24:31.995781] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-20: server >>>>> 192.168.0.53:49159 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 4 10:25:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 4 10:25:01 s06 systemd: Starting User Slice of root. >>>>> Mar 4 10:25:01 s06 systemd: Started Session 201020 of user root. >>>>> Mar 4 10:25:01 s06 systemd: Starting Session 201020 of user root. >>>>> Mar 4 10:25:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 4 10:25:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 4 10:25:57 s06 systemd: Created slice User Slice of root. >>>>> Mar 4 10:25:57 s06 systemd: Starting User Slice of root. >>>>> Mar 4 10:25:57 s06 systemd-logind: New session 201021 of user root. >>>>> Mar 4 10:25:57 s06 systemd: Started Session 201021 of user root. >>>>> Mar 4 10:25:57 s06 systemd: Starting Session 201021 of user root. >>>>> Mar 4 10:26:01 s06 systemd: Started Session 201022 of user root. >>>>> Mar 4 10:26:01 s06 systemd: Starting Session 201022 of user root. >>>>> Mar 4 10:26:21 s06 nrpe[162388]: Error: Could not complete SSL >>>>> handshake with 192.168.1.56: 5 >>>>> Mar 4 10:27:01 s06 systemd: Started Session 201023 of user root. >>>>> Mar 4 10:27:01 s06 systemd: Starting Session 201023 of user root. >>>>> Mar 4 10:28:01 s06 systemd: Started Session 201024 of user root. >>>>> Mar 4 10:28:01 s06 systemd: Starting Session 201024 of user root. >>>>> Mar 4 10:29:01 s06 systemd: Started Session 201025 of user root. >>>>> Mar 4 10:29:01 s06 systemd: Starting Session 201025 of user root. >>>>> >>>>> But, unfortunately, I don?t understand why it is happening. >>>>> Now, NAGIOS server shows that s06 status is ok: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ****** Nagios *****Notification Type: RECOVERYService: Brick - >>>>> /gluster/mnt2/brickHost: s06Address: s06-stgState: OKDate/Time: Mon Mar 4 >>>>> 10:35:23 CET 2019Additional Info:OK: Brick /gluster/mnt2/brick is up* >>>>> >>>>> Nothing is changed from RAM, CPUs, and NETWORK point of view. >>>>> /var/log/message file has been updated: >>>>> >>>>> Mar 4 10:32:01 s06 systemd: Starting Session 201029 of user root. >>>>> Mar 4 10:32:30 s06 glustershd[20355]: [2019-03-04 09:32:30.069082] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server >>>>> 192.168.0.52:49156 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 4 10:32:55 s06 glustershd[20355]: [2019-03-04 09:32:55.074689] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-66: server >>>>> 192.168.0.54:49167 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 4 10:33:01 s06 systemd: Started Session 201030 of user root. >>>>> Mar 4 10:33:01 s06 systemd: Starting Session 201030 of user root. >>>>> Mar 4 10:34:01 s06 systemd: Started Session 201031 of user root. >>>>> Mar 4 10:34:01 s06 systemd: Starting Session 201031 of user root. >>>>> Mar 4 10:35:01 s06 nrpe[162562]: Could not read request from client >>>>> 192.168.1.56, bailing out... >>>>> Mar 4 10:35:01 s06 nrpe[162562]: INFO: SSL Socket Shutdown. >>>>> Mar 4 10:35:01 s06 systemd: Started Session 201032 of user root. >>>>> Mar 4 10:35:01 s06 systemd: Starting Session 201032 of user root. >>>>> >>>>> Could you please help me to understand what it?s happening ? >>>>> Thank you in advance. >>>>> >>>>> Rergards, >>>>> Mauro >>>>> >>>>> >>>>> On 1 Mar 2019, at 12:17, Mauro Tridici wrote: >>>>> >>>>> >>>>> Thank you, Milind. >>>>> I executed the instructions you suggested: >>>>> >>>>> - grep ?blocked for? /var/log/messages on s06 returns no output (no >>>>> ?blocked? word is detected in messages file); >>>>> - in /var/log/messages file I can see this kind of error repeated for >>>>> a lot of times: >>>>> >>>>> Mar 1 08:43:01 s06 systemd: Starting Session 196071 of user root. >>>>> Mar 1 08:43:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 1 08:43:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 1 08:43:02 s06 kernel: traps: check_vol_utili[57091] general >>>>> protection ip:7f88e76ee66d sp:7ffe5a5bcc30 error:0 in >>>>> libglusterfs.so.0.0.1[7f88e769b000+f7000] >>>>> Mar 1 08:43:02 s06 abrt-hook-ccpp: Process 57091 (python2.7) of user >>>>> 0 killed by SIGSEGV - dumping core >>>>> Mar 1 08:43:02 s06 abrt-server: Generating core_backtrace >>>>> Mar 1 08:43:02 s06 abrt-server: Error: Unable to open './coredump': >>>>> No such file or directory >>>>> Mar 1 08:43:58 s06 abrt-server: Duplicate: UUID >>>>> Mar 1 08:43:58 s06 abrt-server: DUP_OF_DIR: >>>>> /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>>> Mar 1 08:43:58 s06 abrt-server: Deleting problem directory >>>>> ccpp-2019-03-01-08:43:02-57091 (dup of ccpp-2018-09-25-12:27:42-13041) >>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Activating service >>>>> name='org.freedesktop.problems' (using servicehelper) >>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Successfully activated >>>>> service 'org.freedesktop.problems' >>>>> Mar 1 08:43:58 s06 abrt-server: Generating core_backtrace >>>>> Mar 1 08:43:58 s06 abrt-server: Error: Unable to open './coredump': >>>>> No such file or directory >>>>> Mar 1 08:43:58 s06 abrt-server: Cannot notify >>>>> '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event >>>>> 'report_uReport' exited with 1 >>>>> Mar 1 08:44:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 1 08:44:01 s06 systemd: Starting User Slice of root. >>>>> Mar 1 08:44:01 s06 systemd: Started Session 196072 of user root. >>>>> Mar 1 08:44:01 s06 systemd: Starting Session 196072 of user root. >>>>> Mar 1 08:44:01 s06 systemd: Removed slice User Slice of root. >>>>> >>>>> - in /var/log/messages file I can see also 4 errors related to other >>>>> cluster servers: >>>>> >>>>> Mar 1 11:05:01 s06 systemd: Starting User Slice of root. >>>>> Mar 1 11:05:01 s06 systemd: Started Session 196230 of user root. >>>>> Mar 1 11:05:01 s06 systemd: Starting Session 196230 of user root. >>>>> Mar 1 11:05:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 1 11:05:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 1 11:05:59 s06 glustershd[70117]: [2019-03-01 10:05:59.347094] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-33: server >>>>> 192.168.0.51:49163 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 1 11:06:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 1 11:06:01 s06 systemd: Starting User Slice of root. >>>>> Mar 1 11:06:01 s06 systemd: Started Session 196231 of user root. >>>>> Mar 1 11:06:01 s06 systemd: Starting Session 196231 of user root. >>>>> Mar 1 11:06:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 1 11:06:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 1 11:06:12 s06 glustershd[70117]: [2019-03-01 10:06:12.351319] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-1: server >>>>> 192.168.0.52:49153 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 1 11:06:38 s06 glustershd[70117]: [2019-03-01 10:06:38.356920] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-7: server >>>>> 192.168.0.52:49155 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 1 11:07:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 1 11:07:01 s06 systemd: Starting User Slice of root. >>>>> Mar 1 11:07:01 s06 systemd: Started Session 196232 of user root. >>>>> Mar 1 11:07:01 s06 systemd: Starting Session 196232 of user root. >>>>> Mar 1 11:07:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 1 11:07:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 1 11:07:36 s06 glustershd[70117]: [2019-03-01 10:07:36.366259] C >>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-0: server >>>>> 192.168.0.51:49152 has not responded in the last 42 seconds, >>>>> disconnecting. >>>>> Mar 1 11:08:01 s06 systemd: Created slice User Slice of root. >>>>> >>>>> No ?blocked? word is in /var/log/messages files on other cluster >>>>> servers. >>>>> In attachment, the /var/log/messages file from s06 server. >>>>> >>>>> Thank you in advance, >>>>> Mauro >>>>> >>>>> >>>>> >>>>> >>>>> On 1 Mar 2019, at 11:47, Milind Changire wrote: >>>>> >>>>> The traces of very high disk activity on the servers are often found >>>>> in /var/log/messages >>>>> You might want to grep for "blocked for" in /var/log/messages on s06 >>>>> and correlate the timestamps to confirm the unresponsiveness as reported in >>>>> gluster client logs. >>>>> In cases of high disk activity, although the operating system >>>>> continues to respond to ICMP pings, the processes writing to disks often >>>>> get blocked to a large flush to the disk which could span beyond 42 seconds >>>>> and hence result in ping-timer-expiry logs. >>>>> >>>>> As a side note: >>>>> If you indeed find gluster processes being blocked in >>>>> /var/log/messages, you might want to tweak sysctl tunables called >>>>> vm.dirty_background_ratio or vm.dirty_background_bytes to a smaller value >>>>> than the existing. Please read up more on those tunables before touching >>>>> the settings. >>>>> >>>>> >>>>> On Fri, Mar 1, 2019 at 4:06 PM Mauro Tridici >>>>> wrote: >>>>> >>>>>> >>>>>> Hi all, >>>>>> >>>>>> in attachment the client log captured after changing >>>>>> network.ping-timeout option. >>>>>> I noticed this error involving server 192.168.0.56 (s06) >>>>>> >>>>>> [2019-03-01 09:23:36.077287] I [rpc-clnt.c:1962:rpc_clnt_reconfig] >>>>>> 0-tier2-client-71: changing ping timeout to 42 (from 0) >>>>>> [2019-03-01 09:23:36.078213] I >>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>> volfile,continuing >>>>>> [2019-03-01 09:23:36.078432] I >>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>> volfile,continuing >>>>>> [2019-03-01 09:23:36.092357] I >>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>> volfile,continuing >>>>>> [2019-03-01 09:23:36.094146] I >>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>> volfile,continuing >>>>>> [2019-03-01 10:06:24.708082] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-50: server >>>>>> 192.168.0.56:49156 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> >>>>>> I don?t know why it happens, s06 server seems to be reachable. >>>>>> >>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>> Trying 192.168.0.56... >>>>>> Connected to 192.168.0.56. >>>>>> Escape character is '^]'. >>>>>> ^CConnection closed by foreign host. >>>>>> [athena_login2][/users/home/sysm02/]> ping 192.168.0.56 >>>>>> PING 192.168.0.56 (192.168.0.56) 56(84) bytes of data. >>>>>> 64 bytes from 192.168.0.56: icmp_seq=1 ttl=64 time=0.116 ms >>>>>> 64 bytes from 192.168.0.56: icmp_seq=2 ttl=64 time=0.101 ms >>>>>> >>>>>> --- 192.168.0.56 ping statistics --- >>>>>> 2 packets transmitted, 2 received, 0% packet loss, time 1528ms >>>>>> rtt min/avg/max/mdev = 0.101/0.108/0.116/0.012 ms >>>>>> >>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>> Trying 192.168.0.56... >>>>>> Connected to 192.168.0.56. >>>>>> Escape character is '^]'. >>>>>> >>>>>> Thank you for your help, >>>>>> Mauro >>>>>> >>>>>> >>>>>> >>>>>> On 1 Mar 2019, at 10:29, Mauro Tridici wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> thank you for the explanation. >>>>>> I just changed network.ping-timeout option to default value >>>>>> (network.ping-timeout=42). >>>>>> >>>>>> I will check the logs to see if the errors will appear again. >>>>>> >>>>>> Regards, >>>>>> Mauro >>>>>> >>>>>> On 1 Mar 2019, at 04:43, Milind Changire wrote: >>>>>> >>>>>> network.ping-timeout should not be set to zero for non-glusterd >>>>>> clients. >>>>>> glusterd is a special case for which ping-timeout is set to zero via >>>>>> /etc/glusterfs/glusterd.vol >>>>>> >>>>>> Setting network.ping-timeout to zero disables arming of the ping >>>>>> timer for connections. This disables testing the connection for >>>>>> responsiveness and hence avoids proactive fail-over. >>>>>> >>>>>> Please reset network.ping-timeout to a non-zero positive value, eg. 42 >>>>>> >>>>>> >>>>>> On Thu, Feb 28, 2019 at 5:07 PM Nithya Balachandran < >>>>>> nbalacha at redhat.com> wrote: >>>>>> >>>>>>> Adding Raghavendra and Milind to comment on this. >>>>>>> >>>>>>> What is the effect of setting network.ping-timeout to 0 and should >>>>>>> it be set back to 42? >>>>>>> Regards, >>>>>>> Nithya >>>>>>> >>>>>>> On Thu, 28 Feb 2019 at 16:01, Mauro Tridici >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Nithya, >>>>>>>> >>>>>>>> sorry for the late. >>>>>>>> network.ping-timeout has been set to 0 in order to try to solve >>>>>>>> some timeout problems, but it didn?t help. >>>>>>>> I can set it to the default value. >>>>>>>> >>>>>>>> Can I proceed with the change? >>>>>>>> >>>>>>>> Thank you, >>>>>>>> Mauro >>>>>>>> >>>>>>>> >>>>>>>> On 28 Feb 2019, at 04:41, Nithya Balachandran >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Mauro, >>>>>>>> >>>>>>>> Is network.ping-timeout still set to 0. The default value is 42. Is >>>>>>>> there a particular reason why this was changed? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Nithya >>>>>>>> >>>>>>>> >>>>>>>> On Wed, 27 Feb 2019 at 21:32, Mauro Tridici >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Hi Xavi, >>>>>>>>> >>>>>>>>> thank you for the detailed explanation and suggestions. >>>>>>>>> Yes, transport.listen-backlog option is still set to 1024. >>>>>>>>> >>>>>>>>> I will check the network and connectivity status using ?ping? and >>>>>>>>> ?telnet? as soon as the errors will come back again. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Mauro >>>>>>>>> >>>>>>>>> Il giorno 27 feb 2019, alle ore 16:42, Xavi Hernandez < >>>>>>>>> jahernan at redhat.com> ha scritto: >>>>>>>>> >>>>>>>>> Hi Mauro, >>>>>>>>> >>>>>>>>> those errors say that the mount point is not connected to some of >>>>>>>>> the bricks while executing operations. I see references to 3rd and 6th >>>>>>>>> bricks of several disperse sets, which seem to map to server s06. For some >>>>>>>>> reason, gluster is having troubles connecting from the client machine to >>>>>>>>> that particular server. At the end of the log I see that after long time a >>>>>>>>> reconnect is done to both of them. However little after, other bricks from >>>>>>>>> the s05 get disconnected and a reconnect times out. >>>>>>>>> >>>>>>>>> That's really odd. It seems like if server/communication is cut to >>>>>>>>> s06 for some time, then restored, and then the same happens to the next >>>>>>>>> server. >>>>>>>>> >>>>>>>>> If the servers are really online and it's only a communication >>>>>>>>> issue, it explains why server memory and network has increased: if the >>>>>>>>> problem only exists between the client and servers, any write made by the >>>>>>>>> client will automatically mark the file as damaged, since some of the >>>>>>>>> servers have not been updated. Since self-heal runs from the server nodes, >>>>>>>>> they will probably be correctly connected to all bricks, which allows them >>>>>>>>> to heal the just damaged file, which increases memory and network usage. >>>>>>>>> >>>>>>>>> I guess you still have transport.listen-backlog set to 1024, right >>>>>>>>> ? >>>>>>>>> >>>>>>>>> Just to try to identify if the problem really comes from network, >>>>>>>>> can you check if you lose some pings from the client to all of the servers >>>>>>>>> while you are seeing those errors in the log file ? >>>>>>>>> >>>>>>>>> You can also check if during those errors, you can telnet to the >>>>>>>>> port of the brick from the client. >>>>>>>>> >>>>>>>>> Xavi >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Feb 26, 2019 at 10:17 AM Mauro Tridici < >>>>>>>>> mauro.tridici at cmcc.it> wrote: >>>>>>>>> >>>>>>>>>> Hi Nithya, >>>>>>>>>> >>>>>>>>>> ?df -h? operation is not still slow, but no users are using the >>>>>>>>>> volume, RAM and NETWORK usage is ok on the client node. >>>>>>>>>> >>>>>>>>>> I was worried about this kind of warnings/errors: >>>>>>>>>> >>>>>>>>>> [2019-02-25 10:59:00.664323] W [MSGID: 122035] >>>>>>>>>> [ec-common.c:571:ec_child_select] 0-tier2-disperse-6: Executing operation >>>>>>>>>> with some subvolumes unavailable (20) >>>>>>>>>> >>>>>>>>>> [2019-02-26 03:11:35.212603] E >>>>>>>>>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>>>>>>>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> >>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> >>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> >>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] >>>>>>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) >>>>>>>>>> 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) >>>>>>>>>> called at 2019-02-26 03:10:56.549903 (xid=0x106f1c5) >>>>>>>>>> >>>>>>>>>> [2019-02-26 03:13:03.313831] E >>>>>>>>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-50: connection to >>>>>>>>>> 192.168.0.56:49156 failed (Timeout della connessione); >>>>>>>>>> disconnecting socket >>>>>>>>>> >>>>>>>>>> It seems that some subvolumes are not available and 192.168.0.56 >>>>>>>>>> server (s06) is not reachable. >>>>>>>>>> But gluster servers are up&running and bricks are ok. >>>>>>>>>> >>>>>>>>>> In attachment the updated tier2.log file. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thank you. >>>>>>>>>> Regards, >>>>>>>>>> Mauro >>>>>>>>>> >>>>>>>>>> Il giorno 26 feb 2019, alle ore 04:03, Nithya Balachandran < >>>>>>>>>> nbalacha at redhat.com> ha scritto: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I see a lot of EC messages in the log but they don't seem very >>>>>>>>>> serious. Xavi, can you take a look? >>>>>>>>>> >>>>>>>>>> The only errors I see are: >>>>>>>>>> [2019-02-25 10:58:45.519871] E >>>>>>>>>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>>>>>>>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> >>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> >>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> >>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] >>>>>>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) >>>>>>>>>> 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) >>>>>>>>>> called at 2019-02-25 10:57:47.429969 (xid=0xd26fe7) >>>>>>>>>> [2019-02-25 10:58:51.461493] E >>>>>>>>>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>>>>>>>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> >>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> >>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> >>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] >>>>>>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) >>>>>>>>>> 0-tier2-client-41: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) >>>>>>>>>> called at 2019-02-25 10:57:47.499174 (xid=0xf47d6a) >>>>>>>>>> [2019-02-25 11:07:57.152874] E >>>>>>>>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-70: connection to >>>>>>>>>> 192.168.0.55:49163 failed (Timeout della connessione); >>>>>>>>>> disconnecting socket >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Is the df -h operation still slow? If yes, can you take a tcpdump >>>>>>>>>> of the client while running df -h and send that across? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Nithya >>>>>>>>>> >>>>>>>>>> On Mon, 25 Feb 2019 at 17:27, Mauro Tridici < >>>>>>>>>> mauro.tridici at cmcc.it> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Sorry, some minutes after my last mail message, I noticed that >>>>>>>>>>> ?df -h? command hanged for a while before returns the prompt. >>>>>>>>>>> Yesterday, everything was ok in the gluster client log, but, >>>>>>>>>>> today, I see a lot of errors (please, take a look to the attached file). >>>>>>>>>>> >>>>>>>>>>> On the client node, I detected an important RAM e NETWORK usage. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Do you think that the errors have been caused by the client >>>>>>>>>>> resources usage? >>>>>>>>>>> >>>>>>>>>>> Thank you in advance, >>>>>>>>>>> Mauro >>>>>>>>>>> >>>>>>>>>>> >>>>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srakonde at redhat.com Tue Mar 12 05:16:07 2019 From: srakonde at redhat.com (Sanju Rakonde) Date: Tue, 12 Mar 2019 10:46:07 +0530 Subject: [Gluster-users] Gluster 4.1 install on AKS (Azure) In-Reply-To: References: Message-ID: Hi Maurya, Can you please share the glusterd.log with us? It will be stored under /var/log/glusterfs/ directory. Thanks, Sanju On Mon, Mar 11, 2019 at 4:09 PM Maurya M wrote: > Hi All, > I am trying to install gluster 4.1 on 3 nodes on my AKS cluster using > gluster-kubernetes project. > > As i followed the pre-setup instructions with opening the firewall, > providing raw block devices, install glusterfs-client / server & start the > gluster on the node: i am getting this error below: > > Mar 11 10:22:54 aks-agentpool-26682136-0 systemd[1]: Starting GlusterFS, a > clustered file-system server... > -- Subject: Unit glusterd.service has begun start-up > -- Defined-By: systemd > -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel > -- > -- Unit glusterd.service has begun starting up. > Mar 11 10:22:54 aks-agentpool-26682136-0 GlusterFS[13694]: > [glusterfsd.c:2150:parse_cmdline] 0-glusterfs: ERROR: parsing the volfile > failed [No such file or directory] Mar 11 10:22:54 aks-agentpool-26682136-0 > glusterd[13694]: USAGE: /usr/sbin/glusterd [options] [mountpoint] > Mar 11 10:22:54 aks-agentpool-26682136-0 systemd[1]: glusterd.service: > Control process exited, code=exited status=255 > Mar 11 10:22:54 aks-agentpool-26682136-0 systemd[1]: Failed to start > GlusterFS, a clustered file-system server. > -- Subject: Unit glusterd.service has failed > -- Defined-By: systemd > -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel > -- > -- Unit glusterd.service has failed. > -- > -- The result is failed. > Mar 11 10:22:54 aks-agentpool-26682136-0 systemd[1]: glusterd.service: > Unit entered failed state. > Mar 11 10:22:54 aks-agentpool-26682136-0 systemd[1]: glusterd.service: > Failed with result 'exit-code'. > > Any ideas how to troubleshoot this, tried remove / reconfigure / delete > the symlink , but yet i am not able to proceed with the install. > > Thanks in advance for your help & support. > > Appreciate it, > Maurya > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: From abhishpaliwal at gmail.com Tue Mar 12 05:28:46 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Tue, 12 Mar 2019 10:58:46 +0530 Subject: [Gluster-users] Glusterfsd crashed with SIGSEGV In-Reply-To: References: Message-ID: Hi Amar, Below are the requested logs pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libglusterfs.so not a dynamic executable pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libgfrpc.so not a dynamic executable root at 128:/# gdb /usr/sbin/glusterd core.1099 GNU gdb (GDB) 7.10.1 Copyright (C) 2015 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64-wrs-linux". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/sbin/glusterd...(no debugging symbols found)...done. [New LWP 1109] [New LWP 1101] [New LWP 1105] [New LWP 1110] [New LWP 1099] [New LWP 1107] [New LWP 1119] [New LWP 1103] [New LWP 1112] [New LWP 1116] [New LWP 1104] [New LWP 1239] [New LWP 1106] [New LWP 1111] [New LWP 1108] [New LWP 1117] [New LWP 1102] [New LWP 1118] [New LWP 1100] [New LWP 1114] [New LWP 1113] [New LWP 1115] warning: Could not load shared library symbols for linux-vdso64.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 --volfile-id gv0.128.224.95.140.tmp-bric'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, bytes=bytes at entry=36) at malloc.c:3327 3327 { [Current thread is 1 (Thread 0x3fffb1689160 (LWP 1109))] (gdb) bt full #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, bytes=bytes at entry=36) at malloc.c:3327 nb = idx = bin = victim = size = victim_index = remainder = remainder_size = block = bit = map = fwd = bck = errstr = 0x0 __func__ = "_int_malloc" #1 0x00003fffb76a93dc in __GI___libc_malloc (bytes=36) at malloc.c:2921 ar_ptr = 0x3fffa8000020 victim = hook = __func__ = "__libc_malloc" #2 0x00003fffb7764fd0 in x_inline (xdrs=0x3fffb1686d20, len=) at xdr_sizeof.c:89 len = 36 xdrs = 0x3fffb1686d20 #3 0x00003fffb7842488 in .xdr_gfx_iattx () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #4 0x00003fffb7842e84 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #5 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, pp=0x3fffa81099f0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa8109aa0 "\265\256\373\200\f\206\361j" stat = #6 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, objpp=0x3fffa81099f0, obj_size=, xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #7 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #8 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, pp=0x3fffa8109870, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa8109920 "\232\373\377\315\352\325\005\271" stat = #9 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, objpp=0x3fffa8109870, obj_size=, xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #10 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #11 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, pp=0x3fffa81096f0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa81097a0 "\241X\372!\216\256=\342" stat = ---Type to continue, or q to quit--- #12 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, objpp=0x3fffa81096f0, obj_size=, xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #13 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #14 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, pp=0x3fffa8109570, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa8109620 "\265\205\003Vu'\002L" stat = #15 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, objpp=0x3fffa8109570, obj_size=, xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #16 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #17 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, pp=0x3fffa81093f0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa81094a0 "\200L\027F'\177\366D" stat = #18 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, objpp=0x3fffa81093f0, obj_size=, xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #19 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #20 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, pp=0x3fffa8109270, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa8109320 "\217{dK(\001E\220" stat = #21 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, objpp=0x3fffa8109270, obj_size=, xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #22 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #23 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, pp=0x3fffa81090f0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa81091a0 "\217\275\067\336\232\300(\005" stat = #24 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, objpp=0x3fffa81090f0, obj_size=, xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #25 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #26 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, pp=0x3fffa8108f70, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa8109020 "\260.\025\b\244\352IT" stat = #27 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, objpp=0x3fffa8108f70, obj_size=, xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #28 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #29 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, pp=0x3fffa8108df0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa8108ea0 "\212GS\203l\035\n\\" ---Type to continue, or q to quit--- Regards, Abhishek On Mon, Mar 11, 2019 at 7:10 PM Amar Tumballi Suryanarayan < atumball at redhat.com> wrote: > Hi Abhishek, > > Can you check and get back to us? > > ``` > bash# ldd /usr/lib64/libglusterfs.so > bash# ldd /usr/lib64/libgfrpc.so > > ``` > > Also considering you have the core, can you do `(gdb) thr apply all bt > full` and pass it on? > > Thanks & Regards, > Amar > > On Mon, Mar 11, 2019 at 3:41 PM ABHISHEK PALIWAL > wrote: > >> Hi Team, >> >> COuld you please provide some pointer to debug it further. >> >> Regards, >> Abhishek >> >> On Fri, Mar 8, 2019 at 2:19 PM ABHISHEK PALIWAL >> wrote: >> >>> Hi Team, >>> >>> I am using Glusterfs 5.4, where after setting the gluster mount point >>> when trying to access it, glusterfsd is getting crashed and mount point >>> through the "Transport endpoint is not connected error. >>> >>> Here I are the gdb log for the core file >>> >>> warning: Could not load shared library symbols for linux-vdso64.so.1. >>> Do you need "set solib-search-path" or "set sysroot"? >>> [Thread debugging using libthread_db enabled] >>> Using host libthread_db library "/lib64/libthread_db.so.1". >>> Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 >>> --volfile-id gv0.128.224.95.140.tmp-bric'. >>> Program terminated with signal SIGSEGV, Segmentation fault. >>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>> bytes=bytes at entry=36) at malloc.c:3327 >>> 3327 { >>> [Current thread is 1 (Thread 0x3fff90394160 (LWP 811))] >>> (gdb) >>> (gdb) >>> (gdb) bt >>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>> bytes=bytes at entry=36) at malloc.c:3327 >>> #1 0x00003fff95ab43dc in __GI___libc_malloc (bytes=36) at malloc.c:2921 >>> #2 0x00003fff95b6ffd0 in x_inline (xdrs=0x3fff90391d20, len=>> out>) at xdr_sizeof.c:89 >>> #3 0x00003fff95c4d488 in .xdr_gfx_iattx () from /usr/lib64/libgfxdr.so.0 >>> #4 0x00003fff95c4de84 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> #5 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>> pp=0x3fff7c132020, size=, proc=) at >>> xdr_ref.c:84 >>> #6 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>> objpp=0x3fff7c132020, obj_size=, >>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> #7 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> #8 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>> pp=0x3fff7c131ea0, size=, proc=) at >>> xdr_ref.c:84 >>> #9 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>> objpp=0x3fff7c131ea0, obj_size=, >>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> #10 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> #11 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>> pp=0x3fff7c131d20, size=, proc=) at >>> xdr_ref.c:84 >>> #12 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>> objpp=0x3fff7c131d20, obj_size=, >>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> #13 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> #14 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>> pp=0x3fff7c131ba0, size=, proc=) at >>> xdr_ref.c:84 >>> #15 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>> objpp=0x3fff7c131ba0, obj_size=, >>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> #16 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> #17 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>> pp=0x3fff7c131a20, size=, proc=) at >>> xdr_ref.c:84 >>> #18 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>> objpp=0x3fff7c131a20, obj_size=, >>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> #19 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> #20 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>> pp=0x3fff7c1318a0, size=, proc=) at >>> xdr_ref.c:84 >>> #21 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>> objpp=0x3fff7c1318a0, obj_size=, >>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> #22 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> #23 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>> pp=0x3fff7c131720, size=, proc=) at >>> xdr_ref.c:84 >>> #24 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>> objpp=0x3fff7c131720, obj_size=, >>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> #25 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> #26 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>> pp=0x3fff7c1315a0, size=, proc=) at >>> xdr_ref.c:84 >>> #27 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>> objpp=0x3fff7c1315a0, obj_size=, >>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> #28 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> #29 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>> pp=0x3fff7c131420, size=, proc=) at >>> xdr_ref.c:84 >>> #30 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>> objpp=0x3fff7c131420, obj_size=, >>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> #31 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> #32 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>> pp=0x3fff7c1312a0, size=, proc=) at >>> xdr_ref.c:84 >>> #33 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>> objpp=0x3fff7c1312a0, obj_size=, >>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> >>> Frames are getting repeated, could any one please me. >>> -- >>> Regards >>> Abhishek Paliwal >>> >> >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Amar Tumballi (amarts) > -- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: From srakonde at redhat.com Tue Mar 12 05:37:09 2019 From: srakonde at redhat.com (Sanju Rakonde) Date: Tue, 12 Mar 2019 11:07:09 +0530 Subject: [Gluster-users] Gluster 4.1 install on AKS (Azure) In-Reply-To: References: Message-ID: On Tue, Mar 12, 2019 at 10:46 AM Sanju Rakonde wrote: > Hi Maurya, > > Can you please share the glusterd.log with us? It will be stored under > /var/log/glusterfs/ directory. > > Thanks, > Sanju > > On Mon, Mar 11, 2019 at 4:09 PM Maurya M wrote: > >> Hi All, >> I am trying to install gluster 4.1 on 3 nodes on my AKS cluster using >> gluster-kubernetes project. >> >> As i followed the pre-setup instructions with opening the firewall, >> providing raw block devices, install glusterfs-client / server & start the >> gluster on the node: i am getting this error below: >> >> Mar 11 10:22:54 aks-agentpool-26682136-0 systemd[1]: Starting GlusterFS, >> a clustered file-system server... >> -- Subject: Unit glusterd.service has begun start-up >> -- Defined-By: systemd >> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel >> -- >> -- Unit glusterd.service has begun starting up. >> Mar 11 10:22:54 aks-agentpool-26682136-0 GlusterFS[13694]: >> [glusterfsd.c:2150:parse_cmdline] 0-glusterfs: ERROR: parsing the volfile >> failed [No such file or directory] >> > The error message is saying that, volfile is not present. Can you check whether you have glusterd volfile? Search for the path of glusterd.vol using find command. and check for volfile in the path specified by the find command. You can use below command to find out the path of glusterd.vol find / -name glusterd.vol > Mar 11 10:22:54 aks-agentpool-26682136-0 glusterd[13694]: USAGE: >> /usr/sbin/glusterd [options] [mountpoint] >> Mar 11 10:22:54 aks-agentpool-26682136-0 systemd[1]: glusterd.service: >> Control process exited, code=exited status=255 >> Mar 11 10:22:54 aks-agentpool-26682136-0 systemd[1]: Failed to start >> GlusterFS, a clustered file-system server. >> -- Subject: Unit glusterd.service has failed >> -- Defined-By: systemd >> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel >> -- >> -- Unit glusterd.service has failed. >> -- >> -- The result is failed. >> Mar 11 10:22:54 aks-agentpool-26682136-0 systemd[1]: glusterd.service: >> Unit entered failed state. >> Mar 11 10:22:54 aks-agentpool-26682136-0 systemd[1]: glusterd.service: >> Failed with result 'exit-code'. >> >> Any ideas how to troubleshoot this, tried remove / reconfigure / delete >> the symlink , but yet i am not able to proceed with the install. >> >> Thanks in advance for your help & support. >> >> Appreciate it, >> Maurya >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Thanks, > Sanju > -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: From kontakt at taste-of-it.de Tue Mar 12 09:23:49 2019 From: kontakt at taste-of-it.de (Taste-Of-IT) Date: Tue, 12 Mar 2019 09:23:49 +0000 Subject: [Gluster-users] Removing Brick in Distributed GlusterFS Message-ID: <5409b2d08e789e3711cbda99900deb85083e6ff3@taste-of-it.de> Hi, i have a 3 Node Distributed Gluster. I have one Volume over all 3 Nodes / Bricks. I want to remove one Brick and run gluster volume remove-brick start. The Job completes and shows 11960 failures and only transfers 5TB out of 15TB Data. I have still files and folders on this volume on the brick to remove. I actually didnt run the final command with "commit". Both other Nodes have each over 6TB of free Space, so it can hold the remaininge Data from Brick3 theoretically. Need help. thx Taste From spalai at redhat.com Tue Mar 12 09:49:13 2019 From: spalai at redhat.com (Susant Palai) Date: Tue, 12 Mar 2019 15:19:13 +0530 Subject: [Gluster-users] Removing Brick in Distributed GlusterFS In-Reply-To: <5409b2d08e789e3711cbda99900deb85083e6ff3@taste-of-it.de> References: <5409b2d08e789e3711cbda99900deb85083e6ff3@taste-of-it.de> Message-ID: Would it be possible for you to pass the rebalance log file on the node from which you want to remove the brick? (location : /var/log/glusterfs/) + the following information: 1 - gluster volume info 2 - gluster volume status 2 - df -h output on all 3 nodes Susant On Tue, Mar 12, 2019 at 3:08 PM Taste-Of-IT wrote: > Hi, > i have a 3 Node Distributed Gluster. I have one Volume over all 3 Nodes / > Bricks. I want to remove one Brick and run gluster volume remove-brick > start. The Job completes and shows 11960 failures and > only transfers 5TB out of 15TB Data. I have still files and folders on this > volume on the brick to remove. I actually didnt run the final command with > "commit". Both other Nodes have each over 6TB of free Space, so it can hold > the remaininge Data from Brick3 theoretically. > > Need help. > thx > Taste > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kontakt at taste-of-it.de Tue Mar 12 11:45:51 2019 From: kontakt at taste-of-it.de (Taste-Of-IT) Date: Tue, 12 Mar 2019 11:45:51 +0000 Subject: [Gluster-users] Removing Brick in Distributed GlusterFS In-Reply-To: References: <5409b2d08e789e3711cbda99900deb85083e6ff3@taste-of-it.de> Message-ID: Hi Susant, and thanks for your fast reply and pointing me to that log. So i was able to find the problem: "dht-rebalance.c:1052:__dht_check_free_space] 0-vol4-dht: Could not find any subvol with space accomodating the file" But Volume Detail and df -h show xTB of free Disk Space and also Free Inodes. Options Reconfigured: performance.client-io-threads: on storage.reserve: 0 performance.parallel-readdir: off performance.readdir-ahead: off auth.allow: 192.168.0.* nfs.disable: off transport.address-family: inet Ok since there is enough disk space on other Bricks and i actually didnt complete brick-remove, can i rerun brick-remove to rebalance last Files and Folders? Thanks Taste Am 12.03.2019 10:49:13, schrieb Susant Palai: > Would it be possible for you to pass the rebalance log file on the node from which you want to remove the brick? (location : /var/log/glusterfs/) > > + the following information: > ?1 - gluster volume info? > > ?2 - gluster volume status > > ?2 - df -h output on all 3 nodes > > Susant > > On Tue, Mar 12, 2019 at 3:08 PM Taste-Of-IT <> kontakt at taste-of-it.de> > wrote: > > Hi, > > i have a 3 Node Distributed Gluster. I have one Volume over all 3 Nodes / Bricks.? I want to remove one Brick and run gluster volume remove-brick start. The Job completes and shows 11960 failures and only transfers 5TB out of 15TB Data. I have still files and folders on this volume on the brick to remove. I actually didnt run the final command? with "commit". Both other Nodes have each over 6TB of free Space, so it can hold the remaininge Data from Brick3 theoretically. > > > > Need help. > > thx > > Taste > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kontakt at taste-of-it.de Tue Mar 12 15:18:19 2019 From: kontakt at taste-of-it.de (Taste-Of-IT) Date: Tue, 12 Mar 2019 15:18:19 +0000 Subject: [Gluster-users] Removing Brick in Distributed GlusterFS In-Reply-To: References: <5409b2d08e789e3711cbda99900deb85083e6ff3@taste-of-it.de> Message-ID: <2101a0f0763cc87ac55a91bff11cf350527ce48b@taste-of-it.de> Hi, i found a Bug about this in Version 3.10. I run 3.13.2 - for your Information. As far as i can see, the default of 1% rule is active and not configure 0 = for disable storage.reserve. So what can i do? Finish remove brick? Upgrade to newer Version and rerun rebalance? thx Taste Am 12.03.2019 12:45:51, schrieb Taste-Of-IT: > Hi Susant, > > and thanks for your fast reply and pointing me to that log. So i was able to find the problem: "dht-rebalance.c:1052:__dht_check_free_space] 0-vol4-dht: Could not find any subvol with space accomodating the file" > > But Volume Detail and df -h show xTB of free Disk Space and also Free Inodes. > Options Reconfigured: > performance.client-io-threads: on > storage.reserve: 0 > performance.parallel-readdir: off > performance.readdir-ahead: off > auth.allow: 192.168.0.* > nfs.disable: off > transport.address-family: inet > Ok since there is enough disk space on other Bricks and i actually didnt complete brick-remove, can i rerun brick-remove to rebalance last Files and Folders? > > Thanks > > Taste > Am 12.03.2019 10:49:13, schrieb Susant Palai: > > Would it be possible for you to pass the rebalance log file on the node from which you want to remove the brick? (location : /var/log/glusterfs/) > > > > + the following information: > > ?1 - gluster volume info? > > > > ?2 - gluster volume status > > > > ?2 - df -h output on all 3 nodes > > > > Susant > > > > On Tue, Mar 12, 2019 at 3:08 PM Taste-Of-IT <> > kontakt at taste-of-it.de> > > wrote: > > > Hi, > > > i have a 3 Node Distributed Gluster. I have one Volume over all 3 Nodes / Bricks.? I want to remove one Brick and run gluster volume remove-brick start. The Job completes and shows 11960 failures and only transfers 5TB out of 15TB Data. I have still files and folders on this volume on the brick to remove. I actually didnt run the final command? with "commit". Both other Nodes have each over 6TB of free Space, so it can hold the remaininge Data from Brick3 theoretically. > > > > > > Need help. > > > thx > > > Taste > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From archon810 at gmail.com Tue Mar 12 17:28:58 2019 From: archon810 at gmail.com (Artem Russakovskii) Date: Tue, 12 Mar 2019 10:28:58 -0700 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: Message-ID: Hi Amar, Any updates on this? I'm still not seeing it in OpenSUSE build repos. Maybe later today? Thanks. Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | +ArtemRussakovskii | @ArtemR On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan < atumball at redhat.com> wrote: > We are talking days. Not weeks. Considering already it is Thursday here. 1 > more day for tagging, and packaging. May be ok to expect it on Monday. > > -Amar > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii > wrote: > >> Is the next release going to be an imminent hotfix, i.e. something like >> today/tomorrow, or are we talking weeks? >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police , APK Mirror >> , Illogical Robot LLC >> beerpla.net | +ArtemRussakovskii >> | @ArtemR >> >> >> >> On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii >> wrote: >> >>> Ended up downgrading to 5.3 just in case. Peer status and volume status >>> are OK now. >>> >>> zypper install --oldpackage glusterfs-5.3-lp150.100.1 >>> Loading repository data... >>> Reading installed packages... >>> Resolving package dependencies... >>> >>> Problem: glusterfs-5.3-lp150.100.1.x86_64 requires libgfapi0 = 5.3, but >>> this requirement cannot be provided >>> not installable providers: libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] >>> Solution 1: Following actions will be done: >>> downgrade of libgfapi0-5.4-lp150.100.1.x86_64 to >>> libgfapi0-5.3-lp150.100.1.x86_64 >>> downgrade of libgfchangelog0-5.4-lp150.100.1.x86_64 to >>> libgfchangelog0-5.3-lp150.100.1.x86_64 >>> downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 to >>> libgfrpc0-5.3-lp150.100.1.x86_64 >>> downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 to >>> libgfxdr0-5.3-lp150.100.1.x86_64 >>> downgrade of libglusterfs0-5.4-lp150.100.1.x86_64 to >>> libglusterfs0-5.3-lp150.100.1.x86_64 >>> Solution 2: do not install glusterfs-5.3-lp150.100.1.x86_64 >>> Solution 3: break glusterfs-5.3-lp150.100.1.x86_64 by ignoring some of >>> its dependencies >>> >>> Choose from above solutions by number or cancel [1/2/3/c] (c): 1 >>> Resolving dependencies... >>> Resolving package dependencies... >>> >>> The following 6 packages are going to be downgraded: >>> glusterfs libgfapi0 libgfchangelog0 libgfrpc0 libgfxdr0 libglusterfs0 >>> >>> 6 packages to downgrade. >>> >>> Sincerely, >>> Artem >>> >>> -- >>> Founder, Android Police , APK Mirror >>> , Illogical Robot LLC >>> beerpla.net | +ArtemRussakovskii >>> | @ArtemR >>> >>> >>> >>> On Tue, Mar 5, 2019 at 10:57 AM Artem Russakovskii >>> wrote: >>> >>>> Noticed the same when upgrading from 5.3 to 5.4, as mentioned. >>>> >>>> I'm confused though. Is actual replication affected, because the 5.4 >>>> server and the 3x 5.3 servers still show heal info as all 4 connected, and >>>> the files seem to be replicating correctly as well. >>>> >>>> So what's actually affected - just the status command, or leaving 5.4 >>>> on one of the nodes is doing some damage to the underlying fs? Is it >>>> fixable by tweaking transport.socket.ssl-enabled? Does upgrading all >>>> servers to 5.4 resolve it, or should we revert back to 5.3? >>>> >>>> Sincerely, >>>> Artem >>>> >>>> -- >>>> Founder, Android Police , APK Mirror >>>> , Illogical Robot LLC >>>> beerpla.net | +ArtemRussakovskii >>>> | @ArtemR >>>> >>>> >>>> >>>> On Tue, Mar 5, 2019 at 2:02 AM Hu Bert wrote: >>>> >>>>> fyi: did a downgrade 5.4 -> 5.3 and it worked. all replicas are up and >>>>> running. Awaiting updated v5.4. >>>>> >>>>> thx :-) >>>>> >>>>> Am Di., 5. M?rz 2019 um 09:26 Uhr schrieb Hari Gowtham < >>>>> hgowtham at redhat.com>: >>>>> > >>>>> > There are plans to revert the patch causing this error and rebuilt >>>>> 5.4. >>>>> > This should happen faster. the rebuilt 5.4 should be void of this >>>>> upgrade issue. >>>>> > >>>>> > In the meantime, you can use 5.3 for this cluster. >>>>> > Downgrading to 5.3 will work if it was just one node that was >>>>> upgrade to 5.4 >>>>> > and the other nodes are still in 5.3. >>>>> > >>>>> > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert >>>>> wrote: >>>>> > > >>>>> > > Hi Hari, >>>>> > > >>>>> > > thx for the hint. Do you know when this will be fixed? Is a >>>>> downgrade >>>>> > > 5.4 -> 5.3 a possibility to fix this? >>>>> > > >>>>> > > Hubert >>>>> > > >>>>> > > Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb Hari Gowtham < >>>>> hgowtham at redhat.com>: >>>>> > > > >>>>> > > > Hi, >>>>> > > > >>>>> > > > This is a known issue we are working on. >>>>> > > > As the checksum differs between the updated and non updated >>>>> node, the >>>>> > > > peers are getting rejected. >>>>> > > > The bricks aren't coming because of the same issue. >>>>> > > > >>>>> > > > More about the issue: >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1685120 >>>>> > > > >>>>> > > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert >>>>> wrote: >>>>> > > > > >>>>> > > > > Interestingly: gluster volume status misses gluster1, while >>>>> heal >>>>> > > > > statistics show gluster1: >>>>> > > > > >>>>> > > > > gluster volume status workdata >>>>> > > > > Status of volume: workdata >>>>> > > > > Gluster process TCP Port RDMA >>>>> Port Online Pid >>>>> > > > > >>>>> ------------------------------------------------------------------------------ >>>>> > > > > Brick gluster2:/gluster/md4/workdata 49153 0 >>>>> Y 1723 >>>>> > > > > Brick gluster3:/gluster/md4/workdata 49153 0 >>>>> Y 2068 >>>>> > > > > Self-heal Daemon on localhost N/A N/A >>>>> Y 1732 >>>>> > > > > Self-heal Daemon on gluster3 N/A N/A >>>>> Y 2077 >>>>> > > > > >>>>> > > > > vs. >>>>> > > > > >>>>> > > > > gluster volume heal workdata statistics heal-count >>>>> > > > > Gathering count of entries to be healed on volume workdata has >>>>> been successful >>>>> > > > > >>>>> > > > > Brick gluster1:/gluster/md4/workdata >>>>> > > > > Number of entries: 0 >>>>> > > > > >>>>> > > > > Brick gluster2:/gluster/md4/workdata >>>>> > > > > Number of entries: 10745 >>>>> > > > > >>>>> > > > > Brick gluster3:/gluster/md4/workdata >>>>> > > > > Number of entries: 10744 >>>>> > > > > >>>>> > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert < >>>>> revirii at googlemail.com>: >>>>> > > > > > >>>>> > > > > > Hi Miling, >>>>> > > > > > >>>>> > > > > > well, there are such entries, but those haven't been a >>>>> problem during >>>>> > > > > > install and the last kernel update+reboot. The entries look >>>>> like: >>>>> > > > > > >>>>> > > > > > PUBLIC_IP gluster2.alpserver.de gluster2 >>>>> > > > > > >>>>> > > > > > 192.168.0.50 gluster1 >>>>> > > > > > 192.168.0.51 gluster2 >>>>> > > > > > 192.168.0.52 gluster3 >>>>> > > > > > >>>>> > > > > > 'ping gluster2' resolves to LAN IP; I removed the last entry >>>>> in the >>>>> > > > > > 1st line, did a reboot ... no, didn't help. From >>>>> > > > > > /var/log/glusterfs/glusterd.log >>>>> > > > > > on gluster 2: >>>>> > > > > > >>>>> > > > > > [2019-03-05 07:04:36.188128] E [MSGID: 106010] >>>>> > > > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] >>>>> 0-management: >>>>> > > > > > Version of Cksums persistent differ. local cksum = >>>>> 3950307018, remote >>>>> > > > > > cksum = 455409345 on peer gluster1 >>>>> > > > > > [2019-03-05 07:04:36.188314] I [MSGID: 106493] >>>>> > > > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] >>>>> 0-glusterd: >>>>> > > > > > Responded to gluster1 (0), ret: 0, op_ret: -1 >>>>> > > > > > >>>>> > > > > > Interestingly there are no entries in the brick logs of the >>>>> rejected >>>>> > > > > > server. Well, not surprising as no brick process is running. >>>>> The >>>>> > > > > > server gluster1 is still in rejected state. >>>>> > > > > > >>>>> > > > > > 'gluster volume start workdata force' starts the brick >>>>> process on >>>>> > > > > > gluster1, and some heals are happening on gluster2+3, but >>>>> via 'gluster >>>>> > > > > > volume status workdata' the volumes still aren't complete. >>>>> > > > > > >>>>> > > > > > gluster1: >>>>> > > > > > >>>>> ------------------------------------------------------------------------------ >>>>> > > > > > Brick gluster1:/gluster/md4/workdata 49152 0 >>>>> Y 2523 >>>>> > > > > > Self-heal Daemon on localhost N/A N/A >>>>> Y 2549 >>>>> > > > > > >>>>> > > > > > gluster2: >>>>> > > > > > Gluster process TCP Port RDMA >>>>> Port Online Pid >>>>> > > > > > >>>>> ------------------------------------------------------------------------------ >>>>> > > > > > Brick gluster2:/gluster/md4/workdata 49153 0 >>>>> Y 1723 >>>>> > > > > > Brick gluster3:/gluster/md4/workdata 49153 0 >>>>> Y 2068 >>>>> > > > > > Self-heal Daemon on localhost N/A N/A >>>>> Y 1732 >>>>> > > > > > Self-heal Daemon on gluster3 N/A N/A >>>>> Y 2077 >>>>> > > > > > >>>>> > > > > > >>>>> > > > > > Hubert >>>>> > > > > > >>>>> > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire < >>>>> mchangir at redhat.com>: >>>>> > > > > > > >>>>> > > > > > > There are probably DNS entries or /etc/hosts entries with >>>>> the public IP Addresses that the host names (gluster1, gluster2, gluster3) >>>>> are getting resolved to. >>>>> > > > > > > /etc/resolv.conf would tell which is the default domain >>>>> searched for the node names and the DNS servers which respond to the >>>>> queries. >>>>> > > > > > > >>>>> > > > > > > >>>>> > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert < >>>>> revirii at googlemail.com> wrote: >>>>> > > > > > >> >>>>> > > > > > >> Good morning, >>>>> > > > > > >> >>>>> > > > > > >> i have a replicate 3 setup with 2 volumes, running on >>>>> version 5.3 on >>>>> > > > > > >> debian stretch. This morning i upgraded one server to >>>>> version 5.4 and >>>>> > > > > > >> rebooted the machine; after the restart i noticed that: >>>>> > > > > > >> >>>>> > > > > > >> - no brick process is running >>>>> > > > > > >> - gluster volume status only shows the server itself: >>>>> > > > > > >> gluster volume status workdata >>>>> > > > > > >> Status of volume: workdata >>>>> > > > > > >> Gluster process TCP Port >>>>> RDMA Port Online Pid >>>>> > > > > > >> >>>>> ------------------------------------------------------------------------------ >>>>> > > > > > >> Brick gluster1:/gluster/md4/workdata N/A >>>>> N/A N N/A >>>>> > > > > > >> NFS Server on localhost N/A >>>>> N/A N N/A >>>>> > > > > > >> >>>>> > > > > > >> - gluster peer status on the server >>>>> > > > > > >> gluster peer status >>>>> > > > > > >> Number of Peers: 2 >>>>> > > > > > >> >>>>> > > > > > >> Hostname: gluster3 >>>>> > > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a >>>>> > > > > > >> State: Peer Rejected (Connected) >>>>> > > > > > >> >>>>> > > > > > >> Hostname: gluster2 >>>>> > > > > > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 >>>>> > > > > > >> State: Peer Rejected (Connected) >>>>> > > > > > >> >>>>> > > > > > >> - gluster peer status on the other 2 servers: >>>>> > > > > > >> gluster peer status >>>>> > > > > > >> Number of Peers: 2 >>>>> > > > > > >> >>>>> > > > > > >> Hostname: gluster1 >>>>> > > > > > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef >>>>> > > > > > >> State: Peer Rejected (Connected) >>>>> > > > > > >> >>>>> > > > > > >> Hostname: gluster3 >>>>> > > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a >>>>> > > > > > >> State: Peer in Cluster (Connected) >>>>> > > > > > >> >>>>> > > > > > >> I noticed that, in the brick logs, i see that the public >>>>> IP is used >>>>> > > > > > >> instead of the LAN IP. brick logs from one of the volumes: >>>>> > > > > > >> >>>>> > > > > > >> rejected node: https://pastebin.com/qkpj10Sd >>>>> > > > > > >> connected nodes: https://pastebin.com/8SxVVYFV >>>>> > > > > > >> >>>>> > > > > > >> Why is the public IP suddenly used instead of the LAN IP? >>>>> Killing all >>>>> > > > > > >> gluster processes and rebooting (again) didn't help. >>>>> > > > > > >> >>>>> > > > > > >> >>>>> > > > > > >> Thx, >>>>> > > > > > >> Hubert >>>>> > > > > > >> _______________________________________________ >>>>> > > > > > >> Gluster-users mailing list >>>>> > > > > > >> Gluster-users at gluster.org >>>>> > > > > > >> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>>>> > > > > > > -- >>>>> > > > > > > Milind >>>>> > > > > > > >>>>> > > > > _______________________________________________ >>>>> > > > > Gluster-users mailing list >>>>> > > > > Gluster-users at gluster.org >>>>> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > -- >>>>> > > > Regards, >>>>> > > > Hari Gowtham. >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > Regards, >>>>> > Hari Gowtham. >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Amar Tumballi (amarts) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srangana at redhat.com Wed Mar 13 02:25:08 2019 From: srangana at redhat.com (Shyam Ranganathan) Date: Tue, 12 Mar 2019 22:25:08 -0400 Subject: [Gluster-users] [Gluster-Maintainers] Release 6: Release date update In-Reply-To: <8c0c5f02-3d31-526a-9c6e-e8221e23cccd@redhat.com> References: <8c0c5f02-3d31-526a-9c6e-e8221e23cccd@redhat.com> Message-ID: On 3/5/19 1:17 PM, Shyam Ranganathan wrote: > Hi, > > Release-6 was to be an early March release, and due to finding bugs > while performing upgrade testing, is now expected in the week of 18th > March, 2019. > > RC1 builds are expected this week, to contain the required fixes, next > week would be testing our RC1 for release fitness before the release. RC1 is tagged, and will mostly be packaged for testing by tomorrow. Expect package details in a day or two, to aid with testing the release. > > As always, request that users test the RC builds and report back issues > they encounter, to help make the release a better quality. > > Shyam > _______________________________________________ > maintainers mailing list > maintainers at gluster.org > https://lists.gluster.org/mailman/listinfo/maintainers > From sankarshan.mukhopadhyay at gmail.com Wed Mar 13 02:37:35 2019 From: sankarshan.mukhopadhyay at gmail.com (Sankarshan Mukhopadhyay) Date: Wed, 13 Mar 2019 08:07:35 +0530 Subject: [Gluster-users] [Gluster-Maintainers] Release 6: Release date update In-Reply-To: References: <8c0c5f02-3d31-526a-9c6e-e8221e23cccd@redhat.com> Message-ID: On Wed, Mar 13, 2019 at 7:55 AM Shyam Ranganathan wrote: > > On 3/5/19 1:17 PM, Shyam Ranganathan wrote: > > Hi, > > > > Release-6 was to be an early March release, and due to finding bugs > > while performing upgrade testing, is now expected in the week of 18th > > March, 2019. > > > > RC1 builds are expected this week, to contain the required fixes, next > > week would be testing our RC1 for release fitness before the release. > > RC1 is tagged, and will mostly be packaged for testing by tomorrow. > > Expect package details in a day or two, to aid with testing the release. Would be worth posting it out via Twitter as well. Do we plan to provide any specific guidance on testing particular areas/flows? For example, upgrade tests with some of the combinations - I recollect Amar had published a spreadsheet of items - should we continue with those? From abhishpaliwal at gmail.com Wed Mar 13 04:00:08 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Wed, 13 Mar 2019 09:30:08 +0530 Subject: [Gluster-users] Glusterfsd crashed with SIGSEGV In-Reply-To: References: Message-ID: Hi Amar, did you get time to check the logs? Regards, Abhishek On Tue, Mar 12, 2019 at 10:58 AM ABHISHEK PALIWAL wrote: > Hi Amar, > > Below are the requested logs > > pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libglusterfs.so > not a dynamic executable > > pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libgfrpc.so > not a dynamic executable > > root at 128:/# gdb /usr/sbin/glusterd core.1099 > GNU gdb (GDB) 7.10.1 > Copyright (C) 2015 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "powerpc64-wrs-linux". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > . > Find the GDB manual and other documentation resources online at: > . > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from /usr/sbin/glusterd...(no debugging symbols > found)...done. > [New LWP 1109] > [New LWP 1101] > [New LWP 1105] > [New LWP 1110] > [New LWP 1099] > [New LWP 1107] > [New LWP 1119] > [New LWP 1103] > [New LWP 1112] > [New LWP 1116] > [New LWP 1104] > [New LWP 1239] > [New LWP 1106] > [New LWP 1111] > [New LWP 1108] > [New LWP 1117] > [New LWP 1102] > [New LWP 1118] > [New LWP 1100] > [New LWP 1114] > [New LWP 1113] > [New LWP 1115] > > warning: Could not load shared library symbols for linux-vdso64.so.1. > Do you need "set solib-search-path" or "set sysroot"? > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 --volfile-id > gv0.128.224.95.140.tmp-bric'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, > bytes=bytes at entry=36) at malloc.c:3327 > 3327 { > [Current thread is 1 (Thread 0x3fffb1689160 (LWP 1109))] > (gdb) bt full > #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, > bytes=bytes at entry=36) at malloc.c:3327 > nb = > idx = > bin = > victim = > size = > victim_index = > remainder = > remainder_size = > block = > bit = > map = > fwd = > bck = > errstr = 0x0 > __func__ = "_int_malloc" > #1 0x00003fffb76a93dc in __GI___libc_malloc (bytes=36) at malloc.c:2921 > ar_ptr = 0x3fffa8000020 > victim = > hook = > __func__ = "__libc_malloc" > #2 0x00003fffb7764fd0 in x_inline (xdrs=0x3fffb1686d20, len= out>) at xdr_sizeof.c:89 > len = 36 > xdrs = 0x3fffb1686d20 > #3 0x00003fffb7842488 in .xdr_gfx_iattx () from /usr/lib64/libgfxdr.so.0 > No symbol table info available. > #4 0x00003fffb7842e84 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > No symbol table info available. > #5 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, > pp=0x3fffa81099f0, size=, proc=) at > xdr_ref.c:84 > loc = 0x3fffa8109aa0 "\265\256\373\200\f\206\361j" > stat = > #6 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, > objpp=0x3fffa81099f0, obj_size=, > xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > more_data = 1 > #7 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > No symbol table info available. > #8 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, > pp=0x3fffa8109870, size=, proc=) at > xdr_ref.c:84 > loc = 0x3fffa8109920 "\232\373\377\315\352\325\005\271" > stat = > #9 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, > objpp=0x3fffa8109870, obj_size=, > xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > more_data = 1 > #10 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > No symbol table info available. > #11 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, > pp=0x3fffa81096f0, size=, proc=) at > xdr_ref.c:84 > loc = 0x3fffa81097a0 "\241X\372!\216\256=\342" > stat = > ---Type to continue, or q to quit--- > #12 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, > objpp=0x3fffa81096f0, obj_size=, > xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > more_data = 1 > #13 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > No symbol table info available. > #14 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, > pp=0x3fffa8109570, size=, proc=) at > xdr_ref.c:84 > loc = 0x3fffa8109620 "\265\205\003Vu'\002L" > stat = > #15 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, > objpp=0x3fffa8109570, obj_size=, > xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > more_data = 1 > #16 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > No symbol table info available. > #17 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, > pp=0x3fffa81093f0, size=, proc=) at > xdr_ref.c:84 > loc = 0x3fffa81094a0 "\200L\027F'\177\366D" > stat = > #18 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, > objpp=0x3fffa81093f0, obj_size=, > xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > more_data = 1 > #19 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > No symbol table info available. > #20 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, > pp=0x3fffa8109270, size=, proc=) at > xdr_ref.c:84 > loc = 0x3fffa8109320 "\217{dK(\001E\220" > stat = > #21 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, > objpp=0x3fffa8109270, obj_size=, > xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > more_data = 1 > #22 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > No symbol table info available. > #23 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, > pp=0x3fffa81090f0, size=, proc=) at > xdr_ref.c:84 > loc = 0x3fffa81091a0 "\217\275\067\336\232\300(\005" > stat = > #24 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, > objpp=0x3fffa81090f0, obj_size=, > xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > more_data = 1 > #25 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > No symbol table info available. > #26 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, > pp=0x3fffa8108f70, size=, proc=) at > xdr_ref.c:84 > loc = 0x3fffa8109020 "\260.\025\b\244\352IT" > stat = > #27 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, > objpp=0x3fffa8108f70, obj_size=, > xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at > xdr_ref.c:135 > more_data = 1 > #28 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from > /usr/lib64/libgfxdr.so.0 > No symbol table info available. > #29 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, > pp=0x3fffa8108df0, size=, proc=) at > xdr_ref.c:84 > loc = 0x3fffa8108ea0 "\212GS\203l\035\n\\" > ---Type to continue, or q to quit--- > > > Regards, > Abhishek > > On Mon, Mar 11, 2019 at 7:10 PM Amar Tumballi Suryanarayan < > atumball at redhat.com> wrote: > >> Hi Abhishek, >> >> Can you check and get back to us? >> >> ``` >> bash# ldd /usr/lib64/libglusterfs.so >> bash# ldd /usr/lib64/libgfrpc.so >> >> ``` >> >> Also considering you have the core, can you do `(gdb) thr apply all bt >> full` and pass it on? >> >> Thanks & Regards, >> Amar >> >> On Mon, Mar 11, 2019 at 3:41 PM ABHISHEK PALIWAL >> wrote: >> >>> Hi Team, >>> >>> COuld you please provide some pointer to debug it further. >>> >>> Regards, >>> Abhishek >>> >>> On Fri, Mar 8, 2019 at 2:19 PM ABHISHEK PALIWAL >>> wrote: >>> >>>> Hi Team, >>>> >>>> I am using Glusterfs 5.4, where after setting the gluster mount point >>>> when trying to access it, glusterfsd is getting crashed and mount point >>>> through the "Transport endpoint is not connected error. >>>> >>>> Here I are the gdb log for the core file >>>> >>>> warning: Could not load shared library symbols for linux-vdso64.so.1. >>>> Do you need "set solib-search-path" or "set sysroot"? >>>> [Thread debugging using libthread_db enabled] >>>> Using host libthread_db library "/lib64/libthread_db.so.1". >>>> Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 >>>> --volfile-id gv0.128.224.95.140.tmp-bric'. >>>> Program terminated with signal SIGSEGV, Segmentation fault. >>>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>>> bytes=bytes at entry=36) at malloc.c:3327 >>>> 3327 { >>>> [Current thread is 1 (Thread 0x3fff90394160 (LWP 811))] >>>> (gdb) >>>> (gdb) >>>> (gdb) bt >>>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>>> bytes=bytes at entry=36) at malloc.c:3327 >>>> #1 0x00003fff95ab43dc in __GI___libc_malloc (bytes=36) at malloc.c:2921 >>>> #2 0x00003fff95b6ffd0 in x_inline (xdrs=0x3fff90391d20, len=>>> out>) at xdr_sizeof.c:89 >>>> #3 0x00003fff95c4d488 in .xdr_gfx_iattx () from >>>> /usr/lib64/libgfxdr.so.0 >>>> #4 0x00003fff95c4de84 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> #5 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>> pp=0x3fff7c132020, size=, proc=) at >>>> xdr_ref.c:84 >>>> #6 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>> objpp=0x3fff7c132020, obj_size=, >>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> #7 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> #8 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>> pp=0x3fff7c131ea0, size=, proc=) at >>>> xdr_ref.c:84 >>>> #9 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>> objpp=0x3fff7c131ea0, obj_size=, >>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> #10 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> #11 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>> pp=0x3fff7c131d20, size=, proc=) at >>>> xdr_ref.c:84 >>>> #12 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>> objpp=0x3fff7c131d20, obj_size=, >>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> #13 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> #14 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>> pp=0x3fff7c131ba0, size=, proc=) at >>>> xdr_ref.c:84 >>>> #15 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>> objpp=0x3fff7c131ba0, obj_size=, >>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> #16 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> #17 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>> pp=0x3fff7c131a20, size=, proc=) at >>>> xdr_ref.c:84 >>>> #18 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>> objpp=0x3fff7c131a20, obj_size=, >>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> #19 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> #20 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>> pp=0x3fff7c1318a0, size=, proc=) at >>>> xdr_ref.c:84 >>>> #21 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>> objpp=0x3fff7c1318a0, obj_size=, >>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> #22 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> #23 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>> pp=0x3fff7c131720, size=, proc=) at >>>> xdr_ref.c:84 >>>> #24 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>> objpp=0x3fff7c131720, obj_size=, >>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> #25 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> #26 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>> pp=0x3fff7c1315a0, size=, proc=) at >>>> xdr_ref.c:84 >>>> #27 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>> objpp=0x3fff7c1315a0, obj_size=, >>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> #28 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> #29 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>> pp=0x3fff7c131420, size=, proc=) at >>>> xdr_ref.c:84 >>>> #30 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>> objpp=0x3fff7c131420, obj_size=, >>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> #31 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> #32 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>> pp=0x3fff7c1312a0, size=, proc=) at >>>> xdr_ref.c:84 >>>> #33 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>> objpp=0x3fff7c1312a0, obj_size=, >>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> >>>> Frames are getting repeated, could any one please me. >>>> -- >>>> Regards >>>> Abhishek Paliwal >>>> >>> >>> >>> -- >>> >>> >>> >>> >>> Regards >>> Abhishek Paliwal >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> Amar Tumballi (amarts) >> > > > -- > > > > > Regards > Abhishek Paliwal > -- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed Mar 13 04:20:00 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 13 Mar 2019 09:50:00 +0530 Subject: [Gluster-users] Glusterfsd crashed with SIGSEGV In-Reply-To: References: Message-ID: Hi Abhishek, Few more questions, > On Tue, Mar 12, 2019 at 10:58 AM ABHISHEK PALIWAL > wrote: > >> Hi Amar, >> >> Below are the requested logs >> >> pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libglusterfs.so >> not a dynamic executable >> >> pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libgfrpc.so >> not a dynamic executable >> >> Can you please add a * at the end, so it gets the linked library list from the actual files (ideally this is a symlink, but I expected it to resolve like in Fedora). > root at 128:/# gdb /usr/sbin/glusterd core.1099 >> GNU gdb (GDB) 7.10.1 >> Copyright (C) 2015 Free Software Foundation, Inc. >> License GPLv3+: GNU GPL version 3 or later < >> http://gnu.org/licenses/gpl.html> >> This is free software: you are free to change and redistribute it. >> There is NO WARRANTY, to the extent permitted by law. Type "show copying" >> and "show warranty" for details. >> This GDB was configured as "powerpc64-wrs-linux". >> Type "show configuration" for configuration details. >> For bug reporting instructions, please see: >> . >> Find the GDB manual and other documentation resources online at: >> . >> For help, type "help". >> Type "apropos word" to search for commands related to "word"... >> Reading symbols from /usr/sbin/glusterd...(no debugging symbols >> found)...done. >> [New LWP 1109] >> [New LWP 1101] >> [New LWP 1105] >> [New LWP 1110] >> [New LWP 1099] >> [New LWP 1107] >> [New LWP 1119] >> [New LWP 1103] >> [New LWP 1112] >> [New LWP 1116] >> [New LWP 1104] >> [New LWP 1239] >> [New LWP 1106] >> [New LWP 1111] >> [New LWP 1108] >> [New LWP 1117] >> [New LWP 1102] >> [New LWP 1118] >> [New LWP 1100] >> [New LWP 1114] >> [New LWP 1113] >> [New LWP 1115] >> >> warning: Could not load shared library symbols for linux-vdso64.so.1. >> Do you need "set solib-search-path" or "set sysroot"? >> [Thread debugging using libthread_db enabled] >> Using host libthread_db library "/lib64/libthread_db.so.1". >> Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 >> --volfile-id gv0.128.224.95.140.tmp-bric'. >> Program terminated with signal SIGSEGV, Segmentation fault. >> #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, >> bytes=bytes at entry=36) at malloc.c:3327 >> 3327 { >> [Current thread is 1 (Thread 0x3fffb1689160 (LWP 1109))] >> (gdb) bt full >> > This is backtrace of one particular thread. I need output of command (gdb) thread apply all bt full Also, considering this is a crash in the malloc library call itself, would like to know the details of OS, Kernel version and gcc versions. Regards, Amar #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, >> bytes=bytes at entry=36) at malloc.c:3327 >> nb = >> idx = >> bin = >> victim = >> size = >> victim_index = >> remainder = >> remainder_size = >> block = >> bit = >> map = >> fwd = >> bck = >> errstr = 0x0 >> __func__ = "_int_malloc" >> #1 0x00003fffb76a93dc in __GI___libc_malloc (bytes=36) at malloc.c:2921 >> ar_ptr = 0x3fffa8000020 >> victim = >> hook = >> __func__ = "__libc_malloc" >> #2 0x00003fffb7764fd0 in x_inline (xdrs=0x3fffb1686d20, len=> out>) at xdr_sizeof.c:89 >> len = 36 >> xdrs = 0x3fffb1686d20 >> #3 0x00003fffb7842488 in .xdr_gfx_iattx () from /usr/lib64/libgfxdr.so.0 >> No symbol table info available. >> #4 0x00003fffb7842e84 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> No symbol table info available. >> #5 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >> pp=0x3fffa81099f0, size=, proc=) at >> xdr_ref.c:84 >> loc = 0x3fffa8109aa0 "\265\256\373\200\f\206\361j" >> stat = >> #6 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >> objpp=0x3fffa81099f0, obj_size=, >> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> more_data = 1 >> #7 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> No symbol table info available. >> #8 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >> pp=0x3fffa8109870, size=, proc=) at >> xdr_ref.c:84 >> loc = 0x3fffa8109920 "\232\373\377\315\352\325\005\271" >> stat = >> #9 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >> objpp=0x3fffa8109870, obj_size=, >> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> more_data = 1 >> #10 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> No symbol table info available. >> #11 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >> pp=0x3fffa81096f0, size=, proc=) at >> xdr_ref.c:84 >> loc = 0x3fffa81097a0 "\241X\372!\216\256=\342" >> stat = >> ---Type to continue, or q to quit--- >> #12 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >> objpp=0x3fffa81096f0, obj_size=, >> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> more_data = 1 >> #13 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> No symbol table info available. >> #14 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >> pp=0x3fffa8109570, size=, proc=) at >> xdr_ref.c:84 >> loc = 0x3fffa8109620 "\265\205\003Vu'\002L" >> stat = >> #15 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >> objpp=0x3fffa8109570, obj_size=, >> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> more_data = 1 >> #16 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> No symbol table info available. >> #17 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >> pp=0x3fffa81093f0, size=, proc=) at >> xdr_ref.c:84 >> loc = 0x3fffa81094a0 "\200L\027F'\177\366D" >> stat = >> #18 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >> objpp=0x3fffa81093f0, obj_size=, >> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> more_data = 1 >> #19 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> No symbol table info available. >> #20 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >> pp=0x3fffa8109270, size=, proc=) at >> xdr_ref.c:84 >> loc = 0x3fffa8109320 "\217{dK(\001E\220" >> stat = >> #21 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >> objpp=0x3fffa8109270, obj_size=, >> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> more_data = 1 >> #22 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> No symbol table info available. >> #23 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >> pp=0x3fffa81090f0, size=, proc=) at >> xdr_ref.c:84 >> loc = 0x3fffa81091a0 "\217\275\067\336\232\300(\005" >> stat = >> #24 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >> objpp=0x3fffa81090f0, obj_size=, >> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> more_data = 1 >> #25 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> No symbol table info available. >> #26 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >> pp=0x3fffa8108f70, size=, proc=) at >> xdr_ref.c:84 >> loc = 0x3fffa8109020 "\260.\025\b\244\352IT" >> stat = >> #27 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >> objpp=0x3fffa8108f70, obj_size=, >> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >> xdr_ref.c:135 >> more_data = 1 >> #28 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >> /usr/lib64/libgfxdr.so.0 >> No symbol table info available. >> #29 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >> pp=0x3fffa8108df0, size=, proc=) at >> xdr_ref.c:84 >> loc = 0x3fffa8108ea0 "\212GS\203l\035\n\\" >> ---Type to continue, or q to quit--- >> >> >> Regards, >> Abhishek >> >> On Mon, Mar 11, 2019 at 7:10 PM Amar Tumballi Suryanarayan < >> atumball at redhat.com> wrote: >> >>> Hi Abhishek, >>> >>> Can you check and get back to us? >>> >>> ``` >>> bash# ldd /usr/lib64/libglusterfs.so >>> bash# ldd /usr/lib64/libgfrpc.so >>> >>> ``` >>> >>> Also considering you have the core, can you do `(gdb) thr apply all bt >>> full` and pass it on? >>> >>> Thanks & Regards, >>> Amar >>> >>> On Mon, Mar 11, 2019 at 3:41 PM ABHISHEK PALIWAL < >>> abhishpaliwal at gmail.com> wrote: >>> >>>> Hi Team, >>>> >>>> COuld you please provide some pointer to debug it further. >>>> >>>> Regards, >>>> Abhishek >>>> >>>> On Fri, Mar 8, 2019 at 2:19 PM ABHISHEK PALIWAL < >>>> abhishpaliwal at gmail.com> wrote: >>>> >>>>> Hi Team, >>>>> >>>>> I am using Glusterfs 5.4, where after setting the gluster mount point >>>>> when trying to access it, glusterfsd is getting crashed and mount point >>>>> through the "Transport endpoint is not connected error. >>>>> >>>>> Here I are the gdb log for the core file >>>>> >>>>> warning: Could not load shared library symbols for linux-vdso64.so.1. >>>>> Do you need "set solib-search-path" or "set sysroot"? >>>>> [Thread debugging using libthread_db enabled] >>>>> Using host libthread_db library "/lib64/libthread_db.so.1". >>>>> Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 >>>>> --volfile-id gv0.128.224.95.140.tmp-bric'. >>>>> Program terminated with signal SIGSEGV, Segmentation fault. >>>>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>> 3327 { >>>>> [Current thread is 1 (Thread 0x3fff90394160 (LWP 811))] >>>>> (gdb) >>>>> (gdb) >>>>> (gdb) bt >>>>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>> #1 0x00003fff95ab43dc in __GI___libc_malloc (bytes=36) at >>>>> malloc.c:2921 >>>>> #2 0x00003fff95b6ffd0 in x_inline (xdrs=0x3fff90391d20, >>>>> len=) at xdr_sizeof.c:89 >>>>> #3 0x00003fff95c4d488 in .xdr_gfx_iattx () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> #4 0x00003fff95c4de84 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> #5 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>> pp=0x3fff7c132020, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> #6 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>> objpp=0x3fff7c132020, obj_size=, >>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> #7 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> #8 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>> pp=0x3fff7c131ea0, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> #9 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>> objpp=0x3fff7c131ea0, obj_size=, >>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> #10 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> #11 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>> pp=0x3fff7c131d20, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> #12 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>> objpp=0x3fff7c131d20, obj_size=, >>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> #13 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> #14 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>> pp=0x3fff7c131ba0, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> #15 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>> objpp=0x3fff7c131ba0, obj_size=, >>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> #16 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> #17 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>> pp=0x3fff7c131a20, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> #18 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>> objpp=0x3fff7c131a20, obj_size=, >>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> #19 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> #20 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>> pp=0x3fff7c1318a0, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> #21 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>> objpp=0x3fff7c1318a0, obj_size=, >>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> #22 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> #23 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>> pp=0x3fff7c131720, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> #24 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>> objpp=0x3fff7c131720, obj_size=, >>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> #25 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> #26 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>> pp=0x3fff7c1315a0, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> #27 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>> objpp=0x3fff7c1315a0, obj_size=, >>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> #28 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> #29 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>> pp=0x3fff7c131420, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> #30 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>> objpp=0x3fff7c131420, obj_size=, >>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> #31 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> #32 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>> pp=0x3fff7c1312a0, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> #33 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>> objpp=0x3fff7c1312a0, obj_size=, >>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> >>>>> Frames are getting repeated, could any one please me. >>>>> -- >>>>> Regards >>>>> Abhishek Paliwal >>>>> >>>> >>>> >>>> -- >>>> >>>> >>>> >>>> >>>> Regards >>>> Abhishek Paliwal >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> -- >>> Amar Tumballi (amarts) >>> >> >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal >> > > > -- > > > > > Regards > Abhishek Paliwal > -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abhishpaliwal at gmail.com Wed Mar 13 04:32:51 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Wed, 13 Mar 2019 10:02:51 +0530 Subject: [Gluster-users] Glusterfsd crashed with SIGSEGV In-Reply-To: References: Message-ID: Here are the logs: pabhishe at arn-build3$ldd ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.* ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.0: not a dynamic executable ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.0.0.1: not a dynamic executable pabhishe at arn-build3$ldd ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.0.0.1 not a dynamic executable For backtraces I have attached the core_logs.txt file. Regards, Abhishek On Wed, Mar 13, 2019 at 9:51 AM Amar Tumballi Suryanarayan < atumball at redhat.com> wrote: > Hi Abhishek, > > Few more questions, > > >> On Tue, Mar 12, 2019 at 10:58 AM ABHISHEK PALIWAL < >> abhishpaliwal at gmail.com> wrote: >> >>> Hi Amar, >>> >>> Below are the requested logs >>> >>> pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libglusterfs.so >>> not a dynamic executable >>> >>> pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libgfrpc.so >>> not a dynamic executable >>> >>> > Can you please add a * at the end, so it gets the linked library list from > the actual files (ideally this is a symlink, but I expected it to resolve > like in Fedora). > > > >> root at 128:/# gdb /usr/sbin/glusterd core.1099 >>> GNU gdb (GDB) 7.10.1 >>> Copyright (C) 2015 Free Software Foundation, Inc. >>> License GPLv3+: GNU GPL version 3 or later < >>> http://gnu.org/licenses/gpl.html> >>> This is free software: you are free to change and redistribute it. >>> There is NO WARRANTY, to the extent permitted by law. Type "show >>> copying" >>> and "show warranty" for details. >>> This GDB was configured as "powerpc64-wrs-linux". >>> Type "show configuration" for configuration details. >>> For bug reporting instructions, please see: >>> . >>> Find the GDB manual and other documentation resources online at: >>> . >>> For help, type "help". >>> Type "apropos word" to search for commands related to "word"... >>> Reading symbols from /usr/sbin/glusterd...(no debugging symbols >>> found)...done. >>> [New LWP 1109] >>> [New LWP 1101] >>> [New LWP 1105] >>> [New LWP 1110] >>> [New LWP 1099] >>> [New LWP 1107] >>> [New LWP 1119] >>> [New LWP 1103] >>> [New LWP 1112] >>> [New LWP 1116] >>> [New LWP 1104] >>> [New LWP 1239] >>> [New LWP 1106] >>> [New LWP 1111] >>> [New LWP 1108] >>> [New LWP 1117] >>> [New LWP 1102] >>> [New LWP 1118] >>> [New LWP 1100] >>> [New LWP 1114] >>> [New LWP 1113] >>> [New LWP 1115] >>> >>> warning: Could not load shared library symbols for linux-vdso64.so.1. >>> Do you need "set solib-search-path" or "set sysroot"? >>> [Thread debugging using libthread_db enabled] >>> Using host libthread_db library "/lib64/libthread_db.so.1". >>> Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 >>> --volfile-id gv0.128.224.95.140.tmp-bric'. >>> Program terminated with signal SIGSEGV, Segmentation fault. >>> #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, >>> bytes=bytes at entry=36) at malloc.c:3327 >>> 3327 { >>> [Current thread is 1 (Thread 0x3fffb1689160 (LWP 1109))] >>> (gdb) bt full >>> >> > This is backtrace of one particular thread. I need output of command > > (gdb) thread apply all bt full > > > Also, considering this is a crash in the malloc library call itself, would > like to know the details of OS, Kernel version and gcc versions. > > Regards, > Amar > > #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, >>> bytes=bytes at entry=36) at malloc.c:3327 >>> nb = >>> idx = >>> bin = >>> victim = >>> size = >>> victim_index = >>> remainder = >>> remainder_size = >>> block = >>> bit = >>> map = >>> fwd = >>> bck = >>> errstr = 0x0 >>> __func__ = "_int_malloc" >>> #1 0x00003fffb76a93dc in __GI___libc_malloc (bytes=36) at malloc.c:2921 >>> ar_ptr = 0x3fffa8000020 >>> victim = >>> hook = >>> __func__ = "__libc_malloc" >>> #2 0x00003fffb7764fd0 in x_inline (xdrs=0x3fffb1686d20, len=>> out>) at xdr_sizeof.c:89 >>> len = 36 >>> xdrs = 0x3fffb1686d20 >>> #3 0x00003fffb7842488 in .xdr_gfx_iattx () from /usr/lib64/libgfxdr.so.0 >>> No symbol table info available. >>> #4 0x00003fffb7842e84 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> No symbol table info available. >>> #5 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>> pp=0x3fffa81099f0, size=, proc=) at >>> xdr_ref.c:84 >>> loc = 0x3fffa8109aa0 "\265\256\373\200\f\206\361j" >>> stat = >>> #6 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>> objpp=0x3fffa81099f0, obj_size=, >>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> more_data = 1 >>> #7 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> No symbol table info available. >>> #8 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>> pp=0x3fffa8109870, size=, proc=) at >>> xdr_ref.c:84 >>> loc = 0x3fffa8109920 "\232\373\377\315\352\325\005\271" >>> stat = >>> #9 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>> objpp=0x3fffa8109870, obj_size=, >>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> more_data = 1 >>> #10 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> No symbol table info available. >>> #11 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>> pp=0x3fffa81096f0, size=, proc=) at >>> xdr_ref.c:84 >>> loc = 0x3fffa81097a0 "\241X\372!\216\256=\342" >>> stat = >>> ---Type to continue, or q to quit--- >>> #12 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>> objpp=0x3fffa81096f0, obj_size=, >>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> more_data = 1 >>> #13 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> No symbol table info available. >>> #14 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>> pp=0x3fffa8109570, size=, proc=) at >>> xdr_ref.c:84 >>> loc = 0x3fffa8109620 "\265\205\003Vu'\002L" >>> stat = >>> #15 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>> objpp=0x3fffa8109570, obj_size=, >>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> more_data = 1 >>> #16 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> No symbol table info available. >>> #17 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>> pp=0x3fffa81093f0, size=, proc=) at >>> xdr_ref.c:84 >>> loc = 0x3fffa81094a0 "\200L\027F'\177\366D" >>> stat = >>> #18 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>> objpp=0x3fffa81093f0, obj_size=, >>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> more_data = 1 >>> #19 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> No symbol table info available. >>> #20 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>> pp=0x3fffa8109270, size=, proc=) at >>> xdr_ref.c:84 >>> loc = 0x3fffa8109320 "\217{dK(\001E\220" >>> stat = >>> #21 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>> objpp=0x3fffa8109270, obj_size=, >>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> more_data = 1 >>> #22 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> No symbol table info available. >>> #23 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>> pp=0x3fffa81090f0, size=, proc=) at >>> xdr_ref.c:84 >>> loc = 0x3fffa81091a0 "\217\275\067\336\232\300(\005" >>> stat = >>> #24 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>> objpp=0x3fffa81090f0, obj_size=, >>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> more_data = 1 >>> #25 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> No symbol table info available. >>> #26 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>> pp=0x3fffa8108f70, size=, proc=) at >>> xdr_ref.c:84 >>> loc = 0x3fffa8109020 "\260.\025\b\244\352IT" >>> stat = >>> #27 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>> objpp=0x3fffa8108f70, obj_size=, >>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>> xdr_ref.c:135 >>> more_data = 1 >>> #28 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>> /usr/lib64/libgfxdr.so.0 >>> No symbol table info available. >>> #29 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>> pp=0x3fffa8108df0, size=, proc=) at >>> xdr_ref.c:84 >>> loc = 0x3fffa8108ea0 "\212GS\203l\035\n\\" >>> ---Type to continue, or q to quit--- >>> >>> >>> Regards, >>> Abhishek >>> >>> On Mon, Mar 11, 2019 at 7:10 PM Amar Tumballi Suryanarayan < >>> atumball at redhat.com> wrote: >>> >>>> Hi Abhishek, >>>> >>>> Can you check and get back to us? >>>> >>>> ``` >>>> bash# ldd /usr/lib64/libglusterfs.so >>>> bash# ldd /usr/lib64/libgfrpc.so >>>> >>>> ``` >>>> >>>> Also considering you have the core, can you do `(gdb) thr apply all bt >>>> full` and pass it on? >>>> >>>> Thanks & Regards, >>>> Amar >>>> >>>> On Mon, Mar 11, 2019 at 3:41 PM ABHISHEK PALIWAL < >>>> abhishpaliwal at gmail.com> wrote: >>>> >>>>> Hi Team, >>>>> >>>>> COuld you please provide some pointer to debug it further. >>>>> >>>>> Regards, >>>>> Abhishek >>>>> >>>>> On Fri, Mar 8, 2019 at 2:19 PM ABHISHEK PALIWAL < >>>>> abhishpaliwal at gmail.com> wrote: >>>>> >>>>>> Hi Team, >>>>>> >>>>>> I am using Glusterfs 5.4, where after setting the gluster mount point >>>>>> when trying to access it, glusterfsd is getting crashed and mount point >>>>>> through the "Transport endpoint is not connected error. >>>>>> >>>>>> Here I are the gdb log for the core file >>>>>> >>>>>> warning: Could not load shared library symbols for linux-vdso64.so.1. >>>>>> Do you need "set solib-search-path" or "set sysroot"? >>>>>> [Thread debugging using libthread_db enabled] >>>>>> Using host libthread_db library "/lib64/libthread_db.so.1". >>>>>> Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 >>>>>> --volfile-id gv0.128.224.95.140.tmp-bric'. >>>>>> Program terminated with signal SIGSEGV, Segmentation fault. >>>>>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>>> 3327 { >>>>>> [Current thread is 1 (Thread 0x3fff90394160 (LWP 811))] >>>>>> (gdb) >>>>>> (gdb) >>>>>> (gdb) bt >>>>>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>>> #1 0x00003fff95ab43dc in __GI___libc_malloc (bytes=36) at >>>>>> malloc.c:2921 >>>>>> #2 0x00003fff95b6ffd0 in x_inline (xdrs=0x3fff90391d20, >>>>>> len=) at xdr_sizeof.c:89 >>>>>> #3 0x00003fff95c4d488 in .xdr_gfx_iattx () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> #4 0x00003fff95c4de84 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> #5 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>> pp=0x3fff7c132020, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> #6 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>> objpp=0x3fff7c132020, obj_size=, >>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> #7 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> #8 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>> pp=0x3fff7c131ea0, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> #9 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>> objpp=0x3fff7c131ea0, obj_size=, >>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> #10 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> #11 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>> pp=0x3fff7c131d20, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> #12 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>> objpp=0x3fff7c131d20, obj_size=, >>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> #13 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> #14 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>> pp=0x3fff7c131ba0, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> #15 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>> objpp=0x3fff7c131ba0, obj_size=, >>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> #16 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> #17 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>> pp=0x3fff7c131a20, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> #18 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>> objpp=0x3fff7c131a20, obj_size=, >>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> #19 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> #20 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>> pp=0x3fff7c1318a0, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> #21 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>> objpp=0x3fff7c1318a0, obj_size=, >>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> #22 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> #23 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>> pp=0x3fff7c131720, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> #24 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>> objpp=0x3fff7c131720, obj_size=, >>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> #25 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> #26 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>> pp=0x3fff7c1315a0, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> #27 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>> objpp=0x3fff7c1315a0, obj_size=, >>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> #28 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> #29 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>> pp=0x3fff7c131420, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> #30 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>> objpp=0x3fff7c131420, obj_size=, >>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> #31 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> #32 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>> pp=0x3fff7c1312a0, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> #33 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>> objpp=0x3fff7c1312a0, obj_size=, >>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> >>>>>> Frames are getting repeated, could any one please me. >>>>>> -- >>>>>> Regards >>>>>> Abhishek Paliwal >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>>> >>>>> Regards >>>>> Abhishek Paliwal >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>>> >>>> -- >>>> Amar Tumballi (amarts) >>>> >>> >>> >>> -- >>> >>> >>> >>> >>> Regards >>> Abhishek Paliwal >>> >> >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal >> > > > -- > Amar Tumballi (amarts) > -- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- warning: Could not load shared library symbols for linux-vdso64.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 --volfile-id gv0.128.224.95.140.tmp-bric'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00003fffb3570d48 in _int_malloc (av=av at entry=0x3fffa4000020, bytes=bytes at entry=36) at malloc.c:3327 3327 { [Current thread is 1 (Thread 0x3fffad553160 (LWP 554))] (gdb) thread apply all bt full Thread 23 (Thread 0x3fffad513160 (LWP 555)): #0 0x00003fffb36ab7b0 in __pthread_cond_wait (cond=0x3fffa8074ff8, mutex=0x3fffa8074fd0) at pthread_cond_wait.c:186 r4 = 128 r7 = 2 r5 = 1 r8 = 2 arg3 = 1 r0 = 221 r3 = 512 r6 = 0 arg4 = 0 __err = __ret = futex_val = 1 buffer = {__routine = @0x3fffb36c8b50: 0x3fffb36ab400 <__condvar_cleanup>, __arg = 0x3fffad5125e0, __canceltype = 16383, __prev = 0x0} cbuffer = {oldtype = 0, cond = 0x3fffa8074ff8, mutex = 0x3fffa8074fd0, bc_seq = 0} err = pshared = 0 pi_flag = 0 val = seq = 0 #1 0x00003fffaf2762ec in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/bitrot-stub.so No symbol table info available. #2 0x00003fffb36a3b30 in start_thread (arg=0x3fffad513160) at pthread_create.c:462 pd = 0x3fffad513160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {8589934592, 70367268262048, 70367268262048, 70367268267296, 3419478345679794793, 7163871753673912110, 7452460622125822566, 8299977387227111790, 8388357161025011712, 0 }, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 22 (Thread 0x3fffac80f160 (LWP 557)): #0 0x00003fffb36ab7b0 in __pthread_cond_wait (cond=0x3fffa807a298, mutex=0x3fffa807a270) at pthread_cond_wait.c:186 r4 = 128 r7 = 2 r5 = 1 r8 = 2 arg3 = 1 r0 = 221 ---Type to continue, or q to quit--- r3 = 512 r6 = 0 arg4 = 0 __err = __ret = futex_val = 1 buffer = {__routine = @0x3fffb36c8b50: 0x3fffb36ab400 <__condvar_cleanup>, __arg = 0x3fffac80e560, __canceltype = 0, __prev = 0x0} cbuffer = {oldtype = 0, cond = 0x3fffa807a298, mutex = 0x3fffa807a270, bc_seq = 0} err = pshared = 0 pi_flag = 0 val = seq = 0 #1 0x00003fffaf2b4a9c in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/changelog.so No symbol table info available. #2 0x00003fffb36a3b30 in start_thread (arg=0x3fffac80f160) at pthread_create.c:462 pd = 0x3fffac80f160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {70367268283192, 70367268283192, 70367268283808, 1, 0, -4995072473058770944, 149, 206158430208, 64, 70367268268048, -3819410108757049344, 0, 0, 0, 0, 88, 64, 70367388068184, 0, 70367268267328, 70367268288960, 268615696, 70367461387880, -4995072473058770944, 101, 171798691840, 15, 70367268268048, -3819410108757049344, 0, 0, 0, 0, 3419478345679794793, 7163871753673900218, -5913212017086300160, 133, 171798691840, 37, 70367268268048, -3819410108757049344, 0, 0, 0, 0, 3419478345679794793, 7163871753673912110, 7452460622125822566, 8299961938229487461, 7813577622142234096, 936748722493063168, 0, 133, 695784701952, 40, 70367268268048, -3819410108757049344, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = {0xbaadf00d00000000, 0x0, 0x95, 0x8100000000}, data = {prev = 0xbaadf00d00000000, cleanup = 0x0, canceltype = 0}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 21 (Thread 0x3fffacd13160 (LWP 556)): #0 0x00003fffb36ab7b0 in __pthread_cond_wait (cond=0x3fffa8075070, mutex=0x3fffa8075048) at pthread_cond_wait.c:186 r4 = 128 r7 = 2 r5 = 1 r8 = 2 arg3 = 1 r0 = 221 r3 = 512 r6 = 0 arg4 = 0 __err = __ret = futex_val = 1 ---Type to continue, or q to quit--- buffer = {__routine = @0x3fffb36c8b50: 0x3fffb36ab400 <__condvar_cleanup>, __arg = 0x3fffacd125e0, __canceltype = 16383, __prev = 0x0} cbuffer = {oldtype = 0, cond = 0x3fffa8075070, mutex = 0x3fffa8075048, bc_seq = 0} err = pshared = 0 pi_flag = 0 val = seq = 0 #1 0x00003fffaf27385c in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/bitrot-stub.so No symbol table info available. #2 0x00003fffb36a3b30 in start_thread (arg=0x3fffacd13160) at pthread_create.c:462 pd = 0x3fffacd13160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0 }, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 20 (Thread 0x3fffaef53160 (LWP 551)): #0 0x00003fffb36ab7b0 in __pthread_cond_wait (cond=0x3fffa803f770, mutex=0x3fffa803f748) at pthread_cond_wait.c:186 r4 = 128 r7 = 2 r5 = 1 r8 = 2 arg3 = 1 r0 = 221 r3 = 512 r6 = 0 arg4 = 0 __err = __ret = futex_val = 1 buffer = {__routine = @0x3fffb36c8b50: 0x3fffb36ab400 <__condvar_cleanup>, __arg = 0x3fffaef525e0, __canceltype = 16383, __prev = 0x0} cbuffer = {oldtype = 0, cond = 0x3fffa803f770, mutex = 0x3fffa803f748, bc_seq = 0} err = pshared = 0 pi_flag = 0 val = seq = 0 #1 0x00003fffb373825c in ?? () from /usr/lib64/libgfrpc.so.0 No symbol table info available. #2 0x00003fffb36a3b30 in start_thread (arg=0x3fffaef53160) at pthread_create.c:462 ---Type to continue, or q to quit--- pd = 0x3fffaef53160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {70367268043272, 70367268042552, 70367268042552, 0, 1, 0, 0, 0, 1, 1, 0, 0, 70367268042568, 8589934592, 70367384514912, 4294967297, -4995072473058770944, 309, 16, 0, 1, 0, -1, 0, -1, 0 , 117, 691489734656, 32, 70367268057072, -3819410108757049344, 0, 0, 0, 0}, mask_was_saved = 16383}}, priv = {pad = {0x0, 0x0, 0xbaadf00d00000000, 0x145}, data = {prev = 0x0, cleanup = 0x0, canceltype = -1163005939}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 19 (Thread 0x3fff966c7160 (LWP 561)): #0 0x00003fffb35b4044 in .__nanosleep () at ../sysdeps/unix/syscall-template.S:84 No locals. #1 0x00003fffb35b3e0c in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138 ts = {tv_sec = 1, tv_nsec = 79283866} set = {__val = {65536, 0 }} oset = {__val = {18446744066193095191, 0, 0, 0, 0, 70366972896928, 0, 70367388722224, 0, 0, 70367389062912, 0, 0, 0, 0, 0}} result = #2 0x00003fffaf356678 in ?? () from /usr/lib64/glusterfs/5.4/xlator/storage/posix.so No symbol table info available. #3 0x00003fffb36a3b30 in start_thread (arg=0x3fff966c7160) at pthread_create.c:462 pd = 0x3fff966c7160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {70367389029784, 70367389029760, 70367389029688, 70367389029664, 0, 70367389029640, 0, 0, 0, 459, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 196, 0 , 1023, 0, 32, 0, 599, 0, 0, 0, 3, 0, 4, 0, 0, 0, 1, 0, 0, 0, 0, 0, 599, 0, 0, 0, 139, 0, 0, 0, 6375, 0, 0}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" #4 0x0000000000000000 in ?? () No symbol table info available. Thread 18 (Thread 0x3fff94ec7160 (LWP 564)): #0 0x00003fffb36ab7b0 in __pthread_cond_wait (cond=0x3fffa80843e8, mutex=0x3fffa80843c0) at pthread_cond_wait.c:186 r4 = 128 r7 = 2 r5 = 1 r8 = 2 arg3 = 1 r0 = 221 ---Type to continue, or q to quit--- r3 = 512 r6 = 0 arg4 = 0 __err = __ret = futex_val = 1 buffer = {__routine = @0x3fffb36c8b50: 0x3fffb36ab400 <__condvar_cleanup>, __arg = 0x3fff94ec64e0, __canceltype = 0, __prev = 0x0} cbuffer = {oldtype = 0, cond = 0x3fffa80843e8, mutex = 0x3fffa80843c0, bc_seq = 0} err = pshared = 0 pi_flag = 0 val = seq = 0 #1 0x00003fffaf35699c in ?? () from /usr/lib64/glusterfs/5.4/xlator/storage/posix.so No symbol table info available. #2 0x00003fffaf356cc4 in ?? () from /usr/lib64/glusterfs/5.4/xlator/storage/posix.so No symbol table info available. #3 0x00003fffb36a3b30 in start_thread (arg=0x3fff94ec7160) at pthread_create.c:462 pd = 0x3fff94ec7160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {70367268323904, 70367268323904, 1215, 608, 607, 607, 70367268323968, 8589934592, 0, 1, 0, 0, 0, 0, 75374666, 0, 72057594037927936, 70366956122464, 72057594037927936, 70367268357728, 70367268324832, 36, 25694, 5, 72018011619328, 0, 0, 100, 4096, 0, 263, 395262763, 308, 258307626, 308, 258307626, 0, 0, 0, 7539334392389520705, -8557774763185766717, 0, 0, 0, 0, 70366947733856, 70367268324272, 70367268324272, 0, 1, 0, 0, 0, 1, 1, 0, 0, 70367268324288, 8589934592, 3, 80384, 0, 128849018890, 70366964511072}, mask_was_saved = 16777216}}, priv = {pad = {0x3fff966c7160, 0x100000000000001, 0x0, 0x1ff}, data = {prev = 0x3fff966c7160, cleanup = 0x100000000000001, canceltype = 0}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" #4 0x00003fffa8080c50 in ?? () No symbol table info available. Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 17 (Thread 0x3fff95ec7160 (LWP 562)): #0 0x00003fffb35b4044 in .__nanosleep () at ../sysdeps/unix/syscall-template.S:84 No locals. #1 0x00003fffb35b3e0c in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138 ts = {tv_sec = 12, tv_nsec = 78972105} set = {__val = {65536, 0 }} oset = {__val = {18446744066193095191, 0, 0, 0, 0, 0, 0, 68719476736, 0, 70366964507520, 256, 0, 0, 0, 0, 0}} result = #2 0x00003fffaf355e08 in ?? () from /usr/lib64/glusterfs/5.4/xlator/storage/posix.so No symbol table info available. #3 0x00003fffb36a3b30 in start_thread (arg=0x3fff95ec7160) at pthread_create.c:462 ---Type to continue, or q to quit--- pd = 0x3fff95ec7160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {70367389029784, 70367389029760, 70367389029688, 70367389029664, 0, 70367389029640, 0, 0, 0, 459, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 196, 0 , 1023, 0, 32, 0, 599, 0, 0, 0, 3, 0, 4, 0, 0, 0, 1, 0, 0, 0, 0, 0, 599, 0, 0, 0, 139, 0, 0, 0, 6375, 0, 0}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" #4 0x0000000000000000 in ?? () No symbol table info available. Thread 16 (Thread 0x3fffb1c9e160 (LWP 546)): #0 0x00003fffb36b19a0 in do_sigwait (set=, set at entry=0x3fffb1c9d660, sig=sig at entry=0x3fffb1c9d6e0) at ../sysdeps/unix/sysv/linux/sigwait.c:61 r4 = 0 r7 = 2 arg2 = 0 r5 = 0 r8 = 2 arg3 = 0 r0 = 176 r3 = 4 r6 = 8 arg4 = 8 ret = tmpset = {__val = {0, 0, 0, 0, 0, 0, 0, 70367459258040, 268515760, 268516016, 268516264, 268612384, 70367432038576, 268478552, 70367432005088, 0}} err = #1 0x00003fffb36b1a88 in __sigwait (set=0x3fffb1c9d660, sig=0x3fffb1c9d6e0) at ../sysdeps/unix/sysv/linux/sigwait.c:96 oldtype = 0 result = sig = 0x3fffb1c9d6e0 set = 0x3fffb1c9d660 #2 0x000000001000a82c in .glusterfs_sigwaiter () No symbol table info available. #3 0x00003fffb36a3b30 in start_thread (arg=0x3fffb1c9e160) at pthread_create.c:462 pd = 0x3fffb1c9e160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {8674967983565600856, 70367459482624, 8674967983542736380, 0, 0, 70367423623168, 70367432008224, 8388608, 70367459442720, 70367774066608, 268616336, 70367459468248, 268606096, 3, 0, 70367459468264, 70367774065808, 70367774065864, 4001536, 70367459443736, 70367432005440, -3187653500, 0 }, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = pagesize_m1 = sp = ---Type to continue, or q to quit--- freesize = __PRETTY_FUNCTION__ = "start_thread" #4 0x00003fffb35ee17c in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96 No locals. Thread 15 (Thread 0x3fffb249e160 (LWP 545)): #0 0x00003fffb36b1150 in .__nanosleep () at ../sysdeps/unix/syscall-template.S:84 No locals. #1 0x00003fffb37a3b9c in ?? () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #2 0x00003fffb36a3b30 in start_thread (arg=0x3fffb249e160) at pthread_create.c:462 pd = 0x3fffb249e160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {-1, 0, -1, 0 , 49, 1, 0, 0, 0, 0, 369, 70367461387792, 70367267785088, 0, 70367461387880, 0, 0, 70367461387920, 0, 0, 70367461387960, 0, 0, 70367461388000, 0, 0, 70367461388040, 0, 0, 70367461388080, 0, 0, 70367461388120, 0, 0}, mask_was_saved = 16383}}, priv = {pad = {0x0, 0x3fffb38a2fa8, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x3fffb38a2fa8, canceltype = 0}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 14 (Thread 0x3fffad653160 (LWP 553)): #0 0x00003fffb36ab7b0 in __pthread_cond_wait (cond=0x3fffa804ed00, mutex=0x3fffa804ecd8) at pthread_cond_wait.c:186 r4 = 128 r7 = 2 r5 = 25 r8 = 2 arg3 = 25 r0 = 221 r3 = 512 r6 = 0 arg4 = 0 __err = __ret = futex_val = 25 buffer = {__routine = @0x3fffb36c8b50: 0x3fffb36ab400 <__condvar_cleanup>, __arg = 0x3fffad6525e0, __canceltype = 16383, __prev = 0x0} cbuffer = {oldtype = 0, cond = 0x3fffa804ed00, mutex = 0x3fffa804ecd8, bc_seq = 0} err = pshared = 0 pi_flag = 0 val = seq = 12 #1 0x00003fffaf0a41d0 in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/index.so ---Type to continue, or q to quit--- No symbol table info available. #2 0x00003fffb36a3b30 in start_thread (arg=0x3fffad653160) at pthread_create.c:462 pd = 0x3fffad653160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {-7104728266551728278, 70367268105416, 70367268105416, 0, 1, 0, 0, 0, 25, 13, 12, 12, 70367268105432, 8589934592, 70367268106968, 70367268106344, 70367268105864, 0, 70367358300512, 0, -4995072473058770944, 101, 682899800064, 1, 70367268135648, -3819410108757049344, 0, 0, 0, 0, 124603186227970048, 0, 0, 37, 70367267824576, 0, 0, 117, 196092424128, 18, 70367268092080, -3819410108757049344, 0, 0, 0, 0, 8390898194479146030, 7018422612882190964, 8719174134808772608, 0, 0, 277, 3405691582, 0, 70367267785088, 34359738368, 268862464, 3, 4294967299, 1, 70367268105960, 70367268107992, 0, 0}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x3fffa804f6d8}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 13 (Thread 0x3fff96fff160 (LWP 560)): #0 0x00003fffb35e4764 in .__select () at ../sysdeps/unix/syscall-template.S:84 No locals. #1 0x00003fffaf2b4df0 in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/changelog.so No symbol table info available. #2 0x00003fffb36a3b30 in start_thread (arg=0x3fff96fff160) at pthread_create.c:462 pd = 0x3fff96fff160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, 0, 0, 0, 0, 70367268283136, 70367268283136, 0, 0, 0, 0, 0, 70367268283192, 70367268283192, 70367268283808, 1, 0, -4995072473058770944, 149, 206158430208, 64, 70367268268048, -3819410108757049344, 0, 0, 0, 0, 88, 64, 70367388068184, 0, 70367268267328, 70367268288960, 268615696, 70367461387880, -4995072473058770944, 101, 171798691840, 15, 70367268268048, -3819410108757049344, 0, 0, 0, 0, 3419478345679794793, 7163871753673900218, -5913212017086300160, 133, 171798691840, 37, 70367268268048, -3819410108757049344, 0, 0, 0, 0, 3419478345679794793, 7163871753673912110, 7452460622125822566, 8299961938229487461, 7813577622142234096, 936748722493063168, 0}, mask_was_saved = 0}}, priv = {pad = {0x28, 0x3fffa8076810, 0xcafebabe00000000, 0x0}, data = {prev = 0x28, cleanup = 0x3fffa8076810, canceltype = -889275714}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" #3 0x0000000000000000 in ?? () No symbol table info available. Thread 12 (Thread 0x3fffb049e160 (LWP 549)): #0 0x00003fffb36abccc in __pthread_cond_timedwait (cond=0x100705a8, mutex=0x10070580, abstime=0x3fffb049d670) at pthread_cond_timedwait.c:198 r4 = 393 r7 = 0 arg5 = 0 arg2 = ---Type to continue, or q to quit--- r5 = 26 r8 = 4294967295 arg6 = 4294967295 arg3 = 26 r0 = 221 r3 = 516 r6 = 70367406839408 arg4 = 70367406839408 arg1 = 268895660 __err = __ret = futex_val = 26 buffer = {__routine = @0x3fffb36c8b50: 0x3fffb36ab400 <__condvar_cleanup>, __arg = 0x3fffb049d540, __canceltype = 0, __prev = 0x0} cbuffer = {oldtype = 0, cond = 0x100705a8, mutex = 0x10070580, bc_seq = 6} result = 0 pshared = 0 pi_flag = 0 err = val = seq = 12 #1 0x00003fffb37dee54 in ?? () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #2 0x00003fffb37dfd4c in ?? () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #3 0x00003fffb36a3b30 in start_thread (arg=0x3fffb049e160) at pthread_create.c:462 pd = 0x3fffb049e160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0 , 268872368, 70367460585420, 70367415227936, 70367459256832, 0, 70367133610272, 1, 70367459451856, 1, 70366462478528, 70367267922320, 1, 0, 70367457788880, 70367415261360, 0, 0, 70367406845952, 70367415231008, 8388608, 70367459442720, 268872128, 268872128, 70367459468248, 70367461341232, 3, 0, 70367459468264, 70367774066048, 70367774066104, 4001536, 268872128, 70367133610160, 70367460585448, 0, 0, 70367457788880, 70367460585448, 0, 1107313796, 0, 0, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = { 0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 11 (Thread 0x3fff977ff160 (LWP 559)): #0 0x00003fffb35e4764 in .__select () at ../sysdeps/unix/syscall-template.S:84 No locals. #1 0x00003fffaf2b4df0 in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/changelog.so No symbol table info available. #2 0x00003fffb36a3b30 in start_thread (arg=0x3fff977ff160) at pthread_create.c:462 ---Type to continue, or q to quit--- pd = 0x3fff977ff160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, 0, 0, 0, 0, 70367268283136, 70367268283136, 0, 0, 0, 0, 0, 70367268283192, 70367268283192, 70367268283808, 1, 0, -4995072473058770944, 149, 206158430208, 64, 70367268268048, -3819410108757049344, 0, 0, 0, 0, 88, 64, 70367388068184, 0, 70367268267328, 70367268288960, 268615696, 70367461387880, -4995072473058770944, 101, 171798691840, 15, 70367268268048, -3819410108757049344, 0, 0, 0, 0, 3419478345679794793, 7163871753673900218, -5913212017086300160, 133, 171798691840, 37, 70367268268048, -3819410108757049344, 0, 0, 0, 0, 3419478345679794793, 7163871753673912110, 7452460622125822566, 8299961938229487461, 7813577622142234096, 936748722493063168, 0}, mask_was_saved = 0}}, priv = {pad = {0x28, 0x3fffa8076810, 0xcafebabe00000000, 0x0}, data = {prev = 0x28, cleanup = 0x3fffa8076810, canceltype = -889275714}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" #3 0x0000000000000000 in ?? () No symbol table info available. Thread 10 (Thread 0x3fffb149e160 (LWP 547)): #0 0x00003fffb35b4044 in .__nanosleep () at ../sysdeps/unix/syscall-template.S:84 No locals. #1 0x00003fffb35b3e0c in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138 ts = {tv_sec = 6, tv_nsec = 35069955} set = {__val = {65536, 0 }} oset = {__val = {18446744066193095191, 0, 0, 0, 0, 0, 70367423608448, 8388608, 70367423616680, 70367423616664, 70367423608456, 0, 70367461387752, 70367461387824, 70367461387864, 3133061822}} result = #2 0x00003fffb37c675c in ?? () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #3 0x00003fffb36a3b30 in start_thread (arg=0x3fffb149e160) at pthread_create.c:462 pd = 0x3fffb149e160 now = unwind_buf = not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 9 (Thread 0x3fffae753160 (LWP 552)): #0 0x00003fffb36ab7b0 in __pthread_cond_wait (cond=0x3fffa803fa50, mutex=0x3fffa803fa28) at pthread_cond_wait.c:186 r4 = 128 r7 = 2 r5 = 27457 r8 = 2 arg3 = 27457 ---Type to continue, or q to quit--- r0 = 221 r3 = 512 r6 = 0 arg4 = 0 __err = __ret = futex_val = 27457 buffer = {__routine = @0x3fffb36c8b50: 0x3fffb36ab400 <__condvar_cleanup>, __arg = 0x3fffae7525e0, __canceltype = 16383, __prev = 0x0} cbuffer = {oldtype = 0, cond = 0x3fffa803fa50, mutex = 0x3fffa803fa28, bc_seq = 0} err = pshared = 0 pi_flag = 0 val = seq = 13728 #1 0x00003fffb373825c in ?? () from /usr/lib64/libgfrpc.so.0 No symbol table info available. #2 0x00003fffb36a3b30 in start_thread (arg=0x3fffae753160) at pthread_create.c:462 pd = 0x3fffae753160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {70367268036144, 70367268043288, 70367268043288, 0, 1, 0, 0, 0, 27457, 13729, 13728, 13728, 70367268043304, 8589934592, 70367376126304, 4294967297, -4995072473058770944, 309, 16, 0, 1, 0, -1, 0, -1, 0 , 117, 691489734656, 32, 70367268057072, -3819410108757049344, 0, 0, 0, 0}, mask_was_saved = 16383}}, priv = {pad = {0x0, 0x0, 0xbaadf00d00000000, 0x145}, data = {prev = 0x0, cleanup = 0x0, canceltype = -1163005939}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 8 (Thread 0x3fffafba2160 (LWP 550)): #0 0x00003fffb35ee8f4 in .epoll_wait () at ../sysdeps/unix/syscall-template.S:84 No locals. #1 0x00003fffb3807730 in ?? () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #2 0x00003fffb36a3b30 in start_thread (arg=0x3fffafba2160) at pthread_create.c:462 pd = 0x3fffafba2160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0 }, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" ---Type to continue, or q to quit--- Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 7 (Thread 0x3fff9452c160 (LWP 674)): #0 0x00003fffb36abccc in __pthread_cond_timedwait (cond=0x3fffa805c908, mutex=0x3fffa805c8e0, abstime=0x3fff9452b6d0) at pthread_cond_timedwait.c:198 r4 = 393 r7 = 0 arg5 = 0 arg2 = r5 = 26867 r8 = 4294967295 arg6 = 4294967295 arg3 = 26867 r0 = 221 r3 = 516 r6 = 70366937659088 arg4 = 70366937659088 arg1 = 70367268161804 __err = __ret = futex_val = 26867 buffer = {__routine = @0x3fffb36c8b50: 0x3fffb36ab400 <__condvar_cleanup>, __arg = 0x3fff9452b590, __canceltype = 16383, __prev = 0x0} cbuffer = {oldtype = 0, cond = 0x3fffa805c908, mutex = 0x3fffa805c8e0, bc_seq = 0} result = 0 pshared = 0 pi_flag = 0 err = val = seq = 13432 #1 0x00003fffaf13d380 in ?? () from /usr/lib64/glusterfs/5.4/xlator/performance/io-threads.so No symbol table info available. #2 0x00003fffb36a3b30 in start_thread (arg=0x3fff9452c160) at pthread_create.c:462 pd = 0x3fff9452c160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {70367268161864, 70367268161880, 70367268161880, 70367268161896, 70367268161896, 70367268161912, 70367268161912, 70367268161928, 70367268161928, 70367268161944, 70367268161944, 70367268161960, 70367268161960, 70367268161976, 70367268161976, 70367268161992, 70367268161992, 70367268162008, 70367268162008, 70367268162024, 70367268162024, 70367268162040, 70367268162040, 68719476752, 68719476737, 4294967296, 0, 0, 0, 0, 0, 0, 4096, 0, 262144, 0, 0, 72057594037927936, 70367267891312, 262144, 282574488338432, 0, 0, 0, -4995072473058770944, 309, 16, 0, 1, 0, -1, 0, -1, 0 }, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" #3 0x0000000000000000 in ?? () No symbol table info available. ---Type to continue, or q to quit--- Thread 6 (Thread 0x3fffb38d5000 (LWP 544)): #0 0x00003fffb36a5084 in pthread_join (threadid=70367397421408, thread_return=0x0) at pthread_join.c:90 r4 = 0 r7 = 2 arg2 = 0 r5 = 550 r8 = 2 arg3 = 550 r0 = 221 r3 = 512 r6 = 0 arg4 = 0 arg1 = 70367397421616 __err = __ret = __tid = 550 _buffer = {__routine = @0x3fffb36c8478: 0x3fffb36a4f70 , __arg = 0x3fffafba2588, __canceltype = 1735157363, __prev = 0x0} oldtype = 0 self = 0x3fffb38d5000 result = 0 #1 0x00003fffb3806be0 in ?? () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #2 0x00003fffb37c5214 in .event_dispatch () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #3 0x0000000010006358 in .main () No symbol table info available. Thread 5 (Thread 0x3fff9456c160 (LWP 671)): #0 0x00003fffb36abccc in __pthread_cond_timedwait (cond=0x3fffa805c908, mutex=0x3fffa805c8e0, abstime=0x3fff9456b6d0) at pthread_cond_timedwait.c:198 r4 = 393 r7 = 0 arg5 = 0 arg2 = r5 = 26865 r8 = 4294967295 arg6 = 4294967295 arg3 = 26865 r0 = 221 r3 = 516 r6 = 70366937921232 arg4 = 70366937921232 arg1 = 70367268161804 __err = __ret = ---Type to continue, or q to quit--- futex_val = 26865 buffer = {__routine = @0x3fffb36c8b50: 0x3fffb36ab400 <__condvar_cleanup>, __arg = 0x3fff9456b590, __canceltype = 16383, __prev = 0x0} cbuffer = {oldtype = 0, cond = 0x3fffa805c908, mutex = 0x3fffa805c8e0, bc_seq = 0} result = 0 pshared = 0 pi_flag = 0 err = val = seq = 13431 #1 0x00003fffaf13d380 in ?? () from /usr/lib64/glusterfs/5.4/xlator/performance/io-threads.so No symbol table info available. #2 0x00003fffb36a3b30 in start_thread (arg=0x3fff9456c160) at pthread_create.c:462 pd = 0x3fff9456c160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {70367268161864, 70367268161880, 70367268161880, 70367268161896, 70367268161896, 70367268161912, 70367268161912, 70367268161928, 70367268161928, 70367268161944, 70367268161944, 70367268161960, 70367268161960, 70367268161976, 70367268161976, 70367268161992, 70367268161992, 70367268162008, 70367268162008, 70367268162024, 70367268162024, 70367268162040, 70367268162040, 68719476752, 68719476737, 4294967296, 0, 0, 0, 0, 0, 0, 4096, 0, 262144, 0, 0, 72057594037927936, 70367267891312, 262144, 282574488338432, 0, 0, 0, -4995072473058770944, 309, 16, 0, 1, 0, -1, 0, -1, 0 }, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" #3 0x0000000000000000 in ?? () No symbol table info available. Thread 4 (Thread 0x3fff956c7160 (LWP 563)): #0 0x00003fffb36abccc in __pthread_cond_timedwait (cond=0x3fffa8084250, mutex=0x3fffa8084280, abstime=0x3fff956c66a0) at pthread_cond_timedwait.c:198 r4 = 393 r7 = 0 arg5 = 0 arg2 = r5 = 1215 r8 = 4294967295 arg6 = 4294967295 arg3 = 1215 r0 = 221 r3 = 516 r6 = 70366956119712 arg4 = 70366956119712 arg1 = 70367268323924 __err = __ret = futex_val = 1215 ---Type to continue, or q to quit--- buffer = {__routine = @0x3fffb36c8b50: 0x3fffb36ab400 <__condvar_cleanup>, __arg = 0x3fff956c6560, __canceltype = 0, __prev = 0x0} cbuffer = {oldtype = 0, cond = 0x3fffa8084250, mutex = 0x3fffa8084280, bc_seq = 0} result = 0 pshared = 0 pi_flag = 0 err = val = seq = 607 #1 0x00003fffaf351278 in ?? () from /usr/lib64/glusterfs/5.4/xlator/storage/posix.so No symbol table info available. #2 0x00003fffb36a3b30 in start_thread (arg=0x3fff956c7160) at pthread_create.c:462 pd = 0x3fff956c7160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {75374666, 0, 72057594037927936, 70366956122464, 72057594037927936, 70367268357728, 70367268324832, 36, 25694, 5, 72018011619328, 0, 0, 100, 4096, 0, 263, 395262763, 308, 258307626, 308, 258307626, 0, 0, 0, 7539334392389520705, -8557774763185766717, 0, 0, 0, 0, 70366947733856, 70367268324272, 70367268324272, 0, 1, 0, 0, 0, 1, 1, 0, 0, 70367268324288, 8589934592, 3, 80384, 0, 128849018890, 70366964511072, 72057594037927937, 0, 70366972899680, 72057594037927937, 0, 511, 2194728288356, 0, -4995072473058770944, 341, 193273528320, 256, 70367268310096, -3819410108757049344}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x3132382e3232342e, 0x39352e3134300000}, data = {prev = 0x0, cleanup = 0x0, canceltype = 825374766}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" #3 0x0000000000000000 in ?? () No symbol table info available. Thread 3 (Thread 0x3fff97fff160 (LWP 558)): #0 0x00003fffb35e4764 in .__select () at ../sysdeps/unix/syscall-template.S:84 No locals. #1 0x00003fffaf2b4df0 in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/changelog.so No symbol table info available. #2 0x00003fffb36a3b30 in start_thread (arg=0x3fff97fff160) at pthread_create.c:462 pd = 0x3fff97fff160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, 0, 0, 0, 0, 70367268283136, 70367268283136, 0, 0, 0, 0, 0, 70367268283192, 70367268283192, 70367268283808, 1, 0, -4995072473058770944, 149, 206158430208, 64, 70367268268048, -3819410108757049344, 0, 0, 0, 0, 88, 64, 70367388068184, 0, 70367268267328, 70367268288960, 268615696, 70367461387880, -4995072473058770944, 101, 171798691840, 15, 70367268268048, -3819410108757049344, 0, 0, 0, 0, 3419478345679794793, 7163871753673900218, -5913212017086300160, 133, 171798691840, 37, 70367268268048, -3819410108757049344, 0, 0, 0, 0, 3419478345679794793, 7163871753673912110, 7452460622125822566, 8299961938229487461, 7813577622142234096, 936748722493063168, 0}, mask_was_saved = 0}}, priv = {pad = {0x28, 0x3fffa8076810, 0xcafebabe00000000, 0x0}, data = {prev = 0x28, cleanup = 0x3fffa8076810, canceltype = -889275714}}} not_first_call = pagesize_m1 = sp = freesize = ---Type to continue, or q to quit--- __PRETTY_FUNCTION__ = "start_thread" #3 0x0000000000000000 in ?? () No symbol table info available. Thread 2 (Thread 0x3fffb0c9e160 (LWP 548)): #0 0x00003fffb36abccc in __pthread_cond_timedwait (cond=0x100705a8, mutex=0x10070580, abstime=0x3fffb0c9d670) at pthread_cond_timedwait.c:198 r4 = 393 r7 = 0 arg5 = 0 arg2 = r5 = 25 r8 = 4294967295 arg6 = 4294967295 arg3 = 25 r0 = 221 r3 = 516 r6 = 70367415228016 arg4 = 70367415228016 arg1 = 268895660 __err = __ret = futex_val = 25 buffer = {__routine = @0x3fffb36c8b50: 0x3fffb36ab400 <__condvar_cleanup>, __arg = 0x3fffb0c9d540, __canceltype = 16383, __prev = 0x0} cbuffer = {oldtype = 0, cond = 0x100705a8, mutex = 0x10070580, bc_seq = 6} result = 0 pshared = 0 pi_flag = 0 err = val = seq = 12 #1 0x00003fffb37dee54 in ?? () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #2 0x00003fffb37dfd4c in ?? () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #3 0x00003fffb36a3b30 in start_thread (arg=0x3fffb0c9e160) at pthread_create.c:462 pd = 0x3fffb0c9e160 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0 , 268872368, 70367460585420, 70367415227936, 70367459256832, 0, 70367133610272, 1, 70367459451856, 1, 70366462478528, 70367267922320, 1, 0, 70367457788880, 70367415261360, 0, 0, 70367406845952, 70367415231008, 8388608, 70367459442720, 268872128, 268872128, 70367459468248, 70367461341232, 3, 0, 70367459468264, 70367774066048, 70367774066104, 4001536, 268872128, 70367133610160, 70367460585448, 0, 0, 70367457788880, 70367460585448, 0, 1107313796, 0, 0, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = { 0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = pagesize_m1 = sp = ---Type to continue, or q to quit--- freesize = __PRETTY_FUNCTION__ = "start_thread" Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 1 (Thread 0x3fffad553160 (LWP 554)): #0 0x00003fffb3570d48 in _int_malloc (av=av at entry=0x3fffa4000020, bytes=bytes at entry=36) at malloc.c:3327 nb = idx = bin = victim = size = victim_index = remainder = remainder_size = block = bit = map = fwd = bck = errstr = 0x0 __func__ = "_int_malloc" #1 0x00003fffb35733dc in __GI___libc_malloc (bytes=36) at malloc.c:2921 ar_ptr = 0x3fffa4000020 victim = hook = __func__ = "__libc_malloc" #2 0x00003fffb362efd0 in x_inline (xdrs=0x3fffad550d20, len=) at xdr_sizeof.c:89 len = 36 xdrs = 0x3fffad550d20 #3 0x00003fffb370c488 in .xdr_gfx_iattx () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #4 0x00003fffb370ce84 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #5 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e9c10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e9cc0 "\275\270^m\371j\233O" stat = #6 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e9c10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #7 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #8 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e9a90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e9b40 "\241\315\264rh<\b\274" stat = #9 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e9a90, obj_size=, ---Type to continue, or q to quit--- xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #10 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #11 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e9910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e99c0 "\266+W$\331o6\310" stat = #12 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e9910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #13 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #14 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e9790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e9840 "\262hF[\200\331\236\336" stat = #15 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e9790, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #16 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #17 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e9610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e96c0 "\224T\fu\021Iw\021" stat = #18 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e9610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #19 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #20 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e9490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e9540 "\237n(\216\211\246[\261" stat = #21 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e9490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #22 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #23 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e9310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e93c0 "\235N~8\213&\221\356" stat = #24 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e9310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #25 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #26 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e9190, size=, proc=) at xdr_ref.c:84 ---Type to continue, or q to quit--- loc = 0x3fffa40e9240 "\221\351\207\227\a \346" stat = #27 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e9190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #28 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #29 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e9010, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e90c0 "\241\250?\337\037\355\v\r" stat = #30 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e9010, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #31 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #32 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e8e90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e8f40 "\261\033(\201\022%q," stat = #33 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e8e90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #34 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #35 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e8d10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e8dc0 "\273\372\242K\351T\035\360" stat = #36 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e8d10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #37 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #38 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e8b90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e8c40 "\200z~h\330-\342}" stat = #39 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e8b90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #40 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #41 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e8a10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e8ac0 "\264\016\246~\222\031z7" stat = #42 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e8a10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 ---Type to continue, or q to quit--- #43 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #44 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e8890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e8940 "\202 #45 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e8890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #46 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #47 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e8710, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e87c0 "\227\341\252" stat = #48 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e8710, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #49 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #50 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e8590, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e8640 "\206\322F,= #51 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e8590, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #52 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #53 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e8410, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e84c0 "\272Y\210ot7\004j" stat = #54 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e8410, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #55 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #56 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e8290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e8340 "\200\306\001\317\375\a\307\206" stat = #57 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e8290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #58 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #59 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e8110, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e81c0 "\203\067\327\224x\037\370\021" stat = ---Type to continue, or q to quit--- #60 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e8110, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #61 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #62 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e7f90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e8040 "\203\344p\225\033\322\345W" stat = #63 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e7f90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #64 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #65 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e7e10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e7ec0 "\233\234\326iS\306\236\277" stat = #66 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e7e10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #67 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #68 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e7c90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e7d40 "\211\350\030R\344g#\303" stat = #69 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e7c90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #70 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #71 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e7b10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e7bc0 "\245=\277\374\036M|\202" stat = #72 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e7b10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #73 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #74 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e7990, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e7a40 "\206N\261\372\320\341\371\365" stat = #75 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e7990, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #76 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. ---Type to continue, or q to quit--- #77 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e7810, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e78c0 "\224\271\271\216|\204~%" stat = #78 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e7810, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #79 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #80 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e7690, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e7740 "\224\351\333\274\354A-\233" stat = #81 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e7690, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #82 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #83 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e7510, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e75c0 "\220\345\357\325" stat = #84 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e7510, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #85 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #86 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e7390, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e7440 "\212\216I\320\006\244\335\032" stat = #87 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e7390, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #88 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #89 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e7210, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e72c0 "\201\b[\314X6[\273" stat = #90 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e7210, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #91 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #92 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e7090, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e7140 "\214#2$\210>\303\f" stat = #93 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e7090, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 ---Type to continue, or q to quit--- more_data = 1 #94 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #95 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e6f10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e6fc0 "\250\067\071\327\244\334lx" stat = #96 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e6f10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #97 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #98 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e6d90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e6e40 "\236\250\337\202\246\307\003\367" stat = #99 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e6d90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #100 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #101 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e6c10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e6cc0 "\222E\305\310>\362<\300" stat = #102 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e6c10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #103 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #104 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e6a90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e6b40 "\225C\346\322P\322f-" stat = #105 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e6a90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #106 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #107 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e6910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e69c0 "\214W85\017\033\273\200" stat = #108 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e6910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #109 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #110 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e6790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e6840 "\276U\v(\n\301\360$" ---Type to continue, or q to quit--- stat = #111 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e6790, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #112 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #113 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e6610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e66c0 "\261\271\005j\334<9Y" stat = #114 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e6610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #115 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #116 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e6490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e6540 "\215jE2\222\067\070\004" stat = #117 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e6490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #118 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #119 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e6310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e63c0 "\236\307n\021n\314\003\213" stat = #120 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e6310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #121 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #122 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e6190, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e6240 "\234n\363T\021\235\243\312" stat = #123 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e6190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #124 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #125 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e6010, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e60c0 "\202\222A\233\225\001\327U" stat = #126 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e6010, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #127 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 ---Type to continue, or q to quit--- No symbol table info available. #128 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e5e90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e5f40 "\274\344\270t\346%r\021" stat = #129 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e5e90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #130 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #131 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e5d10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e5dc0 "\215\agF\332\337\310W" stat = #132 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e5d10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #133 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #134 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e5b90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e5c40 "\234\066\022\270\226\t\221\065" stat = #135 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e5b90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #136 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #137 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e5a10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e5ac0 "\237\210\006o[lc\335" stat = #138 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e5a10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #139 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #140 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e5890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e5940 "\202\326\256[\256!\341g" stat = #141 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e5890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #142 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #143 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e5710, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e57c0 "\226\262V\r\325\037;p" stat = #144 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e5710, obj_size=, ---Type to continue, or q to quit--- xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #145 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #146 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e5590, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e5640 "\223\017uz\213\243\302\247" stat = #147 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e5590, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #148 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #149 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e5410, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e54c0 "\264\302\346Z\330\231\363\060" stat = #150 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e5410, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #151 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #152 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e5290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e5340 "\210s\223" stat = #153 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e5290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #154 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #155 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e5110, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e51c0 "\204\204\333\254\330>\376\070" stat = #156 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e5110, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #157 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #158 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e4f90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e5040 "\223\254+j\250\266\263l" stat = #159 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e4f90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #160 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #161 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e4e10, size=, proc=) at xdr_ref.c:84 ---Type to continue, or q to quit--- loc = 0x3fffa40e4ec0 "\223\062\203\341q\237ke" stat = #162 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e4e10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #163 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #164 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e4c90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e4d40 "\231\362\336x\303\267\221\213" stat = #165 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e4c90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #166 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #167 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e4b10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e4bc0 "\276R\036\201w\r\304\236" stat = #168 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e4b10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #169 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #170 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e4990, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e4a40 "\211\373E\335;\017/*" stat = #171 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e4990, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #172 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #173 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e4810, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e48c0 "\226=\227\311\006\311\372\333" stat = #174 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e4810, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #175 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #176 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e4690, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e4740 "\220sT0\031\303\274+" stat = #177 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e4690, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 ---Type to continue, or q to quit--- #178 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #179 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e4510, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e45c0 "\222\313\062\250\264)\256\070" stat = #180 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e4510, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #181 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #182 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e4390, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e4440 "\252~2 \203\223\234T" stat = #183 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e4390, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #184 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #185 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e4210, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e42c0 "\256D\t\257_Nk\016" stat = #186 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e4210, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #187 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #188 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e4090, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e4140 "\262\272\256\212\362\020\365\226" stat = #189 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e4090, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #190 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #191 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e3f10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e3fc0 "\227bN\323\201T\022\320" stat = #192 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e3f10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #193 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #194 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e3d90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e3e40 "\206 \204\217\350\213\236M" stat = ---Type to continue, or q to quit--- #195 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e3d90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #196 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #197 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e3c10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e3cc0 "\201\355\370v\023\320\204\375" stat = #198 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e3c10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #199 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #200 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e3a90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e3b40 "\200n+\246\245\317-\247" stat = #201 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e3a90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #202 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #203 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e3910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e39c0 "\274;\263l\350\257\205\060" stat = #204 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e3910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #205 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #206 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e3790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e3840 "\231\312:\316\346\345\245," stat = #207 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e3790, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #208 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #209 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e3610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e36c0 "\251\321RL\306N\324~" stat = #210 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e3610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #211 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. ---Type to continue, or q to quit--- #212 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e3490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e3540 "\225W\320\300\334\327/ " stat = #213 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e3490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #214 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #215 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e3310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e33c0 "\210&\\2E\023+H" stat = #216 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e3310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #217 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #218 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e3190, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e3240 "\275C\335\217\215\033E\303" stat = #219 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e3190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #220 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #221 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e3010, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e30c0 "\225\252\265b\344\237+\r" stat = #222 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e3010, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #223 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #224 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e2e90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e2f40 "\224\341\066+ \226\241\205" stat = #225 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e2e90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #226 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #227 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e2d10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e2dc0 "\217\222\032\263IX19" stat = #228 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e2d10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 ---Type to continue, or q to quit--- more_data = 1 #229 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #230 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e2b90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e2c40 "\261\223\312\217\352I\021\222" stat = #231 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e2b90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #232 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #233 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e2a10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e2ac0 "\246\316\372\217\277\341\213\244" stat = #234 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e2a10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #235 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #236 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e2890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e2940 "\205v`p\320a\225\364" stat = #237 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e2890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #238 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #239 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e2710, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e27c0 "\233\025o\347\060\215e\023" stat = #240 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e2710, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #241 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #242 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e2590, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e2640 "\270\253\255O+\260\214)" stat = #243 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e2590, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #244 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #245 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e2410, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e24c0 "\256JQ6\330\226\317M" ---Type to continue, or q to quit--- stat = #246 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e2410, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #247 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #248 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e2290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e2340 "\240eJ`\313&\t\245" stat = #249 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e2290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #250 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #251 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e2110, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e21c0 "\237\260F\345\004\307\020\357" stat = #252 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e2110, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #253 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #254 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e1f90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e2040 "\200\336\062\333\201\"`@" stat = #255 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e1f90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #256 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #257 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e1e10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e1ec0 "\250\207\236\067\066\062\210\260" stat = #258 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e1e10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #259 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #260 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e1c90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e1d40 "\244\064\206\260\342\344\221\224" stat = #261 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e1c90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #262 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 ---Type to continue, or q to quit--- No symbol table info available. #263 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e1b10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e1bc0 "\242\003\033\361\025!Q\250" stat = #264 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e1b10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #265 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #266 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e1990, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e1a40 "\202\337/\243\367MD\243" stat = #267 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e1990, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #268 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #269 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e1810, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e18c0 "\255W\234\020\250kC\240" stat = #270 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e1810, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #271 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #272 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e1690, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e1740 "\207V\235E\362\363\203n" stat = #273 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e1690, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #274 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #275 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e1510, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e15c0 "\211\340\210\261$\005\250#" stat = #276 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e1510, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #277 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #278 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e1390, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e1440 "\217K\a\003d\r\r*" stat = #279 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e1390, obj_size=, ---Type to continue, or q to quit--- xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #280 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #281 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e1210, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e12c0 "\233\027\354\247\256R\227'" stat = #282 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e1210, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #283 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #284 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e1090, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e1140 "\246\061\177\032}17\232" stat = #285 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e1090, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #286 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #287 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e0f10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e0fc0 "\241\060\023\v\026\221S\370" stat = #288 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e0f10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #289 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #290 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e0d90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e0e40 "\272\306\256{\261\232+j" stat = #291 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e0d90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #292 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #293 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e0c10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e0cc0 "\233\344\271\357\336\301\066\017" stat = #294 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e0c10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #295 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #296 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e0a90, size=, proc=) at xdr_ref.c:84 ---Type to continue, or q to quit--- loc = 0x3fffa40e0b40 "\220\213\277V\263\207z\311" stat = #297 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e0a90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #298 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #299 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e0910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e09c0 "\246\346\232" stat = #300 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e0910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #301 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #302 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e0790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e0840 "\236\272\261\342<\230\243~" stat = #303 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e0790, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #304 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #305 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e0610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e06c0 "\204c\333Jw\333\220\357" stat = #306 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e0610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #307 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #308 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e0490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e0540 "\216\227\360]]7\355H" stat = #309 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e0490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #310 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #311 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e0310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e03c0 "\244\337\226^ 8y[" stat = #312 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e0310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 ---Type to continue, or q to quit--- #313 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #314 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e0190, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e0240 "\221h\026b1%\337\276" stat = #315 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e0190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #316 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #317 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40e0010, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40e00c0 "\216\340\201\341\323\016C\257" stat = #318 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40e0010, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #319 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #320 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dfe90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dff40 "\264s\326\325\302Yi\311" stat = #321 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dfe90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #322 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #323 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dfd10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dfdc0 "\262H\230\322_\231\v\275" stat = #324 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dfd10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #325 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #326 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dfb90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dfc40 "\212\216\017\260\250jn\271" stat = #327 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dfb90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #328 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #329 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dfa10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dfac0 "\245\275\002\323" stat = ---Type to continue, or q to quit--- #330 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dfa10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #331 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #332 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40df890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40df940 "\236\255\n$\316I\274+" stat = #333 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40df890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #334 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #335 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40df710, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40df7c0 "\215\343<\227\221\306\241\306" stat = #336 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40df710, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #337 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #338 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40df590, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40df640 "\245.\221\223\067\374\245\350" stat = #339 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40df590, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #340 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #341 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40df410, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40df4c0 "\223\347\225O\003\341W\363" stat = #342 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40df410, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #343 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #344 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40df290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40df340 "\245\322Yf\344i!\241" stat = #345 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40df290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #346 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. ---Type to continue, or q to quit--- #347 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40df110, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40df1c0 "\274,\306\273\322\031\301\062" stat = #348 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40df110, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #349 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #350 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40def90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40df040 "\210y+c\243\236\024l" stat = #351 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40def90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #352 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #353 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dee10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40deec0 "\251?\201rKcz\313" stat = #354 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dee10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #355 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #356 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dec90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ded40 "\205\022\220u\307\334\330\036" stat = #357 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dec90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #358 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #359 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40deb10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40debc0 "\266\003\001\234)s\b0" stat = #360 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40deb10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #361 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #362 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40de990, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dea40 "\227\241\305Fh\366\217\206" stat = #363 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40de990, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 ---Type to continue, or q to quit--- more_data = 1 #364 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #365 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40de810, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40de8c0 "\200W\030\344 \233i\b" stat = #366 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40de810, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #367 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #368 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40de690, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40de740 "\267\323s\257\214CJH" stat = #369 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40de690, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #370 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #371 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40de510, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40de5c0 "\204\232\213<\307\303Z(" stat = #372 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40de510, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #373 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #374 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40de390, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40de440 "\251)\353UBX\215\223" stat = #375 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40de390, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #376 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #377 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40de210, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40de2c0 "\237\370\230\212\217Zx\307" stat = #378 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40de210, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #379 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #380 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40de090, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40de140 "\233\327dk\247 9\345" ---Type to continue, or q to quit--- stat = #381 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40de090, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #382 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #383 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ddf10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ddfc0 "\222k #384 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ddf10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #385 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #386 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ddd90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dde40 "\237\272\370\207\301\225J$" stat = #387 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ddd90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #388 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #389 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ddc10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ddcc0 "\206\247\351\306\026\321Vz" stat = #390 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ddc10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #391 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #392 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dda90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ddb40 "\275q\267\254\330\247\315\317" stat = #393 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dda90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #394 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #395 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dd910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dd9c0 "\214\216\224\311\ah\265\242" stat = #396 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dd910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #397 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 ---Type to continue, or q to quit--- No symbol table info available. #398 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dd790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dd840 "\237G3\274>z1K" stat = #399 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dd790, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #400 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #401 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dd610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dd6c0 "\261\005 at m\372\326{A" stat = #402 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dd610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #403 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #404 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dd490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dd540 "\232\177\242\251\350\a\023," stat = #405 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dd490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #406 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #407 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dd310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dd3c0 "\242\337\352[\234/\306\224" stat = #408 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dd310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #409 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #410 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dd190, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dd240 "\201\t\247\333t\340\312\341" stat = #411 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dd190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #412 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #413 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dd010, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dd0c0 "\262\356\027p!\347\227X" stat = #414 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dd010, obj_size=, ---Type to continue, or q to quit--- xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #415 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #416 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dce90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dcf40 "\227\333r\267\342\030SX" stat = #417 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dce90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #418 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #419 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dcd10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dcdc0 "\230/\005Z\256]C^" stat = #420 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dcd10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #421 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #422 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dcb90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dcc40 "\244\017\360" stat = #423 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dcb90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #424 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #425 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dca10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dcac0 "\240\023\311|\221\216\324\234" stat = #426 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dca10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #427 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #428 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dc890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dc940 "\242\347Y\346\033\346 \365" stat = #429 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dc890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #430 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #431 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dc710, size=, proc=) at xdr_ref.c:84 ---Type to continue, or q to quit--- loc = 0x3fffa40dc7c0 "\237)mZ\223\016\376n" stat = #432 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dc710, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #433 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #434 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dc590, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dc640 "\221\"\224\352U/\313\342" stat = #435 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dc590, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #436 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #437 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dc410, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dc4c0 "\203\314H\017.P #438 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dc410, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #439 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #440 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dc290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dc340 "\260\320 \356\310\326\365\277" stat = #441 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dc290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #442 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #443 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dc110, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dc1c0 "\273h\020*j\241\377\354" stat = #444 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dc110, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #445 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #446 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dbf90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dc040 "\250\316\a\370'\362Gs" stat = #447 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dbf90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 ---Type to continue, or q to quit--- #448 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #449 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dbe10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dbec0 "\246)i\207\064_yr" stat = #450 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dbe10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #451 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #452 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dbc90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dbd40 "\262\246\312\331T*yx" stat = #453 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dbc90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #454 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #455 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dbb10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dbbc0 "\234\200\350\306\272\360\345\255" stat = #456 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dbb10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #457 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #458 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40db990, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dba40 "\274\034?\nS<\r\"" stat = #459 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40db990, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #460 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #461 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40db810, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40db8c0 "\240Ky3\310;s\246" stat = #462 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40db810, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #463 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #464 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40db690, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40db740 "\233\231\025\311X\327\370\351" stat = ---Type to continue, or q to quit--- #465 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40db690, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #466 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #467 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40db510, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40db5c0 "\235)\313/\245&\314W" stat = #468 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40db510, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #469 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #470 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40db390, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40db440 "\237\033\356Yi@?\342" stat = #471 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40db390, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #472 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #473 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40db210, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40db2c0 "\256\060\364Vw\245\345J" stat = #474 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40db210, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #475 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #476 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40db090, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40db140 "\247\351$dk\036\064\t" stat = #477 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40db090, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #478 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #479 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40daf10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dafc0 "\226\066\231\236\363\201\326\266" stat = #480 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40daf10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #481 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. ---Type to continue, or q to quit--- #482 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dad90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dae40 "\213\204\r\262\004t\377\235" stat = #483 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dad90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #484 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #485 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40dac10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dacc0 "\250E\017\061\303F\314\325" stat = #486 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40dac10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #487 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #488 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40daa90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40dab40 "\270>\\\177\071\035\371(" stat = #489 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40daa90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #490 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #491 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40da910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40da9c0 "\245\024\363\071\216\267\b " stat = #492 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40da910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #493 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #494 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40da790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40da840 "\200\346\203\362y\312\070\351" stat = #495 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40da790, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #496 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #497 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40da610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40da6c0 "\211\246\312N3\205\364\232" stat = #498 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40da610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 ---Type to continue, or q to quit--- more_data = 1 #499 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #500 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40da490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40da540 "\212\273\207\300{;L4" stat = #501 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40da490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #502 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #503 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40da310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40da3c0 "\224\017\250)\315\066bG" stat = #504 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40da310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #505 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #506 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40da190, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40da240 "\275\252}\353xW\354\230" stat = #507 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40da190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #508 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #509 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40da010, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40da0c0 "\266U\251\202\342\314\001\022" stat = #510 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40da010, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #511 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #512 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d9e90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d9f40 "\241\v\242\061\257\361\033\242" stat = #513 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d9e90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #514 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #515 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d9d10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d9dc0 "\211p\"\225\332+\177\373" ---Type to continue, or q to quit--- stat = #516 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d9d10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #517 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #518 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d9b90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d9c40 "\277\370\004\204\302c\016\225" stat = #519 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d9b90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #520 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #521 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d9a10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d9ac0 "\217\375\255A\017\266;\270" stat = #522 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d9a10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #523 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #524 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d9890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d9940 "\277\212p\226w \212\315" stat = #525 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d9890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #526 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #527 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d9710, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d97c0 "\232\364J9_\v\327F" stat = #528 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d9710, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #529 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #530 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d9590, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d9640 "\276L\237\240V\272\070j" stat = #531 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d9590, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #532 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 ---Type to continue, or q to quit--- No symbol table info available. #533 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d9410, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d94c0 "\276j\026\342>1\244\335" stat = #534 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d9410, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #535 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #536 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d9290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d9340 "\213\367\a\270\227:\233\334" stat = #537 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d9290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #538 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #539 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d9110, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d91c0 "\234\333J\331qOaD" stat = #540 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d9110, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #541 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #542 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d8f90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d9040 "\255\304\300\032[\207ux" stat = #543 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d8f90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #544 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #545 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d8e10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d8ec0 "\210\244\346\263\037yx\027" stat = #546 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d8e10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #547 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #548 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d8c90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d8d40 "\257\356\263\370\335\200\242]" stat = #549 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d8c90, obj_size=, ---Type to continue, or q to quit--- xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #550 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #551 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d8b10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d8bc0 "\272j\357^\377M\267K" stat = #552 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d8b10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #553 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #554 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d8990, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d8a40 "\265\v\025\002\231x\210=" stat = #555 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d8990, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #556 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #557 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d8810, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d88c0 "\247vE\350S\022R\337" stat = #558 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d8810, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #559 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #560 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d8690, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d8740 "\232\353\374\334\240\315$j" stat = #561 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d8690, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #562 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #563 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d8510, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d85c0 "\215Q\v\200s\320\066H" stat = #564 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d8510, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #565 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #566 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d8390, size=, proc=) at xdr_ref.c:84 ---Type to continue, or q to quit--- loc = 0x3fffa40d8440 "\232F`kTw\313\221" stat = #567 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d8390, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #568 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #569 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d8210, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d82c0 "\250\227!\251w\354\276e" stat = #570 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d8210, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #571 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #572 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d8090, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d8140 "\210\352\370\225\221Y\243B" stat = #573 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d8090, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #574 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #575 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d7f10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d7fc0 "\255\061F\363\206\277\004\320" stat = #576 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d7f10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #577 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #578 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d7d90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d7e40 "\200\217\025k5:\263[" stat = #579 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d7d90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #580 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #581 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d7c10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d7cc0 "\267\211n~\356\364\062\200" stat = #582 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d7c10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 ---Type to continue, or q to quit--- #583 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #584 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d7a90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d7b40 "\276\342\326\f\236Ty(" stat = #585 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d7a90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #586 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #587 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d7910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d79c0 "\231\342\230\a\364\031\232\376" stat = #588 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d7910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #589 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #590 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d7790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d7840 "\262\264_\313\327\341\364\f" stat = #591 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d7790, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #592 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #593 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d7610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d76c0 "\237\061[\\\306\371iQ" stat = #594 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d7610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #595 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #596 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d7490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d7540 "\216P\240xV\270\365-" stat = #597 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d7490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #598 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #599 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d7310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d73c0 "\212\233\060cI\204;x" stat = ---Type to continue, or q to quit--- #600 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d7310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #601 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #602 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d7190, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d7240 "\275av\323\273\004g\304" stat = #603 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d7190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #604 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #605 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d7010, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d70c0 "\276XQ\310\070\067~\311" stat = #606 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d7010, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #607 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #608 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d6e90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d6f40 "\243\231\270>\366/A\331" stat = #609 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d6e90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #610 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #611 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d6d10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d6dc0 "\246\345*\220\177j\233\340" stat = #612 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d6d10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #613 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #614 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d6b90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d6c40 "\227\325N\320\227b\306k" stat = #615 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d6b90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #616 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. ---Type to continue, or q to quit--- #617 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d6a10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d6ac0 "\207_:6\277\367 #618 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d6a10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #619 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #620 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d6890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d6940 "\274\314.\216\002f\344." stat = #621 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d6890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #622 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #623 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d6710, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d67c0 "\256zq\321\353\346\341\276" stat = #624 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d6710, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #625 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #626 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d6590, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d6640 "\246\001\r\262\351\061\064\n" stat = #627 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d6590, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #628 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #629 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d6410, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d64c0 "\277\357\253\237\266'\341\305" stat = #630 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d6410, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #631 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #632 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d6290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d6340 "\245\233:\002D#\261\366" stat = #633 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d6290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 ---Type to continue, or q to quit--- more_data = 1 #634 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #635 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d6110, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d61c0 "\263\002N\033\330\035\022\353" stat = #636 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d6110, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #637 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #638 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d5f90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d6040 "\215\314\375$\336\200\060\377" stat = #639 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d5f90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #640 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #641 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d5e10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d5ec0 "\276;,F\032\304#\003" stat = #642 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d5e10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #643 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #644 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d5c90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d5d40 "\251H+\300\301\217\366\263" stat = #645 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d5c90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #646 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #647 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d5b10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d5bc0 "\205\326w\235\310\022\361\375" stat = #648 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d5b10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #649 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #650 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d5990, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d5a40 "\202\t\252\r4\"\241\226" ---Type to continue, or q to quit--- stat = #651 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d5990, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #652 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #653 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d5810, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d58c0 "\211\250TR\230\316\322;" stat = #654 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d5810, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #655 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #656 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d5690, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d5740 "\275\027\207$\031\061\207\266" stat = #657 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d5690, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #658 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #659 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d5510, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d55c0 "\243*ew)t\222\064" stat = #660 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d5510, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #661 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #662 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d5390, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d5440 "\247\067\335\360\342\236\207 " stat = #663 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d5390, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #664 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #665 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d5210, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d52c0 "\265y\r\347\261\060\204G" stat = #666 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d5210, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #667 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 ---Type to continue, or q to quit--- No symbol table info available. #668 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d5090, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d5140 "\241\315^\327v*\006\232" stat = #669 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d5090, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #670 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #671 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d4f10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d4fc0 "\234\353(S=\214\267v" stat = #672 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d4f10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #673 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #674 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d4d90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d4e40 "\240F\320Z\214\277\262W" stat = #675 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d4d90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #676 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #677 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d4c10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d4cc0 "\262}\341\266\031\071\b\244" stat = #678 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d4c10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #679 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #680 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d4a90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d4b40 "\244\214\334\b\325t\205\316" stat = #681 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d4a90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #682 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #683 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d4910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d49c0 "\220!\b!\264W\377\251" stat = #684 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d4910, obj_size=, ---Type to continue, or q to quit--- xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #685 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #686 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d4790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d4840 "\251.\373e\232|\323\235" stat = #687 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d4790, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #688 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #689 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d4610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d46c0 "\223\066\351\004\277\333\022Y" stat = #690 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d4610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #691 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #692 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d4490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d4540 "\252\303\005\372\311$\226\267" stat = #693 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d4490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #694 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #695 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d4310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d43c0 "\201j \032\306\232\017\224" stat = #696 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d4310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #697 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #698 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d4190, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d4240 "\226\353}\340\213\"1\236" stat = #699 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d4190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #700 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #701 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d4010, size=, proc=) at xdr_ref.c:84 ---Type to continue, or q to quit--- loc = 0x3fffa40d40c0 "\271\271\241u)\215\235\364" stat = #702 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d4010, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #703 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #704 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d3e90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d3f40 "\222|QJ{\221\335\320" stat = #705 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d3e90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #706 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #707 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d3d10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d3dc0 "\232-\351 at +V<\b" stat = #708 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d3d10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #709 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #710 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d3b90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d3c40 "\267\310\276\071\255\266\065\314" stat = #711 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d3b90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #712 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #713 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d3a10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d3ac0 "\211L\321\361\365\327\272\337" stat = #714 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d3a10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #715 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #716 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d3890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d3940 "\216\227\207\205l\023\240\062" stat = #717 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d3890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 ---Type to continue, or q to quit--- #718 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #719 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d3710, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d37c0 "\236\031\241\255\344d\237\227" stat = #720 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d3710, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #721 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #722 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d3590, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d3640 "\246/\314\340\277\266-\316" stat = #723 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d3590, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #724 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #725 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d3410, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d34c0 "\252\035\bi\310\006\332\242" stat = #726 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d3410, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #727 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #728 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d3290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d3340 "\222\246\062\202+n\344H" stat = #729 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d3290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #730 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #731 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d3110, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d31c0 "\262o\031\231DW\020\371" stat = #732 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d3110, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #733 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #734 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d2f90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d3040 "\254U%\236\a]\350+" stat = ---Type to continue, or q to quit--- #735 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d2f90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #736 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #737 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d2e10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d2ec0 "\236\070\242\205\332Ud\217" stat = #738 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d2e10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #739 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #740 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d2c90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d2d40 "\203l\243\377)\202\202\214" stat = #741 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d2c90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #742 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #743 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d2b10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d2bc0 "\243Uq\262\362e" stat = #744 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d2b10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #745 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #746 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d2990, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d2a40 "\260\274\337\371\220\372\326'" stat = #747 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d2990, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #748 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #749 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d2810, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d28c0 "\202\233\262\231\340\f($" stat = #750 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d2810, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #751 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. ---Type to continue, or q to quit--- #752 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d2690, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d2740 "\260\033:\344\071\v\252\271" stat = #753 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d2690, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #754 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #755 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d2510, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d25c0 "\221\274\206\004|>\253N" stat = #756 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d2510, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #757 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #758 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d2390, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d2440 "\267v\005\030\025\070\aq" stat = #759 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d2390, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #760 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #761 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d2210, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d22c0 "\235\270\345+\304 at 3a" stat = #762 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d2210, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #763 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #764 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d2090, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d2140 "\201Dj9t\326\201\232" stat = #765 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d2090, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #766 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #767 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d1f10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d1fc0 "\231{\222\311\317\353\204H" stat = #768 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d1f10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 ---Type to continue, or q to quit--- more_data = 1 #769 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #770 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d1d90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d1e40 "\271\323\241\064\367,\037\024" stat = #771 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d1d90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #772 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #773 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d1c10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d1cc0 "\243\063p\\\t\221RZ" stat = #774 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d1c10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #775 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #776 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d1a90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d1b40 "\237TH\361Y\203\301B" stat = #777 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d1a90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #778 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #779 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d1910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d19c0 "\257\250\231\317\242\352\262\272" stat = #780 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d1910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #781 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #782 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d1790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d1840 "\246\247%\a\360\037\244\257" stat = #783 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d1790, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #784 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #785 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d1610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d16c0 "\246 -\327\064\347\374]" ---Type to continue, or q to quit--- stat = #786 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d1610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #787 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #788 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d1490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d1540 "\267\216\255\264N\037\031\254" stat = #789 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d1490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #790 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #791 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d1310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d13c0 "\214d\312Jhx/\026" stat = #792 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d1310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #793 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #794 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d1190, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d1240 "\243J\303\355Zp=\345" stat = #795 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d1190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #796 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #797 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d1010, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d10c0 "\270\037b\020Zl\242\246" stat = #798 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d1010, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #799 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #800 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d0e90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d0f40 "\260\032\032^Nac\313" stat = #801 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d0e90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #802 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 ---Type to continue, or q to quit--- No symbol table info available. #803 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d0d10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d0dc0 "\260\230T\250\017~1x" stat = #804 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d0d10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #805 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #806 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d0b90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d0c40 "\200\005x\321\062S\005>" stat = #807 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d0b90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #808 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #809 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d0a10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d0ac0 "\225\302\037\356\030\f\224\257" stat = #810 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d0a10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #811 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #812 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d0890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d0940 "\260\036\024\036\335epN" stat = #813 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d0890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #814 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #815 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d0710, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d07c0 "\223\336\255\002\247\364\361\373" stat = #816 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d0710, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #817 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #818 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d0590, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d0640 "\203\226\232@\245\322\352\032" stat = #819 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d0590, obj_size=, ---Type to continue, or q to quit--- xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #820 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #821 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d0410, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d04c0 "\241\325\346O\005Or\305" stat = #822 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d0410, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #823 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #824 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d0290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d0340 "\276;\325\302\221F+\330" stat = #825 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d0290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #826 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #827 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40d0110, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d01c0 "\241)I\333\251\305\201\225" stat = #828 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40d0110, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #829 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #830 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cff90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40d0040 "\270\064G\223\317)\277\306" stat = #831 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cff90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #832 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #833 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cfe10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cfec0 "\231\241\344\036\031\226\340\231" stat = #834 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cfe10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #835 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #836 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cfc90, size=, proc=) at xdr_ref.c:84 ---Type to continue, or q to quit--- loc = 0x3fffa40cfd40 "\231\373>\364>\021\316i" stat = #837 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cfc90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #838 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #839 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cfb10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cfbc0 "\264\250\236\266pe\273Z" stat = #840 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cfb10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #841 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #842 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cf990, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cfa40 "\255\225\023\037\260c\025\370" stat = #843 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cf990, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #844 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #845 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cf810, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cf8c0 "\214(f9\360'\204\234" stat = #846 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cf810, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #847 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #848 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cf690, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cf740 "\271\b\346\310?\266\b4" stat = #849 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cf690, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #850 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #851 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cf510, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cf5c0 "\205A\333\270NqC\233" stat = #852 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cf510, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 ---Type to continue, or q to quit--- #853 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #854 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cf390, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cf440 "\242\363\205\363\347\266>\231" stat = #855 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cf390, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #856 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #857 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cf210, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cf2c0 "\216p\001b\276\334\017\265" stat = #858 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cf210, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #859 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #860 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cf090, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cf140 "\235\263\001\241\202\325d\226" stat = #861 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cf090, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #862 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #863 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cef10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cefc0 "\211iZ\202\265\372\330f" stat = #864 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cef10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #865 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #866 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ced90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cee40 "\241b\356\364\217?\033\311" stat = #867 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ced90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #868 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #869 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cec10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cecc0 "\216!\333\233m\217N^" stat = ---Type to continue, or q to quit--- #870 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cec10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #871 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #872 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cea90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ceb40 "\271\314sR\342\036K\202" stat = #873 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cea90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #874 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #875 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ce910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ce9c0 "\242w\004\274\063\373\375\274" stat = #876 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ce910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #877 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #878 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ce790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ce840 "\245\235\263\254\214\214\036\361" stat = #879 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ce790, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #880 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #881 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ce610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ce6c0 "\237\327\257\222y\347_\\" stat = #882 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ce610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #883 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #884 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ce490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ce540 "\251\244\033+\300\006\036~" stat = #885 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ce490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #886 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. ---Type to continue, or q to quit--- #887 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ce310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ce3c0 "\222\266\362]\b\343\257\210" stat = #888 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ce310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #889 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #890 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ce190, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ce240 "\230\261\331\225~\r\364\333" stat = #891 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ce190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #892 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #893 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ce010, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ce0c0 "\215\241\200\025\375E*\035" stat = #894 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ce010, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #895 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #896 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cde90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cdf40 "\214_96\352\024\335\234" stat = #897 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cde90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #898 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #899 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cdd10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cddc0 "\272\272\261\235[\360\061\323" stat = #900 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cdd10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #901 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #902 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cdb90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cdc40 "\214\363\307\313\227\237\017\264" stat = #903 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cdb90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 ---Type to continue, or q to quit--- more_data = 1 #904 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #905 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cda10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cdac0 "\220*oA<8\353%" stat = #906 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cda10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #907 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #908 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cd890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cd940 "\275\300\255*\356\255iH" stat = #909 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cd890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #910 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #911 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cd710, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cd7c0 "\273!\001/\\\237\017S" stat = #912 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cd710, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #913 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #914 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cd590, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cd640 "\205q\203\223Y\224\364\334" stat = #915 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cd590, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #916 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #917 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cd410, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cd4c0 "\272\\\020\224\355I\264\300" stat = #918 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cd410, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #919 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #920 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cd290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cd340 "\243\344\037\352i%\004\034" ---Type to continue, or q to quit--- stat = #921 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cd290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #922 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #923 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cd110, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cd1c0 "\266\060\353\263Y\020P\351" stat = #924 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cd110, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #925 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #926 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ccf90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cd040 "\264\223\v\004C\343\377\037" stat = #927 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ccf90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #928 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #929 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cce10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ccec0 "\257/\254\bU\024\060\272" stat = #930 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cce10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #931 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #932 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ccc90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ccd40 "\260\217\355\374\372R\026\236" stat = #933 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ccc90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #934 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #935 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ccb10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ccbc0 "\261\315\376\232\361\245;;" stat = #936 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ccb10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #937 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 ---Type to continue, or q to quit--- No symbol table info available. #938 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cc990, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cca40 "\275\005\217/\317a\365" stat = #939 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cc990, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #940 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #941 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cc810, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cc8c0 "\222\210XN\335\305\247\252" stat = #942 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cc810, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #943 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #944 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cc690, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cc740 "\240\371\364N\216T\252\364" stat = #945 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cc690, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #946 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #947 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cc510, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cc5c0 "\266\343\356\362a\a\250\261" stat = #948 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cc510, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #949 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #950 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cc390, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cc440 "\214\006e\035QNp\210" stat = #951 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cc390, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #952 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #953 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cc210, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cc2c0 "\221\206\333\020\265X!{" stat = #954 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cc210, obj_size=, ---Type to continue, or q to quit--- xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #955 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #956 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cc090, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cc140 "\231z #957 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cc090, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #958 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #959 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cbf10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cbfc0 "\235x\263\243i\001\240\275" stat = #960 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cbf10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #961 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #962 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cbd90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cbe40 "\253\207\311x#\347H\376" stat = #963 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cbd90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #964 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #965 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cbc10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cbcc0 "\215\315p\235\354\206\370L" stat = #966 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cbc10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #967 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #968 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cba90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cbb40 "\246&\330oA\322\357\211" stat = #969 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cba90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #970 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #971 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cb910, size=, proc=) at xdr_ref.c:84 ---Type to continue, or q to quit--- loc = 0x3fffa40cb9c0 "\257\313\252\376\322\064v\006" stat = #972 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cb910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #973 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #974 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cb790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cb840 "\212\022\f\317\231\343\330[" stat = #975 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cb790, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #976 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #977 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cb610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cb6c0 "\240\314\234n\204\324\217/" stat = #978 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cb610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #979 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #980 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cb490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cb540 "\271\235O\256':\222\207" stat = #981 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cb490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #982 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #983 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cb310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cb3c0 "\257X\234\254x\022\030\027" stat = #984 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cb310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #985 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #986 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cb190, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cb240 "\255\330\334\071J\245l\374" stat = #987 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cb190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 ---Type to continue, or q to quit--- #988 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #989 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cb010, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cb0c0 "\217\063\275\337)\365\225\231" stat = #990 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cb010, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #991 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #992 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cae90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40caf40 "\224vA\365C\245p\326" stat = #993 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cae90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #994 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #995 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cad10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cadc0 "\201\033\222?[\314nw" stat = #996 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cad10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #997 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #998 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40cab90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40cac40 "\247\236\023\333\200\220Ql" stat = #999 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40cab90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1000 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1001 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40caa10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40caac0 "\220[\020\036\213V6!" stat = #1002 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40caa10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1003 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1004 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ca890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ca940 "\234k:\036" stat = ---Type to continue, or q to quit--- #1005 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ca890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1006 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1007 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ca710, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ca7c0 "\210`\230\371x\302p\362" stat = #1008 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ca710, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1009 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1010 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ca590, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ca640 "\207k=\254\333\244\r2" stat = #1011 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ca590, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1012 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1013 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ca410, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ca4c0 "\240!\242]0/\260;" stat = #1014 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ca410, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1015 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1016 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ca290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ca340 "\267\001\001\224\203\216\223\200" stat = #1017 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ca290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1018 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1019 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ca110, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ca1c0 "\243!\306G\222\303V\372" stat = #1020 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ca110, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1021 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. ---Type to continue, or q to quit--- #1022 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c9f90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ca040 "\264\314\003\375\a\263\260Q" stat = #1023 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c9f90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1024 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1025 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c9e10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c9ec0 "\277\027\241\034\375)\207\r" stat = #1026 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c9e10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1027 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1028 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c9c90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c9d40 "\213\231H\003\235\322\226\305" stat = #1029 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c9c90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1030 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1031 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c9b10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c9bc0 "\243Q&&\200\a\324\241" stat = #1032 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c9b10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1033 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1034 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c9990, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c9a40 "\215\234$\243\350\030\243\242" stat = #1035 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c9990, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1036 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1037 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c9810, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c98c0 "\224\206\207\361\343O8s" stat = #1038 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c9810, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 ---Type to continue, or q to quit--- more_data = 1 #1039 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1040 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c9690, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c9740 "\254s\233\177\245I\372-" stat = #1041 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c9690, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1042 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1043 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c9510, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c95c0 "\220\270\310\201\035\033}\t" stat = #1044 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c9510, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1045 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1046 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c9390, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c9440 "\217j|\261\265O\211<" stat = #1047 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c9390, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1048 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1049 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c9210, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c92c0 "\241\305\016\254g`\273\274" stat = #1050 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c9210, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1051 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1052 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c9090, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c9140 "\250\226\260.P\026\071\036" stat = #1053 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c9090, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1054 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1055 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c8f10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c8fc0 "\204\376\017\035\335\033\253\206" ---Type to continue, or q to quit--- stat = #1056 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c8f10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1057 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1058 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c8d90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c8e40 "\201\206\064Z>\240a" stat = #1059 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c8d90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1060 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1061 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c8c10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c8cc0 "\252%\264\201\332\336(\277" stat = #1062 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c8c10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1063 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1064 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c8a90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c8b40 "\216\372\231\252o\223\255$" stat = #1065 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c8a90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1066 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1067 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c8910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c89c0 "\230Y\302\b\365E\225y" stat = #1068 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c8910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1069 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1070 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c8790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c8840 "\224@\v\371\244\351\"\234" stat = #1071 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c8790, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1072 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 ---Type to continue, or q to quit--- No symbol table info available. #1073 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c8610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c86c0 "\227\017\261R$\257uH" stat = #1074 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c8610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1075 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1076 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c8490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c8540 "\230$&\322\321\344\205\301" stat = #1077 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c8490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1078 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1079 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c8310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c83c0 "\275\304\255\344\300\001\211\365" stat = #1080 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c8310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1081 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1082 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c8190, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c8240 "\256\333\333\347\070\304\343\251" stat = #1083 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c8190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1084 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1085 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c8010, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c80c0 "\255\345\224x\260\331\252\005" stat = #1086 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c8010, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1087 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1088 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c7e90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c7f40 "\213\212em\234\376\337\233" stat = #1089 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c7e90, obj_size=, ---Type to continue, or q to quit--- xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1090 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1091 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c7d10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c7dc0 "\260\206vH\213\202\235\357" stat = #1092 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c7d10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1093 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1094 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c7b90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c7c40 "\255\v\324\301%\206\252\357" stat = #1095 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c7b90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1096 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1097 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c7a10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c7ac0 "\271\221;C\002\200\265Q" stat = #1098 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c7a10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1099 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1100 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c7890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c7940 "\235\257\267\305\031(\f\207" stat = #1101 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c7890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1102 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1103 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c7710, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c77c0 "\260\017\205\372g\273\037\343" stat = #1104 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c7710, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1105 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1106 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c7590, size=, proc=) at xdr_ref.c:84 ---Type to continue, or q to quit--- loc = 0x3fffa40c7640 "\273}m0\327\260\067\244" stat = #1107 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c7590, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1108 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1109 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c7410, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c74c0 "\273\022\306\226\207\337\374\241" stat = #1110 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c7410, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1111 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1112 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c7290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c7340 "\246k\005\332\252\371\"z" stat = #1113 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c7290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1114 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1115 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c7110, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c71c0 "\225R\\Z\020\341R\220" stat = #1116 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c7110, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1117 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1118 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c6f90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c7040 "\260\362\001\003+ds\241" stat = #1119 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c6f90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1120 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1121 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c6e10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c6ec0 "\275\337e #1122 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c6e10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 ---Type to continue, or q to quit--- #1123 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1124 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c6c90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c6d40 "\255W\300\346\016\316\360\241" stat = #1125 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c6c90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1126 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1127 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c6b10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c6bc0 "\275?*\332\276k\203\332" stat = #1128 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c6b10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1129 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1130 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c6990, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c6a40 "\234\022\066\034\322W\352\035" stat = #1131 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c6990, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1132 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1133 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c6810, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c68c0 "\204\264&\224^\233\025\006" stat = #1134 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c6810, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1135 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1136 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c6690, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c6740 "\213\020\067|\250\350\035Y" stat = #1137 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c6690, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1138 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1139 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c6510, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c65c0 "\275h\030\016\331E\025\363" stat = ---Type to continue, or q to quit--- #1140 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c6510, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1141 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1142 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c6390, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c6440 "\270\033'\263D\373\357\036" stat = #1143 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c6390, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1144 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1145 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c6210, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c62c0 "\262\336!p\272\022\r\210" stat = #1146 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c6210, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1147 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1148 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c6090, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c6140 "\243\225V9[\241\032\243" stat = #1149 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c6090, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1150 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1151 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c5f10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c5fc0 "\210\354x\264\235\224X\307" stat = #1152 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c5f10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1153 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1154 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c5d90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c5e40 "\222\224\355\320\212\247\273\266" stat = #1155 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c5d90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1156 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. ---Type to continue, or q to quit--- #1157 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c5c10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c5cc0 "\252\324\230\266qL\321F" stat = #1158 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c5c10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1159 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1160 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c5a90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c5b40 "\200\025,\237\314#\033\"" stat = #1161 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c5a90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1162 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1163 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c5910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c59c0 "\256\036\033\314\220\235j\375" stat = #1164 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c5910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1165 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1166 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c5790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c5840 "\271]\354\003-z\247\364" stat = #1167 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c5790, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1168 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1169 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c5610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c56c0 "\274\267\332#\f\017k\236" stat = #1170 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c5610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1171 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1172 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c5490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c5540 "\233\251\370WlX\231J" stat = #1173 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c5490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 ---Type to continue, or q to quit--- more_data = 1 #1174 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1175 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c5310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c53c0 "\263\371\237Fc\371\240\240" stat = #1176 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c5310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1177 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1178 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c5190, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c5240 "\241\343W\234`\264\224\344" stat = #1179 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c5190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1180 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1181 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c5010, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c50c0 "\201\304\215\302\277\203\231)" stat = #1182 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c5010, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1183 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1184 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c4e90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c4f40 "\212\322^\341O\v\334b" stat = #1185 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c4e90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1186 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1187 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c4d10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c4dc0 "\203\032g\207\314J\016\302" stat = #1188 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c4d10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1189 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1190 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c4b90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c4c40 "\257!\004\317\361\355P2" ---Type to continue, or q to quit--- stat = #1191 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c4b90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1192 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1193 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c4a10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c4ac0 "\212[\271:E>\310\334" stat = #1194 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c4a10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1195 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1196 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c4890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c4940 "\225e\331\234/\251iK" stat = #1197 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c4890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1198 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1199 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c4710, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c47c0 "\261t\f\335\255\215\300\035" stat = #1200 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c4710, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1201 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1202 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c4590, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c4640 "\203\244=?\243}\223c" stat = #1203 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c4590, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1204 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1205 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c4410, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c44c0 "\216\364\330\207\016\341#\264" stat = #1206 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c4410, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1207 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 ---Type to continue, or q to quit--- No symbol table info available. #1208 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c4290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c4340 "\276\024\303\257BoU\320" stat = #1209 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c4290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1210 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1211 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c4110, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c41c0 "\225\323}G\307qN1" stat = #1212 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c4110, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1213 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1214 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c3f90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c4040 "\232\223\253!\347\016')" stat = #1215 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c3f90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1216 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1217 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c3e10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c3ec0 "\223\312[\r\024t\340I" stat = #1218 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c3e10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1219 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1220 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c3c90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c3d40 "\231\226\354\067\027\352\363j" stat = #1221 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c3c90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1222 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1223 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c3b10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c3bc0 "\224P\232y~\217\070\230" stat = #1224 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c3b10, obj_size=, ---Type to continue, or q to quit--- xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1225 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1226 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c3990, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c3a40 "\202o\"\275\320\373\205\314" stat = #1227 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c3990, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1228 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1229 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c3810, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c38c0 "\242\312qi\237q\374f" stat = #1230 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c3810, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1231 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1232 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c3690, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c3740 "\275\375+\320\230\354\365\216" stat = #1233 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c3690, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1234 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1235 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c3510, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c35c0 "\253|b\314J\274\257\017" stat = #1236 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c3510, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1237 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1238 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c3390, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c3440 "\206\327B\364\351xM\r" stat = #1239 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c3390, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1240 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1241 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c3210, size=, proc=) at xdr_ref.c:84 ---Type to continue, or q to quit--- loc = 0x3fffa40c32c0 "\250\017\061\321\373\363\376C" stat = #1242 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c3210, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1243 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1244 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c3090, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c3140 "\222\231\256\213\353\070\200\307" stat = #1245 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c3090, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1246 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1247 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c2f10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c2fc0 "\262\017Q8\202So\263" stat = #1248 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c2f10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1249 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1250 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c2d90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c2e40 "\212\034\211\201Wr\234\275" stat = #1251 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c2d90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1252 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1253 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c2c10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c2cc0 "\206\221\203\256j\262\370+" stat = #1254 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c2c10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1255 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1256 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c2a90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c2b40 "\207rH\346\331o\222\016" stat = #1257 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c2a90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 ---Type to continue, or q to quit--- #1258 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1259 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c2910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c29c0 "\235\200.e\302j;\312" stat = #1260 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c2910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1261 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1262 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c2790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c2840 "\226\201\240\306?\246\204\371" stat = #1263 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c2790, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1264 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1265 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c2610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c26c0 "\237\002\006\325\n\277\222\352" stat = #1266 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c2610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1267 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1268 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c2490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c2540 "\237\252\026\333*{\264\017" stat = #1269 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c2490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1270 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1271 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c2310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c23c0 "\233`w\357\371\312\372\224" stat = #1272 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c2310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1273 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1274 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c2190, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c2240 "\222li\316\230W\261\247" stat = ---Type to continue, or q to quit--- #1275 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c2190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1276 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1277 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c2010, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c20c0 "\247\a\035\216\337EOO" stat = #1278 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c2010, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1279 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1280 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c1e90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c1f40 "\202F\277\060\230\060\304\207" stat = #1281 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c1e90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1282 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1283 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c1d10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c1dc0 "\245\246\213\v\016Q+\331" stat = #1284 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c1d10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1285 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1286 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c1b90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c1c40 "\260\377\066&\215\032Z\260" stat = #1287 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c1b90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1288 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1289 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c1a10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c1ac0 "\201 \016x\fje," stat = #1290 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c1a10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1291 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. ---Type to continue, or q to quit--- #1292 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c1890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c1940 "\267e\314?\r\307\213\001" stat = #1293 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c1890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1294 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1295 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c1710, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c17c0 "\272q\227\202\324#\210," stat = #1296 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c1710, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1297 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1298 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c1590, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c1640 "\261\303A\356\205\247\005\247" stat = #1299 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c1590, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1300 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1301 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c1410, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c14c0 "\201c\312\321\313\333rP" stat = #1302 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c1410, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1303 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1304 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c1290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c1340 "\231\225\314\302s\250\n0" stat = #1305 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c1290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1306 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1307 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c1110, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c11c0 "\201\376\254\017\355\347\377\277" stat = #1308 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c1110, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 ---Type to continue, or q to quit--- more_data = 1 #1309 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1310 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c0f90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c1040 "\250\371]\254\027f\316\223" stat = #1311 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c0f90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1312 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1313 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c0e10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c0ec0 "\262\032^\201\324\320\v\243" stat = #1314 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c0e10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1315 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1316 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c0c90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c0d40 "\251\253\065\226\"x\350\a" stat = #1317 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c0c90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1318 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1319 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c0b10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c0bc0 "\224\330\340\024\361\224[\213" stat = #1320 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c0b10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1321 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1322 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c0990, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c0a40 "\255\021\302\235\343%\021@" stat = #1323 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c0990, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1324 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1325 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c0810, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c08c0 "\274\375w\006\067d\350\272" ---Type to continue, or q to quit--- stat = #1326 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c0810, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1327 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1328 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c0690, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c0740 "\266\261\310\235\036;\306\277" stat = #1329 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c0690, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1330 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1331 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c0510, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c05c0 "\225\367\216K\016\233i\001" stat = #1332 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c0510, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1333 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1334 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c0390, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c0440 "\205\331\034'\v}\024w" stat = #1335 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c0390, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1336 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1337 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c0210, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c02c0 "\274\063\266\350\240\321h\257" stat = #1338 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c0210, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1339 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1340 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40c0090, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40c0140 "\275\360hu\372P\036 " stat = #1341 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40c0090, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1342 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 ---Type to continue, or q to quit--- No symbol table info available. #1343 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bff10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bffc0 "\251d\314\366\277\301D\215" stat = #1344 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bff10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1345 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1346 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bfd90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bfe40 "\250\366$\335\023\230z(" stat = #1347 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bfd90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1348 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1349 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bfc10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bfcc0 "\270\200\373j\362\227\305\b" stat = #1350 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bfc10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1351 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1352 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bfa90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bfb40 "\257x\341^;\347\300\357" stat = #1353 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bfa90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1354 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1355 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bf910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bf9c0 "\255_|\261\221\201\267\064" stat = #1356 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bf910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1357 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1358 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bf790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bf840 "\270\312r\263~\240\020\231" stat = #1359 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bf790, obj_size=, ---Type to continue, or q to quit--- xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1360 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1361 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bf610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bf6c0 "\242~Rd.{EL" stat = #1362 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bf610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1363 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1364 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bf490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bf540 "\227]\261-\213u\242\305" stat = #1365 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bf490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1366 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1367 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bf310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bf3c0 "\235\244\273\020\364Lue" stat = #1368 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bf310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1369 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1370 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bf190, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bf240 "\200\035\250\020f\373\273\002" stat = #1371 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bf190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1372 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1373 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bf010, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bf0c0 "\262;\257\337uw6#" stat = #1374 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bf010, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1375 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1376 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bee90, size=, proc=) at xdr_ref.c:84 ---Type to continue, or q to quit--- loc = 0x3fffa40bef40 "\217D\267\370\067\230HS" stat = #1377 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bee90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1378 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1379 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bed10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bedc0 "\256\230\332\327\251G\026l" stat = #1380 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bed10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1381 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1382 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40beb90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bec40 "\223R\035\r\363\004\016\321" stat = #1383 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40beb90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1384 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1385 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bea10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40beac0 "\224F\016\363\313\356A6" stat = #1386 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bea10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1387 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1388 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40be890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40be940 "\237E\257\316=\344\061:" stat = #1389 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40be890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1390 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1391 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40be710, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40be7c0 "\266>\242j\032]t{" stat = #1392 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40be710, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 ---Type to continue, or q to quit--- #1393 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1394 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40be590, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40be640 "\225q\241\032\312\272\276z" stat = #1395 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40be590, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1396 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1397 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40be410, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40be4c0 "\247\242\036\234'\357\021s" stat = #1398 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40be410, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1399 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1400 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40be290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40be340 "\227\004c\266\020c$(" stat = #1401 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40be290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1402 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1403 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40be110, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40be1c0 "\200\217\221\035\244u\214j" stat = #1404 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40be110, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1405 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1406 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bdf90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40be040 "\252\265\346]\031\065\374\221" stat = #1407 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bdf90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1408 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1409 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bde10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bdec0 "\213h ---Type to continue, or q to quit--- #1410 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bde10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1411 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1412 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bdc90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bdd40 "\253\375\310\027\033\317t\233" stat = #1413 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bdc90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1414 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1415 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bdb10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bdbc0 "\261D\237\325\006X\277\323" stat = #1416 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bdb10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1417 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1418 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bd990, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bda40 "\211\375\212\026\320\021\307\\" stat = #1419 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bd990, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1420 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1421 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bd810, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bd8c0 "\262\375gUz\027\037\"" stat = #1422 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bd810, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1423 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1424 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bd690, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bd740 "\255\066\212_C\206\253S" stat = #1425 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bd690, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1426 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. ---Type to continue, or q to quit--- #1427 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bd510, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bd5c0 "\226~\341i{(x\217" stat = #1428 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bd510, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1429 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1430 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bd390, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bd440 "\207\312\005\200\364L\202\326" stat = #1431 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bd390, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1432 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1433 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bd210, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bd2c0 "\263g\326\353t\241s\215" stat = #1434 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bd210, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1435 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1436 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bd090, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bd140 "\245\035\227\273o\300^\001" stat = #1437 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bd090, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1438 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1439 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bcf10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bcfc0 "\267\001\241\265\027!\275\335" stat = #1440 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bcf10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1441 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1442 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bcd90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bce40 "\211\245\024C\267\212X\247" stat = #1443 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bcd90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 ---Type to continue, or q to quit--- more_data = 1 #1444 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1445 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bcc10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bccc0 "\272_<#^\024d6" stat = #1446 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bcc10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1447 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1448 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bca90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bcb40 "\250A\r\324\341\360" stat = #1449 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bca90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1450 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1451 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bc910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bc9c0 "\203!n\264\b#{k" stat = #1452 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bc910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1453 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1454 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bc790, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bc840 "\261\031)9\004]hl" stat = #1455 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bc790, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1456 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1457 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bc610, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bc6c0 "\217\030Y\310\274\344\203\317" stat = #1458 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bc610, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1459 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1460 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bc490, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bc540 "\251\354\337\263\321\067B" ---Type to continue, or q to quit--- stat = #1461 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bc490, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1462 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1463 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bc310, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bc3c0 "\237&?\307\215)\332\267" stat = #1464 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bc310, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1465 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1466 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bc190, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bc240 "\265\304H\241%\341*\035" stat = #1467 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bc190, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1468 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1469 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bc010, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bc0c0 "\226h\272b$T\275\303" stat = #1470 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bc010, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1471 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1472 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bbe90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bbf40 "\261\221\312a\225\367/\312" stat = #1473 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bbe90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1474 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1475 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bbd10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bbdc0 "\223\200^,\223\037}\241" stat = #1476 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bbd10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1477 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 ---Type to continue, or q to quit--- No symbol table info available. #1478 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bbb90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bbc40 "\216\303l\245d\332\223'" stat = #1479 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bbb90, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1480 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1481 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bba10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bbac0 "\215Pr\263\374vF\315" stat = #1482 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bba10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1483 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1484 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bb890, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bb940 "\220\036\177\016\311A\273\266" stat = #1485 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bb890, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1486 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1487 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bb760, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bb7c0 "\215v)\232K\206t\t" stat = #1488 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bb760, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1489 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1490 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bb630, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bb690 "\257\257\252P\n\310\343\257" stat = #1491 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bb630, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1492 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1493 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bb500, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bb560 "\241\071P/\271\305q;" stat = #1494 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bb500, obj_size=, ---Type to continue, or q to quit--- xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1495 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1496 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bb3d0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bb430 "\237\027a\025\017%\205\006" stat = #1497 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bb3d0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1498 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1499 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bb2a0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bb300 "\205#\202\065\fay\026" stat = #1500 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bb2a0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1501 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1502 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bb170, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bb1d0 "\263\066@\003\276}" stat = #1503 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bb170, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1504 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1505 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bb040, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bb0a0 "\257\260Q\313<\320\351\245" stat = #1506 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bb040, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1507 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1508 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40baf10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40baf70 "\266\277\215\367\201\200\315e" stat = #1509 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40baf10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1510 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1511 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bade0, size=, proc=) at xdr_ref.c:84 ---Type to continue, or q to quit--- loc = 0x3fffa40bae40 "\237\064\367\311\313\355\265\217" stat = #1512 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bade0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1513 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1514 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bacb0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40bad10 "\233N\n\267{\377x7" stat = #1515 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bacb0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1516 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1517 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40bab80, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40babe0 "\215\336R\345\347\235q6" stat = #1518 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40bab80, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1519 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1520 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40baa50, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40baab0 "\264\303\356\270N\243pU" stat = #1521 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40baa50, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1522 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1523 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ba920, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ba980 "\245\031R\034\067n\217\201" stat = #1524 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ba920, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1525 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1526 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ba7f0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ba850 "\236\234&\bM\264\367\270" stat = #1527 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ba7f0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 ---Type to continue, or q to quit--- #1528 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1529 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ba6c0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ba720 "\250\326\310\235\022\355\251\257" stat = #1530 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ba6c0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1531 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1532 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ba590, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ba5f0 "\247oM at li\031\257" stat = #1533 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ba590, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1534 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1535 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ba460, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ba4c0 "\207\353\351\376\312\025\235\215" stat = #1536 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ba460, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1537 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1538 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ba330, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ba390 "\224\304\330>\247\343\214\342" stat = #1539 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ba330, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1540 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1541 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ba200, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ba260 "\220\376\305\316\315\026\307`" stat = #1542 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ba200, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1543 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1544 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40ba0d0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ba130 "\200\v\025\017\244\224W\367" stat = ---Type to continue, or q to quit--- #1545 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40ba0d0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1546 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1547 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b9fa0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40ba000 "\272\300\032\220 #1548 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b9fa0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1549 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1550 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b9e70, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b9ed0 "\261\330\275,\204\267\225v" stat = #1551 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b9e70, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1552 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1553 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b9d40, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b9da0 "\215\223~\224\325\315[/" stat = #1554 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b9d40, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1555 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1556 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b9c10, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b9c70 "\251\320\\\307\362\256\246#" stat = #1557 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b9c10, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1558 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1559 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b9ae0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b9b40 "\202\220h\300\372\207\362\004" stat = #1560 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b9ae0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1561 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. ---Type to continue, or q to quit--- #1562 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b99b0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b9a10 "\247m\a\217\276lx\252" stat = #1563 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b99b0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1564 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1565 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b9880, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b98e0 "\267t\331\213\351q\316\237" stat = #1566 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b9880, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1567 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1568 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b9750, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b97b0 "\201m\316\367\312i\316\070" stat = #1569 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b9750, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1570 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1571 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b9620, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b9680 "\270\217UtR\344\377z" stat = #1572 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b9620, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1573 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1574 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b94f0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b9550 "\205\331\305c\f\210\304O" stat = #1575 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b94f0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1576 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1577 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b93c0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b9420 "\262\215\320Uq\336\340n" stat = #1578 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b93c0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 ---Type to continue, or q to quit--- more_data = 1 #1579 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1580 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b9290, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b92f0 "\241\365" stat = #1581 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b9290, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1582 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1583 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b9160, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b91c0 "\274\232\245L\024c&5" stat = #1584 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b9160, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1585 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1586 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b9030, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b9090 "\270|\352\324\365\324n\017" stat = #1587 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b9030, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1588 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1589 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b8f00, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b8f60 "\200" stat = #1590 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b8f00, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1591 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1592 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b8dd0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b8e30 "\264B\016\032z!\307\370" stat = #1593 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b8dd0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1594 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1595 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b8ca0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b8d00 "\253T\266`\240\342\270z" ---Type to continue, or q to quit--- stat = #1596 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b8ca0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1597 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1598 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b8b70, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b8bd0 "\231\021\206\002\346\256\265\315" stat = #1599 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b8b70, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1600 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1601 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b8a40, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b8aa0 "\265\223\271;U. \326" stat = #1602 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b8a40, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1603 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1604 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b8910, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b8970 "\256\025\262\201_\277\356\026" stat = #1605 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b8910, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1606 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1607 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b87e0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b8840 "\255]s/\032\363\304S" stat = #1608 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b87e0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1609 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1610 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b86b0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b8710 "\257\062\222\243\070D\330\335" stat = #1611 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b86b0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1612 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 ---Type to continue, or q to quit--- No symbol table info available. #1613 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b8580, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b85e0 "\222\314\337z,1\376\004" stat = #1614 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b8580, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1615 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1616 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b8450, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b84b0 "\222\222\372\070\300\237\206_" stat = #1617 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b8450, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1618 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1619 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b8320, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b8380 "\241\202}\320aN\361\061" stat = #1620 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b8320, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1621 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1622 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b81f0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b8250 "\203\f\240\302t\221~\362" stat = #1623 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b81f0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1624 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1625 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b80c0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b8120 "\224\250\322]+\024\247\245" stat = #1626 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b80c0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1627 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1628 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b7f90, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b7ff0 "\233{d~@A>\235" stat = #1629 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b7f90, obj_size=, ---Type to continue, or q to quit--- xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1630 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1631 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b7e60, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b7ec0 "\225\016\237\257\"d!\210" stat = #1632 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b7e60, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1633 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1634 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffa40b7c30, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b7d90 "\377\377\377\377\377\377\377\377" stat = #1635 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffa40b7c30, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1636 0x00003fffb370cec0 in .xdr_gfx_dirplist () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1637 0x00003fffb362ec28 in __GI_xdr_reference (xdrs=0x3fffad550d20, pp=0x3fffad550fe0, size=, proc=) at xdr_ref.c:84 loc = 0x3fffa40b7b60 "" stat = #1638 0x00003fffb362ee04 in __GI_xdr_pointer (xdrs=0x3fffad550d20, objpp=0x3fffad550fe0, obj_size=, xdr_obj=@0x3fffb37294b0: 0x3fffb370cdc0 <.xdr_gfx_dirplist>) at xdr_ref.c:135 more_data = 1 #1639 0x00003fffb3711348 in .xdr_gfx_readdirp_rsp () from /usr/lib64/libgfxdr.so.0 No symbol table info available. #1640 0x00003fffb362f120 in __GI_xdr_sizeof (func=, data=) at xdr_sizeof.c:157 x = {x_op = XDR_ENCODE, x_ops = 0x3fffad550cd0, x_public = 0x3fffa8029d00 "", x_private = 0x3fffa403fba0 "", x_base = 0x24 , x_handy = 110908} ops = {x_getlong = @0x3fffb3691b20: 0x3fffb362eea0 , x_putlong = @0x3fffb3691b80: 0x3fffb362f030 <.x_putlong>, x_getbytes = @0x3fffb3691b20: 0x3fffb362eea0 , x_putbytes = @0x3fffb3691ad8: 0x3fffb362ee30 , x_getpostn = @0x3fffb3691af0: 0x3fffb362ee60 , x_setpostn = @0x3fffb3691b08: 0x3fffb362ee80 , x_inline = @0x3fffb3691b68: 0x3fffb362ef60 , x_destroy = @0x3fffb3691b50: 0x3fffb362eef0 , x_getint32 = @0x3fffb3691b20: 0x3fffb362eea0 , x_putint32 = @0x3fffb3691b38: 0x3fffb362eec0 } stat = #1641 0x00003fffaef8d92c in ?? () from /usr/lib64/glusterfs/5.4/xlator/protocol/server.so No symbol table info available. #1642 0x00003fffaef8db88 in ?? () from /usr/lib64/glusterfs/5.4/xlator/protocol/server.so No symbol table info available. #1643 0x00003fffaefe54c0 in ?? () from /usr/lib64/glusterfs/5.4/xlator/protocol/server.so No symbol table info available. #1644 0x00003fffaf04955c in ?? () from /usr/lib64/glusterfs/5.4/xlator/debug/io-stats.so No symbol table info available. ---Type to continue, or q to quit--- #1645 0x00003fffaf0f0984 in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/marker.so No symbol table info available. #1646 0x00003fffb383a098 in .default_readdirp_cbk () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #1647 0x00003fffaf150e94 in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/upcall.so No symbol table info available. #1648 0x00003fffaf20757c in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/locks.so No symbol table info available. #1649 0x00003fffaf25542c in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/access-control.so No symbol table info available. #1650 0x00003fffaf27fbe0 in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/bitrot-stub.so No symbol table info available. #1651 0x00003fffaf385ef0 in ?? () from /usr/lib64/glusterfs/5.4/xlator/storage/posix.so No symbol table info available. #1652 0x00003fffaf386990 in ?? () from /usr/lib64/glusterfs/5.4/xlator/storage/posix.so No symbol table info available. #1653 0x00003fffb384315c in .default_readdirp () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #1654 0x00003fffb384315c in .default_readdirp () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #1655 0x00003fffaf279b28 in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/bitrot-stub.so No symbol table info available. #1656 0x00003fffaf25056c in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/access-control.so No symbol table info available. #1657 0x00003fffaf1fdaa4 in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/locks.so No symbol table info available. #1658 0x00003fffb384315c in .default_readdirp () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #1659 0x00003fffb384315c in .default_readdirp () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #1660 0x00003fffb384315c in .default_readdirp () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #1661 0x00003fffaf15c37c in ?? () from /usr/lib64/glusterfs/5.4/xlator/features/upcall.so No symbol table info available. #1662 0x00003fffb38668a4 in .default_readdirp_resume () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #1663 0x00003fffb37c0c64 in .call_resume_wind () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #1664 0x00003fffb37c1694 in .call_resume () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #1665 0x00003fffaf13d4c8 in ?? () from /usr/lib64/glusterfs/5.4/xlator/performance/io-threads.so No symbol table info available. #1666 0x00003fffb36a3b30 in start_thread (arg=0x3fffad553160) at pthread_create.c:462 pd = 0x3fffad553160 now = ---Type to continue, or q to quit--- unwind_buf = {cancel_jmp_buf = {{jmp_buf = {70367268161864, 70367268161880, 70367268161880, 70367268161896, 70367268161896, 70367268161912, 70367268161912, 70367268161928, 70367268161928, 70367268161944, 70367268161944, 70367268161960, 70367268161960, 70367268161976, 70367268161976, 70367268161992, 70367268161992, 70367268162008, 70367268162008, 70367268162024, 70367268162024, 70367268162040, 70367268162040, 68719476752, 68719476737, 4294967296, 0, 0, 0, 0, 0, 0, 4096, 0, 262144, 0, 0, 72057594037927936, 70367267891312, 262144, 282574488338432, 0, 0, 0, -4995072473058770944, 309, 16, 0, 1, 0, -1, 0, -1, 0 }, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) (gdb) (gdb) (gdb) From abhishpaliwal at gmail.com Wed Mar 13 05:12:56 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Wed, 13 Mar 2019 10:42:56 +0530 Subject: [Gluster-users] Glusterfsd crashed with SIGSEGV In-Reply-To: References: Message-ID: logs for libgfrpc.so pabhishe at arn-build3$ldd ./5.4-r0/packages-split/glusterfs/usr/lib64/libgfrpc.so.* ./5.4-r0/packages-split/glusterfs/usr/lib64/libgfrpc.so.0: not a dynamic executable ./5.4-r0/packages-split/glusterfs/usr/lib64/libgfrpc.so.0.0.1: not a dynamic executable On Wed, Mar 13, 2019 at 10:02 AM ABHISHEK PALIWAL wrote: > Here are the logs: > > > pabhishe at arn-build3$ldd > ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.* > ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.0: > not a dynamic executable > ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.0.0.1: > not a dynamic executable > pabhishe at arn-build3$ldd > ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.0.0.1 > not a dynamic executable > > > For backtraces I have attached the core_logs.txt file. > > Regards, > Abhishek > > On Wed, Mar 13, 2019 at 9:51 AM Amar Tumballi Suryanarayan < > atumball at redhat.com> wrote: > >> Hi Abhishek, >> >> Few more questions, >> >> >>> On Tue, Mar 12, 2019 at 10:58 AM ABHISHEK PALIWAL < >>> abhishpaliwal at gmail.com> wrote: >>> >>>> Hi Amar, >>>> >>>> Below are the requested logs >>>> >>>> pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libglusterfs.so >>>> not a dynamic executable >>>> >>>> pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libgfrpc.so >>>> not a dynamic executable >>>> >>>> >> Can you please add a * at the end, so it gets the linked library list >> from the actual files (ideally this is a symlink, but I expected it to >> resolve like in Fedora). >> >> >> >>> root at 128:/# gdb /usr/sbin/glusterd core.1099 >>>> GNU gdb (GDB) 7.10.1 >>>> Copyright (C) 2015 Free Software Foundation, Inc. >>>> License GPLv3+: GNU GPL version 3 or later < >>>> http://gnu.org/licenses/gpl.html> >>>> This is free software: you are free to change and redistribute it. >>>> There is NO WARRANTY, to the extent permitted by law. Type "show >>>> copying" >>>> and "show warranty" for details. >>>> This GDB was configured as "powerpc64-wrs-linux". >>>> Type "show configuration" for configuration details. >>>> For bug reporting instructions, please see: >>>> . >>>> Find the GDB manual and other documentation resources online at: >>>> . >>>> For help, type "help". >>>> Type "apropos word" to search for commands related to "word"... >>>> Reading symbols from /usr/sbin/glusterd...(no debugging symbols >>>> found)...done. >>>> [New LWP 1109] >>>> [New LWP 1101] >>>> [New LWP 1105] >>>> [New LWP 1110] >>>> [New LWP 1099] >>>> [New LWP 1107] >>>> [New LWP 1119] >>>> [New LWP 1103] >>>> [New LWP 1112] >>>> [New LWP 1116] >>>> [New LWP 1104] >>>> [New LWP 1239] >>>> [New LWP 1106] >>>> [New LWP 1111] >>>> [New LWP 1108] >>>> [New LWP 1117] >>>> [New LWP 1102] >>>> [New LWP 1118] >>>> [New LWP 1100] >>>> [New LWP 1114] >>>> [New LWP 1113] >>>> [New LWP 1115] >>>> >>>> warning: Could not load shared library symbols for linux-vdso64.so.1. >>>> Do you need "set solib-search-path" or "set sysroot"? >>>> [Thread debugging using libthread_db enabled] >>>> Using host libthread_db library "/lib64/libthread_db.so.1". >>>> Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 >>>> --volfile-id gv0.128.224.95.140.tmp-bric'. >>>> Program terminated with signal SIGSEGV, Segmentation fault. >>>> #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, >>>> bytes=bytes at entry=36) at malloc.c:3327 >>>> 3327 { >>>> [Current thread is 1 (Thread 0x3fffb1689160 (LWP 1109))] >>>> (gdb) bt full >>>> >>> >> This is backtrace of one particular thread. I need output of command >> >> (gdb) thread apply all bt full >> >> >> Also, considering this is a crash in the malloc library call itself, >> would like to know the details of OS, Kernel version and gcc versions. >> >> Regards, >> Amar >> >> #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, >>>> bytes=bytes at entry=36) at malloc.c:3327 >>>> nb = >>>> idx = >>>> bin = >>>> victim = >>>> size = >>>> victim_index = >>>> remainder = >>>> remainder_size = >>>> block = >>>> bit = >>>> map = >>>> fwd = >>>> bck = >>>> errstr = 0x0 >>>> __func__ = "_int_malloc" >>>> #1 0x00003fffb76a93dc in __GI___libc_malloc (bytes=36) at malloc.c:2921 >>>> ar_ptr = 0x3fffa8000020 >>>> victim = >>>> hook = >>>> __func__ = "__libc_malloc" >>>> #2 0x00003fffb7764fd0 in x_inline (xdrs=0x3fffb1686d20, len=>>> out>) at xdr_sizeof.c:89 >>>> len = 36 >>>> xdrs = 0x3fffb1686d20 >>>> #3 0x00003fffb7842488 in .xdr_gfx_iattx () from >>>> /usr/lib64/libgfxdr.so.0 >>>> No symbol table info available. >>>> #4 0x00003fffb7842e84 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> No symbol table info available. >>>> #5 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>> pp=0x3fffa81099f0, size=, proc=) at >>>> xdr_ref.c:84 >>>> loc = 0x3fffa8109aa0 "\265\256\373\200\f\206\361j" >>>> stat = >>>> #6 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>> objpp=0x3fffa81099f0, obj_size=, >>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> more_data = 1 >>>> #7 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> No symbol table info available. >>>> #8 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>> pp=0x3fffa8109870, size=, proc=) at >>>> xdr_ref.c:84 >>>> loc = 0x3fffa8109920 "\232\373\377\315\352\325\005\271" >>>> stat = >>>> #9 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>> objpp=0x3fffa8109870, obj_size=, >>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> more_data = 1 >>>> #10 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> No symbol table info available. >>>> #11 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>> pp=0x3fffa81096f0, size=, proc=) at >>>> xdr_ref.c:84 >>>> loc = 0x3fffa81097a0 "\241X\372!\216\256=\342" >>>> stat = >>>> ---Type to continue, or q to quit--- >>>> #12 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>> objpp=0x3fffa81096f0, obj_size=, >>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> more_data = 1 >>>> #13 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> No symbol table info available. >>>> #14 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>> pp=0x3fffa8109570, size=, proc=) at >>>> xdr_ref.c:84 >>>> loc = 0x3fffa8109620 "\265\205\003Vu'\002L" >>>> stat = >>>> #15 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>> objpp=0x3fffa8109570, obj_size=, >>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> more_data = 1 >>>> #16 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> No symbol table info available. >>>> #17 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>> pp=0x3fffa81093f0, size=, proc=) at >>>> xdr_ref.c:84 >>>> loc = 0x3fffa81094a0 "\200L\027F'\177\366D" >>>> stat = >>>> #18 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>> objpp=0x3fffa81093f0, obj_size=, >>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> more_data = 1 >>>> #19 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> No symbol table info available. >>>> #20 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>> pp=0x3fffa8109270, size=, proc=) at >>>> xdr_ref.c:84 >>>> loc = 0x3fffa8109320 "\217{dK(\001E\220" >>>> stat = >>>> #21 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>> objpp=0x3fffa8109270, obj_size=, >>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> more_data = 1 >>>> #22 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> No symbol table info available. >>>> #23 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>> pp=0x3fffa81090f0, size=, proc=) at >>>> xdr_ref.c:84 >>>> loc = 0x3fffa81091a0 "\217\275\067\336\232\300(\005" >>>> stat = >>>> #24 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>> objpp=0x3fffa81090f0, obj_size=, >>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> more_data = 1 >>>> #25 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> No symbol table info available. >>>> #26 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>> pp=0x3fffa8108f70, size=, proc=) at >>>> xdr_ref.c:84 >>>> loc = 0x3fffa8109020 "\260.\025\b\244\352IT" >>>> stat = >>>> #27 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>> objpp=0x3fffa8108f70, obj_size=, >>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>> xdr_ref.c:135 >>>> more_data = 1 >>>> #28 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>> /usr/lib64/libgfxdr.so.0 >>>> No symbol table info available. >>>> #29 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>> pp=0x3fffa8108df0, size=, proc=) at >>>> xdr_ref.c:84 >>>> loc = 0x3fffa8108ea0 "\212GS\203l\035\n\\" >>>> ---Type to continue, or q to quit--- >>>> >>>> >>>> Regards, >>>> Abhishek >>>> >>>> On Mon, Mar 11, 2019 at 7:10 PM Amar Tumballi Suryanarayan < >>>> atumball at redhat.com> wrote: >>>> >>>>> Hi Abhishek, >>>>> >>>>> Can you check and get back to us? >>>>> >>>>> ``` >>>>> bash# ldd /usr/lib64/libglusterfs.so >>>>> bash# ldd /usr/lib64/libgfrpc.so >>>>> >>>>> ``` >>>>> >>>>> Also considering you have the core, can you do `(gdb) thr apply all bt >>>>> full` and pass it on? >>>>> >>>>> Thanks & Regards, >>>>> Amar >>>>> >>>>> On Mon, Mar 11, 2019 at 3:41 PM ABHISHEK PALIWAL < >>>>> abhishpaliwal at gmail.com> wrote: >>>>> >>>>>> Hi Team, >>>>>> >>>>>> COuld you please provide some pointer to debug it further. >>>>>> >>>>>> Regards, >>>>>> Abhishek >>>>>> >>>>>> On Fri, Mar 8, 2019 at 2:19 PM ABHISHEK PALIWAL < >>>>>> abhishpaliwal at gmail.com> wrote: >>>>>> >>>>>>> Hi Team, >>>>>>> >>>>>>> I am using Glusterfs 5.4, where after setting the gluster mount >>>>>>> point when trying to access it, glusterfsd is getting crashed and mount >>>>>>> point through the "Transport endpoint is not connected error. >>>>>>> >>>>>>> Here I are the gdb log for the core file >>>>>>> >>>>>>> warning: Could not load shared library symbols for linux-vdso64.so.1. >>>>>>> Do you need "set solib-search-path" or "set sysroot"? >>>>>>> [Thread debugging using libthread_db enabled] >>>>>>> Using host libthread_db library "/lib64/libthread_db.so.1". >>>>>>> Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 >>>>>>> --volfile-id gv0.128.224.95.140.tmp-bric'. >>>>>>> Program terminated with signal SIGSEGV, Segmentation fault. >>>>>>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>>>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>>>> 3327 { >>>>>>> [Current thread is 1 (Thread 0x3fff90394160 (LWP 811))] >>>>>>> (gdb) >>>>>>> (gdb) >>>>>>> (gdb) bt >>>>>>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>>>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>>>> #1 0x00003fff95ab43dc in __GI___libc_malloc (bytes=36) at >>>>>>> malloc.c:2921 >>>>>>> #2 0x00003fff95b6ffd0 in x_inline (xdrs=0x3fff90391d20, >>>>>>> len=) at xdr_sizeof.c:89 >>>>>>> #3 0x00003fff95c4d488 in .xdr_gfx_iattx () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> #4 0x00003fff95c4de84 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> #5 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>> pp=0x3fff7c132020, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> #6 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>> objpp=0x3fff7c132020, obj_size=, >>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> #7 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> #8 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>> pp=0x3fff7c131ea0, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> #9 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>> objpp=0x3fff7c131ea0, obj_size=, >>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> #10 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> #11 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>> pp=0x3fff7c131d20, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> #12 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>> objpp=0x3fff7c131d20, obj_size=, >>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> #13 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> #14 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>> pp=0x3fff7c131ba0, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> #15 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>> objpp=0x3fff7c131ba0, obj_size=, >>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> #16 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> #17 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>> pp=0x3fff7c131a20, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> #18 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>> objpp=0x3fff7c131a20, obj_size=, >>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> #19 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> #20 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>> pp=0x3fff7c1318a0, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> #21 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>> objpp=0x3fff7c1318a0, obj_size=, >>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> #22 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> #23 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>> pp=0x3fff7c131720, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> #24 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>> objpp=0x3fff7c131720, obj_size=, >>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> #25 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> #26 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>> pp=0x3fff7c1315a0, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> #27 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>> objpp=0x3fff7c1315a0, obj_size=, >>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> #28 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> #29 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>> pp=0x3fff7c131420, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> #30 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>> objpp=0x3fff7c131420, obj_size=, >>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> #31 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> #32 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>> pp=0x3fff7c1312a0, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> #33 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>> objpp=0x3fff7c1312a0, obj_size=, >>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> >>>>>>> Frames are getting repeated, could any one please me. >>>>>>> -- >>>>>>> Regards >>>>>>> Abhishek Paliwal >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Regards >>>>>> Abhishek Paliwal >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >>>>> >>>>> -- >>>>> Amar Tumballi (amarts) >>>>> >>>> >>>> >>>> -- >>>> >>>> >>>> >>>> >>>> Regards >>>> Abhishek Paliwal >>>> >>> >>> >>> -- >>> >>> >>> >>> >>> Regards >>> Abhishek Paliwal >>> >> >> >> -- >> Amar Tumballi (amarts) >> > > > -- > > > > > Regards > Abhishek Paliwal > -- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: From abhishpaliwal at gmail.com Wed Mar 13 05:24:32 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Wed, 13 Mar 2019 10:54:32 +0530 Subject: [Gluster-users] Glusterfsd crashed with SIGSEGV In-Reply-To: References: Message-ID: Hi Amar, this problem seems to be configuration issue due to librpc. Could you please let me know what should be configuration I need to use? Regards, Abhishek On Wed, Mar 13, 2019 at 10:42 AM ABHISHEK PALIWAL wrote: > logs for libgfrpc.so > > pabhishe at arn-build3$ldd > ./5.4-r0/packages-split/glusterfs/usr/lib64/libgfrpc.so.* > ./5.4-r0/packages-split/glusterfs/usr/lib64/libgfrpc.so.0: > not a dynamic executable > ./5.4-r0/packages-split/glusterfs/usr/lib64/libgfrpc.so.0.0.1: > not a dynamic executable > > > On Wed, Mar 13, 2019 at 10:02 AM ABHISHEK PALIWAL > wrote: > >> Here are the logs: >> >> >> pabhishe at arn-build3$ldd >> ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.* >> ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.0: >> not a dynamic executable >> ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.0.0.1: >> not a dynamic executable >> pabhishe at arn-build3$ldd >> ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.0.0.1 >> not a dynamic executable >> >> >> For backtraces I have attached the core_logs.txt file. >> >> Regards, >> Abhishek >> >> On Wed, Mar 13, 2019 at 9:51 AM Amar Tumballi Suryanarayan < >> atumball at redhat.com> wrote: >> >>> Hi Abhishek, >>> >>> Few more questions, >>> >>> >>>> On Tue, Mar 12, 2019 at 10:58 AM ABHISHEK PALIWAL < >>>> abhishpaliwal at gmail.com> wrote: >>>> >>>>> Hi Amar, >>>>> >>>>> Below are the requested logs >>>>> >>>>> pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libglusterfs.so >>>>> not a dynamic executable >>>>> >>>>> pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libgfrpc.so >>>>> not a dynamic executable >>>>> >>>>> >>> Can you please add a * at the end, so it gets the linked library list >>> from the actual files (ideally this is a symlink, but I expected it to >>> resolve like in Fedora). >>> >>> >>> >>>> root at 128:/# gdb /usr/sbin/glusterd core.1099 >>>>> GNU gdb (GDB) 7.10.1 >>>>> Copyright (C) 2015 Free Software Foundation, Inc. >>>>> License GPLv3+: GNU GPL version 3 or later < >>>>> http://gnu.org/licenses/gpl.html> >>>>> This is free software: you are free to change and redistribute it. >>>>> There is NO WARRANTY, to the extent permitted by law. Type "show >>>>> copying" >>>>> and "show warranty" for details. >>>>> This GDB was configured as "powerpc64-wrs-linux". >>>>> Type "show configuration" for configuration details. >>>>> For bug reporting instructions, please see: >>>>> . >>>>> Find the GDB manual and other documentation resources online at: >>>>> . >>>>> For help, type "help". >>>>> Type "apropos word" to search for commands related to "word"... >>>>> Reading symbols from /usr/sbin/glusterd...(no debugging symbols >>>>> found)...done. >>>>> [New LWP 1109] >>>>> [New LWP 1101] >>>>> [New LWP 1105] >>>>> [New LWP 1110] >>>>> [New LWP 1099] >>>>> [New LWP 1107] >>>>> [New LWP 1119] >>>>> [New LWP 1103] >>>>> [New LWP 1112] >>>>> [New LWP 1116] >>>>> [New LWP 1104] >>>>> [New LWP 1239] >>>>> [New LWP 1106] >>>>> [New LWP 1111] >>>>> [New LWP 1108] >>>>> [New LWP 1117] >>>>> [New LWP 1102] >>>>> [New LWP 1118] >>>>> [New LWP 1100] >>>>> [New LWP 1114] >>>>> [New LWP 1113] >>>>> [New LWP 1115] >>>>> >>>>> warning: Could not load shared library symbols for linux-vdso64.so.1. >>>>> Do you need "set solib-search-path" or "set sysroot"? >>>>> [Thread debugging using libthread_db enabled] >>>>> Using host libthread_db library "/lib64/libthread_db.so.1". >>>>> Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 >>>>> --volfile-id gv0.128.224.95.140.tmp-bric'. >>>>> Program terminated with signal SIGSEGV, Segmentation fault. >>>>> #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, >>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>> 3327 { >>>>> [Current thread is 1 (Thread 0x3fffb1689160 (LWP 1109))] >>>>> (gdb) bt full >>>>> >>>> >>> This is backtrace of one particular thread. I need output of command >>> >>> (gdb) thread apply all bt full >>> >>> >>> Also, considering this is a crash in the malloc library call itself, >>> would like to know the details of OS, Kernel version and gcc versions. >>> >>> Regards, >>> Amar >>> >>> #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, >>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>> nb = >>>>> idx = >>>>> bin = >>>>> victim = >>>>> size = >>>>> victim_index = >>>>> remainder = >>>>> remainder_size = >>>>> block = >>>>> bit = >>>>> map = >>>>> fwd = >>>>> bck = >>>>> errstr = 0x0 >>>>> __func__ = "_int_malloc" >>>>> #1 0x00003fffb76a93dc in __GI___libc_malloc (bytes=36) at >>>>> malloc.c:2921 >>>>> ar_ptr = 0x3fffa8000020 >>>>> victim = >>>>> hook = >>>>> __func__ = "__libc_malloc" >>>>> #2 0x00003fffb7764fd0 in x_inline (xdrs=0x3fffb1686d20, >>>>> len=) at xdr_sizeof.c:89 >>>>> len = 36 >>>>> xdrs = 0x3fffb1686d20 >>>>> #3 0x00003fffb7842488 in .xdr_gfx_iattx () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> No symbol table info available. >>>>> #4 0x00003fffb7842e84 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> No symbol table info available. >>>>> #5 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>> pp=0x3fffa81099f0, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> loc = 0x3fffa8109aa0 "\265\256\373\200\f\206\361j" >>>>> stat = >>>>> #6 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>> objpp=0x3fffa81099f0, obj_size=, >>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> more_data = 1 >>>>> #7 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> No symbol table info available. >>>>> #8 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>> pp=0x3fffa8109870, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> loc = 0x3fffa8109920 "\232\373\377\315\352\325\005\271" >>>>> stat = >>>>> #9 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>> objpp=0x3fffa8109870, obj_size=, >>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> more_data = 1 >>>>> #10 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> No symbol table info available. >>>>> #11 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>> pp=0x3fffa81096f0, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> loc = 0x3fffa81097a0 "\241X\372!\216\256=\342" >>>>> stat = >>>>> ---Type to continue, or q to quit--- >>>>> #12 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>> objpp=0x3fffa81096f0, obj_size=, >>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> more_data = 1 >>>>> #13 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> No symbol table info available. >>>>> #14 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>> pp=0x3fffa8109570, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> loc = 0x3fffa8109620 "\265\205\003Vu'\002L" >>>>> stat = >>>>> #15 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>> objpp=0x3fffa8109570, obj_size=, >>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> more_data = 1 >>>>> #16 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> No symbol table info available. >>>>> #17 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>> pp=0x3fffa81093f0, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> loc = 0x3fffa81094a0 "\200L\027F'\177\366D" >>>>> stat = >>>>> #18 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>> objpp=0x3fffa81093f0, obj_size=, >>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> more_data = 1 >>>>> #19 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> No symbol table info available. >>>>> #20 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>> pp=0x3fffa8109270, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> loc = 0x3fffa8109320 "\217{dK(\001E\220" >>>>> stat = >>>>> #21 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>> objpp=0x3fffa8109270, obj_size=, >>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> more_data = 1 >>>>> #22 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> No symbol table info available. >>>>> #23 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>> pp=0x3fffa81090f0, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> loc = 0x3fffa81091a0 "\217\275\067\336\232\300(\005" >>>>> stat = >>>>> #24 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>> objpp=0x3fffa81090f0, obj_size=, >>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> more_data = 1 >>>>> #25 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> No symbol table info available. >>>>> #26 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>> pp=0x3fffa8108f70, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> loc = 0x3fffa8109020 "\260.\025\b\244\352IT" >>>>> stat = >>>>> #27 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>> objpp=0x3fffa8108f70, obj_size=, >>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>> xdr_ref.c:135 >>>>> more_data = 1 >>>>> #28 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>> /usr/lib64/libgfxdr.so.0 >>>>> No symbol table info available. >>>>> #29 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>> pp=0x3fffa8108df0, size=, proc=) at >>>>> xdr_ref.c:84 >>>>> loc = 0x3fffa8108ea0 "\212GS\203l\035\n\\" >>>>> ---Type to continue, or q to quit--- >>>>> >>>>> >>>>> Regards, >>>>> Abhishek >>>>> >>>>> On Mon, Mar 11, 2019 at 7:10 PM Amar Tumballi Suryanarayan < >>>>> atumball at redhat.com> wrote: >>>>> >>>>>> Hi Abhishek, >>>>>> >>>>>> Can you check and get back to us? >>>>>> >>>>>> ``` >>>>>> bash# ldd /usr/lib64/libglusterfs.so >>>>>> bash# ldd /usr/lib64/libgfrpc.so >>>>>> >>>>>> ``` >>>>>> >>>>>> Also considering you have the core, can you do `(gdb) thr apply all >>>>>> bt full` and pass it on? >>>>>> >>>>>> Thanks & Regards, >>>>>> Amar >>>>>> >>>>>> On Mon, Mar 11, 2019 at 3:41 PM ABHISHEK PALIWAL < >>>>>> abhishpaliwal at gmail.com> wrote: >>>>>> >>>>>>> Hi Team, >>>>>>> >>>>>>> COuld you please provide some pointer to debug it further. >>>>>>> >>>>>>> Regards, >>>>>>> Abhishek >>>>>>> >>>>>>> On Fri, Mar 8, 2019 at 2:19 PM ABHISHEK PALIWAL < >>>>>>> abhishpaliwal at gmail.com> wrote: >>>>>>> >>>>>>>> Hi Team, >>>>>>>> >>>>>>>> I am using Glusterfs 5.4, where after setting the gluster mount >>>>>>>> point when trying to access it, glusterfsd is getting crashed and mount >>>>>>>> point through the "Transport endpoint is not connected error. >>>>>>>> >>>>>>>> Here I are the gdb log for the core file >>>>>>>> >>>>>>>> warning: Could not load shared library symbols for >>>>>>>> linux-vdso64.so.1. >>>>>>>> Do you need "set solib-search-path" or "set sysroot"? >>>>>>>> [Thread debugging using libthread_db enabled] >>>>>>>> Using host libthread_db library "/lib64/libthread_db.so.1". >>>>>>>> Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 >>>>>>>> --volfile-id gv0.128.224.95.140.tmp-bric'. >>>>>>>> Program terminated with signal SIGSEGV, Segmentation fault. >>>>>>>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>>>>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>>>>> 3327 { >>>>>>>> [Current thread is 1 (Thread 0x3fff90394160 (LWP 811))] >>>>>>>> (gdb) >>>>>>>> (gdb) >>>>>>>> (gdb) bt >>>>>>>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>>>>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>>>>> #1 0x00003fff95ab43dc in __GI___libc_malloc (bytes=36) at >>>>>>>> malloc.c:2921 >>>>>>>> #2 0x00003fff95b6ffd0 in x_inline (xdrs=0x3fff90391d20, >>>>>>>> len=) at xdr_sizeof.c:89 >>>>>>>> #3 0x00003fff95c4d488 in .xdr_gfx_iattx () from >>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>> #4 0x00003fff95c4de84 in .xdr_gfx_dirplist () from >>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>> #5 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>> pp=0x3fff7c132020, size=, proc=) at >>>>>>>> xdr_ref.c:84 >>>>>>>> #6 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>> objpp=0x3fff7c132020, obj_size=, >>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>>> xdr_ref.c:135 >>>>>>>> #7 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>> #8 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>> pp=0x3fff7c131ea0, size=, proc=) at >>>>>>>> xdr_ref.c:84 >>>>>>>> #9 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>> objpp=0x3fff7c131ea0, obj_size=, >>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>>> xdr_ref.c:135 >>>>>>>> #10 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>> #11 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>> pp=0x3fff7c131d20, size=, proc=) at >>>>>>>> xdr_ref.c:84 >>>>>>>> #12 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>> objpp=0x3fff7c131d20, obj_size=, >>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>>> xdr_ref.c:135 >>>>>>>> #13 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>> #14 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>> pp=0x3fff7c131ba0, size=, proc=) at >>>>>>>> xdr_ref.c:84 >>>>>>>> #15 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>> objpp=0x3fff7c131ba0, obj_size=, >>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>>> xdr_ref.c:135 >>>>>>>> #16 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>> #17 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>> pp=0x3fff7c131a20, size=, proc=) at >>>>>>>> xdr_ref.c:84 >>>>>>>> #18 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>> objpp=0x3fff7c131a20, obj_size=, >>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>>> xdr_ref.c:135 >>>>>>>> #19 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>> #20 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>> pp=0x3fff7c1318a0, size=, proc=) at >>>>>>>> xdr_ref.c:84 >>>>>>>> #21 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>> objpp=0x3fff7c1318a0, obj_size=, >>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>>> xdr_ref.c:135 >>>>>>>> #22 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>> #23 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>> pp=0x3fff7c131720, size=, proc=) at >>>>>>>> xdr_ref.c:84 >>>>>>>> #24 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>> objpp=0x3fff7c131720, obj_size=, >>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>>> xdr_ref.c:135 >>>>>>>> #25 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>> #26 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>> pp=0x3fff7c1315a0, size=, proc=) at >>>>>>>> xdr_ref.c:84 >>>>>>>> #27 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>> objpp=0x3fff7c1315a0, obj_size=, >>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>>> xdr_ref.c:135 >>>>>>>> #28 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>> #29 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>> pp=0x3fff7c131420, size=, proc=) at >>>>>>>> xdr_ref.c:84 >>>>>>>> #30 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>> objpp=0x3fff7c131420, obj_size=, >>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>>> xdr_ref.c:135 >>>>>>>> #31 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>> #32 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>> pp=0x3fff7c1312a0, size=, proc=) at >>>>>>>> xdr_ref.c:84 >>>>>>>> #33 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>> objpp=0x3fff7c1312a0, obj_size=, >>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) at >>>>>>>> xdr_ref.c:135 >>>>>>>> >>>>>>>> Frames are getting repeated, could any one please me. >>>>>>>> -- >>>>>>>> Regards >>>>>>>> Abhishek Paliwal >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Regards >>>>>>> Abhishek Paliwal >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Amar Tumballi (amarts) >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>>> >>>>> Regards >>>>> Abhishek Paliwal >>>>> >>>> >>>> >>>> -- >>>> >>>> >>>> >>>> >>>> Regards >>>> Abhishek Paliwal >>>> >>> >>> >>> -- >>> Amar Tumballi (amarts) >>> >> >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal >> > > > -- > > > > > Regards > Abhishek Paliwal > -- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed Mar 13 06:25:23 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 13 Mar 2019 11:55:23 +0530 Subject: [Gluster-users] Glusterfsd crashed with SIGSEGV In-Reply-To: References: Message-ID: We recommend to use 'tirpc' in the later releases. use '--with-tirpc' while running ./configure On Wed, Mar 13, 2019 at 10:55 AM ABHISHEK PALIWAL wrote: > Hi Amar, > > this problem seems to be configuration issue due to librpc. > > Could you please let me know what should be configuration I need to use? > > Regards, > Abhishek > > On Wed, Mar 13, 2019 at 10:42 AM ABHISHEK PALIWAL > wrote: > >> logs for libgfrpc.so >> >> pabhishe at arn-build3$ldd >> ./5.4-r0/packages-split/glusterfs/usr/lib64/libgfrpc.so.* >> ./5.4-r0/packages-split/glusterfs/usr/lib64/libgfrpc.so.0: >> not a dynamic executable >> ./5.4-r0/packages-split/glusterfs/usr/lib64/libgfrpc.so.0.0.1: >> not a dynamic executable >> >> >> On Wed, Mar 13, 2019 at 10:02 AM ABHISHEK PALIWAL < >> abhishpaliwal at gmail.com> wrote: >> >>> Here are the logs: >>> >>> >>> pabhishe at arn-build3$ldd >>> ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.* >>> ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.0: >>> not a dynamic executable >>> ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.0.0.1: >>> not a dynamic executable >>> pabhishe at arn-build3$ldd >>> ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.0.0.1 >>> not a dynamic executable >>> >>> >>> For backtraces I have attached the core_logs.txt file. >>> >>> Regards, >>> Abhishek >>> >>> On Wed, Mar 13, 2019 at 9:51 AM Amar Tumballi Suryanarayan < >>> atumball at redhat.com> wrote: >>> >>>> Hi Abhishek, >>>> >>>> Few more questions, >>>> >>>> >>>>> On Tue, Mar 12, 2019 at 10:58 AM ABHISHEK PALIWAL < >>>>> abhishpaliwal at gmail.com> wrote: >>>>> >>>>>> Hi Amar, >>>>>> >>>>>> Below are the requested logs >>>>>> >>>>>> pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libglusterfs.so >>>>>> not a dynamic executable >>>>>> >>>>>> pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libgfrpc.so >>>>>> not a dynamic executable >>>>>> >>>>>> >>>> Can you please add a * at the end, so it gets the linked library list >>>> from the actual files (ideally this is a symlink, but I expected it to >>>> resolve like in Fedora). >>>> >>>> >>>> >>>>> root at 128:/# gdb /usr/sbin/glusterd core.1099 >>>>>> GNU gdb (GDB) 7.10.1 >>>>>> Copyright (C) 2015 Free Software Foundation, Inc. >>>>>> License GPLv3+: GNU GPL version 3 or later < >>>>>> http://gnu.org/licenses/gpl.html> >>>>>> This is free software: you are free to change and redistribute it. >>>>>> There is NO WARRANTY, to the extent permitted by law. Type "show >>>>>> copying" >>>>>> and "show warranty" for details. >>>>>> This GDB was configured as "powerpc64-wrs-linux". >>>>>> Type "show configuration" for configuration details. >>>>>> For bug reporting instructions, please see: >>>>>> . >>>>>> Find the GDB manual and other documentation resources online at: >>>>>> . >>>>>> For help, type "help". >>>>>> Type "apropos word" to search for commands related to "word"... >>>>>> Reading symbols from /usr/sbin/glusterd...(no debugging symbols >>>>>> found)...done. >>>>>> [New LWP 1109] >>>>>> [New LWP 1101] >>>>>> [New LWP 1105] >>>>>> [New LWP 1110] >>>>>> [New LWP 1099] >>>>>> [New LWP 1107] >>>>>> [New LWP 1119] >>>>>> [New LWP 1103] >>>>>> [New LWP 1112] >>>>>> [New LWP 1116] >>>>>> [New LWP 1104] >>>>>> [New LWP 1239] >>>>>> [New LWP 1106] >>>>>> [New LWP 1111] >>>>>> [New LWP 1108] >>>>>> [New LWP 1117] >>>>>> [New LWP 1102] >>>>>> [New LWP 1118] >>>>>> [New LWP 1100] >>>>>> [New LWP 1114] >>>>>> [New LWP 1113] >>>>>> [New LWP 1115] >>>>>> >>>>>> warning: Could not load shared library symbols for linux-vdso64.so.1. >>>>>> Do you need "set solib-search-path" or "set sysroot"? >>>>>> [Thread debugging using libthread_db enabled] >>>>>> Using host libthread_db library "/lib64/libthread_db.so.1". >>>>>> Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 >>>>>> --volfile-id gv0.128.224.95.140.tmp-bric'. >>>>>> Program terminated with signal SIGSEGV, Segmentation fault. >>>>>> #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, >>>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>>> 3327 { >>>>>> [Current thread is 1 (Thread 0x3fffb1689160 (LWP 1109))] >>>>>> (gdb) bt full >>>>>> >>>>> >>>> This is backtrace of one particular thread. I need output of command >>>> >>>> (gdb) thread apply all bt full >>>> >>>> >>>> Also, considering this is a crash in the malloc library call itself, >>>> would like to know the details of OS, Kernel version and gcc versions. >>>> >>>> Regards, >>>> Amar >>>> >>>> #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, >>>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>>> nb = >>>>>> idx = >>>>>> bin = >>>>>> victim = >>>>>> size = >>>>>> victim_index = >>>>>> remainder = >>>>>> remainder_size = >>>>>> block = >>>>>> bit = >>>>>> map = >>>>>> fwd = >>>>>> bck = >>>>>> errstr = 0x0 >>>>>> __func__ = "_int_malloc" >>>>>> #1 0x00003fffb76a93dc in __GI___libc_malloc (bytes=36) at >>>>>> malloc.c:2921 >>>>>> ar_ptr = 0x3fffa8000020 >>>>>> victim = >>>>>> hook = >>>>>> __func__ = "__libc_malloc" >>>>>> #2 0x00003fffb7764fd0 in x_inline (xdrs=0x3fffb1686d20, >>>>>> len=) at xdr_sizeof.c:89 >>>>>> len = 36 >>>>>> xdrs = 0x3fffb1686d20 >>>>>> #3 0x00003fffb7842488 in .xdr_gfx_iattx () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> No symbol table info available. >>>>>> #4 0x00003fffb7842e84 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> No symbol table info available. >>>>>> #5 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>> pp=0x3fffa81099f0, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> loc = 0x3fffa8109aa0 "\265\256\373\200\f\206\361j" >>>>>> stat = >>>>>> #6 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>> objpp=0x3fffa81099f0, obj_size=, >>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> more_data = 1 >>>>>> #7 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> No symbol table info available. >>>>>> #8 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>> pp=0x3fffa8109870, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> loc = 0x3fffa8109920 "\232\373\377\315\352\325\005\271" >>>>>> stat = >>>>>> #9 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>> objpp=0x3fffa8109870, obj_size=, >>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> more_data = 1 >>>>>> #10 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> No symbol table info available. >>>>>> #11 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>> pp=0x3fffa81096f0, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> loc = 0x3fffa81097a0 "\241X\372!\216\256=\342" >>>>>> stat = >>>>>> ---Type to continue, or q to quit--- >>>>>> #12 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>> objpp=0x3fffa81096f0, obj_size=, >>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> more_data = 1 >>>>>> #13 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> No symbol table info available. >>>>>> #14 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>> pp=0x3fffa8109570, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> loc = 0x3fffa8109620 "\265\205\003Vu'\002L" >>>>>> stat = >>>>>> #15 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>> objpp=0x3fffa8109570, obj_size=, >>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> more_data = 1 >>>>>> #16 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> No symbol table info available. >>>>>> #17 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>> pp=0x3fffa81093f0, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> loc = 0x3fffa81094a0 "\200L\027F'\177\366D" >>>>>> stat = >>>>>> #18 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>> objpp=0x3fffa81093f0, obj_size=, >>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> more_data = 1 >>>>>> #19 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> No symbol table info available. >>>>>> #20 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>> pp=0x3fffa8109270, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> loc = 0x3fffa8109320 "\217{dK(\001E\220" >>>>>> stat = >>>>>> #21 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>> objpp=0x3fffa8109270, obj_size=, >>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> more_data = 1 >>>>>> #22 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> No symbol table info available. >>>>>> #23 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>> pp=0x3fffa81090f0, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> loc = 0x3fffa81091a0 "\217\275\067\336\232\300(\005" >>>>>> stat = >>>>>> #24 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>> objpp=0x3fffa81090f0, obj_size=, >>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> more_data = 1 >>>>>> #25 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> No symbol table info available. >>>>>> #26 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>> pp=0x3fffa8108f70, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> loc = 0x3fffa8109020 "\260.\025\b\244\352IT" >>>>>> stat = >>>>>> #27 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>> objpp=0x3fffa8108f70, obj_size=, >>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>> xdr_ref.c:135 >>>>>> more_data = 1 >>>>>> #28 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>> /usr/lib64/libgfxdr.so.0 >>>>>> No symbol table info available. >>>>>> #29 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>> pp=0x3fffa8108df0, size=, proc=) at >>>>>> xdr_ref.c:84 >>>>>> loc = 0x3fffa8108ea0 "\212GS\203l\035\n\\" >>>>>> ---Type to continue, or q to quit--- >>>>>> >>>>>> >>>>>> Regards, >>>>>> Abhishek >>>>>> >>>>>> On Mon, Mar 11, 2019 at 7:10 PM Amar Tumballi Suryanarayan < >>>>>> atumball at redhat.com> wrote: >>>>>> >>>>>>> Hi Abhishek, >>>>>>> >>>>>>> Can you check and get back to us? >>>>>>> >>>>>>> ``` >>>>>>> bash# ldd /usr/lib64/libglusterfs.so >>>>>>> bash# ldd /usr/lib64/libgfrpc.so >>>>>>> >>>>>>> ``` >>>>>>> >>>>>>> Also considering you have the core, can you do `(gdb) thr apply all >>>>>>> bt full` and pass it on? >>>>>>> >>>>>>> Thanks & Regards, >>>>>>> Amar >>>>>>> >>>>>>> On Mon, Mar 11, 2019 at 3:41 PM ABHISHEK PALIWAL < >>>>>>> abhishpaliwal at gmail.com> wrote: >>>>>>> >>>>>>>> Hi Team, >>>>>>>> >>>>>>>> COuld you please provide some pointer to debug it further. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Abhishek >>>>>>>> >>>>>>>> On Fri, Mar 8, 2019 at 2:19 PM ABHISHEK PALIWAL < >>>>>>>> abhishpaliwal at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Team, >>>>>>>>> >>>>>>>>> I am using Glusterfs 5.4, where after setting the gluster mount >>>>>>>>> point when trying to access it, glusterfsd is getting crashed and mount >>>>>>>>> point through the "Transport endpoint is not connected error. >>>>>>>>> >>>>>>>>> Here I are the gdb log for the core file >>>>>>>>> >>>>>>>>> warning: Could not load shared library symbols for >>>>>>>>> linux-vdso64.so.1. >>>>>>>>> Do you need "set solib-search-path" or "set sysroot"? >>>>>>>>> [Thread debugging using libthread_db enabled] >>>>>>>>> Using host libthread_db library "/lib64/libthread_db.so.1". >>>>>>>>> Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 >>>>>>>>> --volfile-id gv0.128.224.95.140.tmp-bric'. >>>>>>>>> Program terminated with signal SIGSEGV, Segmentation fault. >>>>>>>>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>>>>>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>>>>>> 3327 { >>>>>>>>> [Current thread is 1 (Thread 0x3fff90394160 (LWP 811))] >>>>>>>>> (gdb) >>>>>>>>> (gdb) >>>>>>>>> (gdb) bt >>>>>>>>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>>>>>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>>>>>> #1 0x00003fff95ab43dc in __GI___libc_malloc (bytes=36) at >>>>>>>>> malloc.c:2921 >>>>>>>>> #2 0x00003fff95b6ffd0 in x_inline (xdrs=0x3fff90391d20, >>>>>>>>> len=) at xdr_sizeof.c:89 >>>>>>>>> #3 0x00003fff95c4d488 in .xdr_gfx_iattx () from >>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>> #4 0x00003fff95c4de84 in .xdr_gfx_dirplist () from >>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>> #5 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>>> pp=0x3fff7c132020, size=, proc=) at >>>>>>>>> xdr_ref.c:84 >>>>>>>>> #6 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>> objpp=0x3fff7c132020, obj_size=, >>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>> at xdr_ref.c:135 >>>>>>>>> #7 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>> #8 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>>> pp=0x3fff7c131ea0, size=, proc=) at >>>>>>>>> xdr_ref.c:84 >>>>>>>>> #9 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>> objpp=0x3fff7c131ea0, obj_size=, >>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>> at xdr_ref.c:135 >>>>>>>>> #10 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>> #11 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>>> pp=0x3fff7c131d20, size=, proc=) at >>>>>>>>> xdr_ref.c:84 >>>>>>>>> #12 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>> objpp=0x3fff7c131d20, obj_size=, >>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>> at xdr_ref.c:135 >>>>>>>>> #13 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>> #14 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>>> pp=0x3fff7c131ba0, size=, proc=) at >>>>>>>>> xdr_ref.c:84 >>>>>>>>> #15 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>> objpp=0x3fff7c131ba0, obj_size=, >>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>> at xdr_ref.c:135 >>>>>>>>> #16 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>> #17 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>>> pp=0x3fff7c131a20, size=, proc=) at >>>>>>>>> xdr_ref.c:84 >>>>>>>>> #18 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>> objpp=0x3fff7c131a20, obj_size=, >>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>> at xdr_ref.c:135 >>>>>>>>> #19 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>> #20 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>>> pp=0x3fff7c1318a0, size=, proc=) at >>>>>>>>> xdr_ref.c:84 >>>>>>>>> #21 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>> objpp=0x3fff7c1318a0, obj_size=, >>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>> at xdr_ref.c:135 >>>>>>>>> #22 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>> #23 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>>> pp=0x3fff7c131720, size=, proc=) at >>>>>>>>> xdr_ref.c:84 >>>>>>>>> #24 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>> objpp=0x3fff7c131720, obj_size=, >>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>> at xdr_ref.c:135 >>>>>>>>> #25 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>> #26 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>>> pp=0x3fff7c1315a0, size=, proc=) at >>>>>>>>> xdr_ref.c:84 >>>>>>>>> #27 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>> objpp=0x3fff7c1315a0, obj_size=, >>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>> at xdr_ref.c:135 >>>>>>>>> #28 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>> #29 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>>> pp=0x3fff7c131420, size=, proc=) at >>>>>>>>> xdr_ref.c:84 >>>>>>>>> #30 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>> objpp=0x3fff7c131420, obj_size=, >>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>> at xdr_ref.c:135 >>>>>>>>> #31 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>> #32 0x00003fff95b6fc28 in __GI_xdr_reference (xdrs=0x3fff90391d20, >>>>>>>>> pp=0x3fff7c1312a0, size=, proc=) at >>>>>>>>> xdr_ref.c:84 >>>>>>>>> #33 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>> objpp=0x3fff7c1312a0, obj_size=, >>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>> at xdr_ref.c:135 >>>>>>>>> >>>>>>>>> Frames are getting repeated, could any one please me. >>>>>>>>> -- >>>>>>>>> Regards >>>>>>>>> Abhishek Paliwal >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Regards >>>>>>>> Abhishek Paliwal >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Amar Tumballi (amarts) >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Regards >>>>>> Abhishek Paliwal >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>>> >>>>> Regards >>>>> Abhishek Paliwal >>>>> >>>> >>>> >>>> -- >>>> Amar Tumballi (amarts) >>>> >>> >>> >>> -- >>> >>> >>> >>> >>> Regards >>> Abhishek Paliwal >>> >> >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal >> > > > -- > > > > > Regards > Abhishek Paliwal > -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abhishpaliwal at gmail.com Wed Mar 13 07:20:53 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Wed, 13 Mar 2019 12:50:53 +0530 Subject: [Gluster-users] Glusterfsd crashed with SIGSEGV In-Reply-To: References: Message-ID: are you sure its '--witn-tirpc'? as with it i am getting WARNING: QA Issue: glusterfs: configure was passed unrecognised options: --with-tirpc [unknown-configure-option] also I tried with '--with-libtirpc' but result was same. Regards, Abhishek On Wed, Mar 13, 2019 at 11:56 AM Amar Tumballi Suryanarayan < atumball at redhat.com> wrote: > We recommend to use 'tirpc' in the later releases. use '--with-tirpc' > while running ./configure > > On Wed, Mar 13, 2019 at 10:55 AM ABHISHEK PALIWAL > wrote: > >> Hi Amar, >> >> this problem seems to be configuration issue due to librpc. >> >> Could you please let me know what should be configuration I need to use? >> >> Regards, >> Abhishek >> >> On Wed, Mar 13, 2019 at 10:42 AM ABHISHEK PALIWAL < >> abhishpaliwal at gmail.com> wrote: >> >>> logs for libgfrpc.so >>> >>> pabhishe at arn-build3$ldd >>> ./5.4-r0/packages-split/glusterfs/usr/lib64/libgfrpc.so.* >>> ./5.4-r0/packages-split/glusterfs/usr/lib64/libgfrpc.so.0: >>> not a dynamic executable >>> ./5.4-r0/packages-split/glusterfs/usr/lib64/libgfrpc.so.0.0.1: >>> not a dynamic executable >>> >>> >>> On Wed, Mar 13, 2019 at 10:02 AM ABHISHEK PALIWAL < >>> abhishpaliwal at gmail.com> wrote: >>> >>>> Here are the logs: >>>> >>>> >>>> pabhishe at arn-build3$ldd >>>> ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.* >>>> ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.0: >>>> not a dynamic executable >>>> ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.0.0.1: >>>> not a dynamic executable >>>> pabhishe at arn-build3$ldd >>>> ./5.4-r0/sysroot-destdir/usr/lib64/libglusterfs.so.0.0.1 >>>> not a dynamic executable >>>> >>>> >>>> For backtraces I have attached the core_logs.txt file. >>>> >>>> Regards, >>>> Abhishek >>>> >>>> On Wed, Mar 13, 2019 at 9:51 AM Amar Tumballi Suryanarayan < >>>> atumball at redhat.com> wrote: >>>> >>>>> Hi Abhishek, >>>>> >>>>> Few more questions, >>>>> >>>>> >>>>>> On Tue, Mar 12, 2019 at 10:58 AM ABHISHEK PALIWAL < >>>>>> abhishpaliwal at gmail.com> wrote: >>>>>> >>>>>>> Hi Amar, >>>>>>> >>>>>>> Below are the requested logs >>>>>>> >>>>>>> pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libglusterfs.so >>>>>>> not a dynamic executable >>>>>>> >>>>>>> pabhishe at arn-build3$ldd ./sysroot-destdir/usr/lib64/libgfrpc.so >>>>>>> not a dynamic executable >>>>>>> >>>>>>> >>>>> Can you please add a * at the end, so it gets the linked library list >>>>> from the actual files (ideally this is a symlink, but I expected it to >>>>> resolve like in Fedora). >>>>> >>>>> >>>>> >>>>>> root at 128:/# gdb /usr/sbin/glusterd core.1099 >>>>>>> GNU gdb (GDB) 7.10.1 >>>>>>> Copyright (C) 2015 Free Software Foundation, Inc. >>>>>>> License GPLv3+: GNU GPL version 3 or later < >>>>>>> http://gnu.org/licenses/gpl.html> >>>>>>> This is free software: you are free to change and redistribute it. >>>>>>> There is NO WARRANTY, to the extent permitted by law. Type "show >>>>>>> copying" >>>>>>> and "show warranty" for details. >>>>>>> This GDB was configured as "powerpc64-wrs-linux". >>>>>>> Type "show configuration" for configuration details. >>>>>>> For bug reporting instructions, please see: >>>>>>> . >>>>>>> Find the GDB manual and other documentation resources online at: >>>>>>> . >>>>>>> For help, type "help". >>>>>>> Type "apropos word" to search for commands related to "word"... >>>>>>> Reading symbols from /usr/sbin/glusterd...(no debugging symbols >>>>>>> found)...done. >>>>>>> [New LWP 1109] >>>>>>> [New LWP 1101] >>>>>>> [New LWP 1105] >>>>>>> [New LWP 1110] >>>>>>> [New LWP 1099] >>>>>>> [New LWP 1107] >>>>>>> [New LWP 1119] >>>>>>> [New LWP 1103] >>>>>>> [New LWP 1112] >>>>>>> [New LWP 1116] >>>>>>> [New LWP 1104] >>>>>>> [New LWP 1239] >>>>>>> [New LWP 1106] >>>>>>> [New LWP 1111] >>>>>>> [New LWP 1108] >>>>>>> [New LWP 1117] >>>>>>> [New LWP 1102] >>>>>>> [New LWP 1118] >>>>>>> [New LWP 1100] >>>>>>> [New LWP 1114] >>>>>>> [New LWP 1113] >>>>>>> [New LWP 1115] >>>>>>> >>>>>>> warning: Could not load shared library symbols for linux-vdso64.so.1. >>>>>>> Do you need "set solib-search-path" or "set sysroot"? >>>>>>> [Thread debugging using libthread_db enabled] >>>>>>> Using host libthread_db library "/lib64/libthread_db.so.1". >>>>>>> Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 >>>>>>> --volfile-id gv0.128.224.95.140.tmp-bric'. >>>>>>> Program terminated with signal SIGSEGV, Segmentation fault. >>>>>>> #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, >>>>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>>>> 3327 { >>>>>>> [Current thread is 1 (Thread 0x3fffb1689160 (LWP 1109))] >>>>>>> (gdb) bt full >>>>>>> >>>>>> >>>>> This is backtrace of one particular thread. I need output of command >>>>> >>>>> (gdb) thread apply all bt full >>>>> >>>>> >>>>> Also, considering this is a crash in the malloc library call itself, >>>>> would like to know the details of OS, Kernel version and gcc versions. >>>>> >>>>> Regards, >>>>> Amar >>>>> >>>>> #0 0x00003fffb76a6d48 in _int_malloc (av=av at entry=0x3fffa8000020, >>>>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>>>> nb = >>>>>>> idx = >>>>>>> bin = >>>>>>> victim = >>>>>>> size = >>>>>>> victim_index = >>>>>>> remainder = >>>>>>> remainder_size = >>>>>>> block = >>>>>>> bit = >>>>>>> map = >>>>>>> fwd = >>>>>>> bck = >>>>>>> errstr = 0x0 >>>>>>> __func__ = "_int_malloc" >>>>>>> #1 0x00003fffb76a93dc in __GI___libc_malloc (bytes=36) at >>>>>>> malloc.c:2921 >>>>>>> ar_ptr = 0x3fffa8000020 >>>>>>> victim = >>>>>>> hook = >>>>>>> __func__ = "__libc_malloc" >>>>>>> #2 0x00003fffb7764fd0 in x_inline (xdrs=0x3fffb1686d20, >>>>>>> len=) at xdr_sizeof.c:89 >>>>>>> len = 36 >>>>>>> xdrs = 0x3fffb1686d20 >>>>>>> #3 0x00003fffb7842488 in .xdr_gfx_iattx () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> No symbol table info available. >>>>>>> #4 0x00003fffb7842e84 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> No symbol table info available. >>>>>>> #5 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>>> pp=0x3fffa81099f0, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> loc = 0x3fffa8109aa0 "\265\256\373\200\f\206\361j" >>>>>>> stat = >>>>>>> #6 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>>> objpp=0x3fffa81099f0, obj_size=, >>>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> more_data = 1 >>>>>>> #7 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> No symbol table info available. >>>>>>> #8 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>>> pp=0x3fffa8109870, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> loc = 0x3fffa8109920 "\232\373\377\315\352\325\005\271" >>>>>>> stat = >>>>>>> #9 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>>> objpp=0x3fffa8109870, obj_size=, >>>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> more_data = 1 >>>>>>> #10 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> No symbol table info available. >>>>>>> #11 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>>> pp=0x3fffa81096f0, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> loc = 0x3fffa81097a0 "\241X\372!\216\256=\342" >>>>>>> stat = >>>>>>> ---Type to continue, or q to quit--- >>>>>>> #12 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>>> objpp=0x3fffa81096f0, obj_size=, >>>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> more_data = 1 >>>>>>> #13 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> No symbol table info available. >>>>>>> #14 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>>> pp=0x3fffa8109570, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> loc = 0x3fffa8109620 "\265\205\003Vu'\002L" >>>>>>> stat = >>>>>>> #15 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>>> objpp=0x3fffa8109570, obj_size=, >>>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> more_data = 1 >>>>>>> #16 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> No symbol table info available. >>>>>>> #17 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>>> pp=0x3fffa81093f0, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> loc = 0x3fffa81094a0 "\200L\027F'\177\366D" >>>>>>> stat = >>>>>>> #18 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>>> objpp=0x3fffa81093f0, obj_size=, >>>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> more_data = 1 >>>>>>> #19 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> No symbol table info available. >>>>>>> #20 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>>> pp=0x3fffa8109270, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> loc = 0x3fffa8109320 "\217{dK(\001E\220" >>>>>>> stat = >>>>>>> #21 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>>> objpp=0x3fffa8109270, obj_size=, >>>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> more_data = 1 >>>>>>> #22 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> No symbol table info available. >>>>>>> #23 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>>> pp=0x3fffa81090f0, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> loc = 0x3fffa81091a0 "\217\275\067\336\232\300(\005" >>>>>>> stat = >>>>>>> #24 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>>> objpp=0x3fffa81090f0, obj_size=, >>>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> more_data = 1 >>>>>>> #25 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> No symbol table info available. >>>>>>> #26 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>>> pp=0x3fffa8108f70, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> loc = 0x3fffa8109020 "\260.\025\b\244\352IT" >>>>>>> stat = >>>>>>> #27 0x00003fffb7764e04 in __GI_xdr_pointer (xdrs=0x3fffb1686d20, >>>>>>> objpp=0x3fffa8108f70, obj_size=, >>>>>>> xdr_obj=@0x3fffb785f4b0: 0x3fffb7842dc0 <.xdr_gfx_dirplist>) at >>>>>>> xdr_ref.c:135 >>>>>>> more_data = 1 >>>>>>> #28 0x00003fffb7842ec0 in .xdr_gfx_dirplist () from >>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>> No symbol table info available. >>>>>>> #29 0x00003fffb7764c28 in __GI_xdr_reference (xdrs=0x3fffb1686d20, >>>>>>> pp=0x3fffa8108df0, size=, proc=) at >>>>>>> xdr_ref.c:84 >>>>>>> loc = 0x3fffa8108ea0 "\212GS\203l\035\n\\" >>>>>>> ---Type to continue, or q to quit--- >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Abhishek >>>>>>> >>>>>>> On Mon, Mar 11, 2019 at 7:10 PM Amar Tumballi Suryanarayan < >>>>>>> atumball at redhat.com> wrote: >>>>>>> >>>>>>>> Hi Abhishek, >>>>>>>> >>>>>>>> Can you check and get back to us? >>>>>>>> >>>>>>>> ``` >>>>>>>> bash# ldd /usr/lib64/libglusterfs.so >>>>>>>> bash# ldd /usr/lib64/libgfrpc.so >>>>>>>> >>>>>>>> ``` >>>>>>>> >>>>>>>> Also considering you have the core, can you do `(gdb) thr apply all >>>>>>>> bt full` and pass it on? >>>>>>>> >>>>>>>> Thanks & Regards, >>>>>>>> Amar >>>>>>>> >>>>>>>> On Mon, Mar 11, 2019 at 3:41 PM ABHISHEK PALIWAL < >>>>>>>> abhishpaliwal at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Team, >>>>>>>>> >>>>>>>>> COuld you please provide some pointer to debug it further. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Abhishek >>>>>>>>> >>>>>>>>> On Fri, Mar 8, 2019 at 2:19 PM ABHISHEK PALIWAL < >>>>>>>>> abhishpaliwal at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Team, >>>>>>>>>> >>>>>>>>>> I am using Glusterfs 5.4, where after setting the gluster mount >>>>>>>>>> point when trying to access it, glusterfsd is getting crashed and mount >>>>>>>>>> point through the "Transport endpoint is not connected error. >>>>>>>>>> >>>>>>>>>> Here I are the gdb log for the core file >>>>>>>>>> >>>>>>>>>> warning: Could not load shared library symbols for >>>>>>>>>> linux-vdso64.so.1. >>>>>>>>>> Do you need "set solib-search-path" or "set sysroot"? >>>>>>>>>> [Thread debugging using libthread_db enabled] >>>>>>>>>> Using host libthread_db library "/lib64/libthread_db.so.1". >>>>>>>>>> Core was generated by `/usr/sbin/glusterfsd -s 128.224.95.140 >>>>>>>>>> --volfile-id gv0.128.224.95.140.tmp-bric'. >>>>>>>>>> Program terminated with signal SIGSEGV, Segmentation fault. >>>>>>>>>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>>>>>>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>>>>>>> 3327 { >>>>>>>>>> [Current thread is 1 (Thread 0x3fff90394160 (LWP 811))] >>>>>>>>>> (gdb) >>>>>>>>>> (gdb) >>>>>>>>>> (gdb) bt >>>>>>>>>> #0 0x00003fff95ab1d48 in _int_malloc (av=av at entry=0x3fff7c000020, >>>>>>>>>> bytes=bytes at entry=36) at malloc.c:3327 >>>>>>>>>> #1 0x00003fff95ab43dc in __GI___libc_malloc (bytes=36) at >>>>>>>>>> malloc.c:2921 >>>>>>>>>> #2 0x00003fff95b6ffd0 in x_inline (xdrs=0x3fff90391d20, >>>>>>>>>> len=) at xdr_sizeof.c:89 >>>>>>>>>> #3 0x00003fff95c4d488 in .xdr_gfx_iattx () from >>>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>>> #4 0x00003fff95c4de84 in .xdr_gfx_dirplist () from >>>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>>> #5 0x00003fff95b6fc28 in __GI_xdr_reference >>>>>>>>>> (xdrs=0x3fff90391d20, pp=0x3fff7c132020, size=, >>>>>>>>>> proc=) at xdr_ref.c:84 >>>>>>>>>> #6 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>>> objpp=0x3fff7c132020, obj_size=, >>>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>>> at xdr_ref.c:135 >>>>>>>>>> #7 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>>> #8 0x00003fff95b6fc28 in __GI_xdr_reference >>>>>>>>>> (xdrs=0x3fff90391d20, pp=0x3fff7c131ea0, size=, >>>>>>>>>> proc=) at xdr_ref.c:84 >>>>>>>>>> #9 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>>> objpp=0x3fff7c131ea0, obj_size=, >>>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>>> at xdr_ref.c:135 >>>>>>>>>> #10 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>>> #11 0x00003fff95b6fc28 in __GI_xdr_reference >>>>>>>>>> (xdrs=0x3fff90391d20, pp=0x3fff7c131d20, size=, >>>>>>>>>> proc=) at xdr_ref.c:84 >>>>>>>>>> #12 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>>> objpp=0x3fff7c131d20, obj_size=, >>>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>>> at xdr_ref.c:135 >>>>>>>>>> #13 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>>> #14 0x00003fff95b6fc28 in __GI_xdr_reference >>>>>>>>>> (xdrs=0x3fff90391d20, pp=0x3fff7c131ba0, size=, >>>>>>>>>> proc=) at xdr_ref.c:84 >>>>>>>>>> #15 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>>> objpp=0x3fff7c131ba0, obj_size=, >>>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>>> at xdr_ref.c:135 >>>>>>>>>> #16 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>>> #17 0x00003fff95b6fc28 in __GI_xdr_reference >>>>>>>>>> (xdrs=0x3fff90391d20, pp=0x3fff7c131a20, size=, >>>>>>>>>> proc=) at xdr_ref.c:84 >>>>>>>>>> #18 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>>> objpp=0x3fff7c131a20, obj_size=, >>>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>>> at xdr_ref.c:135 >>>>>>>>>> #19 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>>> #20 0x00003fff95b6fc28 in __GI_xdr_reference >>>>>>>>>> (xdrs=0x3fff90391d20, pp=0x3fff7c1318a0, size=, >>>>>>>>>> proc=) at xdr_ref.c:84 >>>>>>>>>> #21 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>>> objpp=0x3fff7c1318a0, obj_size=, >>>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>>> at xdr_ref.c:135 >>>>>>>>>> #22 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>>> #23 0x00003fff95b6fc28 in __GI_xdr_reference >>>>>>>>>> (xdrs=0x3fff90391d20, pp=0x3fff7c131720, size=, >>>>>>>>>> proc=) at xdr_ref.c:84 >>>>>>>>>> #24 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>>> objpp=0x3fff7c131720, obj_size=, >>>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>>> at xdr_ref.c:135 >>>>>>>>>> #25 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>>> #26 0x00003fff95b6fc28 in __GI_xdr_reference >>>>>>>>>> (xdrs=0x3fff90391d20, pp=0x3fff7c1315a0, size=, >>>>>>>>>> proc=) at xdr_ref.c:84 >>>>>>>>>> #27 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>>> objpp=0x3fff7c1315a0, obj_size=, >>>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>>> at xdr_ref.c:135 >>>>>>>>>> #28 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>>> #29 0x00003fff95b6fc28 in __GI_xdr_reference >>>>>>>>>> (xdrs=0x3fff90391d20, pp=0x3fff7c131420, size=, >>>>>>>>>> proc=) at xdr_ref.c:84 >>>>>>>>>> #30 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>>> objpp=0x3fff7c131420, obj_size=, >>>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>>> at xdr_ref.c:135 >>>>>>>>>> #31 0x00003fff95c4dec0 in .xdr_gfx_dirplist () from >>>>>>>>>> /usr/lib64/libgfxdr.so.0 >>>>>>>>>> #32 0x00003fff95b6fc28 in __GI_xdr_reference >>>>>>>>>> (xdrs=0x3fff90391d20, pp=0x3fff7c1312a0, size=, >>>>>>>>>> proc=) at xdr_ref.c:84 >>>>>>>>>> #33 0x00003fff95b6fe04 in __GI_xdr_pointer (xdrs=0x3fff90391d20, >>>>>>>>>> objpp=0x3fff7c1312a0, obj_size=, >>>>>>>>>> xdr_obj=@0x3fff95c6a4b0: 0x3fff95c4ddc0 <.xdr_gfx_dirplist>) >>>>>>>>>> at xdr_ref.c:135 >>>>>>>>>> >>>>>>>>>> Frames are getting repeated, could any one please me. >>>>>>>>>> -- >>>>>>>>>> Regards >>>>>>>>>> Abhishek Paliwal >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> Abhishek Paliwal >>>>>>>>> _______________________________________________ >>>>>>>>> Gluster-users mailing list >>>>>>>>> Gluster-users at gluster.org >>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Amar Tumballi (amarts) >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Regards >>>>>>> Abhishek Paliwal >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Regards >>>>>> Abhishek Paliwal >>>>>> >>>>> >>>>> >>>>> -- >>>>> Amar Tumballi (amarts) >>>>> >>>> >>>> >>>> -- >>>> >>>> >>>> >>>> >>>> Regards >>>> Abhishek Paliwal >>>> >>> >>> >>> -- >>> >>> >>> >>> >>> Regards >>> Abhishek Paliwal >>> >> >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal >> > > > -- > Amar Tumballi (amarts) > -- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: From spalai at redhat.com Wed Mar 13 07:33:35 2019 From: spalai at redhat.com (Susant Palai) Date: Wed, 13 Mar 2019 13:03:35 +0530 Subject: [Gluster-users] Removing Brick in Distributed GlusterFS In-Reply-To: References: <5409b2d08e789e3711cbda99900deb85083e6ff3@taste-of-it.de> Message-ID: On Tue, Mar 12, 2019 at 5:16 PM Taste-Of-IT wrote: > Hi Susant, > > and thanks for your fast reply and pointing me to that log. So i was able > to find the problem: "dht-rebalance.c:1052:__dht_check_free_space] > 0-vol4-dht: Could not find any subvol with space accomodating the file" > > But Volume Detail and df -h show xTB of free Disk Space and also Free > Inodes. > > Options Reconfigured: > performance.client-io-threads: on > storage.reserve: 0 > performance.parallel-readdir: off > performance.readdir-ahead: off > auth.allow: 192.168.0.* > nfs.disable: off > transport.address-family: inet > > Ok since there is enough disk space on other Bricks and i actually didnt > complete brick-remove, can i rerun brick-remove to rebalance last Files and > Folders? > Ideally, the error should not have been seen with disk space available on the target nodes. You can start remove-brick again and it should move out the remaining set of files to the other bricks. > > Thanks > Taste > > > Am 12.03.2019 10:49:13, schrieb Susant Palai: > > Would it be possible for you to pass the rebalance log file on the node > from which you want to remove the brick? (location : > /var/log/glusterfs/) > > + the following information: > 1 - gluster volume info > 2 - gluster volume status > 2 - df -h output on all 3 nodes > > > Susant > > On Tue, Mar 12, 2019 at 3:08 PM Taste-Of-IT > wrote: > > Hi, > i have a 3 Node Distributed Gluster. I have one Volume over all 3 Nodes / > Bricks. I want to remove one Brick and run gluster volume remove-brick > start. The Job completes and shows 11960 failures and > only transfers 5TB out of 15TB Data. I have still files and folders on this > volume on the brick to remove. I actually didnt run the final command with > "commit". Both other Nodes have each over 6TB of free Space, so it can hold > the remaininge Data from Brick3 theoretically. > > Need help. > thx > Taste > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From spalai at redhat.com Wed Mar 13 07:35:31 2019 From: spalai at redhat.com (Susant Palai) Date: Wed, 13 Mar 2019 13:05:31 +0530 Subject: [Gluster-users] Removing Brick in Distributed GlusterFS In-Reply-To: <2101a0f0763cc87ac55a91bff11cf350527ce48b@taste-of-it.de> References: <5409b2d08e789e3711cbda99900deb85083e6ff3@taste-of-it.de> <2101a0f0763cc87ac55a91bff11cf350527ce48b@taste-of-it.de> Message-ID: On Tue, Mar 12, 2019 at 8:48 PM Taste-Of-IT wrote: > Hi, > > i found a Bug about this in Version 3.10. I run 3.13.2 - for your > Information. As far as i can see, the default of 1% rule is active and not > configure 0 = for disable storage.reserve. > > Let me verify this bug on release 6 and will update you. (But my recommendation will be to not disable it as that could lead to other problems.) > So what can i do? Finish remove brick? Upgrade to newer Version and rerun > rebalance? > > thx > Taste > > Am 12.03.2019 12:45:51, schrieb Taste-Of-IT: > > Hi Susant, > > and thanks for your fast reply and pointing me to that log. So i was able > to find the problem: "dht-rebalance.c:1052:__dht_check_free_space] > 0-vol4-dht: Could not find any subvol with space accomodating the file" > > But Volume Detail and df -h show xTB of free Disk Space and also Free > Inodes. > > Options Reconfigured: > performance.client-io-threads: on > storage.reserve: 0 > performance.parallel-readdir: off > performance.readdir-ahead: off > auth.allow: 192.168.0.* > nfs.disable: off > transport.address-family: inet > > Ok since there is enough disk space on other Bricks and i actually didnt > complete brick-remove, can i rerun brick-remove to rebalance last Files and > Folders? > > Thanks > Taste > > > Am 12.03.2019 10:49:13, schrieb Susant Palai: > > Would it be possible for you to pass the rebalance log file on the node > from which you want to remove the brick? (location : > /var/log/glusterfs/) > > + the following information: > 1 - gluster volume info > 2 - gluster volume status > 2 - df -h output on all 3 nodes > > > Susant > > On Tue, Mar 12, 2019 at 3:08 PM Taste-Of-IT > wrote: > > Hi, > i have a 3 Node Distributed Gluster. I have one Volume over all 3 Nodes / > Bricks. I want to remove one Brick and run gluster volume remove-brick > start. The Job completes and shows 11960 failures and > only transfers 5TB out of 15TB Data. I have still files and folders on this > volume on the brick to remove. I actually didnt run the final command with > "commit". Both other Nodes have each over 6TB of free Space, so it can hold > the remaininge Data from Brick3 theoretically. > > Need help. > thx > Taste > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From kontakt at taste-of-it.de Wed Mar 13 09:09:00 2019 From: kontakt at taste-of-it.de (Taste-Of-IT) Date: Wed, 13 Mar 2019 09:09:00 +0000 Subject: [Gluster-users] Removing Brick in Distributed GlusterFS In-Reply-To: References: <5409b2d08e789e3711cbda99900deb85083e6ff3@taste-of-it.de> Message-ID: <9cca5e42313f2d021ffc9de3f43eb0dbd0266d2f@taste-of-it.de> Hi, i stopped Operation remove-brick. Than upgraded Debian to Stretch, while the Gluster Repository for Jessie and GlusterFS 4.0 Latest throw an http 404 Error, which i could not fix in time. So i upgraded to Stretch and than to latest GlusterFS 4.02. Than i run remove-brick again which lead to the same error. Brick1 and Brick2 has total Disk of 32,6TB, both have 3.3TB on free Disk now. Brick3 should remove with total of 16.3TB and free of 7.7TB. Files to remove are between a view KBs and over 40GB. So aprox 7TB has to move, which yes could not stored on 3,3TB*2, but as i understand rebalance should move files until free Diskspace on Brick1 and Brick2 is nearly Zero. Right? Ok, i will add a Temp-Disk and move xTB out of the Volume. At all i think its still a Bug. Thx. Am 13.03.2019 08:33:35, schrieb Susant Palai: > On Tue, Mar 12, 2019 at 5:16 PM Taste-Of-IT <> kontakt at taste-of-it.de> > wrote: > > Hi Susant, > > > > and thanks for your fast reply and pointing me to that log. So i was able to find the problem: "dht-rebalance.c:1052:__dht_check_free_space] 0-vol4-dht: Could not find any subvol with space accomodating the file" > > > > But Volume Detail and df -h show xTB of free Disk Space and also Free Inodes. > > Options Reconfigured: > > performance.client-io-threads: on > > storage.reserve: 0 > > performance.parallel-readdir: off > > performance.readdir-ahead: off > > auth.allow: 192.168.0.* > > nfs.disable: off > > transport.address-family: inet > > Ok since there is enough disk space on other Bricks and i actually didnt complete brick-remove, can i rerun brick-remove to rebalance last Files and Folders? > > > > > Ideally, the error should not have been seen with disk space available on the target nodes.? ?You can start remove-brick again and it should move out the remaining set of files to the other bricks. > > ? > > > Thanks > > > > Taste > > Am 12.03.2019 10:49:13, schrieb Susant Palai: > > > Would it be possible for you to pass the rebalance log file on the node from which you want to remove the brick? (location : /var/log/glusterfs/) > > > > > > + the following information: > > > ?1 - gluster volume info? > > > > > > ?2 - gluster volume status > > > > > > ?2 - df -h output on all 3 nodes > > > > > > Susant > > > > > > On Tue, Mar 12, 2019 at 3:08 PM Taste-Of-IT <> > > kontakt at taste-of-it.de> > > > wrote: > > > > Hi, > > > > i have a 3 Node Distributed Gluster. I have one Volume over all 3 Nodes / Bricks.? I want to remove one Brick and run gluster volume remove-brick start. The Job completes and shows 11960 failures and only transfers 5TB out of 15TB Data. I have still files and folders on this volume on the brick to remove. I actually didnt run the final command? with "commit". Both other Nodes have each over 6TB of free Space, so it can hold the remaininge Data from Brick3 theoretically. > > > > > > > > Need help. > > > > thx > > > > Taste > > > > _______________________________________________ > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spalai at redhat.com Wed Mar 13 09:40:12 2019 From: spalai at redhat.com (Susant Palai) Date: Wed, 13 Mar 2019 15:10:12 +0530 Subject: [Gluster-users] Removing Brick in Distributed GlusterFS In-Reply-To: <9cca5e42313f2d021ffc9de3f43eb0dbd0266d2f@taste-of-it.de> References: <5409b2d08e789e3711cbda99900deb85083e6ff3@taste-of-it.de> <9cca5e42313f2d021ffc9de3f43eb0dbd0266d2f@taste-of-it.de> Message-ID: On Wed, Mar 13, 2019 at 2:39 PM Taste-Of-IT wrote: > Hi, > i stopped Operation remove-brick. Than upgraded Debian to Stretch, while > the Gluster Repository for Jessie and GlusterFS 4.0 Latest throw an http > 404 Error, which i could not fix in time. So i upgraded to Stretch and than > to latest GlusterFS 4.02. Than i run remove-brick again which lead to the > same error. > > Brick1 and Brick2 has total Disk of 32,6TB, both have 3.3TB on free Disk > now. Brick3 should remove with total of 16.3TB and free of 7.7TB. Files to > remove are between a view KBs and over 40GB. So aprox 7TB has to move, > which yes could not stored on 3,3TB*2, but as i understand rebalance should > move files until free Diskspace on Brick1 and Brick2 is nearly Zero. Right? > Ok, i will add a Temp-Disk and move xTB out of the Volume. > > At all i think its still a Bug. > Ok, then please file a bug with the details and we can discuss there. Susant > Thx. > > Am 13.03.2019 08:33:35, schrieb Susant Palai: > > > > On Tue, Mar 12, 2019 at 5:16 PM Taste-Of-IT > wrote: > > Hi Susant, > > and thanks for your fast reply and pointing me to that log. So i was able > to find the problem: "dht-rebalance.c:1052:__dht_check_free_space] > 0-vol4-dht: Could not find any subvol with space accomodating the file" > > But Volume Detail and df -h show xTB of free Disk Space and also Free > Inodes. > > Options Reconfigured: > performance.client-io-threads: on > storage.reserve: 0 > performance.parallel-readdir: off > performance.readdir-ahead: off > auth.allow: 192.168.0.* > nfs.disable: off > transport.address-family: inet > > Ok since there is enough disk space on other Bricks and i actually didnt > complete brick-remove, can i rerun brick-remove to rebalance last Files and > Folders? > > > Ideally, the error should not have been seen with disk space available on > the target nodes. You can start remove-brick again and it should move out > the remaining set of files to the other bricks. > > > > Thanks > Taste > > > Am 12.03.2019 10:49:13, schrieb Susant Palai: > > Would it be possible for you to pass the rebalance log file on the node > from which you want to remove the brick? (location : > /var/log/glusterfs/) > > + the following information: > 1 - gluster volume info > 2 - gluster volume status > 2 - df -h output on all 3 nodes > > > Susant > > On Tue, Mar 12, 2019 at 3:08 PM Taste-Of-IT > wrote: > > Hi, > i have a 3 Node Distributed Gluster. I have one Volume over all 3 Nodes / > Bricks. I want to remove one Brick and run gluster volume remove-brick > start. The Job completes and shows 11960 failures and > only transfers 5TB out of 15TB Data. I have still files and folders on this > volume on the brick to remove. I actually didnt run the final command with > "commit". Both other Nodes have each over 6TB of free Space, so it can hold > the remaininge Data from Brick3 theoretically. > > Need help. > thx > Taste > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauro.tridici at cmcc.it Wed Mar 13 10:25:40 2019 From: mauro.tridici at cmcc.it (Mauro Tridici) Date: Wed, 13 Mar 2019 11:25:40 +0100 Subject: [Gluster-users] "rpc_clnt_ping_timer_expired" errors In-Reply-To: References: <96B07283-D8AB-4F06-909D-E00424625528@cmcc.it> <42758A0E-8BE9-497D-BDE3-55D7DC633BC7@cmcc.it> <6A8CF4A4-98EA-48C3-A059-D60D1B2721C7@cmcc.it> <2CF49168-9C1B-4931-8C34-8157262A137A@cmcc.it> <7A151CC9-A0AE-4A45-B450-A4063D216D9E@cmcc.it> <32D53ECE-3F49-4415-A6EE-241B351AC2BA@cmcc.it> <4685A75B-5978-4338-9C9F-4A02FB40B9BC@cmcc.it> <4D2E6B04-C2E8-4FD5-B72D-E7C05931C6F9@cmcc.it> <4E332A56-B318-4BC1-9F44-84AB4392A5DE@cmcc.it> <832FD362-3B14-40D8-8530-604419300476@cmcc.it> <8D926643-1093-48ED-823F-D8F117F702CF@cmcc.it> <9D0BE438-DF11-4D0A-AB85-B44357032F29@cmcc.it> <5F0AC378-8170-4342-8473-9C17159CAC1D@cmcc.it> <7A50E86D-9E27-4EA7-883B-45E9F973991A@cmcc.it> <58B5DB7F-DCB4-4FBF-B1F7-681315B1613A@cmcc.it> <6327B44F-4E7E-46BB-A74C-70F4457DD1EB@cmcc.it> <0167DF4A-8329-4A1A-B439-857DFA6F78BB@cmcc.it> <763F334E-35B8-4729-B8E1-D30866754EEE@cmcc.it> <91DFC9AC-4805-41EB-AC6F-5722E018DD6E@cmcc.it> <8A9752B8-B231-4570-8FF4-8D3D781E7D42@cmcc.it> <47A24804-F975-4EE6-9FA5-67FCDA18D707@cmcc.it> <637FF5D2-D1F4-4686-9D48-646A96F67B96@cmcc.it> <4A87495F-3755-48F7-8507-085869069C64@cmcc.it> <3854BBF6-5B98-4AB3-A67E-E7DE59E69A63@cmcc.it> <313DA021-9173-4899-96B0-831B10B00B61@cmcc.it> <17996AFD-DFC8-40F3-9D09-DB6DDAD5B7D6@cmcc.it> <7074B5D8-955A-4802-A9F3-606C99734417@cmcc.it> <83B84BF9-8334-4230-B6F8-0BC4BFBEFF15@cmcc.it> <133B9AE4-9792-4F72-AD91-D36E7B9EC711@cmcc.it> <6611C4B0-57FD-4390-88B5-BD373555D4C5@cmcc.it> Message-ID: Hi Raghavendra, Yes, server.event-thread has been changed from 4 to 8. During last days, I noticed that the error events are still here although they have been considerably reduced. So, I used grep command against the log files in order to provide you a global vision about the warning, error and critical events appeared today at 06:xx (may be useful I hope). I collected the info from s06 gluster server, but the behaviour is the the almost the same on the other gluster servers. ERRORS: CWD: /var/log/glusterfs COMMAND: grep " E " *.log |grep "2019-03-13 06:" (I can see a lot of this kind of message in the same period but I'm notifying you only one record for each type of error) glusterd.log:[2019-03-13 06:12:35.982863] E [MSGID: 101042] [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /var/run/gluster/tier2_quota_list/ glustershd.log:[2019-03-13 06:14:28.666562] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a71ddcebb] (--> /lib64/libgfr pc.so.0(saved_frames_unwind+0x1de)[0x7f4a71ba1d9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4a71ba1ebe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup +0x90)[0x7f4a71ba3640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f4a71ba4130] ))))) 0-tier2-client-55: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) called at 2019-03-13 06:14:14.858441 (xid=0x17fddb50) glustershd.log:[2019-03-13 06:17:48.883825] E [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to 192.168.0.55:49158 failed (Connection timed out); disco nnecting socket glustershd.log:[2019-03-13 06:19:58.931798] E [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to 192.168.0.55:49158 failed (Connection timed out); disco nnecting socket glustershd.log:[2019-03-13 06:22:08.979829] E [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to 192.168.0.55:49158 failed (Connection timed out); disco nnecting socket glustershd.log:[2019-03-13 06:22:36.226847] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote operation failed [Transport endpoint is not connected] glustershd.log:[2019-03-13 06:22:36.306669] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote operation failed [Transport endpoint is not connected] glustershd.log:[2019-03-13 06:22:36.385257] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote operation failed [Transport endpoint is not connected] WARNINGS: CWD: /var/log/glusterfs COMMAND: grep " W " *.log |grep "2019-03-13 06:" (I can see a lot of this kind of message in the same period but I'm notifying you only one record for each type of warnings) glustershd.log:[2019-03-13 06:14:28.666772] W [MSGID: 114031] [client-rpc-fops.c:1080:client3_3_getxattr_cbk] 0-tier2-client-55: remote operation failed. Path: (b6b35d0f-f34d-4c25-bbe8-74bde0248d7e). Key: (null) [Transport endpoint is not connected] glustershd.log:[2019-03-13 06:14:31.421576] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (2) glustershd.log:[2019-03-13 06:15:31.547417] W [MSGID: 122032] [ec-heald.c:266:ec_shd_index_sweep] 0-tier2-disperse-9: unable to get index-dir on tier2-client-55 [Operation now in progress] quota-mount-tier2.log:[2019-03-13 06:12:36.116277] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'trans port.address-family', continuing with correction quota-mount-tier2.log:[2019-03-13 06:12:36.198430] W [MSGID: 101174] [graph.c:363:_log_if_unknown_option] 0-tier2-readdir-ahead: option 'parallel-readdir' is not recognized quota-mount-tier2.log:[2019-03-13 06:12:37.945007] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f340892be25] -->/usr/sbin/glusterfs(gluste rfs_sigwaiter+0xe5) [0x55ef010164b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55ef0101632b] ) 0-: received signum (15), shutting down CRITICALS: CWD: /var/log/glusterfs COMMAND: grep " C " *.log |grep "2019-03-13 06:" no critical errors at 06:xx only one critical error during the day [root at s06 glusterfs]# grep " C " *.log |grep "2019-03-13" glustershd.log:[2019-03-13 02:21:29.126279] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server 192.168.0.55:49158 has not responded in the last 42 seconds, disconnecting. Thank you very much for your help. Regards, Mauro > On 12 Mar 2019, at 05:17, Raghavendra Gowdappa wrote: > > Was the suggestion to increase server.event-thread values tried? If yes, what were the results? > > On Mon, Mar 11, 2019 at 2:40 PM Mauro Tridici > wrote: > Dear All, > > do you have any suggestions about the right way to "debug" this issue? > In attachment, the updated logs of ?s06" gluster server. > > I noticed a lot of intermittent warning and error messages. > > Thank you in advance, > Mauro > > > >> On 4 Mar 2019, at 18:45, Raghavendra Gowdappa > wrote: >> >> >> +Gluster Devel , +Gluster-users >> >> I would like to point out another issue. Even if what I suggested prevents disconnects, part of the solution would be only symptomatic treatment and doesn't address the root cause of the problem. In most of the ping-timer-expiry issues, the root cause is the increased load on bricks and the inability of bricks to be responsive under high load. So, the actual solution would be doing any or both of the following: >> * identify the source of increased load and if possible throttle it. Internal heal processes like self-heal, rebalance, quota heal are known to pump traffic into bricks without much throttling (io-threads _might_ do some throttling, but my understanding is its not sufficient). >> * identify the reason for bricks to become unresponsive during load. This may be fixable issues like not enough event-threads to read from network or difficult to fix issues like fsync on backend fs freezing the process or semi fixable issues (in code) like lock contention. >> >> So any genuine effort to fix ping-timer-issues (to be honest most of the times they are not issues related to rpc/network) would involve performance characterization of various subsystems on bricks and clients. Various subsystems can include (but not necessarily limited to), underlying OS/filesystem, glusterfs processes, CPU consumption etc >> >> regards, >> Raghavendra >> >> On Mon, Mar 4, 2019 at 9:31 PM Mauro Tridici > wrote: >> Thank you, let?s try! >> I will inform you about the effects of the change. >> >> Regards, >> Mauro >> >>> On 4 Mar 2019, at 16:55, Raghavendra Gowdappa > wrote: >>> >>> >>> >>> On Mon, Mar 4, 2019 at 8:54 PM Mauro Tridici > wrote: >>> Hi Raghavendra, >>> >>> thank you for your reply. >>> Yes, you are right. It is a problem that seems to happen randomly. >>> At this moment, server.event-threads value is 4. I will try to increase this value to 8. Do you think that it could be a valid value ? >>> >>> Yes. We can try with that. You should see at least frequency of ping-timer related disconnects reduce with this value (even if it doesn't eliminate the problem completely). >>> >>> >>> Regards, >>> Mauro >>> >>> >>>> On 4 Mar 2019, at 15:36, Raghavendra Gowdappa > wrote: >>>> >>>> >>>> >>>> On Mon, Mar 4, 2019 at 8:01 PM Nithya Balachandran > wrote: >>>> Hi Mauro, >>>> >>>> It looks like some problem on s06. Are all your other nodes ok? Can you send us the gluster logs from this node? >>>> >>>> @Raghavendra G , do you have any idea as to how this can be debugged? Maybe running top ? Or debug brick logs? >>>> >>>> If we can reproduce the problem, collecting tcpdump on both ends of connection will help. But, one common problem is these bugs are inconsistently reproducible and hence we may not be able to capture tcpdump at correct intervals. Other than that, we can try to collect some evidence that poller threads were busy (waiting on locks). But, not sure what debug data provides that information. >>>> >>>> From what I know, its difficult to collect evidence for this issue and we could only reason about it. >>>> >>>> We can try a workaround though - try increasing server.event-threads and see whether ping-timer expiry issues go away with an optimal value. If that's the case, it kind of provides proof for our hypothesis. >>>> >>>> >>>> >>>> Regards, >>>> Nithya >>>> >>>> On Mon, 4 Mar 2019 at 15:25, Mauro Tridici > wrote: >>>> Hi All, >>>> >>>> some minutes ago I received this message from NAGIOS server >>>> >>>> ***** Nagios ***** >>>> >>>> Notification Type: PROBLEM >>>> >>>> Service: Brick - /gluster/mnt2/brick >>>> Host: s06 >>>> Address: s06-stg >>>> State: CRITICAL >>>> >>>> Date/Time: Mon Mar 4 10:25:33 CET 2019 >>>> >>>> Additional Info: >>>> CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. >>>> >>>> I checked the network, RAM and CPUs usage on s06 node and everything seems to be ok. >>>> No bricks are in error state. In /var/log/messages, I detected again a crash of ?check_vol_utili? that I think it is a module used by NRPE executable (that is the NAGIOS client). >>>> >>>> Mar 4 10:15:29 s06 kernel: traps: check_vol_utili[161224] general protection ip:7facffa0a66d sp:7ffe9f4e6fc0 error:0 in libglusterfs.so.0.0.1[7facff9b7000+f7000] >>>> Mar 4 10:15:29 s06 abrt-hook-ccpp: Process 161224 (python2.7) of user 0 killed by SIGSEGV - dumping core >>>> Mar 4 10:15:29 s06 abrt-server: Generating core_backtrace >>>> Mar 4 10:15:29 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>> Mar 4 10:16:01 s06 systemd: Created slice User Slice of root. >>>> Mar 4 10:16:01 s06 systemd: Starting User Slice of root. >>>> Mar 4 10:16:01 s06 systemd: Started Session 201010 of user root. >>>> Mar 4 10:16:01 s06 systemd: Starting Session 201010 of user root. >>>> Mar 4 10:16:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 4 10:16:01 s06 systemd: Stopping User Slice of root. >>>> Mar 4 10:16:24 s06 abrt-server: Duplicate: UUID >>>> Mar 4 10:16:24 s06 abrt-server: DUP_OF_DIR: /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>> Mar 4 10:16:24 s06 abrt-server: Deleting problem directory ccpp-2019-03-04-10:15:29-161224 (dup of ccpp-2018-09-25-12:27:42-13041) >>>> Mar 4 10:16:24 s06 abrt-server: Generating core_backtrace >>>> Mar 4 10:16:24 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>> Mar 4 10:16:24 s06 abrt-server: Cannot notify '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event 'report_uReport' exited with 1 >>>> Mar 4 10:16:24 s06 abrt-hook-ccpp: Process 161391 (python2.7) of user 0 killed by SIGABRT - dumping core >>>> Mar 4 10:16:25 s06 abrt-server: Generating core_backtrace >>>> Mar 4 10:16:25 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>> Mar 4 10:17:01 s06 systemd: Created slice User Slice of root. >>>> >>>> Also, I noticed the following errors that I think are very critical: >>>> >>>> Mar 4 10:21:12 s06 glustershd[20355]: [2019-03-04 09:21:12.954798] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server 192.168.0.55:49158 has not responded in the last 42 seconds, disconnecting. >>>> Mar 4 10:22:01 s06 systemd: Created slice User Slice of root. >>>> Mar 4 10:22:01 s06 systemd: Starting User Slice of root. >>>> Mar 4 10:22:01 s06 systemd: Started Session 201017 of user root. >>>> Mar 4 10:22:01 s06 systemd: Starting Session 201017 of user root. >>>> Mar 4 10:22:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 4 10:22:01 s06 systemd: Stopping User Slice of root. >>>> Mar 4 10:22:03 s06 glustershd[20355]: [2019-03-04 09:22:03.964120] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server 192.168.0.54:49165 has not responded in the last 42 seconds, disconnecting. >>>> Mar 4 10:23:01 s06 systemd: Created slice User Slice of root. >>>> Mar 4 10:23:01 s06 systemd: Starting User Slice of root. >>>> Mar 4 10:23:01 s06 systemd: Started Session 201018 of user root. >>>> Mar 4 10:23:01 s06 systemd: Starting Session 201018 of user root. >>>> Mar 4 10:23:02 s06 systemd: Removed slice User Slice of root. >>>> Mar 4 10:23:02 s06 systemd: Stopping User Slice of root. >>>> Mar 4 10:24:01 s06 systemd: Created slice User Slice of root. >>>> Mar 4 10:24:01 s06 systemd: Starting User Slice of root. >>>> Mar 4 10:24:01 s06 systemd: Started Session 201019 of user root. >>>> Mar 4 10:24:01 s06 systemd: Starting Session 201019 of user root. >>>> Mar 4 10:24:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 4 10:24:01 s06 systemd: Stopping User Slice of root. >>>> Mar 4 10:24:03 s06 glustershd[20355]: [2019-03-04 09:24:03.982502] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-16: server 192.168.0.52:49158 has not responded in the last 42 seconds, disconnecting. >>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746109] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-3: server 192.168.0.51:49153 has not responded in the last 42 seconds, disconnecting. >>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746215] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server 192.168.0.52:49156 has not responded in the last 42 seconds, disconnecting. >>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746260] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-21: server 192.168.0.51:49159 has not responded in the last 42 seconds, disconnecting. >>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746296] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server 192.168.0.52:49161 has not responded in the last 42 seconds, disconnecting. >>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746413] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server 192.168.0.54:49165 has not responded in the last 42 seconds, disconnecting. >>>> Mar 4 10:24:07 s06 glustershd[20355]: [2019-03-04 09:24:07.982952] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-45: server 192.168.0.54:49155 has not responded in the last 42 seconds, disconnecting. >>>> Mar 4 10:24:18 s06 glustershd[20355]: [2019-03-04 09:24:18.990929] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server 192.168.0.52:49161 has not responded in the last 42 seconds, disconnecting. >>>> Mar 4 10:24:31 s06 glustershd[20355]: [2019-03-04 09:24:31.995781] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-20: server 192.168.0.53:49159 has not responded in the last 42 seconds, disconnecting. >>>> Mar 4 10:25:01 s06 systemd: Created slice User Slice of root. >>>> Mar 4 10:25:01 s06 systemd: Starting User Slice of root. >>>> Mar 4 10:25:01 s06 systemd: Started Session 201020 of user root. >>>> Mar 4 10:25:01 s06 systemd: Starting Session 201020 of user root. >>>> Mar 4 10:25:01 s06 systemd: Removed slice User Slice of root. >>>> Mar 4 10:25:01 s06 systemd: Stopping User Slice of root. >>>> Mar 4 10:25:57 s06 systemd: Created slice User Slice of root. >>>> Mar 4 10:25:57 s06 systemd: Starting User Slice of root. >>>> Mar 4 10:25:57 s06 systemd-logind: New session 201021 of user root. >>>> Mar 4 10:25:57 s06 systemd: Started Session 201021 of user root. >>>> Mar 4 10:25:57 s06 systemd: Starting Session 201021 of user root. >>>> Mar 4 10:26:01 s06 systemd: Started Session 201022 of user root. >>>> Mar 4 10:26:01 s06 systemd: Starting Session 201022 of user root. >>>> Mar 4 10:26:21 s06 nrpe[162388]: Error: Could not complete SSL handshake with 192.168.1.56 : 5 >>>> Mar 4 10:27:01 s06 systemd: Started Session 201023 of user root. >>>> Mar 4 10:27:01 s06 systemd: Starting Session 201023 of user root. >>>> Mar 4 10:28:01 s06 systemd: Started Session 201024 of user root. >>>> Mar 4 10:28:01 s06 systemd: Starting Session 201024 of user root. >>>> Mar 4 10:29:01 s06 systemd: Started Session 201025 of user root. >>>> Mar 4 10:29:01 s06 systemd: Starting Session 201025 of user root. >>>> >>>> But, unfortunately, I don?t understand why it is happening. >>>> Now, NAGIOS server shows that s06 status is ok: >>>> >>>> ***** Nagios ***** >>>> >>>> Notification Type: RECOVERY >>>> >>>> Service: Brick - /gluster/mnt2/brick >>>> Host: s06 >>>> Address: s06-stg >>>> State: OK >>>> >>>> Date/Time: Mon Mar 4 10:35:23 CET 2019 >>>> >>>> Additional Info: >>>> OK: Brick /gluster/mnt2/brick is up >>>> >>>> Nothing is changed from RAM, CPUs, and NETWORK point of view. >>>> /var/log/message file has been updated: >>>> >>>> Mar 4 10:32:01 s06 systemd: Starting Session 201029 of user root. >>>> Mar 4 10:32:30 s06 glustershd[20355]: [2019-03-04 09:32:30.069082] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server 192.168.0.52:49156 has not responded in the last 42 seconds, disconnecting. >>>> Mar 4 10:32:55 s06 glustershd[20355]: [2019-03-04 09:32:55.074689] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-66: server 192.168.0.54:49167 has not responded in the last 42 seconds, disconnecting. >>>> Mar 4 10:33:01 s06 systemd: Started Session 201030 of user root. >>>> Mar 4 10:33:01 s06 systemd: Starting Session 201030 of user root. >>>> Mar 4 10:34:01 s06 systemd: Started Session 201031 of user root. >>>> Mar 4 10:34:01 s06 systemd: Starting Session 201031 of user root. >>>> Mar 4 10:35:01 s06 nrpe[162562]: Could not read request from client 192.168.1.56, bailing out... >>>> Mar 4 10:35:01 s06 nrpe[162562]: INFO: SSL Socket Shutdown. >>>> Mar 4 10:35:01 s06 systemd: Started Session 201032 of user root. >>>> Mar 4 10:35:01 s06 systemd: Starting Session 201032 of user root. >>>> >>>> Could you please help me to understand what it?s happening ? >>>> Thank you in advance. >>>> >>>> Rergards, >>>> Mauro >>>> >>>> >>>>> On 1 Mar 2019, at 12:17, Mauro Tridici > wrote: >>>>> >>>>> >>>>> Thank you, Milind. >>>>> I executed the instructions you suggested: >>>>> >>>>> - grep ?blocked for? /var/log/messages on s06 returns no output (no ?blocked? word is detected in messages file); >>>>> - in /var/log/messages file I can see this kind of error repeated for a lot of times: >>>>> >>>>> Mar 1 08:43:01 s06 systemd: Starting Session 196071 of user root. >>>>> Mar 1 08:43:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 1 08:43:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 1 08:43:02 s06 kernel: traps: check_vol_utili[57091] general protection ip:7f88e76ee66d sp:7ffe5a5bcc30 error:0 in libglusterfs.so.0.0.1[7f88e769b000+f7000] >>>>> Mar 1 08:43:02 s06 abrt-hook-ccpp: Process 57091 (python2.7) of user 0 killed by SIGSEGV - dumping core >>>>> Mar 1 08:43:02 s06 abrt-server: Generating core_backtrace >>>>> Mar 1 08:43:02 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>> Mar 1 08:43:58 s06 abrt-server: Duplicate: UUID >>>>> Mar 1 08:43:58 s06 abrt-server: DUP_OF_DIR: /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>>> Mar 1 08:43:58 s06 abrt-server: Deleting problem directory ccpp-2019-03-01-08:43:02-57091 (dup of ccpp-2018-09-25-12:27:42-13041) >>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Activating service name='org.freedesktop.problems' (using servicehelper) >>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Successfully activated service 'org.freedesktop.problems' >>>>> Mar 1 08:43:58 s06 abrt-server: Generating core_backtrace >>>>> Mar 1 08:43:58 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>> Mar 1 08:43:58 s06 abrt-server: Cannot notify '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event 'report_uReport' exited with 1 >>>>> Mar 1 08:44:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 1 08:44:01 s06 systemd: Starting User Slice of root. >>>>> Mar 1 08:44:01 s06 systemd: Started Session 196072 of user root. >>>>> Mar 1 08:44:01 s06 systemd: Starting Session 196072 of user root. >>>>> Mar 1 08:44:01 s06 systemd: Removed slice User Slice of root. >>>>> >>>>> - in /var/log/messages file I can see also 4 errors related to other cluster servers: >>>>> >>>>> Mar 1 11:05:01 s06 systemd: Starting User Slice of root. >>>>> Mar 1 11:05:01 s06 systemd: Started Session 196230 of user root. >>>>> Mar 1 11:05:01 s06 systemd: Starting Session 196230 of user root. >>>>> Mar 1 11:05:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 1 11:05:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 1 11:05:59 s06 glustershd[70117]: [2019-03-01 10:05:59.347094] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-33: server 192.168.0.51:49163 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 1 11:06:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 1 11:06:01 s06 systemd: Starting User Slice of root. >>>>> Mar 1 11:06:01 s06 systemd: Started Session 196231 of user root. >>>>> Mar 1 11:06:01 s06 systemd: Starting Session 196231 of user root. >>>>> Mar 1 11:06:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 1 11:06:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 1 11:06:12 s06 glustershd[70117]: [2019-03-01 10:06:12.351319] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-1: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 1 11:06:38 s06 glustershd[70117]: [2019-03-01 10:06:38.356920] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-7: server 192.168.0.52:49155 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 1 11:07:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 1 11:07:01 s06 systemd: Starting User Slice of root. >>>>> Mar 1 11:07:01 s06 systemd: Started Session 196232 of user root. >>>>> Mar 1 11:07:01 s06 systemd: Starting Session 196232 of user root. >>>>> Mar 1 11:07:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 1 11:07:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 1 11:07:36 s06 glustershd[70117]: [2019-03-01 10:07:36.366259] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-0: server 192.168.0.51:49152 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 1 11:08:01 s06 systemd: Created slice User Slice of root. >>>>> >>>>> No ?blocked? word is in /var/log/messages files on other cluster servers. >>>>> In attachment, the /var/log/messages file from s06 server. >>>>> >>>>> Thank you in advance, >>>>> Mauro >>>>> >>>>> >>>>> >>>>> >>>>>> On 1 Mar 2019, at 11:47, Milind Changire > wrote: >>>>>> >>>>>> The traces of very high disk activity on the servers are often found in /var/log/messages >>>>>> You might want to grep for "blocked for" in /var/log/messages on s06 and correlate the timestamps to confirm the unresponsiveness as reported in gluster client logs. >>>>>> In cases of high disk activity, although the operating system continues to respond to ICMP pings, the processes writing to disks often get blocked to a large flush to the disk which could span beyond 42 seconds and hence result in ping-timer-expiry logs. >>>>>> >>>>>> As a side note: >>>>>> If you indeed find gluster processes being blocked in /var/log/messages, you might want to tweak sysctl tunables called vm.dirty_background_ratio or vm.dirty_background_bytes to a smaller value than the existing. Please read up more on those tunables before touching the settings. >>>>>> >>>>>> >>>>>> On Fri, Mar 1, 2019 at 4:06 PM Mauro Tridici > wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> in attachment the client log captured after changing network.ping-timeout option. >>>>>> I noticed this error involving server 192.168.0.56 (s06) >>>>>> >>>>>> [2019-03-01 09:23:36.077287] I [rpc-clnt.c:1962:rpc_clnt_reconfig] 0-tier2-client-71: changing ping timeout to 42 (from 0) >>>>>> [2019-03-01 09:23:36.078213] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>> [2019-03-01 09:23:36.078432] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>> [2019-03-01 09:23:36.092357] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>> [2019-03-01 09:23:36.094146] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>> [2019-03-01 10:06:24.708082] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-50: server 192.168.0.56:49156 has not responded in the last 42 seconds, disconnecting. >>>>>> >>>>>> I don?t know why it happens, s06 server seems to be reachable. >>>>>> >>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>> Trying 192.168.0.56... >>>>>> Connected to 192.168.0.56. >>>>>> Escape character is '^]'. >>>>>> ^CConnection closed by foreign host. >>>>>> [athena_login2][/users/home/sysm02/]> ping 192.168.0.56 >>>>>> PING 192.168.0.56 (192.168.0.56) 56(84) bytes of data. >>>>>> 64 bytes from 192.168.0.56 : icmp_seq=1 ttl=64 time=0.116 ms >>>>>> 64 bytes from 192.168.0.56 : icmp_seq=2 ttl=64 time=0.101 ms >>>>>> >>>>>> --- 192.168.0.56 ping statistics --- >>>>>> 2 packets transmitted, 2 received, 0% packet loss, time 1528ms >>>>>> rtt min/avg/max/mdev = 0.101/0.108/0.116/0.012 ms >>>>>> >>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>> Trying 192.168.0.56... >>>>>> Connected to 192.168.0.56. >>>>>> Escape character is '^]'. >>>>>> >>>>>> Thank you for your help, >>>>>> Mauro >>>>>> >>>>>> >>>>>> >>>>>>> On 1 Mar 2019, at 10:29, Mauro Tridici > wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> thank you for the explanation. >>>>>>> I just changed network.ping-timeout option to default value (network.ping-timeout=42). >>>>>>> >>>>>>> I will check the logs to see if the errors will appear again. >>>>>>> >>>>>>> Regards, >>>>>>> Mauro >>>>>>> >>>>>>>> On 1 Mar 2019, at 04:43, Milind Changire > wrote: >>>>>>>> >>>>>>>> network.ping-timeout should not be set to zero for non-glusterd clients. >>>>>>>> glusterd is a special case for which ping-timeout is set to zero via /etc/glusterfs/glusterd.vol >>>>>>>> >>>>>>>> Setting network.ping-timeout to zero disables arming of the ping timer for connections. This disables testing the connection for responsiveness and hence avoids proactive fail-over. >>>>>>>> >>>>>>>> Please reset network.ping-timeout to a non-zero positive value, eg. 42 >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Feb 28, 2019 at 5:07 PM Nithya Balachandran > wrote: >>>>>>>> Adding Raghavendra and Milind to comment on this. >>>>>>>> >>>>>>>> What is the effect of setting network.ping-timeout to 0 and should it be set back to 42? >>>>>>>> Regards, >>>>>>>> Nithya >>>>>>>> >>>>>>>> On Thu, 28 Feb 2019 at 16:01, Mauro Tridici > wrote: >>>>>>>> Hi Nithya, >>>>>>>> >>>>>>>> sorry for the late. >>>>>>>> network.ping-timeout has been set to 0 in order to try to solve some timeout problems, but it didn?t help. >>>>>>>> I can set it to the default value. >>>>>>>> >>>>>>>> Can I proceed with the change? >>>>>>>> >>>>>>>> Thank you, >>>>>>>> Mauro >>>>>>>> >>>>>>>> >>>>>>>>> On 28 Feb 2019, at 04:41, Nithya Balachandran > wrote: >>>>>>>>> >>>>>>>>> Hi Mauro, >>>>>>>>> >>>>>>>>> Is network.ping-timeout still set to 0. The default value is 42. Is there a particular reason why this was changed? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Nithya >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, 27 Feb 2019 at 21:32, Mauro Tridici > wrote: >>>>>>>>> >>>>>>>>> Hi Xavi, >>>>>>>>> >>>>>>>>> thank you for the detailed explanation and suggestions. >>>>>>>>> Yes, transport.listen-backlog option is still set to 1024. >>>>>>>>> >>>>>>>>> I will check the network and connectivity status using ?ping? and ?telnet? as soon as the errors will come back again. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Mauro >>>>>>>>> >>>>>>>>>> Il giorno 27 feb 2019, alle ore 16:42, Xavi Hernandez > ha scritto: >>>>>>>>>> >>>>>>>>>> Hi Mauro, >>>>>>>>>> >>>>>>>>>> those errors say that the mount point is not connected to some of the bricks while executing operations. I see references to 3rd and 6th bricks of several disperse sets, which seem to map to server s06. For some reason, gluster is having troubles connecting from the client machine to that particular server. At the end of the log I see that after long time a reconnect is done to both of them. However little after, other bricks from the s05 get disconnected and a reconnect times out. >>>>>>>>>> >>>>>>>>>> That's really odd. It seems like if server/communication is cut to s06 for some time, then restored, and then the same happens to the next server. >>>>>>>>>> >>>>>>>>>> If the servers are really online and it's only a communication issue, it explains why server memory and network has increased: if the problem only exists between the client and servers, any write made by the client will automatically mark the file as damaged, since some of the servers have not been updated. Since self-heal runs from the server nodes, they will probably be correctly connected to all bricks, which allows them to heal the just damaged file, which increases memory and network usage. >>>>>>>>>> >>>>>>>>>> I guess you still have transport.listen-backlog set to 1024, right ? >>>>>>>>>> >>>>>>>>>> Just to try to identify if the problem really comes from network, can you check if you lose some pings from the client to all of the servers while you are seeing those errors in the log file ? >>>>>>>>>> >>>>>>>>>> You can also check if during those errors, you can telnet to the port of the brick from the client. >>>>>>>>>> >>>>>>>>>> Xavi >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Feb 26, 2019 at 10:17 AM Mauro Tridici > wrote: >>>>>>>>>> Hi Nithya, >>>>>>>>>> >>>>>>>>>> ?df -h? operation is not still slow, but no users are using the volume, RAM and NETWORK usage is ok on the client node. >>>>>>>>>> >>>>>>>>>> I was worried about this kind of warnings/errors: >>>>>>>>>> >>>>>>>>>> [2019-02-25 10:59:00.664323] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-6: Executing operation with some subvolumes unavailable (20) >>>>>>>>>> >>>>>>>>>> [2019-02-26 03:11:35.212603] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-26 03:10:56.549903 (xid=0x106f1c5) >>>>>>>>>> >>>>>>>>>> [2019-02-26 03:13:03.313831] E [socket.c:2376:socket_connect_finish] 0-tier2-client-50: connection to 192.168.0.56:49156 failed (Timeout della connessione); disconnecting socket >>>>>>>>>> >>>>>>>>>> It seems that some subvolumes are not available and 192.168.0.56 server (s06) is not reachable. >>>>>>>>>> But gluster servers are up&running and bricks are ok. >>>>>>>>>> >>>>>>>>>> In attachment the updated tier2.log file. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thank you. >>>>>>>>>> Regards, >>>>>>>>>> Mauro >>>>>>>>>> >>>>>>>>>>> Il giorno 26 feb 2019, alle ore 04:03, Nithya Balachandran > ha scritto: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I see a lot of EC messages in the log but they don't seem very serious. Xavi, can you take a look? >>>>>>>>>>> >>>>>>>>>>> The only errors I see are: >>>>>>>>>>> [2019-02-25 10:58:45.519871] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-25 10:57:47.429969 (xid=0xd26fe7) >>>>>>>>>>> [2019-02-25 10:58:51.461493] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-41: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-25 10:57:47.499174 (xid=0xf47d6a) >>>>>>>>>>> [2019-02-25 11:07:57.152874] E [socket.c:2376:socket_connect_finish] 0-tier2-client-70: connection to 192.168.0.55:49163 failed (Timeout della connessione); disconnecting socket >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Is the df -h operation still slow? If yes, can you take a tcpdump of the client while running df -h and send that across? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Nithya >>>>>>>>>>> >>>>>>>>>>> On Mon, 25 Feb 2019 at 17:27, Mauro Tridici > wrote: >>>>>>>>>>> >>>>>>>>>>> Sorry, some minutes after my last mail message, I noticed that ?df -h? command hanged for a while before returns the prompt. >>>>>>>>>>> Yesterday, everything was ok in the gluster client log, but, today, I see a lot of errors (please, take a look to the attached file). >>>>>>>>>>> >>>>>>>>>>> On the client node, I detected an important RAM e NETWORK usage. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Do you think that the errors have been caused by the client resources usage? >>>>>>>>>>> >>>>>>>>>>> Thank you in advance, >>>>>>>>>>> Mauro >>>>>>>>>>> >>>> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From valerio.luccio at nyu.edu Wed Mar 13 14:43:03 2019 From: valerio.luccio at nyu.edu (Valerio Luccio) Date: Wed, 13 Mar 2019 10:43:03 -0400 Subject: [Gluster-users] ganesha-gfapi Message-ID: Hi all, I recently mounting my gluster from another server using NFS. I started ganesha and my ganesha-gfapi.log file is filled with the following message: ?W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7384) [0x7f1c299b2384] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xae3e) [0x7f1c29bc3e3e] -->/lib64/libglusterfs.so.0(dict_ref+0x5d) [0x7f1c379092ad] ) 0-dict: dict is NULL [Invalid argument] Which sometimes is followed by: E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler Has anyone seen this ? What can be done about it ? Thanks, -- Valerio Luccio (212) 998-8736 Center for Brain Imaging 4 Washington Place, Room 157 New York University New York, NY 10003 "In an open world, who needs windows or gates ?" -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Wed Mar 13 15:06:12 2019 From: revirii at googlemail.com (Hu Bert) Date: Wed, 13 Mar 2019 16:06:12 +0100 Subject: [Gluster-users] ganesha-gfapi In-Reply-To: References: Message-ID: Hi Valerio, is an already known "behaviour" and maybe a bug: https://bugzilla.redhat.com/show_bug.cgi?id=1674225 Regards, Hubert Am Mi., 13. M?rz 2019 um 15:43 Uhr schrieb Valerio Luccio : > > Hi all, > > I recently mounting my gluster from another server using NFS. I started ganesha and my ganesha-gfapi.log file is filled with the following message: > > W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7384) [0x7f1c299b2384] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xae3e) [0x7f1c29bc3e3e] -->/lib64/libglusterfs.so.0(dict_ref+0x5d) [0x7f1c379092ad] ) 0-dict: dict is NULL [Invalid argument] > > Which sometimes is followed by: > > E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler > > Has anyone seen this ? What can be done about it ? > > Thanks, > > -- > Valerio Luccio (212) 998-8736 > Center for Brain Imaging 4 Washington Place, Room 157 > New York University New York, NY 10003 > > "In an open world, who needs windows or gates ?" > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From archon810 at gmail.com Wed Mar 13 16:34:14 2019 From: archon810 at gmail.com (Artem Russakovskii) Date: Wed, 13 Mar 2019 09:34:14 -0700 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: Message-ID: Wednesday now with no update :-/ Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | +ArtemRussakovskii | @ArtemR On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii wrote: > Hi Amar, > > Any updates on this? I'm still not seeing it in OpenSUSE build repos. > Maybe later today? > > Thanks. > > Sincerely, > Artem > > -- > Founder, Android Police , APK Mirror > , Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > | @ArtemR > > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan < > atumball at redhat.com> wrote: > >> We are talking days. Not weeks. Considering already it is Thursday here. >> 1 more day for tagging, and packaging. May be ok to expect it on Monday. >> >> -Amar >> >> On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii >> wrote: >> >>> Is the next release going to be an imminent hotfix, i.e. something like >>> today/tomorrow, or are we talking weeks? >>> >>> Sincerely, >>> Artem >>> >>> -- >>> Founder, Android Police , APK Mirror >>> , Illogical Robot LLC >>> beerpla.net | +ArtemRussakovskii >>> | @ArtemR >>> >>> >>> >>> On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii >>> wrote: >>> >>>> Ended up downgrading to 5.3 just in case. Peer status and volume status >>>> are OK now. >>>> >>>> zypper install --oldpackage glusterfs-5.3-lp150.100.1 >>>> Loading repository data... >>>> Reading installed packages... >>>> Resolving package dependencies... >>>> >>>> Problem: glusterfs-5.3-lp150.100.1.x86_64 requires libgfapi0 = 5.3, but >>>> this requirement cannot be provided >>>> not installable providers: libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] >>>> Solution 1: Following actions will be done: >>>> downgrade of libgfapi0-5.4-lp150.100.1.x86_64 to >>>> libgfapi0-5.3-lp150.100.1.x86_64 >>>> downgrade of libgfchangelog0-5.4-lp150.100.1.x86_64 to >>>> libgfchangelog0-5.3-lp150.100.1.x86_64 >>>> downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 to >>>> libgfrpc0-5.3-lp150.100.1.x86_64 >>>> downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 to >>>> libgfxdr0-5.3-lp150.100.1.x86_64 >>>> downgrade of libglusterfs0-5.4-lp150.100.1.x86_64 to >>>> libglusterfs0-5.3-lp150.100.1.x86_64 >>>> Solution 2: do not install glusterfs-5.3-lp150.100.1.x86_64 >>>> Solution 3: break glusterfs-5.3-lp150.100.1.x86_64 by ignoring some of >>>> its dependencies >>>> >>>> Choose from above solutions by number or cancel [1/2/3/c] (c): 1 >>>> Resolving dependencies... >>>> Resolving package dependencies... >>>> >>>> The following 6 packages are going to be downgraded: >>>> glusterfs libgfapi0 libgfchangelog0 libgfrpc0 libgfxdr0 libglusterfs0 >>>> >>>> 6 packages to downgrade. >>>> >>>> Sincerely, >>>> Artem >>>> >>>> -- >>>> Founder, Android Police , APK Mirror >>>> , Illogical Robot LLC >>>> beerpla.net | +ArtemRussakovskii >>>> | @ArtemR >>>> >>>> >>>> >>>> On Tue, Mar 5, 2019 at 10:57 AM Artem Russakovskii >>>> wrote: >>>> >>>>> Noticed the same when upgrading from 5.3 to 5.4, as mentioned. >>>>> >>>>> I'm confused though. Is actual replication affected, because the 5.4 >>>>> server and the 3x 5.3 servers still show heal info as all 4 connected, and >>>>> the files seem to be replicating correctly as well. >>>>> >>>>> So what's actually affected - just the status command, or leaving 5.4 >>>>> on one of the nodes is doing some damage to the underlying fs? Is it >>>>> fixable by tweaking transport.socket.ssl-enabled? Does upgrading all >>>>> servers to 5.4 resolve it, or should we revert back to 5.3? >>>>> >>>>> Sincerely, >>>>> Artem >>>>> >>>>> -- >>>>> Founder, Android Police , APK Mirror >>>>> , Illogical Robot LLC >>>>> beerpla.net | +ArtemRussakovskii >>>>> | @ArtemR >>>>> >>>>> >>>>> >>>>> On Tue, Mar 5, 2019 at 2:02 AM Hu Bert wrote: >>>>> >>>>>> fyi: did a downgrade 5.4 -> 5.3 and it worked. all replicas are up and >>>>>> running. Awaiting updated v5.4. >>>>>> >>>>>> thx :-) >>>>>> >>>>>> Am Di., 5. M?rz 2019 um 09:26 Uhr schrieb Hari Gowtham < >>>>>> hgowtham at redhat.com>: >>>>>> > >>>>>> > There are plans to revert the patch causing this error and rebuilt >>>>>> 5.4. >>>>>> > This should happen faster. the rebuilt 5.4 should be void of this >>>>>> upgrade issue. >>>>>> > >>>>>> > In the meantime, you can use 5.3 for this cluster. >>>>>> > Downgrading to 5.3 will work if it was just one node that was >>>>>> upgrade to 5.4 >>>>>> > and the other nodes are still in 5.3. >>>>>> > >>>>>> > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert >>>>>> wrote: >>>>>> > > >>>>>> > > Hi Hari, >>>>>> > > >>>>>> > > thx for the hint. Do you know when this will be fixed? Is a >>>>>> downgrade >>>>>> > > 5.4 -> 5.3 a possibility to fix this? >>>>>> > > >>>>>> > > Hubert >>>>>> > > >>>>>> > > Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb Hari Gowtham < >>>>>> hgowtham at redhat.com>: >>>>>> > > > >>>>>> > > > Hi, >>>>>> > > > >>>>>> > > > This is a known issue we are working on. >>>>>> > > > As the checksum differs between the updated and non updated >>>>>> node, the >>>>>> > > > peers are getting rejected. >>>>>> > > > The bricks aren't coming because of the same issue. >>>>>> > > > >>>>>> > > > More about the issue: >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1685120 >>>>>> > > > >>>>>> > > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert >>>>>> wrote: >>>>>> > > > > >>>>>> > > > > Interestingly: gluster volume status misses gluster1, while >>>>>> heal >>>>>> > > > > statistics show gluster1: >>>>>> > > > > >>>>>> > > > > gluster volume status workdata >>>>>> > > > > Status of volume: workdata >>>>>> > > > > Gluster process TCP Port RDMA >>>>>> Port Online Pid >>>>>> > > > > >>>>>> ------------------------------------------------------------------------------ >>>>>> > > > > Brick gluster2:/gluster/md4/workdata 49153 0 >>>>>> Y 1723 >>>>>> > > > > Brick gluster3:/gluster/md4/workdata 49153 0 >>>>>> Y 2068 >>>>>> > > > > Self-heal Daemon on localhost N/A N/A >>>>>> Y 1732 >>>>>> > > > > Self-heal Daemon on gluster3 N/A N/A >>>>>> Y 2077 >>>>>> > > > > >>>>>> > > > > vs. >>>>>> > > > > >>>>>> > > > > gluster volume heal workdata statistics heal-count >>>>>> > > > > Gathering count of entries to be healed on volume workdata >>>>>> has been successful >>>>>> > > > > >>>>>> > > > > Brick gluster1:/gluster/md4/workdata >>>>>> > > > > Number of entries: 0 >>>>>> > > > > >>>>>> > > > > Brick gluster2:/gluster/md4/workdata >>>>>> > > > > Number of entries: 10745 >>>>>> > > > > >>>>>> > > > > Brick gluster3:/gluster/md4/workdata >>>>>> > > > > Number of entries: 10744 >>>>>> > > > > >>>>>> > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert < >>>>>> revirii at googlemail.com>: >>>>>> > > > > > >>>>>> > > > > > Hi Miling, >>>>>> > > > > > >>>>>> > > > > > well, there are such entries, but those haven't been a >>>>>> problem during >>>>>> > > > > > install and the last kernel update+reboot. The entries look >>>>>> like: >>>>>> > > > > > >>>>>> > > > > > PUBLIC_IP gluster2.alpserver.de gluster2 >>>>>> > > > > > >>>>>> > > > > > 192.168.0.50 gluster1 >>>>>> > > > > > 192.168.0.51 gluster2 >>>>>> > > > > > 192.168.0.52 gluster3 >>>>>> > > > > > >>>>>> > > > > > 'ping gluster2' resolves to LAN IP; I removed the last >>>>>> entry in the >>>>>> > > > > > 1st line, did a reboot ... no, didn't help. From >>>>>> > > > > > /var/log/glusterfs/glusterd.log >>>>>> > > > > > on gluster 2: >>>>>> > > > > > >>>>>> > > > > > [2019-03-05 07:04:36.188128] E [MSGID: 106010] >>>>>> > > > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] >>>>>> 0-management: >>>>>> > > > > > Version of Cksums persistent differ. local cksum = >>>>>> 3950307018, remote >>>>>> > > > > > cksum = 455409345 on peer gluster1 >>>>>> > > > > > [2019-03-05 07:04:36.188314] I [MSGID: 106493] >>>>>> > > > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] >>>>>> 0-glusterd: >>>>>> > > > > > Responded to gluster1 (0), ret: 0, op_ret: -1 >>>>>> > > > > > >>>>>> > > > > > Interestingly there are no entries in the brick logs of the >>>>>> rejected >>>>>> > > > > > server. Well, not surprising as no brick process is >>>>>> running. The >>>>>> > > > > > server gluster1 is still in rejected state. >>>>>> > > > > > >>>>>> > > > > > 'gluster volume start workdata force' starts the brick >>>>>> process on >>>>>> > > > > > gluster1, and some heals are happening on gluster2+3, but >>>>>> via 'gluster >>>>>> > > > > > volume status workdata' the volumes still aren't complete. >>>>>> > > > > > >>>>>> > > > > > gluster1: >>>>>> > > > > > >>>>>> ------------------------------------------------------------------------------ >>>>>> > > > > > Brick gluster1:/gluster/md4/workdata 49152 0 >>>>>> Y 2523 >>>>>> > > > > > Self-heal Daemon on localhost N/A N/A >>>>>> Y 2549 >>>>>> > > > > > >>>>>> > > > > > gluster2: >>>>>> > > > > > Gluster process TCP Port RDMA >>>>>> Port Online Pid >>>>>> > > > > > >>>>>> ------------------------------------------------------------------------------ >>>>>> > > > > > Brick gluster2:/gluster/md4/workdata 49153 0 >>>>>> Y 1723 >>>>>> > > > > > Brick gluster3:/gluster/md4/workdata 49153 0 >>>>>> Y 2068 >>>>>> > > > > > Self-heal Daemon on localhost N/A N/A >>>>>> Y 1732 >>>>>> > > > > > Self-heal Daemon on gluster3 N/A N/A >>>>>> Y 2077 >>>>>> > > > > > >>>>>> > > > > > >>>>>> > > > > > Hubert >>>>>> > > > > > >>>>>> > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire < >>>>>> mchangir at redhat.com>: >>>>>> > > > > > > >>>>>> > > > > > > There are probably DNS entries or /etc/hosts entries with >>>>>> the public IP Addresses that the host names (gluster1, gluster2, gluster3) >>>>>> are getting resolved to. >>>>>> > > > > > > /etc/resolv.conf would tell which is the default domain >>>>>> searched for the node names and the DNS servers which respond to the >>>>>> queries. >>>>>> > > > > > > >>>>>> > > > > > > >>>>>> > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert < >>>>>> revirii at googlemail.com> wrote: >>>>>> > > > > > >> >>>>>> > > > > > >> Good morning, >>>>>> > > > > > >> >>>>>> > > > > > >> i have a replicate 3 setup with 2 volumes, running on >>>>>> version 5.3 on >>>>>> > > > > > >> debian stretch. This morning i upgraded one server to >>>>>> version 5.4 and >>>>>> > > > > > >> rebooted the machine; after the restart i noticed that: >>>>>> > > > > > >> >>>>>> > > > > > >> - no brick process is running >>>>>> > > > > > >> - gluster volume status only shows the server itself: >>>>>> > > > > > >> gluster volume status workdata >>>>>> > > > > > >> Status of volume: workdata >>>>>> > > > > > >> Gluster process TCP Port >>>>>> RDMA Port Online Pid >>>>>> > > > > > >> >>>>>> ------------------------------------------------------------------------------ >>>>>> > > > > > >> Brick gluster1:/gluster/md4/workdata N/A >>>>>> N/A N N/A >>>>>> > > > > > >> NFS Server on localhost N/A >>>>>> N/A N N/A >>>>>> > > > > > >> >>>>>> > > > > > >> - gluster peer status on the server >>>>>> > > > > > >> gluster peer status >>>>>> > > > > > >> Number of Peers: 2 >>>>>> > > > > > >> >>>>>> > > > > > >> Hostname: gluster3 >>>>>> > > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a >>>>>> > > > > > >> State: Peer Rejected (Connected) >>>>>> > > > > > >> >>>>>> > > > > > >> Hostname: gluster2 >>>>>> > > > > > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 >>>>>> > > > > > >> State: Peer Rejected (Connected) >>>>>> > > > > > >> >>>>>> > > > > > >> - gluster peer status on the other 2 servers: >>>>>> > > > > > >> gluster peer status >>>>>> > > > > > >> Number of Peers: 2 >>>>>> > > > > > >> >>>>>> > > > > > >> Hostname: gluster1 >>>>>> > > > > > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef >>>>>> > > > > > >> State: Peer Rejected (Connected) >>>>>> > > > > > >> >>>>>> > > > > > >> Hostname: gluster3 >>>>>> > > > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a >>>>>> > > > > > >> State: Peer in Cluster (Connected) >>>>>> > > > > > >> >>>>>> > > > > > >> I noticed that, in the brick logs, i see that the public >>>>>> IP is used >>>>>> > > > > > >> instead of the LAN IP. brick logs from one of the >>>>>> volumes: >>>>>> > > > > > >> >>>>>> > > > > > >> rejected node: https://pastebin.com/qkpj10Sd >>>>>> > > > > > >> connected nodes: https://pastebin.com/8SxVVYFV >>>>>> > > > > > >> >>>>>> > > > > > >> Why is the public IP suddenly used instead of the LAN >>>>>> IP? Killing all >>>>>> > > > > > >> gluster processes and rebooting (again) didn't help. >>>>>> > > > > > >> >>>>>> > > > > > >> >>>>>> > > > > > >> Thx, >>>>>> > > > > > >> Hubert >>>>>> > > > > > >> _______________________________________________ >>>>>> > > > > > >> Gluster-users mailing list >>>>>> > > > > > >> Gluster-users at gluster.org >>>>>> > > > > > >> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> > > > > > > >>>>>> > > > > > > >>>>>> > > > > > > >>>>>> > > > > > > -- >>>>>> > > > > > > Milind >>>>>> > > > > > > >>>>>> > > > > _______________________________________________ >>>>>> > > > > Gluster-users mailing list >>>>>> > > > > Gluster-users at gluster.org >>>>>> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > -- >>>>>> > > > Regards, >>>>>> > > > Hari Gowtham. >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > Regards, >>>>>> > Hari Gowtham. >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> Amar Tumballi (amarts) >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guillaume.pavese at interactiv-group.com Thu Mar 14 02:20:23 2019 From: guillaume.pavese at interactiv-group.com (Guillaume Pavese) Date: Thu, 14 Mar 2019 11:20:23 +0900 Subject: [Gluster-users] [gluster-packaging] [Gluster-Maintainers] glusterfs-6.0rc1 released In-Reply-To: References: <1536588589.15.1552443884989.JavaMail.jenkins@jenkins-el7.rht.gluster.org> <20190313093835.GN3535@ndevos-x270> Message-ID: putting users at gluster.org in the loop Guillaume Pavese Ing?nieur Syst?me et R?seau Interactiv-Group On Thu, Mar 14, 2019 at 11:04 AM Guillaume Pavese < guillaume.pavese at interactiv-group.com> wrote: > Hi, I am testing gluster6-rc1 on a replica 3 oVirt cluster (engine full > replica 3 and 2 other volume replica + arbiter). They were on Gluster6-rc0. > I upgraded one host that was having the "0-epoll: Failed to dispatch > handler" bug for one of its volumes, but now all three volumes are down! > "gluster peer status" now shows its 2 other peers as connected but > rejected. Should I upgrade the other nodes? They are still on Gluster6-rc0 > > > Guillaume Pavese > Ing?nieur Syst?me et R?seau > Interactiv-Group > > > On Wed, Mar 13, 2019 at 6:38 PM Niels de Vos wrote: > >> On Wed, Mar 13, 2019 at 02:24:44AM +0000, jenkins at build.gluster.org >> wrote: >> > SRC: >> https://build.gluster.org/job/release-new/81/artifact/glusterfs-6.0rc1.tar.gz >> > HASH: >> https://build.gluster.org/job/release-new/81/artifact/glusterfs-6.0rc1.sha512sum >> >> Packages from the CentOS Storage SIG will become available shortly in >> the testing repository. Please use these packages to enable the repo and >> install the glusterfs components in a 2nd step. >> >> el7: >> https://cbs.centos.org/kojifiles/work/tasks/3263/723263/centos-release-gluster6-0.9-1.el7.centos.noarch.rpm >> el6 >> : >> >> https://cbs.centos.org/kojifiles/work/tasks/3265/723265/centos-release-gluster6-0.9-1.el6.centos.noarch.rpm >> >> Once installed, the testing repo is enabled. Everything should be >> available. >> >> It is highly appreciated to let me know some results of the testing! >> >> Thanks, >> Niels >> _______________________________________________ >> packaging mailing list >> packaging at gluster.org >> https://lists.gluster.org/mailman/listinfo/packaging >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amukherj at redhat.com Thu Mar 14 02:44:38 2019 From: amukherj at redhat.com (Atin Mukherjee) Date: Thu, 14 Mar 2019 08:14:38 +0530 Subject: [Gluster-users] [Gluster-Maintainers] [gluster-packaging] glusterfs-6.0rc1 released In-Reply-To: References: <1536588589.15.1552443884989.JavaMail.jenkins@jenkins-el7.rht.gluster.org> <20190313093835.GN3535@ndevos-x270> Message-ID: If you were on rc0 and upgraded to rc1, then you are hitting BZ 1684029 I believe. Can you please upgrade all the nodes to rc1, bump up the op-version to 60000 (if not already done) and then restart glusterd services to see if the peer rejection goes away? On Thu, Mar 14, 2019 at 7:51 AM Guillaume Pavese < guillaume.pavese at interactiv-group.com> wrote: > putting users at gluster.org in the loop > > Guillaume Pavese > Ing?nieur Syst?me et R?seau > Interactiv-Group > > > On Thu, Mar 14, 2019 at 11:04 AM Guillaume Pavese < > guillaume.pavese at interactiv-group.com> wrote: > >> Hi, I am testing gluster6-rc1 on a replica 3 oVirt cluster (engine full >> replica 3 and 2 other volume replica + arbiter). They were on Gluster6-rc0. >> I upgraded one host that was having the "0-epoll: Failed to dispatch >> handler" bug for one of its volumes, but now all three volumes are down! >> "gluster peer status" now shows its 2 other peers as connected but >> rejected. Should I upgrade the other nodes? They are still on Gluster6-rc0 >> >> >> Guillaume Pavese >> Ing?nieur Syst?me et R?seau >> Interactiv-Group >> >> >> On Wed, Mar 13, 2019 at 6:38 PM Niels de Vos wrote: >> >>> On Wed, Mar 13, 2019 at 02:24:44AM +0000, jenkins at build.gluster.org >>> wrote: >>> > SRC: >>> https://build.gluster.org/job/release-new/81/artifact/glusterfs-6.0rc1.tar.gz >>> > HASH: >>> https://build.gluster.org/job/release-new/81/artifact/glusterfs-6.0rc1.sha512sum >>> >>> Packages from the CentOS Storage SIG will become available shortly in >>> the testing repository. Please use these packages to enable the repo and >>> install the glusterfs components in a 2nd step. >>> >>> el7: >>> https://cbs.centos.org/kojifiles/work/tasks/3263/723263/centos-release-gluster6-0.9-1.el7.centos.noarch.rpm >>> el6 >>> : >>> >>> https://cbs.centos.org/kojifiles/work/tasks/3265/723265/centos-release-gluster6-0.9-1.el6.centos.noarch.rpm >>> >>> Once installed, the testing repo is enabled. Everything should be >>> available. >>> >>> It is highly appreciated to let me know some results of the testing! >>> >>> Thanks, >>> Niels >>> _______________________________________________ >>> packaging mailing list >>> packaging at gluster.org >>> https://lists.gluster.org/mailman/listinfo/packaging >>> >> _______________________________________________ > maintainers mailing list > maintainers at gluster.org > https://lists.gluster.org/mailman/listinfo/maintainers > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Thu Mar 14 03:57:57 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Thu, 14 Mar 2019 09:27:57 +0530 Subject: [Gluster-users] "rpc_clnt_ping_timer_expired" errors In-Reply-To: References: <96B07283-D8AB-4F06-909D-E00424625528@cmcc.it> <42758A0E-8BE9-497D-BDE3-55D7DC633BC7@cmcc.it> <6A8CF4A4-98EA-48C3-A059-D60D1B2721C7@cmcc.it> <2CF49168-9C1B-4931-8C34-8157262A137A@cmcc.it> <7A151CC9-A0AE-4A45-B450-A4063D216D9E@cmcc.it> <32D53ECE-3F49-4415-A6EE-241B351AC2BA@cmcc.it> <4685A75B-5978-4338-9C9F-4A02FB40B9BC@cmcc.it> <4D2E6B04-C2E8-4FD5-B72D-E7C05931C6F9@cmcc.it> <4E332A56-B318-4BC1-9F44-84AB4392A5DE@cmcc.it> <832FD362-3B14-40D8-8530-604419300476@cmcc.it> <8D926643-1093-48ED-823F-D8F117F702CF@cmcc.it> <9D0BE438-DF11-4D0A-AB85-B44357032F29@cmcc.it> <5F0AC378-8170-4342-8473-9C17159CAC1D@cmcc.it> <7A50E86D-9E27-4EA7-883B-45E9F973991A@cmcc.it> <58B5DB7F-DCB4-4FBF-B1F7-681315B1613A@cmcc.it> <6327B44F-4E7E-46BB-A74C-70F4457DD1EB@cmcc.it> <0167DF4A-8329-4A1A-B439-857DFA6F78BB@cmcc.it> <763F334E-35B8-4729-B8E1-D30866754EEE@cmcc.it> <91DFC9AC-4805-41EB-AC6F-5722E018DD6E@cmcc.it> <8A9752B8-B231-4570-8FF4-8D3D781E7D42@cmcc.it> <47A24804-F975-4EE6-9FA5-67FCDA18D707@cmcc.it> <637FF5D2-D1F4-4686-9D48-646A96F67B96@cmcc.it> <4A87495F-3755-48F7-8507-085869069C64@cmcc.it> <3854BBF6-5B98-4AB3-A67E-E7DE59E69A63@cmcc.it> <313DA021-9173-4899-96B0-831B10B00B61@cmcc.it> <17996AFD-DFC8-40F3-9D09-DB6DDAD5B7D6@cmcc.it> <7074B5D8-955A-4802-A9F3-606C99734417@cmcc.it> <83B84BF9-8334-4230-B6F8-0BC4BFBEFF15@cmcc.it> <133B9AE4-9792-4F72-AD91-D36E7B9EC711@cmcc.it> <6611C4B0-57FD-4390-88B5-BD373555D4C5@cmcc.it> Message-ID: On Wed, Mar 13, 2019 at 3:55 PM Mauro Tridici wrote: > Hi Raghavendra, > > Yes, server.event-thread has been changed from 4 to 8. > Was client.event-thread value too changed to 8? If not, I would like to know the results of including this tuning too. Also, if possible, can you get the output of following command from problematic clients and bricks (during the duration when load tends to be high and ping-timer-expiry is seen)? # top -bHd 3 This will help us to know CPU utilization of event-threads. And I forgot to ask, what version of Glusterfs are you using? During last days, I noticed that the error events are still here although > they have been considerably reduced. > > So, I used grep command against the log files in order to provide you a > global vision about the warning, error and critical events appeared today > at 06:xx (may be useful I hope). > I collected the info from s06 gluster server, but the behaviour is the the > almost the same on the other gluster servers. > > *ERRORS: * > *CWD: /var/log/glusterfs * > *COMMAND: grep " E " *.log |grep "2019-03-13 06:"* > > (I can see a lot of this kind of message in the same period but I'm > notifying you only one record for each type of error) > > glusterd.log:[2019-03-13 06:12:35.982863] E [MSGID: 101042] > [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of > /var/run/gluster/tier2_quota_list/ > > glustershd.log:[2019-03-13 06:14:28.666562] E > [rpc-clnt.c:350:saved_frames_unwind] (--> > /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a71ddcebb] (--> > /lib64/libgfr > pc.so.0(saved_frames_unwind+0x1de)[0x7f4a71ba1d9e] (--> > /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4a71ba1ebe] (--> > /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup > +0x90)[0x7f4a71ba3640] (--> > /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f4a71ba4130] ))))) > 0-tier2-client-55: forced unwinding frame type(GlusterFS 3.3) > op(INODELK(29)) > called at 2019-03-13 06:14:14.858441 (xid=0x17fddb50) > > glustershd.log:[2019-03-13 06:17:48.883825] E > [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to > 192.168.0.55:49158 failed (Connection timed out); disco > nnecting socket > glustershd.log:[2019-03-13 06:19:58.931798] E > [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to > 192.168.0.55:49158 failed (Connection timed out); disco > nnecting socket > glustershd.log:[2019-03-13 06:22:08.979829] E > [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to > 192.168.0.55:49158 failed (Connection timed out); disco > nnecting socket > glustershd.log:[2019-03-13 06:22:36.226847] E [MSGID: 114031] > [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote > operation failed [Transport endpoint > is not connected] > glustershd.log:[2019-03-13 06:22:36.306669] E [MSGID: 114031] > [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote > operation failed [Transport endpoint > is not connected] > glustershd.log:[2019-03-13 06:22:36.385257] E [MSGID: 114031] > [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote > operation failed [Transport endpoint > is not connected] > > *WARNINGS:* > *CWD: /var/log/glusterfs * > *COMMAND: grep " W " *.log |grep "2019-03-13 06:"* > > (I can see a lot of this kind of message in the same period but I'm > notifying you only one record for each type of warnings) > > glustershd.log:[2019-03-13 06:14:28.666772] W [MSGID: 114031] > [client-rpc-fops.c:1080:client3_3_getxattr_cbk] 0-tier2-client-55: remote > operation failed. Path: 0f-f34d-4c25-bbe8-74bde0248d7e> (b6b35d0f-f34d-4c25-bbe8-74bde0248d7e). > Key: (null) [Transport endpoint is not connected] > > glustershd.log:[2019-03-13 06:14:31.421576] W [MSGID: 122035] > [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation > with some subvolumes unavailable (2) > > glustershd.log:[2019-03-13 06:15:31.547417] W [MSGID: 122032] > [ec-heald.c:266:ec_shd_index_sweep] 0-tier2-disperse-9: unable to get > index-dir on tier2-client-55 [Operation > now in progress] > > quota-mount-tier2.log:[2019-03-13 06:12:36.116277] W [MSGID: 101002] > [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is > deprecated, preferred is 'trans > port.address-family', continuing with correction > quota-mount-tier2.log:[2019-03-13 06:12:36.198430] W [MSGID: 101174] > [graph.c:363:_log_if_unknown_option] 0-tier2-readdir-ahead: option > 'parallel-readdir' is not recognized > quota-mount-tier2.log:[2019-03-13 06:12:37.945007] W > [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) > [0x7f340892be25] -->/usr/sbin/glusterfs(gluste > rfs_sigwaiter+0xe5) [0x55ef010164b5] > -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55ef0101632b] ) 0-: > received signum (15), shutting down > > *CRITICALS:* > *CWD: /var/log/glusterfs * > *COMMAND: grep " C " *.log |grep "2019-03-13 06:"* > > no critical errors at 06:xx > only one critical error during the day > > *[root at s06 glusterfs]# grep " C " *.log |grep "2019-03-13"* > glustershd.log:[2019-03-13 02:21:29.126279] C > [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server > 192.168.0.55:49158 has not responded in the last 42 seconds, > disconnecting. > > > Thank you very much for your help. > Regards, > Mauro > > On 12 Mar 2019, at 05:17, Raghavendra Gowdappa > wrote: > > Was the suggestion to increase server.event-thread values tried? If yes, > what were the results? > > On Mon, Mar 11, 2019 at 2:40 PM Mauro Tridici > wrote: > >> Dear All, >> >> do you have any suggestions about the right way to "debug" this issue? >> In attachment, the updated logs of ?s06" gluster server. >> >> I noticed a lot of intermittent warning and error messages. >> >> Thank you in advance, >> Mauro >> >> >> >> On 4 Mar 2019, at 18:45, Raghavendra Gowdappa >> wrote: >> >> >> +Gluster Devel , +Gluster-users >> >> >> I would like to point out another issue. Even if what I suggested >> prevents disconnects, part of the solution would be only symptomatic >> treatment and doesn't address the root cause of the problem. In most of the >> ping-timer-expiry issues, the root cause is the increased load on bricks >> and the inability of bricks to be responsive under high load. So, the >> actual solution would be doing any or both of the following: >> * identify the source of increased load and if possible throttle it. >> Internal heal processes like self-heal, rebalance, quota heal are known to >> pump traffic into bricks without much throttling (io-threads _might_ do >> some throttling, but my understanding is its not sufficient). >> * identify the reason for bricks to become unresponsive during load. This >> may be fixable issues like not enough event-threads to read from network or >> difficult to fix issues like fsync on backend fs freezing the process or >> semi fixable issues (in code) like lock contention. >> >> So any genuine effort to fix ping-timer-issues (to be honest most of the >> times they are not issues related to rpc/network) would involve performance >> characterization of various subsystems on bricks and clients. Various >> subsystems can include (but not necessarily limited to), underlying >> OS/filesystem, glusterfs processes, CPU consumption etc >> >> regards, >> Raghavendra >> >> On Mon, Mar 4, 2019 at 9:31 PM Mauro Tridici >> wrote: >> >>> Thank you, let?s try! >>> I will inform you about the effects of the change. >>> >>> Regards, >>> Mauro >>> >>> On 4 Mar 2019, at 16:55, Raghavendra Gowdappa >>> wrote: >>> >>> >>> >>> On Mon, Mar 4, 2019 at 8:54 PM Mauro Tridici >>> wrote: >>> >>>> Hi Raghavendra, >>>> >>>> thank you for your reply. >>>> Yes, you are right. It is a problem that seems to happen randomly. >>>> At this moment, server.event-threads value is 4. I will try to increase >>>> this value to 8. Do you think that it could be a valid value ? >>>> >>> >>> Yes. We can try with that. You should see at least frequency of >>> ping-timer related disconnects reduce with this value (even if it doesn't >>> eliminate the problem completely). >>> >>> >>>> Regards, >>>> Mauro >>>> >>>> >>>> On 4 Mar 2019, at 15:36, Raghavendra Gowdappa >>>> wrote: >>>> >>>> >>>> >>>> On Mon, Mar 4, 2019 at 8:01 PM Nithya Balachandran >>>> wrote: >>>> >>>>> Hi Mauro, >>>>> >>>>> It looks like some problem on s06. Are all your other nodes ok? Can >>>>> you send us the gluster logs from this node? >>>>> >>>>> @Raghavendra G , do you have any idea as to >>>>> how this can be debugged? Maybe running top ? Or debug brick logs? >>>>> >>>> >>>> If we can reproduce the problem, collecting tcpdump on both ends of >>>> connection will help. But, one common problem is these bugs are >>>> inconsistently reproducible and hence we may not be able to capture tcpdump >>>> at correct intervals. Other than that, we can try to collect some evidence >>>> that poller threads were busy (waiting on locks). But, not sure what debug >>>> data provides that information. >>>> >>>> From what I know, its difficult to collect evidence for this issue and >>>> we could only reason about it. >>>> >>>> We can try a workaround though - try increasing server.event-threads >>>> and see whether ping-timer expiry issues go away with an optimal value. If >>>> that's the case, it kind of provides proof for our hypothesis. >>>> >>>> >>>>> >>>>> Regards, >>>>> Nithya >>>>> >>>>> On Mon, 4 Mar 2019 at 15:25, Mauro Tridici >>>>> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> some minutes ago I received this message from NAGIOS server >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ****** Nagios *****Notification Type: PROBLEMService: Brick - >>>>>> /gluster/mnt2/brickHost: s06Address: s06-stgState: CRITICALDate/Time: Mon >>>>>> Mar 4 10:25:33 CET 2019Additional Info:CHECK_NRPE STATE CRITICAL: Socket >>>>>> timeout after 10 seconds.* >>>>>> >>>>>> I checked the network, RAM and CPUs usage on s06 node and everything >>>>>> seems to be ok. >>>>>> No bricks are in error state. In /var/log/messages, I detected again >>>>>> a crash of ?check_vol_utili? that I think it is a module used by NRPE >>>>>> executable (that is the NAGIOS client). >>>>>> >>>>>> Mar 4 10:15:29 s06 kernel: traps: check_vol_utili[161224] general >>>>>> protection ip:7facffa0a66d sp:7ffe9f4e6fc0 error:0 in >>>>>> libglusterfs.so.0.0.1[7facff9b7000+f7000] >>>>>> Mar 4 10:15:29 s06 abrt-hook-ccpp: Process 161224 (python2.7) of >>>>>> user 0 killed by SIGSEGV - dumping core >>>>>> Mar 4 10:15:29 s06 abrt-server: Generating core_backtrace >>>>>> Mar 4 10:15:29 s06 abrt-server: Error: Unable to open './coredump': >>>>>> No such file or directory >>>>>> Mar 4 10:16:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 4 10:16:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 4 10:16:01 s06 systemd: Started Session 201010 of user root. >>>>>> Mar 4 10:16:01 s06 systemd: Starting Session 201010 of user root. >>>>>> Mar 4 10:16:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 4 10:16:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 4 10:16:24 s06 abrt-server: Duplicate: UUID >>>>>> Mar 4 10:16:24 s06 abrt-server: DUP_OF_DIR: >>>>>> /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>>>> Mar 4 10:16:24 s06 abrt-server: Deleting problem directory >>>>>> ccpp-2019-03-04-10:15:29-161224 (dup of ccpp-2018-09-25-12:27:42-13041) >>>>>> Mar 4 10:16:24 s06 abrt-server: Generating core_backtrace >>>>>> Mar 4 10:16:24 s06 abrt-server: Error: Unable to open './coredump': >>>>>> No such file or directory >>>>>> Mar 4 10:16:24 s06 abrt-server: Cannot notify >>>>>> '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event >>>>>> 'report_uReport' exited with 1 >>>>>> Mar 4 10:16:24 s06 abrt-hook-ccpp: Process 161391 (python2.7) of >>>>>> user 0 killed by SIGABRT - dumping core >>>>>> Mar 4 10:16:25 s06 abrt-server: Generating core_backtrace >>>>>> Mar 4 10:16:25 s06 abrt-server: Error: Unable to open './coredump': >>>>>> No such file or directory >>>>>> Mar 4 10:17:01 s06 systemd: Created slice User Slice of root. >>>>>> >>>>>> Also, I noticed the following errors that I think are very critical: >>>>>> >>>>>> Mar 4 10:21:12 s06 glustershd[20355]: [2019-03-04 09:21:12.954798] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server >>>>>> 192.168.0.55:49158 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 4 10:22:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 4 10:22:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 4 10:22:01 s06 systemd: Started Session 201017 of user root. >>>>>> Mar 4 10:22:01 s06 systemd: Starting Session 201017 of user root. >>>>>> Mar 4 10:22:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 4 10:22:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 4 10:22:03 s06 glustershd[20355]: [2019-03-04 09:22:03.964120] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server >>>>>> 192.168.0.54:49165 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 4 10:23:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 4 10:23:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 4 10:23:01 s06 systemd: Started Session 201018 of user root. >>>>>> Mar 4 10:23:01 s06 systemd: Starting Session 201018 of user root. >>>>>> Mar 4 10:23:02 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 4 10:23:02 s06 systemd: Stopping User Slice of root. >>>>>> Mar 4 10:24:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 4 10:24:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 4 10:24:01 s06 systemd: Started Session 201019 of user root. >>>>>> Mar 4 10:24:01 s06 systemd: Starting Session 201019 of user root. >>>>>> Mar 4 10:24:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 4 10:24:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 4 10:24:03 s06 glustershd[20355]: [2019-03-04 09:24:03.982502] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-16: server >>>>>> 192.168.0.52:49158 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746109] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-3: server >>>>>> 192.168.0.51:49153 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746215] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server >>>>>> 192.168.0.52:49156 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746260] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-21: server >>>>>> 192.168.0.51:49159 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746296] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server >>>>>> 192.168.0.52:49161 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746413] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server >>>>>> 192.168.0.54:49165 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 4 10:24:07 s06 glustershd[20355]: [2019-03-04 09:24:07.982952] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-45: server >>>>>> 192.168.0.54:49155 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 4 10:24:18 s06 glustershd[20355]: [2019-03-04 09:24:18.990929] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server >>>>>> 192.168.0.52:49161 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 4 10:24:31 s06 glustershd[20355]: [2019-03-04 09:24:31.995781] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-20: server >>>>>> 192.168.0.53:49159 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 4 10:25:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 4 10:25:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 4 10:25:01 s06 systemd: Started Session 201020 of user root. >>>>>> Mar 4 10:25:01 s06 systemd: Starting Session 201020 of user root. >>>>>> Mar 4 10:25:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 4 10:25:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 4 10:25:57 s06 systemd: Created slice User Slice of root. >>>>>> Mar 4 10:25:57 s06 systemd: Starting User Slice of root. >>>>>> Mar 4 10:25:57 s06 systemd-logind: New session 201021 of user root. >>>>>> Mar 4 10:25:57 s06 systemd: Started Session 201021 of user root. >>>>>> Mar 4 10:25:57 s06 systemd: Starting Session 201021 of user root. >>>>>> Mar 4 10:26:01 s06 systemd: Started Session 201022 of user root. >>>>>> Mar 4 10:26:01 s06 systemd: Starting Session 201022 of user root. >>>>>> Mar 4 10:26:21 s06 nrpe[162388]: Error: Could not complete SSL >>>>>> handshake with 192.168.1.56: 5 >>>>>> Mar 4 10:27:01 s06 systemd: Started Session 201023 of user root. >>>>>> Mar 4 10:27:01 s06 systemd: Starting Session 201023 of user root. >>>>>> Mar 4 10:28:01 s06 systemd: Started Session 201024 of user root. >>>>>> Mar 4 10:28:01 s06 systemd: Starting Session 201024 of user root. >>>>>> Mar 4 10:29:01 s06 systemd: Started Session 201025 of user root. >>>>>> Mar 4 10:29:01 s06 systemd: Starting Session 201025 of user root. >>>>>> >>>>>> But, unfortunately, I don?t understand why it is happening. >>>>>> Now, NAGIOS server shows that s06 status is ok: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ****** Nagios *****Notification Type: RECOVERYService: Brick - >>>>>> /gluster/mnt2/brickHost: s06Address: s06-stgState: OKDate/Time: Mon Mar 4 >>>>>> 10:35:23 CET 2019Additional Info:OK: Brick /gluster/mnt2/brick is up* >>>>>> >>>>>> Nothing is changed from RAM, CPUs, and NETWORK point of view. >>>>>> /var/log/message file has been updated: >>>>>> >>>>>> Mar 4 10:32:01 s06 systemd: Starting Session 201029 of user root. >>>>>> Mar 4 10:32:30 s06 glustershd[20355]: [2019-03-04 09:32:30.069082] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server >>>>>> 192.168.0.52:49156 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 4 10:32:55 s06 glustershd[20355]: [2019-03-04 09:32:55.074689] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-66: server >>>>>> 192.168.0.54:49167 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 4 10:33:01 s06 systemd: Started Session 201030 of user root. >>>>>> Mar 4 10:33:01 s06 systemd: Starting Session 201030 of user root. >>>>>> Mar 4 10:34:01 s06 systemd: Started Session 201031 of user root. >>>>>> Mar 4 10:34:01 s06 systemd: Starting Session 201031 of user root. >>>>>> Mar 4 10:35:01 s06 nrpe[162562]: Could not read request from client >>>>>> 192.168.1.56, bailing out... >>>>>> Mar 4 10:35:01 s06 nrpe[162562]: INFO: SSL Socket Shutdown. >>>>>> Mar 4 10:35:01 s06 systemd: Started Session 201032 of user root. >>>>>> Mar 4 10:35:01 s06 systemd: Starting Session 201032 of user root. >>>>>> >>>>>> Could you please help me to understand what it?s happening ? >>>>>> Thank you in advance. >>>>>> >>>>>> Rergards, >>>>>> Mauro >>>>>> >>>>>> >>>>>> On 1 Mar 2019, at 12:17, Mauro Tridici wrote: >>>>>> >>>>>> >>>>>> Thank you, Milind. >>>>>> I executed the instructions you suggested: >>>>>> >>>>>> - grep ?blocked for? /var/log/messages on s06 returns no output (no >>>>>> ?blocked? word is detected in messages file); >>>>>> - in /var/log/messages file I can see this kind of error repeated for >>>>>> a lot of times: >>>>>> >>>>>> Mar 1 08:43:01 s06 systemd: Starting Session 196071 of user root. >>>>>> Mar 1 08:43:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 1 08:43:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 1 08:43:02 s06 kernel: traps: check_vol_utili[57091] general >>>>>> protection ip:7f88e76ee66d sp:7ffe5a5bcc30 error:0 in >>>>>> libglusterfs.so.0.0.1[7f88e769b000+f7000] >>>>>> Mar 1 08:43:02 s06 abrt-hook-ccpp: Process 57091 (python2.7) of user >>>>>> 0 killed by SIGSEGV - dumping core >>>>>> Mar 1 08:43:02 s06 abrt-server: Generating core_backtrace >>>>>> Mar 1 08:43:02 s06 abrt-server: Error: Unable to open './coredump': >>>>>> No such file or directory >>>>>> Mar 1 08:43:58 s06 abrt-server: Duplicate: UUID >>>>>> Mar 1 08:43:58 s06 abrt-server: DUP_OF_DIR: >>>>>> /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>>>> Mar 1 08:43:58 s06 abrt-server: Deleting problem directory >>>>>> ccpp-2019-03-01-08:43:02-57091 (dup of ccpp-2018-09-25-12:27:42-13041) >>>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Activating service >>>>>> name='org.freedesktop.problems' (using servicehelper) >>>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Successfully activated >>>>>> service 'org.freedesktop.problems' >>>>>> Mar 1 08:43:58 s06 abrt-server: Generating core_backtrace >>>>>> Mar 1 08:43:58 s06 abrt-server: Error: Unable to open './coredump': >>>>>> No such file or directory >>>>>> Mar 1 08:43:58 s06 abrt-server: Cannot notify >>>>>> '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event >>>>>> 'report_uReport' exited with 1 >>>>>> Mar 1 08:44:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 1 08:44:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 1 08:44:01 s06 systemd: Started Session 196072 of user root. >>>>>> Mar 1 08:44:01 s06 systemd: Starting Session 196072 of user root. >>>>>> Mar 1 08:44:01 s06 systemd: Removed slice User Slice of root. >>>>>> >>>>>> - in /var/log/messages file I can see also 4 errors related to other >>>>>> cluster servers: >>>>>> >>>>>> Mar 1 11:05:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 1 11:05:01 s06 systemd: Started Session 196230 of user root. >>>>>> Mar 1 11:05:01 s06 systemd: Starting Session 196230 of user root. >>>>>> Mar 1 11:05:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 1 11:05:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 1 11:05:59 s06 glustershd[70117]: [2019-03-01 10:05:59.347094] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-33: server >>>>>> 192.168.0.51:49163 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 1 11:06:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 1 11:06:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 1 11:06:01 s06 systemd: Started Session 196231 of user root. >>>>>> Mar 1 11:06:01 s06 systemd: Starting Session 196231 of user root. >>>>>> Mar 1 11:06:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 1 11:06:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 1 11:06:12 s06 glustershd[70117]: [2019-03-01 10:06:12.351319] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-1: server >>>>>> 192.168.0.52:49153 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 1 11:06:38 s06 glustershd[70117]: [2019-03-01 10:06:38.356920] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-7: server >>>>>> 192.168.0.52:49155 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 1 11:07:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 1 11:07:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 1 11:07:01 s06 systemd: Started Session 196232 of user root. >>>>>> Mar 1 11:07:01 s06 systemd: Starting Session 196232 of user root. >>>>>> Mar 1 11:07:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 1 11:07:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 1 11:07:36 s06 glustershd[70117]: [2019-03-01 10:07:36.366259] C >>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-0: server >>>>>> 192.168.0.51:49152 has not responded in the last 42 seconds, >>>>>> disconnecting. >>>>>> Mar 1 11:08:01 s06 systemd: Created slice User Slice of root. >>>>>> >>>>>> No ?blocked? word is in /var/log/messages files on other cluster >>>>>> servers. >>>>>> In attachment, the /var/log/messages file from s06 server. >>>>>> >>>>>> Thank you in advance, >>>>>> Mauro >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 1 Mar 2019, at 11:47, Milind Changire wrote: >>>>>> >>>>>> The traces of very high disk activity on the servers are often found >>>>>> in /var/log/messages >>>>>> You might want to grep for "blocked for" in /var/log/messages on s06 >>>>>> and correlate the timestamps to confirm the unresponsiveness as reported in >>>>>> gluster client logs. >>>>>> In cases of high disk activity, although the operating system >>>>>> continues to respond to ICMP pings, the processes writing to disks often >>>>>> get blocked to a large flush to the disk which could span beyond 42 seconds >>>>>> and hence result in ping-timer-expiry logs. >>>>>> >>>>>> As a side note: >>>>>> If you indeed find gluster processes being blocked in >>>>>> /var/log/messages, you might want to tweak sysctl tunables called >>>>>> vm.dirty_background_ratio or vm.dirty_background_bytes to a smaller value >>>>>> than the existing. Please read up more on those tunables before touching >>>>>> the settings. >>>>>> >>>>>> >>>>>> On Fri, Mar 1, 2019 at 4:06 PM Mauro Tridici >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> in attachment the client log captured after changing >>>>>>> network.ping-timeout option. >>>>>>> I noticed this error involving server 192.168.0.56 (s06) >>>>>>> >>>>>>> [2019-03-01 09:23:36.077287] I [rpc-clnt.c:1962:rpc_clnt_reconfig] >>>>>>> 0-tier2-client-71: changing ping timeout to 42 (from 0) >>>>>>> [2019-03-01 09:23:36.078213] I >>>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>>> volfile,continuing >>>>>>> [2019-03-01 09:23:36.078432] I >>>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>>> volfile,continuing >>>>>>> [2019-03-01 09:23:36.092357] I >>>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>>> volfile,continuing >>>>>>> [2019-03-01 09:23:36.094146] I >>>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>>> volfile,continuing >>>>>>> [2019-03-01 10:06:24.708082] C >>>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-50: server >>>>>>> 192.168.0.56:49156 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> >>>>>>> I don?t know why it happens, s06 server seems to be reachable. >>>>>>> >>>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>>> Trying 192.168.0.56... >>>>>>> Connected to 192.168.0.56. >>>>>>> Escape character is '^]'. >>>>>>> ^CConnection closed by foreign host. >>>>>>> [athena_login2][/users/home/sysm02/]> ping 192.168.0.56 >>>>>>> PING 192.168.0.56 (192.168.0.56) 56(84) bytes of data. >>>>>>> 64 bytes from 192.168.0.56: icmp_seq=1 ttl=64 time=0.116 ms >>>>>>> 64 bytes from 192.168.0.56: icmp_seq=2 ttl=64 time=0.101 ms >>>>>>> >>>>>>> --- 192.168.0.56 ping statistics --- >>>>>>> 2 packets transmitted, 2 received, 0% packet loss, time 1528ms >>>>>>> rtt min/avg/max/mdev = 0.101/0.108/0.116/0.012 ms >>>>>>> >>>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>>> Trying 192.168.0.56... >>>>>>> Connected to 192.168.0.56. >>>>>>> Escape character is '^]'. >>>>>>> >>>>>>> Thank you for your help, >>>>>>> Mauro >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 1 Mar 2019, at 10:29, Mauro Tridici >>>>>>> wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> thank you for the explanation. >>>>>>> I just changed network.ping-timeout option to default value >>>>>>> (network.ping-timeout=42). >>>>>>> >>>>>>> I will check the logs to see if the errors will appear again. >>>>>>> >>>>>>> Regards, >>>>>>> Mauro >>>>>>> >>>>>>> On 1 Mar 2019, at 04:43, Milind Changire >>>>>>> wrote: >>>>>>> >>>>>>> network.ping-timeout should not be set to zero for non-glusterd >>>>>>> clients. >>>>>>> glusterd is a special case for which ping-timeout is set to zero via >>>>>>> /etc/glusterfs/glusterd.vol >>>>>>> >>>>>>> Setting network.ping-timeout to zero disables arming of the ping >>>>>>> timer for connections. This disables testing the connection for >>>>>>> responsiveness and hence avoids proactive fail-over. >>>>>>> >>>>>>> Please reset network.ping-timeout to a non-zero positive value, eg. >>>>>>> 42 >>>>>>> >>>>>>> >>>>>>> On Thu, Feb 28, 2019 at 5:07 PM Nithya Balachandran < >>>>>>> nbalacha at redhat.com> wrote: >>>>>>> >>>>>>>> Adding Raghavendra and Milind to comment on this. >>>>>>>> >>>>>>>> What is the effect of setting network.ping-timeout to 0 and should >>>>>>>> it be set back to 42? >>>>>>>> Regards, >>>>>>>> Nithya >>>>>>>> >>>>>>>> On Thu, 28 Feb 2019 at 16:01, Mauro Tridici >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Nithya, >>>>>>>>> >>>>>>>>> sorry for the late. >>>>>>>>> network.ping-timeout has been set to 0 in order to try to solve >>>>>>>>> some timeout problems, but it didn?t help. >>>>>>>>> I can set it to the default value. >>>>>>>>> >>>>>>>>> Can I proceed with the change? >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> Mauro >>>>>>>>> >>>>>>>>> >>>>>>>>> On 28 Feb 2019, at 04:41, Nithya Balachandran >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi Mauro, >>>>>>>>> >>>>>>>>> Is network.ping-timeout still set to 0. The default value is 42. >>>>>>>>> Is there a particular reason why this was changed? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Nithya >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, 27 Feb 2019 at 21:32, Mauro Tridici >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Xavi, >>>>>>>>>> >>>>>>>>>> thank you for the detailed explanation and suggestions. >>>>>>>>>> Yes, transport.listen-backlog option is still set to 1024. >>>>>>>>>> >>>>>>>>>> I will check the network and connectivity status using ?ping? and >>>>>>>>>> ?telnet? as soon as the errors will come back again. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Mauro >>>>>>>>>> >>>>>>>>>> Il giorno 27 feb 2019, alle ore 16:42, Xavi Hernandez < >>>>>>>>>> jahernan at redhat.com> ha scritto: >>>>>>>>>> >>>>>>>>>> Hi Mauro, >>>>>>>>>> >>>>>>>>>> those errors say that the mount point is not connected to some of >>>>>>>>>> the bricks while executing operations. I see references to 3rd and 6th >>>>>>>>>> bricks of several disperse sets, which seem to map to server s06. For some >>>>>>>>>> reason, gluster is having troubles connecting from the client machine to >>>>>>>>>> that particular server. At the end of the log I see that after long time a >>>>>>>>>> reconnect is done to both of them. However little after, other bricks from >>>>>>>>>> the s05 get disconnected and a reconnect times out. >>>>>>>>>> >>>>>>>>>> That's really odd. It seems like if server/communication is cut >>>>>>>>>> to s06 for some time, then restored, and then the same happens to the next >>>>>>>>>> server. >>>>>>>>>> >>>>>>>>>> If the servers are really online and it's only a communication >>>>>>>>>> issue, it explains why server memory and network has increased: if the >>>>>>>>>> problem only exists between the client and servers, any write made by the >>>>>>>>>> client will automatically mark the file as damaged, since some of the >>>>>>>>>> servers have not been updated. Since self-heal runs from the server nodes, >>>>>>>>>> they will probably be correctly connected to all bricks, which allows them >>>>>>>>>> to heal the just damaged file, which increases memory and network usage. >>>>>>>>>> >>>>>>>>>> I guess you still have transport.listen-backlog set to 1024, >>>>>>>>>> right ? >>>>>>>>>> >>>>>>>>>> Just to try to identify if the problem really comes from network, >>>>>>>>>> can you check if you lose some pings from the client to all of the servers >>>>>>>>>> while you are seeing those errors in the log file ? >>>>>>>>>> >>>>>>>>>> You can also check if during those errors, you can telnet to the >>>>>>>>>> port of the brick from the client. >>>>>>>>>> >>>>>>>>>> Xavi >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Feb 26, 2019 at 10:17 AM Mauro Tridici < >>>>>>>>>> mauro.tridici at cmcc.it> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Nithya, >>>>>>>>>>> >>>>>>>>>>> ?df -h? operation is not still slow, but no users are using the >>>>>>>>>>> volume, RAM and NETWORK usage is ok on the client node. >>>>>>>>>>> >>>>>>>>>>> I was worried about this kind of warnings/errors: >>>>>>>>>>> >>>>>>>>>>> [2019-02-25 10:59:00.664323] W [MSGID: 122035] >>>>>>>>>>> [ec-common.c:571:ec_child_select] 0-tier2-disperse-6: Executing operation >>>>>>>>>>> with some subvolumes unavailable (20) >>>>>>>>>>> >>>>>>>>>>> [2019-02-26 03:11:35.212603] E >>>>>>>>>>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>>>>>>>>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] >>>>>>>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) >>>>>>>>>>> 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) >>>>>>>>>>> called at 2019-02-26 03:10:56.549903 (xid=0x106f1c5) >>>>>>>>>>> >>>>>>>>>>> [2019-02-26 03:13:03.313831] E >>>>>>>>>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-50: connection to >>>>>>>>>>> 192.168.0.56:49156 failed (Timeout della connessione); >>>>>>>>>>> disconnecting socket >>>>>>>>>>> >>>>>>>>>>> It seems that some subvolumes are not available and 192.168.0.56 >>>>>>>>>>> server (s06) is not reachable. >>>>>>>>>>> But gluster servers are up&running and bricks are ok. >>>>>>>>>>> >>>>>>>>>>> In attachment the updated tier2.log file. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thank you. >>>>>>>>>>> Regards, >>>>>>>>>>> Mauro >>>>>>>>>>> >>>>>>>>>>> Il giorno 26 feb 2019, alle ore 04:03, Nithya Balachandran < >>>>>>>>>>> nbalacha at redhat.com> ha scritto: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I see a lot of EC messages in the log but they don't seem very >>>>>>>>>>> serious. Xavi, can you take a look? >>>>>>>>>>> >>>>>>>>>>> The only errors I see are: >>>>>>>>>>> [2019-02-25 10:58:45.519871] E >>>>>>>>>>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>>>>>>>>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] >>>>>>>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) >>>>>>>>>>> 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) >>>>>>>>>>> called at 2019-02-25 10:57:47.429969 (xid=0xd26fe7) >>>>>>>>>>> [2019-02-25 10:58:51.461493] E >>>>>>>>>>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>>>>>>>>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] >>>>>>>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) >>>>>>>>>>> 0-tier2-client-41: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) >>>>>>>>>>> called at 2019-02-25 10:57:47.499174 (xid=0xf47d6a) >>>>>>>>>>> [2019-02-25 11:07:57.152874] E >>>>>>>>>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-70: connection to >>>>>>>>>>> 192.168.0.55:49163 failed (Timeout della connessione); >>>>>>>>>>> disconnecting socket >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Is the df -h operation still slow? If yes, can you take a >>>>>>>>>>> tcpdump of the client while running df -h and send that across? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Nithya >>>>>>>>>>> >>>>>>>>>>> On Mon, 25 Feb 2019 at 17:27, Mauro Tridici < >>>>>>>>>>> mauro.tridici at cmcc.it> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Sorry, some minutes after my last mail message, I noticed that >>>>>>>>>>>> ?df -h? command hanged for a while before returns the prompt. >>>>>>>>>>>> Yesterday, everything was ok in the gluster client log, but, >>>>>>>>>>>> today, I see a lot of errors (please, take a look to the attached file). >>>>>>>>>>>> >>>>>>>>>>>> On the client node, I detected an important RAM e NETWORK usage. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Do you think that the errors have been caused by the client >>>>>>>>>>>> resources usage? >>>>>>>>>>>> >>>>>>>>>>>> Thank you in advance, >>>>>>>>>>>> Mauro >>>>>>>>>>>> >>>>>>>>>>>> >>>>>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Thu Mar 14 07:02:35 2019 From: revirii at googlemail.com (Hu Bert) Date: Thu, 14 Mar 2019 08:02:35 +0100 Subject: [Gluster-users] ganesha-gfapi In-Reply-To: References: Message-ID: btw.: re-adding the list ;-) there's another bug: https://bugzilla.redhat.com/show_bug.cgi?id=1671603 nfs-ganesha is not mentioned there - maybe adding some relevant log entries or stack traces to the bug report might be good. for me setting 'performance.parallel-readdir off' solved the issue; a developer told me that a fix will find its way into a 5.x update. Am Mi., 13. M?rz 2019 um 16:34 Uhr schrieb Valerio Luccio : > > On 3/13/19 11:06 AM, Hu Bert wrote: > > > Hi Valerio, > > > > is an already known "behaviour" and maybe a bug: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.redhat.com_show-5Fbug.cgi-3Fid-3D1674225&d=DwIFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=zZK0dca4HNf-XwnAN9ais1C3ncS0n2x39pF7yr-muHY&m=t4iNWel0wQK63nIiDxlIQnBc8ZjPgVrh-qv-YOSY54o&s=xaZD_8hLEKFghjpbd9rMA99dqNbO7ZMMLVuBJunp-3s&e= > > > > > > Regards, > > Hubert > > Thanks Hubert, > > my nfs-ganesha crashes on a regular basis and i wonder if the two are > related. > > -- > Valerio Luccio (212) 998-8736 > Center for Brain Imaging 4 Washington Place, Room 157 > New York University New York, NY 10003 > > "In an open world, who needs windows or gates ?" > From guillaume.pavese at interactiv-group.com Thu Mar 14 08:03:03 2019 From: guillaume.pavese at interactiv-group.com (Guillaume Pavese) Date: Thu, 14 Mar 2019 17:03:03 +0900 Subject: [Gluster-users] [Gluster-Maintainers] [gluster-packaging] glusterfs-6.0rc1 released In-Reply-To: References: <1536588589.15.1552443884989.JavaMail.jenkins@jenkins-el7.rht.gluster.org> <20190313093835.GN3535@ndevos-x270> Message-ID: That worked, thanks for your help. Guillaume Pavese Ing?nieur Syst?me et R?seau Interactiv-Group On Thu, Mar 14, 2019 at 11:44 AM Atin Mukherjee wrote: > If you were on rc0 and upgraded to rc1, then you are hitting BZ 1684029 I > believe. Can you please upgrade all the nodes to rc1, bump up the > op-version to 60000 (if not already done) and then restart glusterd > services to see if the peer rejection goes away? > > On Thu, Mar 14, 2019 at 7:51 AM Guillaume Pavese < > guillaume.pavese at interactiv-group.com> wrote: > >> putting users at gluster.org in the loop >> >> Guillaume Pavese >> Ing?nieur Syst?me et R?seau >> Interactiv-Group >> >> >> On Thu, Mar 14, 2019 at 11:04 AM Guillaume Pavese < >> guillaume.pavese at interactiv-group.com> wrote: >> >>> Hi, I am testing gluster6-rc1 on a replica 3 oVirt cluster (engine full >>> replica 3 and 2 other volume replica + arbiter). They were on Gluster6-rc0. >>> I upgraded one host that was having the "0-epoll: Failed to dispatch >>> handler" bug for one of its volumes, but now all three volumes are down! >>> "gluster peer status" now shows its 2 other peers as connected but >>> rejected. Should I upgrade the other nodes? They are still on Gluster6-rc0 >>> >>> >>> Guillaume Pavese >>> Ing?nieur Syst?me et R?seau >>> Interactiv-Group >>> >>> >>> On Wed, Mar 13, 2019 at 6:38 PM Niels de Vos wrote: >>> >>>> On Wed, Mar 13, 2019 at 02:24:44AM +0000, jenkins at build.gluster.org >>>> wrote: >>>> > SRC: >>>> https://build.gluster.org/job/release-new/81/artifact/glusterfs-6.0rc1.tar.gz >>>> > HASH: >>>> https://build.gluster.org/job/release-new/81/artifact/glusterfs-6.0rc1.sha512sum >>>> >>>> Packages from the CentOS Storage SIG will become available shortly in >>>> the testing repository. Please use these packages to enable the repo and >>>> install the glusterfs components in a 2nd step. >>>> >>>> el7: >>>> https://cbs.centos.org/kojifiles/work/tasks/3263/723263/centos-release-gluster6-0.9-1.el7.centos.noarch.rpm >>>> el6 >>>> : >>>> >>>> https://cbs.centos.org/kojifiles/work/tasks/3265/723265/centos-release-gluster6-0.9-1.el6.centos.noarch.rpm >>>> >>>> Once installed, the testing repo is enabled. Everything should be >>>> available. >>>> >>>> It is highly appreciated to let me know some results of the testing! >>>> >>>> Thanks, >>>> Niels >>>> _______________________________________________ >>>> packaging mailing list >>>> packaging at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/packaging >>>> >>> _______________________________________________ >> maintainers mailing list >> maintainers at gluster.org >> https://lists.gluster.org/mailman/listinfo/maintainers >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauro.tridici at cmcc.it Thu Mar 14 10:08:21 2019 From: mauro.tridici at cmcc.it (Mauro Tridici) Date: Thu, 14 Mar 2019 11:08:21 +0100 Subject: [Gluster-users] "rpc_clnt_ping_timer_expired" errors In-Reply-To: References: <96B07283-D8AB-4F06-909D-E00424625528@cmcc.it> <42758A0E-8BE9-497D-BDE3-55D7DC633BC7@cmcc.it> <6A8CF4A4-98EA-48C3-A059-D60D1B2721C7@cmcc.it> <2CF49168-9C1B-4931-8C34-8157262A137A@cmcc.it> <7A151CC9-A0AE-4A45-B450-A4063D216D9E@cmcc.it> <32D53ECE-3F49-4415-A6EE-241B351AC2BA@cmcc.it> <4685A75B-5978-4338-9C9F-4A02FB40B9BC@cmcc.it> <4D2E6B04-C2E8-4FD5-B72D-E7C05931C6F9@cmcc.it> <4E332A56-B318-4BC1-9F44-84AB4392A5DE@cmcc.it> <832FD362-3B14-40D8-8530-604419300476@cmcc.it> <8D926643-1093-48ED-823F-D8F117F702CF@cmcc.it> <9D0BE438-DF11-4D0A-AB85-B44357032F29@cmcc.it> <5F0AC378-8170-4342-8473-9C17159CAC1D@cmcc.it> <7A50E86D-9E27-4EA7-883B-45E9F973991A@cmcc.it> <58B5DB7F-DCB4-4FBF-B1F7-681315B1613A@cmcc.it> <6327B44F-4E7E-46BB-A74C-70F4457DD1EB@cmcc.it> <0167DF4A-8329-4A1A-B439-857DFA6F78BB@cmcc.it> <763F334E-35B8-4729-B8E1-D30866754EEE@cmcc.it> <91DFC9AC-4805-41EB-AC6F-5722E018DD6E@cmcc.it> <8A9752B8-B231-4570-8FF4-8D3D781E7D42@cmcc.it> <47A24804-F975-4EE6-9FA5-67FCDA18D707@cmcc.it> <637FF5D2-D1F4-4686-9D48-646A96F67B96@cmcc.it> <4A87495F-3755-48F7-8507-085869069C64@cmcc.it> <3854BBF6-5B98-4AB3-A67E-E7DE59E69A63@cmcc.it> <313DA021-9173-4899-96B0-831B10B00B61@cmcc.it> <17996AFD-DFC8-40F3-9D09-DB6DDAD5B7D6@cmcc.it> <7074B5D8-955A-4802-A9F3-606C99734417@cmcc.it> <83B84BF9-8334-4230-B6F8-0BC4BFBEFF15@cmcc.it> <133B9AE4-9792-4F72-AD91-D36E7B9EC711@cmcc.it> <6611C4B0-57FD-4390-88B5-BD373555D4C5@cmcc.it> Message-ID: Hi Raghavendra, I just changed the client option value to 8. I will check the volume behaviour during the next hours. The GlusterFS version is 3.12.14. I will provide you the logs as soon as the activity load will be high. Thank you, Mauro > On 14 Mar 2019, at 04:57, Raghavendra Gowdappa wrote: > > > > On Wed, Mar 13, 2019 at 3:55 PM Mauro Tridici > wrote: > Hi Raghavendra, > > Yes, server.event-thread has been changed from 4 to 8. > > Was client.event-thread value too changed to 8? If not, I would like to know the results of including this tuning too. Also, if possible, can you get the output of following command from problematic clients and bricks (during the duration when load tends to be high and ping-timer-expiry is seen)? > > # top -bHd 3 > > This will help us to know CPU utilization of event-threads. > > And I forgot to ask, what version of Glusterfs are you using? > > During last days, I noticed that the error events are still here although they have been considerably reduced. > > So, I used grep command against the log files in order to provide you a global vision about the warning, error and critical events appeared today at 06:xx (may be useful I hope). > I collected the info from s06 gluster server, but the behaviour is the the almost the same on the other gluster servers. > > ERRORS: > CWD: /var/log/glusterfs > COMMAND: grep " E " *.log |grep "2019-03-13 06:" > > (I can see a lot of this kind of message in the same period but I'm notifying you only one record for each type of error) > > glusterd.log:[2019-03-13 06:12:35.982863] E [MSGID: 101042] [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /var/run/gluster/tier2_quota_list/ > > glustershd.log:[2019-03-13 06:14:28.666562] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a71ddcebb] (--> /lib64/libgfr > pc.so.0(saved_frames_unwind+0x1de)[0x7f4a71ba1d9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4a71ba1ebe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup > +0x90)[0x7f4a71ba3640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f4a71ba4130] ))))) 0-tier2-client-55: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) > called at 2019-03-13 06:14:14.858441 (xid=0x17fddb50) > > glustershd.log:[2019-03-13 06:17:48.883825] E [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to 192.168.0.55:49158 failed (Connection timed out); disco > nnecting socket > glustershd.log:[2019-03-13 06:19:58.931798] E [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to 192.168.0.55:49158 failed (Connection timed out); disco > nnecting socket > glustershd.log:[2019-03-13 06:22:08.979829] E [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to 192.168.0.55:49158 failed (Connection timed out); disco > nnecting socket > glustershd.log:[2019-03-13 06:22:36.226847] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote operation failed [Transport endpoint > is not connected] > glustershd.log:[2019-03-13 06:22:36.306669] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote operation failed [Transport endpoint > is not connected] > glustershd.log:[2019-03-13 06:22:36.385257] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote operation failed [Transport endpoint > is not connected] > > WARNINGS: > CWD: /var/log/glusterfs > COMMAND: grep " W " *.log |grep "2019-03-13 06:" > > (I can see a lot of this kind of message in the same period but I'm notifying you only one record for each type of warnings) > > glustershd.log:[2019-03-13 06:14:28.666772] W [MSGID: 114031] [client-rpc-fops.c:1080:client3_3_getxattr_cbk] 0-tier2-client-55: remote operation failed. Path: 0f-f34d-4c25-bbe8-74bde0248d7e> (b6b35d0f-f34d-4c25-bbe8-74bde0248d7e). Key: (null) [Transport endpoint is not connected] > > glustershd.log:[2019-03-13 06:14:31.421576] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (2) > > glustershd.log:[2019-03-13 06:15:31.547417] W [MSGID: 122032] [ec-heald.c:266:ec_shd_index_sweep] 0-tier2-disperse-9: unable to get index-dir on tier2-client-55 [Operation > now in progress] > > quota-mount-tier2.log:[2019-03-13 06:12:36.116277] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'trans > port.address-family', continuing with correction > quota-mount-tier2.log:[2019-03-13 06:12:36.198430] W [MSGID: 101174] [graph.c:363:_log_if_unknown_option] 0-tier2-readdir-ahead: option 'parallel-readdir' is not recognized > quota-mount-tier2.log:[2019-03-13 06:12:37.945007] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f340892be25] -->/usr/sbin/glusterfs(gluste > rfs_sigwaiter+0xe5) [0x55ef010164b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55ef0101632b] ) 0-: received signum (15), shutting down > > CRITICALS: > CWD: /var/log/glusterfs > COMMAND: grep " C " *.log |grep "2019-03-13 06:" > > no critical errors at 06:xx > only one critical error during the day > > [root at s06 glusterfs]# grep " C " *.log |grep "2019-03-13" > glustershd.log:[2019-03-13 02:21:29.126279] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server 192.168.0.55:49158 has not responded in the last 42 seconds, disconnecting. > > > Thank you very much for your help. > Regards, > Mauro > >> On 12 Mar 2019, at 05:17, Raghavendra Gowdappa > wrote: >> >> Was the suggestion to increase server.event-thread values tried? If yes, what were the results? >> >> On Mon, Mar 11, 2019 at 2:40 PM Mauro Tridici > wrote: >> Dear All, >> >> do you have any suggestions about the right way to "debug" this issue? >> In attachment, the updated logs of ?s06" gluster server. >> >> I noticed a lot of intermittent warning and error messages. >> >> Thank you in advance, >> Mauro >> >> >> >>> On 4 Mar 2019, at 18:45, Raghavendra Gowdappa > wrote: >>> >>> >>> +Gluster Devel , +Gluster-users >>> >>> I would like to point out another issue. Even if what I suggested prevents disconnects, part of the solution would be only symptomatic treatment and doesn't address the root cause of the problem. In most of the ping-timer-expiry issues, the root cause is the increased load on bricks and the inability of bricks to be responsive under high load. So, the actual solution would be doing any or both of the following: >>> * identify the source of increased load and if possible throttle it. Internal heal processes like self-heal, rebalance, quota heal are known to pump traffic into bricks without much throttling (io-threads _might_ do some throttling, but my understanding is its not sufficient). >>> * identify the reason for bricks to become unresponsive during load. This may be fixable issues like not enough event-threads to read from network or difficult to fix issues like fsync on backend fs freezing the process or semi fixable issues (in code) like lock contention. >>> >>> So any genuine effort to fix ping-timer-issues (to be honest most of the times they are not issues related to rpc/network) would involve performance characterization of various subsystems on bricks and clients. Various subsystems can include (but not necessarily limited to), underlying OS/filesystem, glusterfs processes, CPU consumption etc >>> >>> regards, >>> Raghavendra >>> >>> On Mon, Mar 4, 2019 at 9:31 PM Mauro Tridici > wrote: >>> Thank you, let?s try! >>> I will inform you about the effects of the change. >>> >>> Regards, >>> Mauro >>> >>>> On 4 Mar 2019, at 16:55, Raghavendra Gowdappa > wrote: >>>> >>>> >>>> >>>> On Mon, Mar 4, 2019 at 8:54 PM Mauro Tridici > wrote: >>>> Hi Raghavendra, >>>> >>>> thank you for your reply. >>>> Yes, you are right. It is a problem that seems to happen randomly. >>>> At this moment, server.event-threads value is 4. I will try to increase this value to 8. Do you think that it could be a valid value ? >>>> >>>> Yes. We can try with that. You should see at least frequency of ping-timer related disconnects reduce with this value (even if it doesn't eliminate the problem completely). >>>> >>>> >>>> Regards, >>>> Mauro >>>> >>>> >>>>> On 4 Mar 2019, at 15:36, Raghavendra Gowdappa > wrote: >>>>> >>>>> >>>>> >>>>> On Mon, Mar 4, 2019 at 8:01 PM Nithya Balachandran > wrote: >>>>> Hi Mauro, >>>>> >>>>> It looks like some problem on s06. Are all your other nodes ok? Can you send us the gluster logs from this node? >>>>> >>>>> @Raghavendra G , do you have any idea as to how this can be debugged? Maybe running top ? Or debug brick logs? >>>>> >>>>> If we can reproduce the problem, collecting tcpdump on both ends of connection will help. But, one common problem is these bugs are inconsistently reproducible and hence we may not be able to capture tcpdump at correct intervals. Other than that, we can try to collect some evidence that poller threads were busy (waiting on locks). But, not sure what debug data provides that information. >>>>> >>>>> From what I know, its difficult to collect evidence for this issue and we could only reason about it. >>>>> >>>>> We can try a workaround though - try increasing server.event-threads and see whether ping-timer expiry issues go away with an optimal value. If that's the case, it kind of provides proof for our hypothesis. >>>>> >>>>> >>>>> >>>>> Regards, >>>>> Nithya >>>>> >>>>> On Mon, 4 Mar 2019 at 15:25, Mauro Tridici > wrote: >>>>> Hi All, >>>>> >>>>> some minutes ago I received this message from NAGIOS server >>>>> >>>>> ***** Nagios ***** >>>>> >>>>> Notification Type: PROBLEM >>>>> >>>>> Service: Brick - /gluster/mnt2/brick >>>>> Host: s06 >>>>> Address: s06-stg >>>>> State: CRITICAL >>>>> >>>>> Date/Time: Mon Mar 4 10:25:33 CET 2019 >>>>> >>>>> Additional Info: >>>>> CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. >>>>> >>>>> I checked the network, RAM and CPUs usage on s06 node and everything seems to be ok. >>>>> No bricks are in error state. In /var/log/messages, I detected again a crash of ?check_vol_utili? that I think it is a module used by NRPE executable (that is the NAGIOS client). >>>>> >>>>> Mar 4 10:15:29 s06 kernel: traps: check_vol_utili[161224] general protection ip:7facffa0a66d sp:7ffe9f4e6fc0 error:0 in libglusterfs.so.0.0.1[7facff9b7000+f7000] >>>>> Mar 4 10:15:29 s06 abrt-hook-ccpp: Process 161224 (python2.7) of user 0 killed by SIGSEGV - dumping core >>>>> Mar 4 10:15:29 s06 abrt-server: Generating core_backtrace >>>>> Mar 4 10:15:29 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>> Mar 4 10:16:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 4 10:16:01 s06 systemd: Starting User Slice of root. >>>>> Mar 4 10:16:01 s06 systemd: Started Session 201010 of user root. >>>>> Mar 4 10:16:01 s06 systemd: Starting Session 201010 of user root. >>>>> Mar 4 10:16:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 4 10:16:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 4 10:16:24 s06 abrt-server: Duplicate: UUID >>>>> Mar 4 10:16:24 s06 abrt-server: DUP_OF_DIR: /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>>> Mar 4 10:16:24 s06 abrt-server: Deleting problem directory ccpp-2019-03-04-10:15:29-161224 (dup of ccpp-2018-09-25-12:27:42-13041) >>>>> Mar 4 10:16:24 s06 abrt-server: Generating core_backtrace >>>>> Mar 4 10:16:24 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>> Mar 4 10:16:24 s06 abrt-server: Cannot notify '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event 'report_uReport' exited with 1 >>>>> Mar 4 10:16:24 s06 abrt-hook-ccpp: Process 161391 (python2.7) of user 0 killed by SIGABRT - dumping core >>>>> Mar 4 10:16:25 s06 abrt-server: Generating core_backtrace >>>>> Mar 4 10:16:25 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>> Mar 4 10:17:01 s06 systemd: Created slice User Slice of root. >>>>> >>>>> Also, I noticed the following errors that I think are very critical: >>>>> >>>>> Mar 4 10:21:12 s06 glustershd[20355]: [2019-03-04 09:21:12.954798] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server 192.168.0.55:49158 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 4 10:22:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 4 10:22:01 s06 systemd: Starting User Slice of root. >>>>> Mar 4 10:22:01 s06 systemd: Started Session 201017 of user root. >>>>> Mar 4 10:22:01 s06 systemd: Starting Session 201017 of user root. >>>>> Mar 4 10:22:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 4 10:22:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 4 10:22:03 s06 glustershd[20355]: [2019-03-04 09:22:03.964120] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server 192.168.0.54:49165 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 4 10:23:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 4 10:23:01 s06 systemd: Starting User Slice of root. >>>>> Mar 4 10:23:01 s06 systemd: Started Session 201018 of user root. >>>>> Mar 4 10:23:01 s06 systemd: Starting Session 201018 of user root. >>>>> Mar 4 10:23:02 s06 systemd: Removed slice User Slice of root. >>>>> Mar 4 10:23:02 s06 systemd: Stopping User Slice of root. >>>>> Mar 4 10:24:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 4 10:24:01 s06 systemd: Starting User Slice of root. >>>>> Mar 4 10:24:01 s06 systemd: Started Session 201019 of user root. >>>>> Mar 4 10:24:01 s06 systemd: Starting Session 201019 of user root. >>>>> Mar 4 10:24:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 4 10:24:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 4 10:24:03 s06 glustershd[20355]: [2019-03-04 09:24:03.982502] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-16: server 192.168.0.52:49158 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746109] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-3: server 192.168.0.51:49153 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746215] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server 192.168.0.52:49156 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746260] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-21: server 192.168.0.51:49159 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746296] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server 192.168.0.52:49161 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746413] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server 192.168.0.54:49165 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 4 10:24:07 s06 glustershd[20355]: [2019-03-04 09:24:07.982952] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-45: server 192.168.0.54:49155 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 4 10:24:18 s06 glustershd[20355]: [2019-03-04 09:24:18.990929] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server 192.168.0.52:49161 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 4 10:24:31 s06 glustershd[20355]: [2019-03-04 09:24:31.995781] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-20: server 192.168.0.53:49159 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 4 10:25:01 s06 systemd: Created slice User Slice of root. >>>>> Mar 4 10:25:01 s06 systemd: Starting User Slice of root. >>>>> Mar 4 10:25:01 s06 systemd: Started Session 201020 of user root. >>>>> Mar 4 10:25:01 s06 systemd: Starting Session 201020 of user root. >>>>> Mar 4 10:25:01 s06 systemd: Removed slice User Slice of root. >>>>> Mar 4 10:25:01 s06 systemd: Stopping User Slice of root. >>>>> Mar 4 10:25:57 s06 systemd: Created slice User Slice of root. >>>>> Mar 4 10:25:57 s06 systemd: Starting User Slice of root. >>>>> Mar 4 10:25:57 s06 systemd-logind: New session 201021 of user root. >>>>> Mar 4 10:25:57 s06 systemd: Started Session 201021 of user root. >>>>> Mar 4 10:25:57 s06 systemd: Starting Session 201021 of user root. >>>>> Mar 4 10:26:01 s06 systemd: Started Session 201022 of user root. >>>>> Mar 4 10:26:01 s06 systemd: Starting Session 201022 of user root. >>>>> Mar 4 10:26:21 s06 nrpe[162388]: Error: Could not complete SSL handshake with 192.168.1.56 : 5 >>>>> Mar 4 10:27:01 s06 systemd: Started Session 201023 of user root. >>>>> Mar 4 10:27:01 s06 systemd: Starting Session 201023 of user root. >>>>> Mar 4 10:28:01 s06 systemd: Started Session 201024 of user root. >>>>> Mar 4 10:28:01 s06 systemd: Starting Session 201024 of user root. >>>>> Mar 4 10:29:01 s06 systemd: Started Session 201025 of user root. >>>>> Mar 4 10:29:01 s06 systemd: Starting Session 201025 of user root. >>>>> >>>>> But, unfortunately, I don?t understand why it is happening. >>>>> Now, NAGIOS server shows that s06 status is ok: >>>>> >>>>> ***** Nagios ***** >>>>> >>>>> Notification Type: RECOVERY >>>>> >>>>> Service: Brick - /gluster/mnt2/brick >>>>> Host: s06 >>>>> Address: s06-stg >>>>> State: OK >>>>> >>>>> Date/Time: Mon Mar 4 10:35:23 CET 2019 >>>>> >>>>> Additional Info: >>>>> OK: Brick /gluster/mnt2/brick is up >>>>> >>>>> Nothing is changed from RAM, CPUs, and NETWORK point of view. >>>>> /var/log/message file has been updated: >>>>> >>>>> Mar 4 10:32:01 s06 systemd: Starting Session 201029 of user root. >>>>> Mar 4 10:32:30 s06 glustershd[20355]: [2019-03-04 09:32:30.069082] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server 192.168.0.52:49156 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 4 10:32:55 s06 glustershd[20355]: [2019-03-04 09:32:55.074689] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-66: server 192.168.0.54:49167 has not responded in the last 42 seconds, disconnecting. >>>>> Mar 4 10:33:01 s06 systemd: Started Session 201030 of user root. >>>>> Mar 4 10:33:01 s06 systemd: Starting Session 201030 of user root. >>>>> Mar 4 10:34:01 s06 systemd: Started Session 201031 of user root. >>>>> Mar 4 10:34:01 s06 systemd: Starting Session 201031 of user root. >>>>> Mar 4 10:35:01 s06 nrpe[162562]: Could not read request from client 192.168.1.56, bailing out... >>>>> Mar 4 10:35:01 s06 nrpe[162562]: INFO: SSL Socket Shutdown. >>>>> Mar 4 10:35:01 s06 systemd: Started Session 201032 of user root. >>>>> Mar 4 10:35:01 s06 systemd: Starting Session 201032 of user root. >>>>> >>>>> Could you please help me to understand what it?s happening ? >>>>> Thank you in advance. >>>>> >>>>> Rergards, >>>>> Mauro >>>>> >>>>> >>>>>> On 1 Mar 2019, at 12:17, Mauro Tridici > wrote: >>>>>> >>>>>> >>>>>> Thank you, Milind. >>>>>> I executed the instructions you suggested: >>>>>> >>>>>> - grep ?blocked for? /var/log/messages on s06 returns no output (no ?blocked? word is detected in messages file); >>>>>> - in /var/log/messages file I can see this kind of error repeated for a lot of times: >>>>>> >>>>>> Mar 1 08:43:01 s06 systemd: Starting Session 196071 of user root. >>>>>> Mar 1 08:43:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 1 08:43:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 1 08:43:02 s06 kernel: traps: check_vol_utili[57091] general protection ip:7f88e76ee66d sp:7ffe5a5bcc30 error:0 in libglusterfs.so.0.0.1[7f88e769b000+f7000] >>>>>> Mar 1 08:43:02 s06 abrt-hook-ccpp: Process 57091 (python2.7) of user 0 killed by SIGSEGV - dumping core >>>>>> Mar 1 08:43:02 s06 abrt-server: Generating core_backtrace >>>>>> Mar 1 08:43:02 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>>> Mar 1 08:43:58 s06 abrt-server: Duplicate: UUID >>>>>> Mar 1 08:43:58 s06 abrt-server: DUP_OF_DIR: /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>>>> Mar 1 08:43:58 s06 abrt-server: Deleting problem directory ccpp-2019-03-01-08:43:02-57091 (dup of ccpp-2018-09-25-12:27:42-13041) >>>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Activating service name='org.freedesktop.problems' (using servicehelper) >>>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Successfully activated service 'org.freedesktop.problems' >>>>>> Mar 1 08:43:58 s06 abrt-server: Generating core_backtrace >>>>>> Mar 1 08:43:58 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>>> Mar 1 08:43:58 s06 abrt-server: Cannot notify '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event 'report_uReport' exited with 1 >>>>>> Mar 1 08:44:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 1 08:44:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 1 08:44:01 s06 systemd: Started Session 196072 of user root. >>>>>> Mar 1 08:44:01 s06 systemd: Starting Session 196072 of user root. >>>>>> Mar 1 08:44:01 s06 systemd: Removed slice User Slice of root. >>>>>> >>>>>> - in /var/log/messages file I can see also 4 errors related to other cluster servers: >>>>>> >>>>>> Mar 1 11:05:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 1 11:05:01 s06 systemd: Started Session 196230 of user root. >>>>>> Mar 1 11:05:01 s06 systemd: Starting Session 196230 of user root. >>>>>> Mar 1 11:05:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 1 11:05:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 1 11:05:59 s06 glustershd[70117]: [2019-03-01 10:05:59.347094] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-33: server 192.168.0.51:49163 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 1 11:06:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 1 11:06:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 1 11:06:01 s06 systemd: Started Session 196231 of user root. >>>>>> Mar 1 11:06:01 s06 systemd: Starting Session 196231 of user root. >>>>>> Mar 1 11:06:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 1 11:06:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 1 11:06:12 s06 glustershd[70117]: [2019-03-01 10:06:12.351319] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-1: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 1 11:06:38 s06 glustershd[70117]: [2019-03-01 10:06:38.356920] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-7: server 192.168.0.52:49155 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 1 11:07:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 1 11:07:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 1 11:07:01 s06 systemd: Started Session 196232 of user root. >>>>>> Mar 1 11:07:01 s06 systemd: Starting Session 196232 of user root. >>>>>> Mar 1 11:07:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 1 11:07:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 1 11:07:36 s06 glustershd[70117]: [2019-03-01 10:07:36.366259] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-0: server 192.168.0.51:49152 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 1 11:08:01 s06 systemd: Created slice User Slice of root. >>>>>> >>>>>> No ?blocked? word is in /var/log/messages files on other cluster servers. >>>>>> In attachment, the /var/log/messages file from s06 server. >>>>>> >>>>>> Thank you in advance, >>>>>> Mauro >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On 1 Mar 2019, at 11:47, Milind Changire > wrote: >>>>>>> >>>>>>> The traces of very high disk activity on the servers are often found in /var/log/messages >>>>>>> You might want to grep for "blocked for" in /var/log/messages on s06 and correlate the timestamps to confirm the unresponsiveness as reported in gluster client logs. >>>>>>> In cases of high disk activity, although the operating system continues to respond to ICMP pings, the processes writing to disks often get blocked to a large flush to the disk which could span beyond 42 seconds and hence result in ping-timer-expiry logs. >>>>>>> >>>>>>> As a side note: >>>>>>> If you indeed find gluster processes being blocked in /var/log/messages, you might want to tweak sysctl tunables called vm.dirty_background_ratio or vm.dirty_background_bytes to a smaller value than the existing. Please read up more on those tunables before touching the settings. >>>>>>> >>>>>>> >>>>>>> On Fri, Mar 1, 2019 at 4:06 PM Mauro Tridici > wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> in attachment the client log captured after changing network.ping-timeout option. >>>>>>> I noticed this error involving server 192.168.0.56 (s06) >>>>>>> >>>>>>> [2019-03-01 09:23:36.077287] I [rpc-clnt.c:1962:rpc_clnt_reconfig] 0-tier2-client-71: changing ping timeout to 42 (from 0) >>>>>>> [2019-03-01 09:23:36.078213] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>>> [2019-03-01 09:23:36.078432] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>>> [2019-03-01 09:23:36.092357] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>>> [2019-03-01 09:23:36.094146] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>>> [2019-03-01 10:06:24.708082] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-50: server 192.168.0.56:49156 has not responded in the last 42 seconds, disconnecting. >>>>>>> >>>>>>> I don?t know why it happens, s06 server seems to be reachable. >>>>>>> >>>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>>> Trying 192.168.0.56... >>>>>>> Connected to 192.168.0.56. >>>>>>> Escape character is '^]'. >>>>>>> ^CConnection closed by foreign host. >>>>>>> [athena_login2][/users/home/sysm02/]> ping 192.168.0.56 >>>>>>> PING 192.168.0.56 (192.168.0.56) 56(84) bytes of data. >>>>>>> 64 bytes from 192.168.0.56 : icmp_seq=1 ttl=64 time=0.116 ms >>>>>>> 64 bytes from 192.168.0.56 : icmp_seq=2 ttl=64 time=0.101 ms >>>>>>> >>>>>>> --- 192.168.0.56 ping statistics --- >>>>>>> 2 packets transmitted, 2 received, 0% packet loss, time 1528ms >>>>>>> rtt min/avg/max/mdev = 0.101/0.108/0.116/0.012 ms >>>>>>> >>>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>>> Trying 192.168.0.56... >>>>>>> Connected to 192.168.0.56. >>>>>>> Escape character is '^]'. >>>>>>> >>>>>>> Thank you for your help, >>>>>>> Mauro >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On 1 Mar 2019, at 10:29, Mauro Tridici > wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> thank you for the explanation. >>>>>>>> I just changed network.ping-timeout option to default value (network.ping-timeout=42). >>>>>>>> >>>>>>>> I will check the logs to see if the errors will appear again. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Mauro >>>>>>>> >>>>>>>>> On 1 Mar 2019, at 04:43, Milind Changire > wrote: >>>>>>>>> >>>>>>>>> network.ping-timeout should not be set to zero for non-glusterd clients. >>>>>>>>> glusterd is a special case for which ping-timeout is set to zero via /etc/glusterfs/glusterd.vol >>>>>>>>> >>>>>>>>> Setting network.ping-timeout to zero disables arming of the ping timer for connections. This disables testing the connection for responsiveness and hence avoids proactive fail-over. >>>>>>>>> >>>>>>>>> Please reset network.ping-timeout to a non-zero positive value, eg. 42 >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Feb 28, 2019 at 5:07 PM Nithya Balachandran > wrote: >>>>>>>>> Adding Raghavendra and Milind to comment on this. >>>>>>>>> >>>>>>>>> What is the effect of setting network.ping-timeout to 0 and should it be set back to 42? >>>>>>>>> Regards, >>>>>>>>> Nithya >>>>>>>>> >>>>>>>>> On Thu, 28 Feb 2019 at 16:01, Mauro Tridici > wrote: >>>>>>>>> Hi Nithya, >>>>>>>>> >>>>>>>>> sorry for the late. >>>>>>>>> network.ping-timeout has been set to 0 in order to try to solve some timeout problems, but it didn?t help. >>>>>>>>> I can set it to the default value. >>>>>>>>> >>>>>>>>> Can I proceed with the change? >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> Mauro >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 28 Feb 2019, at 04:41, Nithya Balachandran > wrote: >>>>>>>>>> >>>>>>>>>> Hi Mauro, >>>>>>>>>> >>>>>>>>>> Is network.ping-timeout still set to 0. The default value is 42. Is there a particular reason why this was changed? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Nithya >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, 27 Feb 2019 at 21:32, Mauro Tridici > wrote: >>>>>>>>>> >>>>>>>>>> Hi Xavi, >>>>>>>>>> >>>>>>>>>> thank you for the detailed explanation and suggestions. >>>>>>>>>> Yes, transport.listen-backlog option is still set to 1024. >>>>>>>>>> >>>>>>>>>> I will check the network and connectivity status using ?ping? and ?telnet? as soon as the errors will come back again. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Mauro >>>>>>>>>> >>>>>>>>>>> Il giorno 27 feb 2019, alle ore 16:42, Xavi Hernandez > ha scritto: >>>>>>>>>>> >>>>>>>>>>> Hi Mauro, >>>>>>>>>>> >>>>>>>>>>> those errors say that the mount point is not connected to some of the bricks while executing operations. I see references to 3rd and 6th bricks of several disperse sets, which seem to map to server s06. For some reason, gluster is having troubles connecting from the client machine to that particular server. At the end of the log I see that after long time a reconnect is done to both of them. However little after, other bricks from the s05 get disconnected and a reconnect times out. >>>>>>>>>>> >>>>>>>>>>> That's really odd. It seems like if server/communication is cut to s06 for some time, then restored, and then the same happens to the next server. >>>>>>>>>>> >>>>>>>>>>> If the servers are really online and it's only a communication issue, it explains why server memory and network has increased: if the problem only exists between the client and servers, any write made by the client will automatically mark the file as damaged, since some of the servers have not been updated. Since self-heal runs from the server nodes, they will probably be correctly connected to all bricks, which allows them to heal the just damaged file, which increases memory and network usage. >>>>>>>>>>> >>>>>>>>>>> I guess you still have transport.listen-backlog set to 1024, right ? >>>>>>>>>>> >>>>>>>>>>> Just to try to identify if the problem really comes from network, can you check if you lose some pings from the client to all of the servers while you are seeing those errors in the log file ? >>>>>>>>>>> >>>>>>>>>>> You can also check if during those errors, you can telnet to the port of the brick from the client. >>>>>>>>>>> >>>>>>>>>>> Xavi >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Feb 26, 2019 at 10:17 AM Mauro Tridici > wrote: >>>>>>>>>>> Hi Nithya, >>>>>>>>>>> >>>>>>>>>>> ?df -h? operation is not still slow, but no users are using the volume, RAM and NETWORK usage is ok on the client node. >>>>>>>>>>> >>>>>>>>>>> I was worried about this kind of warnings/errors: >>>>>>>>>>> >>>>>>>>>>> [2019-02-25 10:59:00.664323] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-6: Executing operation with some subvolumes unavailable (20) >>>>>>>>>>> >>>>>>>>>>> [2019-02-26 03:11:35.212603] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-26 03:10:56.549903 (xid=0x106f1c5) >>>>>>>>>>> >>>>>>>>>>> [2019-02-26 03:13:03.313831] E [socket.c:2376:socket_connect_finish] 0-tier2-client-50: connection to 192.168.0.56:49156 failed (Timeout della connessione); disconnecting socket >>>>>>>>>>> >>>>>>>>>>> It seems that some subvolumes are not available and 192.168.0.56 server (s06) is not reachable. >>>>>>>>>>> But gluster servers are up&running and bricks are ok. >>>>>>>>>>> >>>>>>>>>>> In attachment the updated tier2.log file. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thank you. >>>>>>>>>>> Regards, >>>>>>>>>>> Mauro >>>>>>>>>>> >>>>>>>>>>>> Il giorno 26 feb 2019, alle ore 04:03, Nithya Balachandran > ha scritto: >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I see a lot of EC messages in the log but they don't seem very serious. Xavi, can you take a look? >>>>>>>>>>>> >>>>>>>>>>>> The only errors I see are: >>>>>>>>>>>> [2019-02-25 10:58:45.519871] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-25 10:57:47.429969 (xid=0xd26fe7) >>>>>>>>>>>> [2019-02-25 10:58:51.461493] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-41: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-25 10:57:47.499174 (xid=0xf47d6a) >>>>>>>>>>>> [2019-02-25 11:07:57.152874] E [socket.c:2376:socket_connect_finish] 0-tier2-client-70: connection to 192.168.0.55:49163 failed (Timeout della connessione); disconnecting socket >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Is the df -h operation still slow? If yes, can you take a tcpdump of the client while running df -h and send that across? >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Nithya >>>>>>>>>>>> >>>>>>>>>>>> On Mon, 25 Feb 2019 at 17:27, Mauro Tridici > wrote: >>>>>>>>>>>> >>>>>>>>>>>> Sorry, some minutes after my last mail message, I noticed that ?df -h? command hanged for a while before returns the prompt. >>>>>>>>>>>> Yesterday, everything was ok in the gluster client log, but, today, I see a lot of errors (please, take a look to the attached file). >>>>>>>>>>>> >>>>>>>>>>>> On the client node, I detected an important RAM e NETWORK usage. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Do you think that the errors have been caused by the client resources usage? >>>>>>>>>>>> >>>>>>>>>>>> Thank you in advance, >>>>>>>>>>>> Mauro >>>>>>>>>>>> >>>>> >>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Thu Mar 14 12:31:44 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Thu, 14 Mar 2019 18:01:44 +0530 Subject: [Gluster-users] "rpc_clnt_ping_timer_expired" errors In-Reply-To: References: <96B07283-D8AB-4F06-909D-E00424625528@cmcc.it> <42758A0E-8BE9-497D-BDE3-55D7DC633BC7@cmcc.it> <6A8CF4A4-98EA-48C3-A059-D60D1B2721C7@cmcc.it> <2CF49168-9C1B-4931-8C34-8157262A137A@cmcc.it> <7A151CC9-A0AE-4A45-B450-A4063D216D9E@cmcc.it> <32D53ECE-3F49-4415-A6EE-241B351AC2BA@cmcc.it> <4685A75B-5978-4338-9C9F-4A02FB40B9BC@cmcc.it> <4D2E6B04-C2E8-4FD5-B72D-E7C05931C6F9@cmcc.it> <4E332A56-B318-4BC1-9F44-84AB4392A5DE@cmcc.it> <832FD362-3B14-40D8-8530-604419300476@cmcc.it> <8D926643-1093-48ED-823F-D8F117F702CF@cmcc.it> <9D0BE438-DF11-4D0A-AB85-B44357032F29@cmcc.it> <5F0AC378-8170-4342-8473-9C17159CAC1D@cmcc.it> <7A50E86D-9E27-4EA7-883B-45E9F973991A@cmcc.it> <58B5DB7F-DCB4-4FBF-B1F7-681315B1613A@cmcc.it> <6327B44F-4E7E-46BB-A74C-70F4457DD1EB@cmcc.it> <0167DF4A-8329-4A1A-B439-857DFA6F78BB@cmcc.it> <763F334E-35B8-4729-B8E1-D30866754EEE@cmcc.it> <91DFC9AC-4805-41EB-AC6F-5722E018DD6E@cmcc.it> <8A9752B8-B231-4570-8FF4-8D3D781E7D42@cmcc.it> <47A24804-F975-4EE6-9FA5-67FCDA18D707@cmcc.it> <637FF5D2-D1F4-4686-9D48-646A96F67B96@cmcc.it> <4A87495F-3755-48F7-8507-085869069C64@cmcc.it> <3854BBF6-5B98-4AB3-A67E-E7DE59E69A63@cmcc.it> <313DA021-9173-4899-96B0-831B10B00B61@cmcc.it> <17996AFD-DFC8-40F3-9D09-DB6DDAD5B7D6@cmcc.it> <7074B5D8-955A-4802-A9F3-606C99734417@cmcc.it> <83B84BF9-8334-4230-B6F8-0BC4BFBEFF15@cmcc.it> <133B9AE4-9792-4F72-AD91-D36E7B9EC711@cmcc.it> <6611C4B0-57FD-4390-88B5-BD373555D4C5@cmcc.it> Message-ID: Thanks Mauro. On Thu, Mar 14, 2019 at 3:38 PM Mauro Tridici wrote: > Hi Raghavendra, > > I just changed the client option value to 8. > I will check the volume behaviour during the next hours. > > The GlusterFS version is 3.12.14. > > I will provide you the logs as soon as the activity load will be high. > Thank you, > Mauro > > On 14 Mar 2019, at 04:57, Raghavendra Gowdappa > wrote: > > > > On Wed, Mar 13, 2019 at 3:55 PM Mauro Tridici > wrote: > >> Hi Raghavendra, >> >> Yes, server.event-thread has been changed from 4 to 8. >> > > Was client.event-thread value too changed to 8? If not, I would like to > know the results of including this tuning too. Also, if possible, can you > get the output of following command from problematic clients and bricks > (during the duration when load tends to be high and ping-timer-expiry is > seen)? > > # top -bHd 3 > > This will help us to know CPU utilization of event-threads. > > And I forgot to ask, what version of Glusterfs are you using? > > During last days, I noticed that the error events are still here although >> they have been considerably reduced. >> >> So, I used grep command against the log files in order to provide you a >> global vision about the warning, error and critical events appeared today >> at 06:xx (may be useful I hope). >> I collected the info from s06 gluster server, but the behaviour is the >> the almost the same on the other gluster servers. >> >> *ERRORS: * >> *CWD: /var/log/glusterfs * >> *COMMAND: grep " E " *.log |grep "2019-03-13 06:"* >> >> (I can see a lot of this kind of message in the same period but I'm >> notifying you only one record for each type of error) >> >> glusterd.log:[2019-03-13 06:12:35.982863] E [MSGID: 101042] >> [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of >> /var/run/gluster/tier2_quota_list/ >> >> glustershd.log:[2019-03-13 06:14:28.666562] E >> [rpc-clnt.c:350:saved_frames_unwind] (--> >> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a71ddcebb] (--> >> /lib64/libgfr >> pc.so.0(saved_frames_unwind+0x1de)[0x7f4a71ba1d9e] (--> >> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4a71ba1ebe] (--> >> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup >> +0x90)[0x7f4a71ba3640] (--> >> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f4a71ba4130] ))))) >> 0-tier2-client-55: forced unwinding frame type(GlusterFS 3.3) >> op(INODELK(29)) >> called at 2019-03-13 06:14:14.858441 (xid=0x17fddb50) >> >> glustershd.log:[2019-03-13 06:17:48.883825] E >> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to >> 192.168.0.55:49158 failed (Connection timed out); disco >> nnecting socket >> glustershd.log:[2019-03-13 06:19:58.931798] E >> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to >> 192.168.0.55:49158 failed (Connection timed out); disco >> nnecting socket >> glustershd.log:[2019-03-13 06:22:08.979829] E >> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to >> 192.168.0.55:49158 failed (Connection timed out); disco >> nnecting socket >> glustershd.log:[2019-03-13 06:22:36.226847] E [MSGID: 114031] >> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote >> operation failed [Transport endpoint >> is not connected] >> glustershd.log:[2019-03-13 06:22:36.306669] E [MSGID: 114031] >> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote >> operation failed [Transport endpoint >> is not connected] >> glustershd.log:[2019-03-13 06:22:36.385257] E [MSGID: 114031] >> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote >> operation failed [Transport endpoint >> is not connected] >> >> *WARNINGS:* >> *CWD: /var/log/glusterfs * >> *COMMAND: grep " W " *.log |grep "2019-03-13 06:"* >> >> (I can see a lot of this kind of message in the same period but I'm >> notifying you only one record for each type of warnings) >> >> glustershd.log:[2019-03-13 06:14:28.666772] W [MSGID: 114031] >> [client-rpc-fops.c:1080:client3_3_getxattr_cbk] 0-tier2-client-55: remote >> operation failed. Path: > 0f-f34d-4c25-bbe8-74bde0248d7e> (b6b35d0f-f34d-4c25-bbe8-74bde0248d7e). >> Key: (null) [Transport endpoint is not connected] >> >> glustershd.log:[2019-03-13 06:14:31.421576] W [MSGID: 122035] >> [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation >> with some subvolumes unavailable (2) >> >> glustershd.log:[2019-03-13 06:15:31.547417] W [MSGID: 122032] >> [ec-heald.c:266:ec_shd_index_sweep] 0-tier2-disperse-9: unable to get >> index-dir on tier2-client-55 [Operation >> now in progress] >> >> quota-mount-tier2.log:[2019-03-13 06:12:36.116277] W [MSGID: 101002] >> [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is >> deprecated, preferred is 'trans >> port.address-family', continuing with correction >> quota-mount-tier2.log:[2019-03-13 06:12:36.198430] W [MSGID: 101174] >> [graph.c:363:_log_if_unknown_option] 0-tier2-readdir-ahead: option >> 'parallel-readdir' is not recognized >> quota-mount-tier2.log:[2019-03-13 06:12:37.945007] W >> [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) >> [0x7f340892be25] -->/usr/sbin/glusterfs(gluste >> rfs_sigwaiter+0xe5) [0x55ef010164b5] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55ef0101632b] ) 0-: >> received signum (15), shutting down >> >> *CRITICALS:* >> *CWD: /var/log/glusterfs * >> *COMMAND: grep " C " *.log |grep "2019-03-13 06:"* >> >> no critical errors at 06:xx >> only one critical error during the day >> >> *[root at s06 glusterfs]# grep " C " *.log |grep "2019-03-13"* >> glustershd.log:[2019-03-13 02:21:29.126279] C >> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server >> 192.168.0.55:49158 has not responded in the last 42 seconds, >> disconnecting. >> >> >> Thank you very much for your help. >> Regards, >> Mauro >> >> On 12 Mar 2019, at 05:17, Raghavendra Gowdappa >> wrote: >> >> Was the suggestion to increase server.event-thread values tried? If yes, >> what were the results? >> >> On Mon, Mar 11, 2019 at 2:40 PM Mauro Tridici >> wrote: >> >>> Dear All, >>> >>> do you have any suggestions about the right way to "debug" this issue? >>> In attachment, the updated logs of ?s06" gluster server. >>> >>> I noticed a lot of intermittent warning and error messages. >>> >>> Thank you in advance, >>> Mauro >>> >>> >>> >>> On 4 Mar 2019, at 18:45, Raghavendra Gowdappa >>> wrote: >>> >>> >>> +Gluster Devel , +Gluster-users >>> >>> >>> I would like to point out another issue. Even if what I suggested >>> prevents disconnects, part of the solution would be only symptomatic >>> treatment and doesn't address the root cause of the problem. In most of the >>> ping-timer-expiry issues, the root cause is the increased load on bricks >>> and the inability of bricks to be responsive under high load. So, the >>> actual solution would be doing any or both of the following: >>> * identify the source of increased load and if possible throttle it. >>> Internal heal processes like self-heal, rebalance, quota heal are known to >>> pump traffic into bricks without much throttling (io-threads _might_ do >>> some throttling, but my understanding is its not sufficient). >>> * identify the reason for bricks to become unresponsive during load. >>> This may be fixable issues like not enough event-threads to read from >>> network or difficult to fix issues like fsync on backend fs freezing the >>> process or semi fixable issues (in code) like lock contention. >>> >>> So any genuine effort to fix ping-timer-issues (to be honest most of the >>> times they are not issues related to rpc/network) would involve performance >>> characterization of various subsystems on bricks and clients. Various >>> subsystems can include (but not necessarily limited to), underlying >>> OS/filesystem, glusterfs processes, CPU consumption etc >>> >>> regards, >>> Raghavendra >>> >>> On Mon, Mar 4, 2019 at 9:31 PM Mauro Tridici >>> wrote: >>> >>>> Thank you, let?s try! >>>> I will inform you about the effects of the change. >>>> >>>> Regards, >>>> Mauro >>>> >>>> On 4 Mar 2019, at 16:55, Raghavendra Gowdappa >>>> wrote: >>>> >>>> >>>> >>>> On Mon, Mar 4, 2019 at 8:54 PM Mauro Tridici >>>> wrote: >>>> >>>>> Hi Raghavendra, >>>>> >>>>> thank you for your reply. >>>>> Yes, you are right. It is a problem that seems to happen randomly. >>>>> At this moment, server.event-threads value is 4. I will try to >>>>> increase this value to 8. Do you think that it could be a valid value ? >>>>> >>>> >>>> Yes. We can try with that. You should see at least frequency of >>>> ping-timer related disconnects reduce with this value (even if it doesn't >>>> eliminate the problem completely). >>>> >>>> >>>>> Regards, >>>>> Mauro >>>>> >>>>> >>>>> On 4 Mar 2019, at 15:36, Raghavendra Gowdappa >>>>> wrote: >>>>> >>>>> >>>>> >>>>> On Mon, Mar 4, 2019 at 8:01 PM Nithya Balachandran < >>>>> nbalacha at redhat.com> wrote: >>>>> >>>>>> Hi Mauro, >>>>>> >>>>>> It looks like some problem on s06. Are all your other nodes ok? Can >>>>>> you send us the gluster logs from this node? >>>>>> >>>>>> @Raghavendra G , do you have any idea as >>>>>> to how this can be debugged? Maybe running top ? Or debug brick logs? >>>>>> >>>>> >>>>> If we can reproduce the problem, collecting tcpdump on both ends of >>>>> connection will help. But, one common problem is these bugs are >>>>> inconsistently reproducible and hence we may not be able to capture tcpdump >>>>> at correct intervals. Other than that, we can try to collect some evidence >>>>> that poller threads were busy (waiting on locks). But, not sure what debug >>>>> data provides that information. >>>>> >>>>> From what I know, its difficult to collect evidence for this issue and >>>>> we could only reason about it. >>>>> >>>>> We can try a workaround though - try increasing server.event-threads >>>>> and see whether ping-timer expiry issues go away with an optimal value. If >>>>> that's the case, it kind of provides proof for our hypothesis. >>>>> >>>>> >>>>>> >>>>>> Regards, >>>>>> Nithya >>>>>> >>>>>> On Mon, 4 Mar 2019 at 15:25, Mauro Tridici >>>>>> wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> some minutes ago I received this message from NAGIOS server >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ****** Nagios *****Notification Type: PROBLEMService: Brick - >>>>>>> /gluster/mnt2/brickHost: s06Address: s06-stgState: CRITICALDate/Time: Mon >>>>>>> Mar 4 10:25:33 CET 2019Additional Info:CHECK_NRPE STATE CRITICAL: Socket >>>>>>> timeout after 10 seconds.* >>>>>>> >>>>>>> I checked the network, RAM and CPUs usage on s06 node and everything >>>>>>> seems to be ok. >>>>>>> No bricks are in error state. In /var/log/messages, I detected again >>>>>>> a crash of ?check_vol_utili? that I think it is a module used by NRPE >>>>>>> executable (that is the NAGIOS client). >>>>>>> >>>>>>> Mar 4 10:15:29 s06 kernel: traps: check_vol_utili[161224] general >>>>>>> protection ip:7facffa0a66d sp:7ffe9f4e6fc0 error:0 in >>>>>>> libglusterfs.so.0.0.1[7facff9b7000+f7000] >>>>>>> Mar 4 10:15:29 s06 abrt-hook-ccpp: Process 161224 (python2.7) of >>>>>>> user 0 killed by SIGSEGV - dumping core >>>>>>> Mar 4 10:15:29 s06 abrt-server: Generating core_backtrace >>>>>>> Mar 4 10:15:29 s06 abrt-server: Error: Unable to open './coredump': >>>>>>> No such file or directory >>>>>>> Mar 4 10:16:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 4 10:16:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 4 10:16:01 s06 systemd: Started Session 201010 of user root. >>>>>>> Mar 4 10:16:01 s06 systemd: Starting Session 201010 of user root. >>>>>>> Mar 4 10:16:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 4 10:16:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 4 10:16:24 s06 abrt-server: Duplicate: UUID >>>>>>> Mar 4 10:16:24 s06 abrt-server: DUP_OF_DIR: >>>>>>> /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>>>>> Mar 4 10:16:24 s06 abrt-server: Deleting problem directory >>>>>>> ccpp-2019-03-04-10:15:29-161224 (dup of ccpp-2018-09-25-12:27:42-13041) >>>>>>> Mar 4 10:16:24 s06 abrt-server: Generating core_backtrace >>>>>>> Mar 4 10:16:24 s06 abrt-server: Error: Unable to open './coredump': >>>>>>> No such file or directory >>>>>>> Mar 4 10:16:24 s06 abrt-server: Cannot notify >>>>>>> '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event >>>>>>> 'report_uReport' exited with 1 >>>>>>> Mar 4 10:16:24 s06 abrt-hook-ccpp: Process 161391 (python2.7) of >>>>>>> user 0 killed by SIGABRT - dumping core >>>>>>> Mar 4 10:16:25 s06 abrt-server: Generating core_backtrace >>>>>>> Mar 4 10:16:25 s06 abrt-server: Error: Unable to open './coredump': >>>>>>> No such file or directory >>>>>>> Mar 4 10:17:01 s06 systemd: Created slice User Slice of root. >>>>>>> >>>>>>> Also, I noticed the following errors that I think are very critical: >>>>>>> >>>>>>> Mar 4 10:21:12 s06 glustershd[20355]: [2019-03-04 09:21:12.954798] >>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: >>>>>>> server 192.168.0.55:49158 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 4 10:22:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 4 10:22:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 4 10:22:01 s06 systemd: Started Session 201017 of user root. >>>>>>> Mar 4 10:22:01 s06 systemd: Starting Session 201017 of user root. >>>>>>> Mar 4 10:22:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 4 10:22:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 4 10:22:03 s06 glustershd[20355]: [2019-03-04 09:22:03.964120] >>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: >>>>>>> server 192.168.0.54:49165 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 4 10:23:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 4 10:23:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 4 10:23:01 s06 systemd: Started Session 201018 of user root. >>>>>>> Mar 4 10:23:01 s06 systemd: Starting Session 201018 of user root. >>>>>>> Mar 4 10:23:02 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 4 10:23:02 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 4 10:24:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 4 10:24:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 4 10:24:01 s06 systemd: Started Session 201019 of user root. >>>>>>> Mar 4 10:24:01 s06 systemd: Starting Session 201019 of user root. >>>>>>> Mar 4 10:24:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 4 10:24:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 4 10:24:03 s06 glustershd[20355]: [2019-03-04 09:24:03.982502] >>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-16: >>>>>>> server 192.168.0.52:49158 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746109] C >>>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-3: server >>>>>>> 192.168.0.51:49153 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746215] C >>>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server >>>>>>> 192.168.0.52:49156 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746260] C >>>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-21: server >>>>>>> 192.168.0.51:49159 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746296] C >>>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server >>>>>>> 192.168.0.52:49161 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746413] C >>>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server >>>>>>> 192.168.0.54:49165 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 4 10:24:07 s06 glustershd[20355]: [2019-03-04 09:24:07.982952] >>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-45: >>>>>>> server 192.168.0.54:49155 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 4 10:24:18 s06 glustershd[20355]: [2019-03-04 09:24:18.990929] >>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: >>>>>>> server 192.168.0.52:49161 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 4 10:24:31 s06 glustershd[20355]: [2019-03-04 09:24:31.995781] >>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-20: >>>>>>> server 192.168.0.53:49159 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 4 10:25:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 4 10:25:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 4 10:25:01 s06 systemd: Started Session 201020 of user root. >>>>>>> Mar 4 10:25:01 s06 systemd: Starting Session 201020 of user root. >>>>>>> Mar 4 10:25:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 4 10:25:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 4 10:25:57 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 4 10:25:57 s06 systemd: Starting User Slice of root. >>>>>>> Mar 4 10:25:57 s06 systemd-logind: New session 201021 of user root. >>>>>>> Mar 4 10:25:57 s06 systemd: Started Session 201021 of user root. >>>>>>> Mar 4 10:25:57 s06 systemd: Starting Session 201021 of user root. >>>>>>> Mar 4 10:26:01 s06 systemd: Started Session 201022 of user root. >>>>>>> Mar 4 10:26:01 s06 systemd: Starting Session 201022 of user root. >>>>>>> Mar 4 10:26:21 s06 nrpe[162388]: Error: Could not complete SSL >>>>>>> handshake with 192.168.1.56: 5 >>>>>>> Mar 4 10:27:01 s06 systemd: Started Session 201023 of user root. >>>>>>> Mar 4 10:27:01 s06 systemd: Starting Session 201023 of user root. >>>>>>> Mar 4 10:28:01 s06 systemd: Started Session 201024 of user root. >>>>>>> Mar 4 10:28:01 s06 systemd: Starting Session 201024 of user root. >>>>>>> Mar 4 10:29:01 s06 systemd: Started Session 201025 of user root. >>>>>>> Mar 4 10:29:01 s06 systemd: Starting Session 201025 of user root. >>>>>>> >>>>>>> But, unfortunately, I don?t understand why it is happening. >>>>>>> Now, NAGIOS server shows that s06 status is ok: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ****** Nagios *****Notification Type: RECOVERYService: Brick - >>>>>>> /gluster/mnt2/brickHost: s06Address: s06-stgState: OKDate/Time: Mon Mar 4 >>>>>>> 10:35:23 CET 2019Additional Info:OK: Brick /gluster/mnt2/brick is up* >>>>>>> >>>>>>> Nothing is changed from RAM, CPUs, and NETWORK point of view. >>>>>>> /var/log/message file has been updated: >>>>>>> >>>>>>> Mar 4 10:32:01 s06 systemd: Starting Session 201029 of user root. >>>>>>> Mar 4 10:32:30 s06 glustershd[20355]: [2019-03-04 09:32:30.069082] >>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: >>>>>>> server 192.168.0.52:49156 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 4 10:32:55 s06 glustershd[20355]: [2019-03-04 09:32:55.074689] >>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-66: >>>>>>> server 192.168.0.54:49167 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 4 10:33:01 s06 systemd: Started Session 201030 of user root. >>>>>>> Mar 4 10:33:01 s06 systemd: Starting Session 201030 of user root. >>>>>>> Mar 4 10:34:01 s06 systemd: Started Session 201031 of user root. >>>>>>> Mar 4 10:34:01 s06 systemd: Starting Session 201031 of user root. >>>>>>> Mar 4 10:35:01 s06 nrpe[162562]: Could not read request from client >>>>>>> 192.168.1.56, bailing out... >>>>>>> Mar 4 10:35:01 s06 nrpe[162562]: INFO: SSL Socket Shutdown. >>>>>>> Mar 4 10:35:01 s06 systemd: Started Session 201032 of user root. >>>>>>> Mar 4 10:35:01 s06 systemd: Starting Session 201032 of user root. >>>>>>> >>>>>>> Could you please help me to understand what it?s happening ? >>>>>>> Thank you in advance. >>>>>>> >>>>>>> Rergards, >>>>>>> Mauro >>>>>>> >>>>>>> >>>>>>> On 1 Mar 2019, at 12:17, Mauro Tridici >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Thank you, Milind. >>>>>>> I executed the instructions you suggested: >>>>>>> >>>>>>> - grep ?blocked for? /var/log/messages on s06 returns no output (no >>>>>>> ?blocked? word is detected in messages file); >>>>>>> - in /var/log/messages file I can see this kind of error repeated >>>>>>> for a lot of times: >>>>>>> >>>>>>> Mar 1 08:43:01 s06 systemd: Starting Session 196071 of user root. >>>>>>> Mar 1 08:43:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 1 08:43:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 1 08:43:02 s06 kernel: traps: check_vol_utili[57091] general >>>>>>> protection ip:7f88e76ee66d sp:7ffe5a5bcc30 error:0 in >>>>>>> libglusterfs.so.0.0.1[7f88e769b000+f7000] >>>>>>> Mar 1 08:43:02 s06 abrt-hook-ccpp: Process 57091 (python2.7) of >>>>>>> user 0 killed by SIGSEGV - dumping core >>>>>>> Mar 1 08:43:02 s06 abrt-server: Generating core_backtrace >>>>>>> Mar 1 08:43:02 s06 abrt-server: Error: Unable to open './coredump': >>>>>>> No such file or directory >>>>>>> Mar 1 08:43:58 s06 abrt-server: Duplicate: UUID >>>>>>> Mar 1 08:43:58 s06 abrt-server: DUP_OF_DIR: >>>>>>> /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>>>>> Mar 1 08:43:58 s06 abrt-server: Deleting problem directory >>>>>>> ccpp-2019-03-01-08:43:02-57091 (dup of ccpp-2018-09-25-12:27:42-13041) >>>>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Activating service >>>>>>> name='org.freedesktop.problems' (using servicehelper) >>>>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Successfully activated >>>>>>> service 'org.freedesktop.problems' >>>>>>> Mar 1 08:43:58 s06 abrt-server: Generating core_backtrace >>>>>>> Mar 1 08:43:58 s06 abrt-server: Error: Unable to open './coredump': >>>>>>> No such file or directory >>>>>>> Mar 1 08:43:58 s06 abrt-server: Cannot notify >>>>>>> '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event >>>>>>> 'report_uReport' exited with 1 >>>>>>> Mar 1 08:44:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 1 08:44:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 1 08:44:01 s06 systemd: Started Session 196072 of user root. >>>>>>> Mar 1 08:44:01 s06 systemd: Starting Session 196072 of user root. >>>>>>> Mar 1 08:44:01 s06 systemd: Removed slice User Slice of root. >>>>>>> >>>>>>> - in /var/log/messages file I can see also 4 errors related to other >>>>>>> cluster servers: >>>>>>> >>>>>>> Mar 1 11:05:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 1 11:05:01 s06 systemd: Started Session 196230 of user root. >>>>>>> Mar 1 11:05:01 s06 systemd: Starting Session 196230 of user root. >>>>>>> Mar 1 11:05:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 1 11:05:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 1 11:05:59 s06 glustershd[70117]: [2019-03-01 10:05:59.347094] >>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-33: >>>>>>> server 192.168.0.51:49163 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 1 11:06:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 1 11:06:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 1 11:06:01 s06 systemd: Started Session 196231 of user root. >>>>>>> Mar 1 11:06:01 s06 systemd: Starting Session 196231 of user root. >>>>>>> Mar 1 11:06:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 1 11:06:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 1 11:06:12 s06 glustershd[70117]: [2019-03-01 10:06:12.351319] >>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-1: >>>>>>> server 192.168.0.52:49153 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 1 11:06:38 s06 glustershd[70117]: [2019-03-01 10:06:38.356920] >>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-7: >>>>>>> server 192.168.0.52:49155 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 1 11:07:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 1 11:07:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 1 11:07:01 s06 systemd: Started Session 196232 of user root. >>>>>>> Mar 1 11:07:01 s06 systemd: Starting Session 196232 of user root. >>>>>>> Mar 1 11:07:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 1 11:07:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 1 11:07:36 s06 glustershd[70117]: [2019-03-01 10:07:36.366259] >>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-0: >>>>>>> server 192.168.0.51:49152 has not responded in the last 42 seconds, >>>>>>> disconnecting. >>>>>>> Mar 1 11:08:01 s06 systemd: Created slice User Slice of root. >>>>>>> >>>>>>> No ?blocked? word is in /var/log/messages files on other cluster >>>>>>> servers. >>>>>>> In attachment, the /var/log/messages file from s06 server. >>>>>>> >>>>>>> Thank you in advance, >>>>>>> Mauro >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 1 Mar 2019, at 11:47, Milind Changire >>>>>>> wrote: >>>>>>> >>>>>>> The traces of very high disk activity on the servers are often found >>>>>>> in /var/log/messages >>>>>>> You might want to grep for "blocked for" in /var/log/messages on s06 >>>>>>> and correlate the timestamps to confirm the unresponsiveness as reported in >>>>>>> gluster client logs. >>>>>>> In cases of high disk activity, although the operating system >>>>>>> continues to respond to ICMP pings, the processes writing to disks often >>>>>>> get blocked to a large flush to the disk which could span beyond 42 seconds >>>>>>> and hence result in ping-timer-expiry logs. >>>>>>> >>>>>>> As a side note: >>>>>>> If you indeed find gluster processes being blocked in >>>>>>> /var/log/messages, you might want to tweak sysctl tunables called >>>>>>> vm.dirty_background_ratio or vm.dirty_background_bytes to a smaller value >>>>>>> than the existing. Please read up more on those tunables before touching >>>>>>> the settings. >>>>>>> >>>>>>> >>>>>>> On Fri, Mar 1, 2019 at 4:06 PM Mauro Tridici >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> in attachment the client log captured after changing >>>>>>>> network.ping-timeout option. >>>>>>>> I noticed this error involving server 192.168.0.56 (s06) >>>>>>>> >>>>>>>> [2019-03-01 09:23:36.077287] I [rpc-clnt.c:1962:rpc_clnt_reconfig] >>>>>>>> 0-tier2-client-71: changing ping timeout to 42 (from 0) >>>>>>>> [2019-03-01 09:23:36.078213] I >>>>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>>>> volfile,continuing >>>>>>>> [2019-03-01 09:23:36.078432] I >>>>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>>>> volfile,continuing >>>>>>>> [2019-03-01 09:23:36.092357] I >>>>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>>>> volfile,continuing >>>>>>>> [2019-03-01 09:23:36.094146] I >>>>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>>>> volfile,continuing >>>>>>>> [2019-03-01 10:06:24.708082] C >>>>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-50: server >>>>>>>> 192.168.0.56:49156 has not responded in the last 42 seconds, >>>>>>>> disconnecting. >>>>>>>> >>>>>>>> I don?t know why it happens, s06 server seems to be reachable. >>>>>>>> >>>>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>>>> Trying 192.168.0.56... >>>>>>>> Connected to 192.168.0.56. >>>>>>>> Escape character is '^]'. >>>>>>>> ^CConnection closed by foreign host. >>>>>>>> [athena_login2][/users/home/sysm02/]> ping 192.168.0.56 >>>>>>>> PING 192.168.0.56 (192.168.0.56) 56(84) bytes of data. >>>>>>>> 64 bytes from 192.168.0.56: icmp_seq=1 ttl=64 time=0.116 ms >>>>>>>> 64 bytes from 192.168.0.56: icmp_seq=2 ttl=64 time=0.101 ms >>>>>>>> >>>>>>>> --- 192.168.0.56 ping statistics --- >>>>>>>> 2 packets transmitted, 2 received, 0% packet loss, time 1528ms >>>>>>>> rtt min/avg/max/mdev = 0.101/0.108/0.116/0.012 ms >>>>>>>> >>>>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>>>> Trying 192.168.0.56... >>>>>>>> Connected to 192.168.0.56. >>>>>>>> Escape character is '^]'. >>>>>>>> >>>>>>>> Thank you for your help, >>>>>>>> Mauro >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 1 Mar 2019, at 10:29, Mauro Tridici >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> thank you for the explanation. >>>>>>>> I just changed network.ping-timeout option to default value >>>>>>>> (network.ping-timeout=42). >>>>>>>> >>>>>>>> I will check the logs to see if the errors will appear again. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Mauro >>>>>>>> >>>>>>>> On 1 Mar 2019, at 04:43, Milind Changire >>>>>>>> wrote: >>>>>>>> >>>>>>>> network.ping-timeout should not be set to zero for non-glusterd >>>>>>>> clients. >>>>>>>> glusterd is a special case for which ping-timeout is set to zero >>>>>>>> via /etc/glusterfs/glusterd.vol >>>>>>>> >>>>>>>> Setting network.ping-timeout to zero disables arming of the ping >>>>>>>> timer for connections. This disables testing the connection for >>>>>>>> responsiveness and hence avoids proactive fail-over. >>>>>>>> >>>>>>>> Please reset network.ping-timeout to a non-zero positive value, eg. >>>>>>>> 42 >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Feb 28, 2019 at 5:07 PM Nithya Balachandran < >>>>>>>> nbalacha at redhat.com> wrote: >>>>>>>> >>>>>>>>> Adding Raghavendra and Milind to comment on this. >>>>>>>>> >>>>>>>>> What is the effect of setting network.ping-timeout to 0 and should >>>>>>>>> it be set back to 42? >>>>>>>>> Regards, >>>>>>>>> Nithya >>>>>>>>> >>>>>>>>> On Thu, 28 Feb 2019 at 16:01, Mauro Tridici >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Nithya, >>>>>>>>>> >>>>>>>>>> sorry for the late. >>>>>>>>>> network.ping-timeout has been set to 0 in order to try to solve >>>>>>>>>> some timeout problems, but it didn?t help. >>>>>>>>>> I can set it to the default value. >>>>>>>>>> >>>>>>>>>> Can I proceed with the change? >>>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>> Mauro >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 28 Feb 2019, at 04:41, Nithya Balachandran < >>>>>>>>>> nbalacha at redhat.com> wrote: >>>>>>>>>> >>>>>>>>>> Hi Mauro, >>>>>>>>>> >>>>>>>>>> Is network.ping-timeout still set to 0. The default value is 42. >>>>>>>>>> Is there a particular reason why this was changed? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Nithya >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, 27 Feb 2019 at 21:32, Mauro Tridici < >>>>>>>>>> mauro.tridici at cmcc.it> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Xavi, >>>>>>>>>>> >>>>>>>>>>> thank you for the detailed explanation and suggestions. >>>>>>>>>>> Yes, transport.listen-backlog option is still set to 1024. >>>>>>>>>>> >>>>>>>>>>> I will check the network and connectivity status using ?ping? >>>>>>>>>>> and ?telnet? as soon as the errors will come back again. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Mauro >>>>>>>>>>> >>>>>>>>>>> Il giorno 27 feb 2019, alle ore 16:42, Xavi Hernandez < >>>>>>>>>>> jahernan at redhat.com> ha scritto: >>>>>>>>>>> >>>>>>>>>>> Hi Mauro, >>>>>>>>>>> >>>>>>>>>>> those errors say that the mount point is not connected to some >>>>>>>>>>> of the bricks while executing operations. I see references to 3rd and 6th >>>>>>>>>>> bricks of several disperse sets, which seem to map to server s06. For some >>>>>>>>>>> reason, gluster is having troubles connecting from the client machine to >>>>>>>>>>> that particular server. At the end of the log I see that after long time a >>>>>>>>>>> reconnect is done to both of them. However little after, other bricks from >>>>>>>>>>> the s05 get disconnected and a reconnect times out. >>>>>>>>>>> >>>>>>>>>>> That's really odd. It seems like if server/communication is cut >>>>>>>>>>> to s06 for some time, then restored, and then the same happens to the next >>>>>>>>>>> server. >>>>>>>>>>> >>>>>>>>>>> If the servers are really online and it's only a communication >>>>>>>>>>> issue, it explains why server memory and network has increased: if the >>>>>>>>>>> problem only exists between the client and servers, any write made by the >>>>>>>>>>> client will automatically mark the file as damaged, since some of the >>>>>>>>>>> servers have not been updated. Since self-heal runs from the server nodes, >>>>>>>>>>> they will probably be correctly connected to all bricks, which allows them >>>>>>>>>>> to heal the just damaged file, which increases memory and network usage. >>>>>>>>>>> >>>>>>>>>>> I guess you still have transport.listen-backlog set to 1024, >>>>>>>>>>> right ? >>>>>>>>>>> >>>>>>>>>>> Just to try to identify if the problem really comes from >>>>>>>>>>> network, can you check if you lose some pings from the client to all of the >>>>>>>>>>> servers while you are seeing those errors in the log file ? >>>>>>>>>>> >>>>>>>>>>> You can also check if during those errors, you can telnet to the >>>>>>>>>>> port of the brick from the client. >>>>>>>>>>> >>>>>>>>>>> Xavi >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Feb 26, 2019 at 10:17 AM Mauro Tridici < >>>>>>>>>>> mauro.tridici at cmcc.it> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Nithya, >>>>>>>>>>>> >>>>>>>>>>>> ?df -h? operation is not still slow, but no users are using the >>>>>>>>>>>> volume, RAM and NETWORK usage is ok on the client node. >>>>>>>>>>>> >>>>>>>>>>>> I was worried about this kind of warnings/errors: >>>>>>>>>>>> >>>>>>>>>>>> [2019-02-25 10:59:00.664323] W [MSGID: 122035] >>>>>>>>>>>> [ec-common.c:571:ec_child_select] 0-tier2-disperse-6: Executing operation >>>>>>>>>>>> with some subvolumes unavailable (20) >>>>>>>>>>>> >>>>>>>>>>>> [2019-02-26 03:11:35.212603] E >>>>>>>>>>>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> >>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> >>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> >>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] >>>>>>>>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) >>>>>>>>>>>> 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) >>>>>>>>>>>> called at 2019-02-26 03:10:56.549903 (xid=0x106f1c5) >>>>>>>>>>>> >>>>>>>>>>>> [2019-02-26 03:13:03.313831] E >>>>>>>>>>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-50: connection to >>>>>>>>>>>> 192.168.0.56:49156 failed (Timeout della connessione); >>>>>>>>>>>> disconnecting socket >>>>>>>>>>>> >>>>>>>>>>>> It seems that some subvolumes are not available and >>>>>>>>>>>> 192.168.0.56 server (s06) is not reachable. >>>>>>>>>>>> But gluster servers are up&running and bricks are ok. >>>>>>>>>>>> >>>>>>>>>>>> In attachment the updated tier2.log file. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thank you. >>>>>>>>>>>> Regards, >>>>>>>>>>>> Mauro >>>>>>>>>>>> >>>>>>>>>>>> Il giorno 26 feb 2019, alle ore 04:03, Nithya Balachandran < >>>>>>>>>>>> nbalacha at redhat.com> ha scritto: >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I see a lot of EC messages in the log but they don't seem very >>>>>>>>>>>> serious. Xavi, can you take a look? >>>>>>>>>>>> >>>>>>>>>>>> The only errors I see are: >>>>>>>>>>>> [2019-02-25 10:58:45.519871] E >>>>>>>>>>>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> >>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> >>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> >>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] >>>>>>>>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) >>>>>>>>>>>> 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) >>>>>>>>>>>> called at 2019-02-25 10:57:47.429969 (xid=0xd26fe7) >>>>>>>>>>>> [2019-02-25 10:58:51.461493] E >>>>>>>>>>>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> >>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> >>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> >>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] >>>>>>>>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) >>>>>>>>>>>> 0-tier2-client-41: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) >>>>>>>>>>>> called at 2019-02-25 10:57:47.499174 (xid=0xf47d6a) >>>>>>>>>>>> [2019-02-25 11:07:57.152874] E >>>>>>>>>>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-70: connection to >>>>>>>>>>>> 192.168.0.55:49163 failed (Timeout della connessione); >>>>>>>>>>>> disconnecting socket >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Is the df -h operation still slow? If yes, can you take a >>>>>>>>>>>> tcpdump of the client while running df -h and send that across? >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Nithya >>>>>>>>>>>> >>>>>>>>>>>> On Mon, 25 Feb 2019 at 17:27, Mauro Tridici < >>>>>>>>>>>> mauro.tridici at cmcc.it> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Sorry, some minutes after my last mail message, I noticed that >>>>>>>>>>>>> ?df -h? command hanged for a while before returns the prompt. >>>>>>>>>>>>> Yesterday, everything was ok in the gluster client log, but, >>>>>>>>>>>>> today, I see a lot of errors (please, take a look to the attached file). >>>>>>>>>>>>> >>>>>>>>>>>>> On the client node, I detected an important RAM e NETWORK >>>>>>>>>>>>> usage. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Do you think that the errors have been caused by the client >>>>>>>>>>>>> resources usage? >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you in advance, >>>>>>>>>>>>> Mauro >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> >>>> >>>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srangana at redhat.com Fri Mar 15 13:28:00 2019 From: srangana at redhat.com (Shyam Ranganathan) Date: Fri, 15 Mar 2019 09:28:00 -0400 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: Message-ID: <2ed171d2-df68-ada3-e0de-53f19cb79520@redhat.com> We created a 5.5 release tag, and it is under packaging now. It should be packaged and ready for testing early next week and should be released close to mid-week next week. Thanks, Shyam On 3/13/19 12:34 PM, Artem Russakovskii wrote: > Wednesday now with no update :-/ > > Sincerely, > Artem > > -- > Founder, Android Police ,?APK Mirror > , Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > |?@ArtemR > > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii > wrote: > > Hi Amar, > > Any updates on this? I'm still not seeing it in OpenSUSE build > repos. Maybe later today? > > Thanks. > > Sincerely, > Artem > > -- > Founder, Android Police ,?APK Mirror > , Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > |?@ArtemR > > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan > > wrote: > > We are talking days. Not weeks. Considering already it is > Thursday here. 1 more day for tagging, and packaging. May be ok > to expect it on Monday. > > -Amar > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii > > wrote: > > Is the next release going to be an imminent hotfix, i.e. > something like today/tomorrow, or are we talking weeks? > > Sincerely, > Artem > > -- > Founder, Android Police ,?APK > Mirror , Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > |?@ArtemR > > > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii > > wrote: > > Ended up downgrading to 5.3 just in case. Peer status > and volume status are OK now. > > zypper install --oldpackage glusterfs-5.3-lp150.100.1 > Loading repository data... > Reading installed packages... > Resolving package dependencies... > > Problem: glusterfs-5.3-lp150.100.1.x86_64 requires > libgfapi0 = 5.3, but this requirement cannot be provided > ? not installable providers: > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] > ?Solution 1: Following actions will be done: > ? downgrade of libgfapi0-5.4-lp150.100.1.x86_64 to > libgfapi0-5.3-lp150.100.1.x86_64 > ? downgrade of libgfchangelog0-5.4-lp150.100.1.x86_64 to > libgfchangelog0-5.3-lp150.100.1.x86_64 > ? downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 to > libgfrpc0-5.3-lp150.100.1.x86_64 > ? downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 to > libgfxdr0-5.3-lp150.100.1.x86_64 > ? downgrade of libglusterfs0-5.4-lp150.100.1.x86_64 to > libglusterfs0-5.3-lp150.100.1.x86_64 > ?Solution 2: do not install glusterfs-5.3-lp150.100.1.x86_64 > ?Solution 3: break glusterfs-5.3-lp150.100.1.x86_64 by > ignoring some of its dependencies > > Choose from above solutions by number or cancel > [1/2/3/c] (c): 1 > Resolving dependencies... > Resolving package dependencies... > > The following 6 packages are going to be downgraded: > ? glusterfs libgfapi0 libgfchangelog0 libgfrpc0 > libgfxdr0 libglusterfs0 > > 6 packages to downgrade. > > Sincerely, > Artem > > -- > Founder, Android Police > ,?APK Mirror > , Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > |?@ArtemR > > > > On Tue, Mar 5, 2019 at 10:57 AM Artem Russakovskii > > wrote: > > Noticed the same when upgrading from 5.3 to 5.4, as > mentioned. > > I'm confused though. Is actual replication affected, > because the 5.4 server and the 3x 5.3 servers still > show heal info as all 4 connected, and the files > seem to be replicating correctly as well. > > So what's actually affected - just the status > command, or leaving 5.4 on one of the nodes is doing > some damage to the underlying fs? Is it fixable by > tweaking transport.socket.ssl-enabled? Does > upgrading all servers to 5.4 resolve it, or should > we revert back to 5.3? > > Sincerely, > Artem > > -- > Founder, Android Police > ,?APK Mirror > , Illogical Robot LLC > beerpla.net | > +ArtemRussakovskii > > |?@ArtemR > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert > > wrote: > > fyi: did a downgrade 5.4 -> 5.3 and it worked. > all replicas are up and > running. Awaiting updated v5.4. > > thx :-) > > Am Di., 5. M?rz 2019 um 09:26 Uhr schrieb Hari > Gowtham >: > > > > There are plans to revert the patch causing > this error and rebuilt 5.4. > > This should happen faster. the rebuilt 5.4 > should be void of this upgrade issue. > > > > In the meantime, you can use 5.3 for this cluster. > > Downgrading to 5.3 will work if it was just > one node that was upgrade to 5.4 > > and the other nodes are still in 5.3. > > > > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert > > wrote: > > > > > > Hi Hari, > > > > > > thx for the hint. Do you know when this will > be fixed? Is a downgrade > > > 5.4 -> 5.3 a possibility to fix this? > > > > > > Hubert > > > > > > Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb > Hari Gowtham >: > > > > > > > > Hi, > > > > > > > > This is a known issue we are working on. > > > > As the checksum differs between the > updated and non updated node, the > > > > peers are getting rejected. > > > > The bricks aren't coming because of the > same issue. > > > > > > > > More about the issue: > https://bugzilla.redhat.com/show_bug.cgi?id=1685120 > > > > > > > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert > > wrote: > > > > > > > > > > Interestingly: gluster volume status > misses gluster1, while heal > > > > > statistics show gluster1: > > > > > > > > > > gluster volume status workdata > > > > > Status of volume: workdata > > > > > Gluster process? ? ? ? ? ? ? ? ? ? ? ? ? > ? ?TCP Port? RDMA Port? Online? Pid > > > > > > ------------------------------------------------------------------------------ > > > > > Brick gluster2:/gluster/md4/workdata? ? > ? ? 49153? ? ?0? ? ? ? ? Y? ? ? ?1723 > > > > > Brick gluster3:/gluster/md4/workdata? ? > ? ? 49153? ? ?0? ? ? ? ? Y? ? ? ?2068 > > > > > Self-heal Daemon on localhost? ? ? ? ? ? > ? ?N/A? ? ? ?N/A? ? ? ? Y? ? ? ?1732 > > > > > Self-heal Daemon on gluster3? ? ? ? ? ? > ? ? N/A? ? ? ?N/A? ? ? ? Y? ? ? ?2077 > > > > > > > > > > vs. > > > > > > > > > > gluster volume heal workdata statistics > heal-count > > > > > Gathering count of entries to be healed > on volume workdata has been successful > > > > > > > > > > Brick gluster1:/gluster/md4/workdata > > > > > Number of entries: 0 > > > > > > > > > > Brick gluster2:/gluster/md4/workdata > > > > > Number of entries: 10745 > > > > > > > > > > Brick gluster3:/gluster/md4/workdata > > > > > Number of entries: 10744 > > > > > > > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr > schrieb Hu Bert >: > > > > > > > > > > > > Hi Miling, > > > > > > > > > > > > well, there are such entries, but > those haven't been a problem during > > > > > > install and the last kernel > update+reboot. The entries look like: > > > > > > > > > > > > PUBLIC_IP? gluster2.alpserver.de > gluster2 > > > > > > > > > > > > 192.168.0.50 gluster1 > > > > > > 192.168.0.51 gluster2 > > > > > > 192.168.0.52 gluster3 > > > > > > > > > > > > 'ping gluster2' resolves to LAN IP; I > removed the last entry in the > > > > > > 1st line, did a reboot ... no, didn't > help. From > > > > > > /var/log/glusterfs/glusterd.log > > > > > >? on gluster 2: > > > > > > > > > > > > [2019-03-05 07:04:36.188128] E [MSGID: > 106010] > > > > > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] > 0-management: > > > > > > Version of Cksums persistent differ. > local cksum = 3950307018, remote > > > > > > cksum = 455409345 on peer gluster1 > > > > > > [2019-03-05 07:04:36.188314] I [MSGID: > 106493] > > > > > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] > 0-glusterd: > > > > > > Responded to gluster1 (0), ret: 0, > op_ret: -1 > > > > > > > > > > > > Interestingly there are no entries in > the brick logs of the rejected > > > > > > server. Well, not surprising as no > brick process is running. The > > > > > > server gluster1 is still in rejected > state. > > > > > > > > > > > > 'gluster volume start workdata force' > starts the brick process on > > > > > > gluster1, and some heals are happening > on gluster2+3, but via 'gluster > > > > > > volume status workdata' the volumes > still aren't complete. > > > > > > > > > > > > gluster1: > > > > > > > ------------------------------------------------------------------------------ > > > > > > Brick gluster1:/gluster/md4/workdata? > ? ? ? 49152? ? ?0? ? ? ? ? Y? ? ? ?2523 > > > > > > Self-heal Daemon on localhost? ? ? ? ? > ? ? ?N/A? ? ? ?N/A? ? ? ? Y? ? ? ?2549 > > > > > > > > > > > > gluster2: > > > > > > Gluster process? ? ? ? ? ? ? ? ? ? ? ? > ? ? ?TCP Port? RDMA Port? Online? Pid > > > > > > > ------------------------------------------------------------------------------ > > > > > > Brick gluster2:/gluster/md4/workdata? > ? ? ? 49153? ? ?0? ? ? ? ? Y? ? ? ?1723 > > > > > > Brick gluster3:/gluster/md4/workdata? > ? ? ? 49153? ? ?0? ? ? ? ? Y? ? ? ?2068 > > > > > > Self-heal Daemon on localhost? ? ? ? ? > ? ? ?N/A? ? ? ?N/A? ? ? ? Y? ? ? ?1732 > > > > > > Self-heal Daemon on gluster3? ? ? ? ? > ? ? ? N/A? ? ? ?N/A? ? ? ? Y? ? ? ?2077 > > > > > > > > > > > > > > > > > > Hubert > > > > > > > > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr > schrieb Milind Changire >: > > > > > > > > > > > > > > There are probably DNS entries or > /etc/hosts entries with the public IP Addresses > that the host names (gluster1, gluster2, > gluster3) are getting resolved to. > > > > > > > /etc/resolv.conf would tell which is > the default domain searched for the node names > and the DNS servers which respond to the queries. > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu > Bert > wrote: > > > > > > >> > > > > > > >> Good morning, > > > > > > >> > > > > > > >> i have a replicate 3 setup with 2 > volumes, running on version 5.3 on > > > > > > >> debian stretch. This morning i > upgraded one server to version 5.4 and > > > > > > >> rebooted the machine; after the > restart i noticed that: > > > > > > >> > > > > > > >> - no brick process is running > > > > > > >> - gluster volume status only shows > the server itself: > > > > > > >> gluster volume status workdata > > > > > > >> Status of volume: workdata > > > > > > >> Gluster process? ? ? ? ? ? ? ? ? ? > ? ? ? ? ?TCP Port? RDMA Port? Online? Pid > > > > > > >> > ------------------------------------------------------------------------------ > > > > > > >> Brick > gluster1:/gluster/md4/workdata? ? ? ? N/A? ? ? > ?N/A? ? ? ? N? ? ? ?N/A > > > > > > >> NFS Server on localhost? ? ? ? ? ? > ? ? ? ? ?N/A? ? ? ?N/A? ? ? ? N? ? ? ?N/A > > > > > > >> > > > > > > >> - gluster peer status on the server > > > > > > >> gluster peer status > > > > > > >> Number of Peers: 2 > > > > > > >> > > > > > > >> Hostname: gluster3 > > > > > > >> Uuid: > c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > > > > >> State: Peer Rejected (Connected) > > > > > > >> > > > > > > >> Hostname: gluster2 > > > > > > >> Uuid: > 162fea82-406a-4f51-81a3-e90235d8da27 > > > > > > >> State: Peer Rejected (Connected) > > > > > > >> > > > > > > >> - gluster peer status on the other > 2 servers: > > > > > > >> gluster peer status > > > > > > >> Number of Peers: 2 > > > > > > >> > > > > > > >> Hostname: gluster1 > > > > > > >> Uuid: > 9a360776-7b58-49ae-831e-a0ce4e4afbef > > > > > > >> State: Peer Rejected (Connected) > > > > > > >> > > > > > > >> Hostname: gluster3 > > > > > > >> Uuid: > c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > > > > >> State: Peer in Cluster (Connected) > > > > > > >> > > > > > > >> I noticed that, in the brick logs, > i see that the public IP is used > > > > > > >> instead of the LAN IP. brick logs > from one of the volumes: > > > > > > >> > > > > > > >> rejected node: > https://pastebin.com/qkpj10Sd > > > > > > >> connected nodes: > https://pastebin.com/8SxVVYFV > > > > > > >> > > > > > > >> Why is the public IP suddenly used > instead of the LAN IP? Killing all > > > > > > >> gluster processes and rebooting > (again) didn't help. > > > > > > >> > > > > > > >> > > > > > > >> Thx, > > > > > > >> Hubert > > > > > > >> > _______________________________________________ > > > > > > >> Gluster-users mailing list > > > > > > >> Gluster-users at gluster.org > > > > > > > >> > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Milind > > > > > > > > > > > > > _______________________________________________ > > > > > Gluster-users mailing list > > > > > Gluster-users at gluster.org > > > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > Hari Gowtham. > > > > > > > > -- > > Regards, > > Hari Gowtham. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Amar Tumballi (amarts) > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > From kontakt at taste-of-it.de Fri Mar 15 15:53:19 2019 From: kontakt at taste-of-it.de (Taste-Of-IT) Date: Fri, 15 Mar 2019 15:53:19 +0000 Subject: [Gluster-users] Removing Brick in Distributed GlusterFS In-Reply-To: <9cca5e42313f2d021ffc9de3f43eb0dbd0266d2f@taste-of-it.de> References: <5409b2d08e789e3711cbda99900deb85083e6ff3@taste-of-it.de> <9cca5e42313f2d021ffc9de3f43eb0dbd0266d2f@taste-of-it.de> Message-ID: Hi, ok this seems to be a bug. I now update from 3.x to 4.x to 5.x Latest Debian Releases. After each Upgrade i run remove-brick and the problem is still the same. Gluster ignores storage.reserve Option. Positiv is that the performance of rebalance seems to be improven :). Because of wasting to much time for this, I will next move the neccessary Data to an external Disk and rerun remove-brick. Hope it will solved in future version of gluster. Nice WE Taste -------------- next part -------------- An HTML attachment was scrubbed... URL: From davide.obbi at booking.com Sun Mar 17 19:22:34 2019 From: davide.obbi at booking.com (Davide Obbi) Date: Sun, 17 Mar 2019 20:22:34 +0100 Subject: [Gluster-users] geo-replication - OSError: [Errno 1] Operation not permitted - failing with socket files? Message-ID: Hi, i am trying to understand why georeplciation during "History Crawl" starts failing on each of the three bricks, one after the other. I have enabled DEBUG for all the logs configurable by the geo-replication command. Running glusterfs v4.16 the behaviour is as follow: - The "History Crawl" worked fine for about one hr, it actually replicated some files and folders albeit most of them looks empty - at some point it starts becoming faulty, try to start on another brick, faulty and so on - in the logs, Python exception above mentioned is raised: [2019-03-17 18:52:49.565040] E [syncdutils(worker /var/lib/heketi/mounts/vg_b088aec908c959c75674e01fb8598c21/brick_f90f425ecb89c3eec6ef2ef4a2f0a973/brick):332:log_raise_exception] : FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main func(args) File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker local.service_loop(remote) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1291, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1569, in crawl self.changelogs_batch_process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1469, in changelogs_batch_process self.process(batch) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1304, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1203, in process_change failures = self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__ raise res OSError: [Errno 1] Operation not permitted - The operation before the exception: [2019-03-17 18:52:49.545103] D [master(worker /var/lib/heketi/mounts/vg_b088aec908c959c75674e01fb8598c21/brick_f90f425ecb89c3eec6ef2ef4a2f0a973/brick):1186:process_change] _GMaster: entries: [{'uid': 7575, 'gfid': 'e1ad7c98-f32a-4e48-9902-cc75840de7c3', 'gid': 100, 'mode' : 49536, 'entry': '.gfid/5219e4b8-a1f3-4a4e-b9c7-c9b129abe671/.control_f7c33270dc9db9234d005406a13deb4375459715.6lvofzOuVnfAwOwY', 'op': 'MKNOD'}, {'gfid': 'e1ad7c98-f32a-4e48-9902-cc75840de7c3', 'entry': '.gfid/5219e4b8-a1f3-4a4e-b9c7-c9b129abe671/.control_f7c33270dc9db9 234d005406a13deb4375459715', 'stat': {'atime': 1552661403.3846507, 'gid': 100, 'mtime': 1552661403.3846507, 'uid': 7575, 'mode': 49536}, 'link': None, 'op': 'LINK'}, {'gfid': 'e1ad7c98-f32a-4e48-9902-cc75840de7c3', 'entry': '.gfid/5219e4b8-a1f3-4a4e-b9c7-c9b129abe671/.con trol_f7c33270dc9db9234d005406a13deb4375459715.6lvofzOuVnfAwOwY', 'op': 'UNLINK'}] [2019-03-17 18:52:49.548614] D [repce(worker /var/lib/heketi/mounts/vg_b088aec908c959c75674e01fb8598c21/brick_f90f425ecb89c3eec6ef2ef4a2f0a973/brick):179:push] RepceClient: call 56917:140179359156032:1552848769.55 entry_ops([{'uid': 7575, 'gfid': 'e1ad7c98-f32a-4e48-9902- cc75840de7c3', 'gid': 100, 'mode': 49536, 'entry': '.gfid/5219e4b8-a1f3-4a4e-b9c7-c9b129abe671/.control_f7c33270dc9db9234d005406a13deb4375459715.6lvofzOuVnfAwOwY', 'op': 'MKNOD'}, {'gfid': '*e1ad7c98-f32a-4e48-9902-cc75840de7c3*', 'entry': '.gfid/5219e4b8-a1f3-4a4e-b9c7-c9b 129abe671/.control_f7c33270dc9db9234d005406a13deb4375459715', 'stat': {'atime': 1552661403.3846507, 'gid': 100, 'mtime': 1552661403.3846507, 'uid': 7575, 'mode': 49536}, 'link': None, 'op': 'LINK'}, {'gfid': 'e1ad7c98-f32a-4e48-9902-cc75840de7c3', 'entry': '.gfid/5219e4b8 -a1f3-4a4e-b9c7-c9b129abe671/*.control_f7c33270dc9db9234d005406a13deb4375459715.6lvofzOuVnfAwOwY', 'op'*: 'UNLINK'}],) ... - The gfid highlighted, is pointing to these control files which are "unix sockets" as per below: rw------- 2 pippo users 0 Mar 14 16:32 .control_31c3a99664c1f956f949311e58434037e6a52d22 srw------- 2 pippo users 0 Mar 14 16:33 .control_a9b82937042529bca677b9f43eba9eb02ca7c5ee srw------- 2 pippo users 0 Mar 14 16:32 .control_f429221460d52570066d9f25521011fe7e081cf5 srw------- 2 pippo users 0 Mar 15 15:50 .control_f7c33270dc9db9234d005406a13deb4375459715 So it seems geo-replicaiton should be at least skipping such file rather than raising an exception? Am i the first experiencing this behaviour? thanks in advance Davide -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Mon Mar 18 05:23:00 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Mon, 18 Mar 2019 10:53:00 +0530 Subject: [Gluster-users] Removing Brick in Distributed GlusterFS In-Reply-To: References: <5409b2d08e789e3711cbda99900deb85083e6ff3@taste-of-it.de> <9cca5e42313f2d021ffc9de3f43eb0dbd0266d2f@taste-of-it.de> Message-ID: On Fri, Mar 15, 2019 at 9:24 PM Taste-Of-IT wrote: > Hi, > > ok this seems to be a bug. I now update from 3.x to 4.x to 5.x Latest > Debian Releases. After each Upgrade i run remove-brick and the problem is > still the same. Gluster ignores storage.reserve Option. Positiv is that the > performance of rebalance seems to be improven :). Because of wasting to > much time for this, I will next move the neccessary Data to an external > Disk and rerun remove-brick. > Hope it will solved in future version of gluster. > > Sure, we will look into this, and see what can be done on this in the coming releases. Thanks for trying out higher versions. -Amar > Nice WE > Taste > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Mon Mar 18 08:54:51 2019 From: revirii at googlemail.com (Hu Bert) Date: Mon, 18 Mar 2019 09:54:51 +0100 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: <2ed171d2-df68-ada3-e0de-53f19cb79520@redhat.com> References: <2ed171d2-df68-ada3-e0de-53f19cb79520@redhat.com> Message-ID: Good morning :-) for debian the packages are there: https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ I'll do an upgrade of a test installation 5.3 -> 5.5 and see if there are some errors etc. and report back. btw: no release notes for 5.4 and 5.5 so far? https://docs.gluster.org/en/latest/release-notes/ ? Am Fr., 15. M?rz 2019 um 14:28 Uhr schrieb Shyam Ranganathan : > > We created a 5.5 release tag, and it is under packaging now. It should > be packaged and ready for testing early next week and should be released > close to mid-week next week. > > Thanks, > Shyam > On 3/13/19 12:34 PM, Artem Russakovskii wrote: > > Wednesday now with no update :-/ > > > > Sincerely, > > Artem > > > > -- > > Founder, Android Police , APK Mirror > > , Illogical Robot LLC > > beerpla.net | +ArtemRussakovskii > > | @ArtemR > > > > > > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii > > wrote: > > > > Hi Amar, > > > > Any updates on this? I'm still not seeing it in OpenSUSE build > > repos. Maybe later today? > > > > Thanks. > > > > Sincerely, > > Artem > > > > -- > > Founder, Android Police , APK Mirror > > , Illogical Robot LLC > > beerpla.net | +ArtemRussakovskii > > | @ArtemR > > > > > > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan > > > wrote: > > > > We are talking days. Not weeks. Considering already it is > > Thursday here. 1 more day for tagging, and packaging. May be ok > > to expect it on Monday. > > > > -Amar > > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii > > > wrote: > > > > Is the next release going to be an imminent hotfix, i.e. > > something like today/tomorrow, or are we talking weeks? > > > > Sincerely, > > Artem > > > > -- > > Founder, Android Police , APK > > Mirror , Illogical Robot LLC > > beerpla.net | +ArtemRussakovskii > > | @ArtemR > > > > > > > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii > > > wrote: > > > > Ended up downgrading to 5.3 just in case. Peer status > > and volume status are OK now. > > > > zypper install --oldpackage glusterfs-5.3-lp150.100.1 > > Loading repository data... > > Reading installed packages... > > Resolving package dependencies... > > > > Problem: glusterfs-5.3-lp150.100.1.x86_64 requires > > libgfapi0 = 5.3, but this requirement cannot be provided > > not installable providers: > > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] > > Solution 1: Following actions will be done: > > downgrade of libgfapi0-5.4-lp150.100.1.x86_64 to > > libgfapi0-5.3-lp150.100.1.x86_64 > > downgrade of libgfchangelog0-5.4-lp150.100.1.x86_64 to > > libgfchangelog0-5.3-lp150.100.1.x86_64 > > downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 to > > libgfrpc0-5.3-lp150.100.1.x86_64 > > downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 to > > libgfxdr0-5.3-lp150.100.1.x86_64 > > downgrade of libglusterfs0-5.4-lp150.100.1.x86_64 to > > libglusterfs0-5.3-lp150.100.1.x86_64 > > Solution 2: do not install glusterfs-5.3-lp150.100.1.x86_64 > > Solution 3: break glusterfs-5.3-lp150.100.1.x86_64 by > > ignoring some of its dependencies > > > > Choose from above solutions by number or cancel > > [1/2/3/c] (c): 1 > > Resolving dependencies... > > Resolving package dependencies... > > > > The following 6 packages are going to be downgraded: > > glusterfs libgfapi0 libgfchangelog0 libgfrpc0 > > libgfxdr0 libglusterfs0 > > > > 6 packages to downgrade. > > > > Sincerely, > > Artem > > > > -- > > Founder, Android Police > > , APK Mirror > > , Illogical Robot LLC > > beerpla.net | +ArtemRussakovskii > > | @ArtemR > > > > > > > > On Tue, Mar 5, 2019 at 10:57 AM Artem Russakovskii > > > wrote: > > > > Noticed the same when upgrading from 5.3 to 5.4, as > > mentioned. > > > > I'm confused though. Is actual replication affected, > > because the 5.4 server and the 3x 5.3 servers still > > show heal info as all 4 connected, and the files > > seem to be replicating correctly as well. > > > > So what's actually affected - just the status > > command, or leaving 5.4 on one of the nodes is doing > > some damage to the underlying fs? Is it fixable by > > tweaking transport.socket.ssl-enabled? Does > > upgrading all servers to 5.4 resolve it, or should > > we revert back to 5.3? > > > > Sincerely, > > Artem > > > > -- > > Founder, Android Police > > , APK Mirror > > , Illogical Robot LLC > > beerpla.net | > > +ArtemRussakovskii > > > > | @ArtemR > > > > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert > > > > wrote: > > > > fyi: did a downgrade 5.4 -> 5.3 and it worked. > > all replicas are up and > > running. Awaiting updated v5.4. > > > > thx :-) > > > > Am Di., 5. M?rz 2019 um 09:26 Uhr schrieb Hari > > Gowtham > >: > > > > > > There are plans to revert the patch causing > > this error and rebuilt 5.4. > > > This should happen faster. the rebuilt 5.4 > > should be void of this upgrade issue. > > > > > > In the meantime, you can use 5.3 for this cluster. > > > Downgrading to 5.3 will work if it was just > > one node that was upgrade to 5.4 > > > and the other nodes are still in 5.3. > > > > > > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert > > > > wrote: > > > > > > > > Hi Hari, > > > > > > > > thx for the hint. Do you know when this will > > be fixed? Is a downgrade > > > > 5.4 -> 5.3 a possibility to fix this? > > > > > > > > Hubert > > > > > > > > Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb > > Hari Gowtham > >: > > > > > > > > > > Hi, > > > > > > > > > > This is a known issue we are working on. > > > > > As the checksum differs between the > > updated and non updated node, the > > > > > peers are getting rejected. > > > > > The bricks aren't coming because of the > > same issue. > > > > > > > > > > More about the issue: > > https://bugzilla.redhat.com/show_bug.cgi?id=1685120 > > > > > > > > > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert > > > > wrote: > > > > > > > > > > > > Interestingly: gluster volume status > > misses gluster1, while heal > > > > > > statistics show gluster1: > > > > > > > > > > > > gluster volume status workdata > > > > > > Status of volume: workdata > > > > > > Gluster process > > TCP Port RDMA Port Online Pid > > > > > > > > ------------------------------------------------------------------------------ > > > > > > Brick gluster2:/gluster/md4/workdata > > 49153 0 Y 1723 > > > > > > Brick gluster3:/gluster/md4/workdata > > 49153 0 Y 2068 > > > > > > Self-heal Daemon on localhost > > N/A N/A Y 1732 > > > > > > Self-heal Daemon on gluster3 > > N/A N/A Y 2077 > > > > > > > > > > > > vs. > > > > > > > > > > > > gluster volume heal workdata statistics > > heal-count > > > > > > Gathering count of entries to be healed > > on volume workdata has been successful > > > > > > > > > > > > Brick gluster1:/gluster/md4/workdata > > > > > > Number of entries: 0 > > > > > > > > > > > > Brick gluster2:/gluster/md4/workdata > > > > > > Number of entries: 10745 > > > > > > > > > > > > Brick gluster3:/gluster/md4/workdata > > > > > > Number of entries: 10744 > > > > > > > > > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr > > schrieb Hu Bert > >: > > > > > > > > > > > > > > Hi Miling, > > > > > > > > > > > > > > well, there are such entries, but > > those haven't been a problem during > > > > > > > install and the last kernel > > update+reboot. The entries look like: > > > > > > > > > > > > > > PUBLIC_IP gluster2.alpserver.de > > gluster2 > > > > > > > > > > > > > > 192.168.0.50 gluster1 > > > > > > > 192.168.0.51 gluster2 > > > > > > > 192.168.0.52 gluster3 > > > > > > > > > > > > > > 'ping gluster2' resolves to LAN IP; I > > removed the last entry in the > > > > > > > 1st line, did a reboot ... no, didn't > > help. From > > > > > > > /var/log/glusterfs/glusterd.log > > > > > > > on gluster 2: > > > > > > > > > > > > > > [2019-03-05 07:04:36.188128] E [MSGID: > > 106010] > > > > > > > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] > > 0-management: > > > > > > > Version of Cksums persistent differ. > > local cksum = 3950307018, remote > > > > > > > cksum = 455409345 on peer gluster1 > > > > > > > [2019-03-05 07:04:36.188314] I [MSGID: > > 106493] > > > > > > > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] > > 0-glusterd: > > > > > > > Responded to gluster1 (0), ret: 0, > > op_ret: -1 > > > > > > > > > > > > > > Interestingly there are no entries in > > the brick logs of the rejected > > > > > > > server. Well, not surprising as no > > brick process is running. The > > > > > > > server gluster1 is still in rejected > > state. > > > > > > > > > > > > > > 'gluster volume start workdata force' > > starts the brick process on > > > > > > > gluster1, and some heals are happening > > on gluster2+3, but via 'gluster > > > > > > > volume status workdata' the volumes > > still aren't complete. > > > > > > > > > > > > > > gluster1: > > > > > > > > > ------------------------------------------------------------------------------ > > > > > > > Brick gluster1:/gluster/md4/workdata > > 49152 0 Y 2523 > > > > > > > Self-heal Daemon on localhost > > N/A N/A Y 2549 > > > > > > > > > > > > > > gluster2: > > > > > > > Gluster process > > TCP Port RDMA Port Online Pid > > > > > > > > > ------------------------------------------------------------------------------ > > > > > > > Brick gluster2:/gluster/md4/workdata > > 49153 0 Y 1723 > > > > > > > Brick gluster3:/gluster/md4/workdata > > 49153 0 Y 2068 > > > > > > > Self-heal Daemon on localhost > > N/A N/A Y 1732 > > > > > > > Self-heal Daemon on gluster3 > > N/A N/A Y 2077 > > > > > > > > > > > > > > > > > > > > > Hubert > > > > > > > > > > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr > > schrieb Milind Changire > >: > > > > > > > > > > > > > > > > There are probably DNS entries or > > /etc/hosts entries with the public IP Addresses > > that the host names (gluster1, gluster2, > > gluster3) are getting resolved to. > > > > > > > > /etc/resolv.conf would tell which is > > the default domain searched for the node names > > and the DNS servers which respond to the queries. > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu > > Bert > > wrote: > > > > > > > >> > > > > > > > >> Good morning, > > > > > > > >> > > > > > > > >> i have a replicate 3 setup with 2 > > volumes, running on version 5.3 on > > > > > > > >> debian stretch. This morning i > > upgraded one server to version 5.4 and > > > > > > > >> rebooted the machine; after the > > restart i noticed that: > > > > > > > >> > > > > > > > >> - no brick process is running > > > > > > > >> - gluster volume status only shows > > the server itself: > > > > > > > >> gluster volume status workdata > > > > > > > >> Status of volume: workdata > > > > > > > >> Gluster process > > TCP Port RDMA Port Online Pid > > > > > > > >> > > ------------------------------------------------------------------------------ > > > > > > > >> Brick > > gluster1:/gluster/md4/workdata N/A > > N/A N N/A > > > > > > > >> NFS Server on localhost > > N/A N/A N N/A > > > > > > > >> > > > > > > > >> - gluster peer status on the server > > > > > > > >> gluster peer status > > > > > > > >> Number of Peers: 2 > > > > > > > >> > > > > > > > >> Hostname: gluster3 > > > > > > > >> Uuid: > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > > > > > >> State: Peer Rejected (Connected) > > > > > > > >> > > > > > > > >> Hostname: gluster2 > > > > > > > >> Uuid: > > 162fea82-406a-4f51-81a3-e90235d8da27 > > > > > > > >> State: Peer Rejected (Connected) > > > > > > > >> > > > > > > > >> - gluster peer status on the other > > 2 servers: > > > > > > > >> gluster peer status > > > > > > > >> Number of Peers: 2 > > > > > > > >> > > > > > > > >> Hostname: gluster1 > > > > > > > >> Uuid: > > 9a360776-7b58-49ae-831e-a0ce4e4afbef > > > > > > > >> State: Peer Rejected (Connected) > > > > > > > >> > > > > > > > >> Hostname: gluster3 > > > > > > > >> Uuid: > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > > > > > >> State: Peer in Cluster (Connected) > > > > > > > >> > > > > > > > >> I noticed that, in the brick logs, > > i see that the public IP is used > > > > > > > >> instead of the LAN IP. brick logs > > from one of the volumes: > > > > > > > >> > > > > > > > >> rejected node: > > https://pastebin.com/qkpj10Sd > > > > > > > >> connected nodes: > > https://pastebin.com/8SxVVYFV > > > > > > > >> > > > > > > > >> Why is the public IP suddenly used > > instead of the LAN IP? Killing all > > > > > > > >> gluster processes and rebooting > > (again) didn't help. > > > > > > > >> > > > > > > > >> > > > > > > > >> Thx, > > > > > > > >> Hubert > > > > > > > >> > > _______________________________________________ > > > > > > > >> Gluster-users mailing list > > > > > > > >> Gluster-users at gluster.org > > > > > > > > > >> > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Milind > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Gluster-users mailing list > > > > > > Gluster-users at gluster.org > > > > > > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > -- > > > > > Regards, > > > > > Hari Gowtham. > > > > > > > > > > > > -- > > > Regards, > > > Hari Gowtham. > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > -- > > Amar Tumballi (amarts) > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From revirii at googlemail.com Mon Mar 18 10:23:51 2019 From: revirii at googlemail.com (Hu Bert) Date: Mon, 18 Mar 2019 11:23:51 +0100 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: <2ed171d2-df68-ada3-e0de-53f19cb79520@redhat.com> Message-ID: update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2 volumes done. In 'gluster peer status' the peers stay connected during the upgrade, no 'peer rejected' messages. No cksum mismatches in the logs. Looks good :-) Am Mo., 18. M?rz 2019 um 09:54 Uhr schrieb Hu Bert : > > Good morning :-) > > for debian the packages are there: > https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ > > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if there > are some errors etc. and report back. > > btw: no release notes for 5.4 and 5.5 so far? > https://docs.gluster.org/en/latest/release-notes/ ? > > Am Fr., 15. M?rz 2019 um 14:28 Uhr schrieb Shyam Ranganathan > : > > > > We created a 5.5 release tag, and it is under packaging now. It should > > be packaged and ready for testing early next week and should be released > > close to mid-week next week. > > > > Thanks, > > Shyam > > On 3/13/19 12:34 PM, Artem Russakovskii wrote: > > > Wednesday now with no update :-/ > > > > > > Sincerely, > > > Artem > > > > > > -- > > > Founder, Android Police , APK Mirror > > > , Illogical Robot LLC > > > beerpla.net | +ArtemRussakovskii > > > | @ArtemR > > > > > > > > > > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii > > > wrote: > > > > > > Hi Amar, > > > > > > Any updates on this? I'm still not seeing it in OpenSUSE build > > > repos. Maybe later today? > > > > > > Thanks. > > > > > > Sincerely, > > > Artem > > > > > > -- > > > Founder, Android Police , APK Mirror > > > , Illogical Robot LLC > > > beerpla.net | +ArtemRussakovskii > > > | @ArtemR > > > > > > > > > > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan > > > > wrote: > > > > > > We are talking days. Not weeks. Considering already it is > > > Thursday here. 1 more day for tagging, and packaging. May be ok > > > to expect it on Monday. > > > > > > -Amar > > > > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii > > > > wrote: > > > > > > Is the next release going to be an imminent hotfix, i.e. > > > something like today/tomorrow, or are we talking weeks? > > > > > > Sincerely, > > > Artem > > > > > > -- > > > Founder, Android Police , APK > > > Mirror , Illogical Robot LLC > > > beerpla.net | +ArtemRussakovskii > > > | @ArtemR > > > > > > > > > > > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii > > > > wrote: > > > > > > Ended up downgrading to 5.3 just in case. Peer status > > > and volume status are OK now. > > > > > > zypper install --oldpackage glusterfs-5.3-lp150.100.1 > > > Loading repository data... > > > Reading installed packages... > > > Resolving package dependencies... > > > > > > Problem: glusterfs-5.3-lp150.100.1.x86_64 requires > > > libgfapi0 = 5.3, but this requirement cannot be provided > > > not installable providers: > > > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] > > > Solution 1: Following actions will be done: > > > downgrade of libgfapi0-5.4-lp150.100.1.x86_64 to > > > libgfapi0-5.3-lp150.100.1.x86_64 > > > downgrade of libgfchangelog0-5.4-lp150.100.1.x86_64 to > > > libgfchangelog0-5.3-lp150.100.1.x86_64 > > > downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 to > > > libgfrpc0-5.3-lp150.100.1.x86_64 > > > downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 to > > > libgfxdr0-5.3-lp150.100.1.x86_64 > > > downgrade of libglusterfs0-5.4-lp150.100.1.x86_64 to > > > libglusterfs0-5.3-lp150.100.1.x86_64 > > > Solution 2: do not install glusterfs-5.3-lp150.100.1.x86_64 > > > Solution 3: break glusterfs-5.3-lp150.100.1.x86_64 by > > > ignoring some of its dependencies > > > > > > Choose from above solutions by number or cancel > > > [1/2/3/c] (c): 1 > > > Resolving dependencies... > > > Resolving package dependencies... > > > > > > The following 6 packages are going to be downgraded: > > > glusterfs libgfapi0 libgfchangelog0 libgfrpc0 > > > libgfxdr0 libglusterfs0 > > > > > > 6 packages to downgrade. > > > > > > Sincerely, > > > Artem > > > > > > -- > > > Founder, Android Police > > > , APK Mirror > > > , Illogical Robot LLC > > > beerpla.net | +ArtemRussakovskii > > > | @ArtemR > > > > > > > > > > > > On Tue, Mar 5, 2019 at 10:57 AM Artem Russakovskii > > > > wrote: > > > > > > Noticed the same when upgrading from 5.3 to 5.4, as > > > mentioned. > > > > > > I'm confused though. Is actual replication affected, > > > because the 5.4 server and the 3x 5.3 servers still > > > show heal info as all 4 connected, and the files > > > seem to be replicating correctly as well. > > > > > > So what's actually affected - just the status > > > command, or leaving 5.4 on one of the nodes is doing > > > some damage to the underlying fs? Is it fixable by > > > tweaking transport.socket.ssl-enabled? Does > > > upgrading all servers to 5.4 resolve it, or should > > > we revert back to 5.3? > > > > > > Sincerely, > > > Artem > > > > > > -- > > > Founder, Android Police > > > , APK Mirror > > > , Illogical Robot LLC > > > beerpla.net | > > > +ArtemRussakovskii > > > > > > | @ArtemR > > > > > > > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert > > > > > > wrote: > > > > > > fyi: did a downgrade 5.4 -> 5.3 and it worked. > > > all replicas are up and > > > running. Awaiting updated v5.4. > > > > > > thx :-) > > > > > > Am Di., 5. M?rz 2019 um 09:26 Uhr schrieb Hari > > > Gowtham > > >: > > > > > > > > There are plans to revert the patch causing > > > this error and rebuilt 5.4. > > > > This should happen faster. the rebuilt 5.4 > > > should be void of this upgrade issue. > > > > > > > > In the meantime, you can use 5.3 for this cluster. > > > > Downgrading to 5.3 will work if it was just > > > one node that was upgrade to 5.4 > > > > and the other nodes are still in 5.3. > > > > > > > > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert > > > > > > wrote: > > > > > > > > > > Hi Hari, > > > > > > > > > > thx for the hint. Do you know when this will > > > be fixed? Is a downgrade > > > > > 5.4 -> 5.3 a possibility to fix this? > > > > > > > > > > Hubert > > > > > > > > > > Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb > > > Hari Gowtham > > >: > > > > > > > > > > > > Hi, > > > > > > > > > > > > This is a known issue we are working on. > > > > > > As the checksum differs between the > > > updated and non updated node, the > > > > > > peers are getting rejected. > > > > > > The bricks aren't coming because of the > > > same issue. > > > > > > > > > > > > More about the issue: > > > https://bugzilla.redhat.com/show_bug.cgi?id=1685120 > > > > > > > > > > > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert > > > > > > wrote: > > > > > > > > > > > > > > Interestingly: gluster volume status > > > misses gluster1, while heal > > > > > > > statistics show gluster1: > > > > > > > > > > > > > > gluster volume status workdata > > > > > > > Status of volume: workdata > > > > > > > Gluster process > > > TCP Port RDMA Port Online Pid > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > > > Brick gluster2:/gluster/md4/workdata > > > 49153 0 Y 1723 > > > > > > > Brick gluster3:/gluster/md4/workdata > > > 49153 0 Y 2068 > > > > > > > Self-heal Daemon on localhost > > > N/A N/A Y 1732 > > > > > > > Self-heal Daemon on gluster3 > > > N/A N/A Y 2077 > > > > > > > > > > > > > > vs. > > > > > > > > > > > > > > gluster volume heal workdata statistics > > > heal-count > > > > > > > Gathering count of entries to be healed > > > on volume workdata has been successful > > > > > > > > > > > > > > Brick gluster1:/gluster/md4/workdata > > > > > > > Number of entries: 0 > > > > > > > > > > > > > > Brick gluster2:/gluster/md4/workdata > > > > > > > Number of entries: 10745 > > > > > > > > > > > > > > Brick gluster3:/gluster/md4/workdata > > > > > > > Number of entries: 10744 > > > > > > > > > > > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr > > > schrieb Hu Bert > > >: > > > > > > > > > > > > > > > > Hi Miling, > > > > > > > > > > > > > > > > well, there are such entries, but > > > those haven't been a problem during > > > > > > > > install and the last kernel > > > update+reboot. The entries look like: > > > > > > > > > > > > > > > > PUBLIC_IP gluster2.alpserver.de > > > gluster2 > > > > > > > > > > > > > > > > 192.168.0.50 gluster1 > > > > > > > > 192.168.0.51 gluster2 > > > > > > > > 192.168.0.52 gluster3 > > > > > > > > > > > > > > > > 'ping gluster2' resolves to LAN IP; I > > > removed the last entry in the > > > > > > > > 1st line, did a reboot ... no, didn't > > > help. From > > > > > > > > /var/log/glusterfs/glusterd.log > > > > > > > > on gluster 2: > > > > > > > > > > > > > > > > [2019-03-05 07:04:36.188128] E [MSGID: > > > 106010] > > > > > > > > > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] > > > 0-management: > > > > > > > > Version of Cksums persistent differ. > > > local cksum = 3950307018, remote > > > > > > > > cksum = 455409345 on peer gluster1 > > > > > > > > [2019-03-05 07:04:36.188314] I [MSGID: > > > 106493] > > > > > > > > > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] > > > 0-glusterd: > > > > > > > > Responded to gluster1 (0), ret: 0, > > > op_ret: -1 > > > > > > > > > > > > > > > > Interestingly there are no entries in > > > the brick logs of the rejected > > > > > > > > server. Well, not surprising as no > > > brick process is running. The > > > > > > > > server gluster1 is still in rejected > > > state. > > > > > > > > > > > > > > > > 'gluster volume start workdata force' > > > starts the brick process on > > > > > > > > gluster1, and some heals are happening > > > on gluster2+3, but via 'gluster > > > > > > > > volume status workdata' the volumes > > > still aren't complete. > > > > > > > > > > > > > > > > gluster1: > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > > > > Brick gluster1:/gluster/md4/workdata > > > 49152 0 Y 2523 > > > > > > > > Self-heal Daemon on localhost > > > N/A N/A Y 2549 > > > > > > > > > > > > > > > > gluster2: > > > > > > > > Gluster process > > > TCP Port RDMA Port Online Pid > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > > > > Brick gluster2:/gluster/md4/workdata > > > 49153 0 Y 1723 > > > > > > > > Brick gluster3:/gluster/md4/workdata > > > 49153 0 Y 2068 > > > > > > > > Self-heal Daemon on localhost > > > N/A N/A Y 1732 > > > > > > > > Self-heal Daemon on gluster3 > > > N/A N/A Y 2077 > > > > > > > > > > > > > > > > > > > > > > > > Hubert > > > > > > > > > > > > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr > > > schrieb Milind Changire > > >: > > > > > > > > > > > > > > > > > > There are probably DNS entries or > > > /etc/hosts entries with the public IP Addresses > > > that the host names (gluster1, gluster2, > > > gluster3) are getting resolved to. > > > > > > > > > /etc/resolv.conf would tell which is > > > the default domain searched for the node names > > > and the DNS servers which respond to the queries. > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu > > > Bert > > > wrote: > > > > > > > > >> > > > > > > > > >> Good morning, > > > > > > > > >> > > > > > > > > >> i have a replicate 3 setup with 2 > > > volumes, running on version 5.3 on > > > > > > > > >> debian stretch. This morning i > > > upgraded one server to version 5.4 and > > > > > > > > >> rebooted the machine; after the > > > restart i noticed that: > > > > > > > > >> > > > > > > > > >> - no brick process is running > > > > > > > > >> - gluster volume status only shows > > > the server itself: > > > > > > > > >> gluster volume status workdata > > > > > > > > >> Status of volume: workdata > > > > > > > > >> Gluster process > > > TCP Port RDMA Port Online Pid > > > > > > > > >> > > > ------------------------------------------------------------------------------ > > > > > > > > >> Brick > > > gluster1:/gluster/md4/workdata N/A > > > N/A N N/A > > > > > > > > >> NFS Server on localhost > > > N/A N/A N N/A > > > > > > > > >> > > > > > > > > >> - gluster peer status on the server > > > > > > > > >> gluster peer status > > > > > > > > >> Number of Peers: 2 > > > > > > > > >> > > > > > > > > >> Hostname: gluster3 > > > > > > > > >> Uuid: > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > > > > > > >> State: Peer Rejected (Connected) > > > > > > > > >> > > > > > > > > >> Hostname: gluster2 > > > > > > > > >> Uuid: > > > 162fea82-406a-4f51-81a3-e90235d8da27 > > > > > > > > >> State: Peer Rejected (Connected) > > > > > > > > >> > > > > > > > > >> - gluster peer status on the other > > > 2 servers: > > > > > > > > >> gluster peer status > > > > > > > > >> Number of Peers: 2 > > > > > > > > >> > > > > > > > > >> Hostname: gluster1 > > > > > > > > >> Uuid: > > > 9a360776-7b58-49ae-831e-a0ce4e4afbef > > > > > > > > >> State: Peer Rejected (Connected) > > > > > > > > >> > > > > > > > > >> Hostname: gluster3 > > > > > > > > >> Uuid: > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > > > > > > >> State: Peer in Cluster (Connected) > > > > > > > > >> > > > > > > > > >> I noticed that, in the brick logs, > > > i see that the public IP is used > > > > > > > > >> instead of the LAN IP. brick logs > > > from one of the volumes: > > > > > > > > >> > > > > > > > > >> rejected node: > > > https://pastebin.com/qkpj10Sd > > > > > > > > >> connected nodes: > > > https://pastebin.com/8SxVVYFV > > > > > > > > >> > > > > > > > > >> Why is the public IP suddenly used > > > instead of the LAN IP? Killing all > > > > > > > > >> gluster processes and rebooting > > > (again) didn't help. > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> Thx, > > > > > > > > >> Hubert > > > > > > > > >> > > > _______________________________________________ > > > > > > > > >> Gluster-users mailing list > > > > > > > > >> Gluster-users at gluster.org > > > > > > > > > > > >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Milind > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Gluster-users mailing list > > > > > > > Gluster-users at gluster.org > > > > > > > > > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Regards, > > > > > > Hari Gowtham. > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > Hari Gowtham. > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > -- > > > Amar Tumballi (amarts) > > > > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users From atumball at redhat.com Mon Mar 18 10:34:56 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Mon, 18 Mar 2019 16:04:56 +0530 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: <2ed171d2-df68-ada3-e0de-53f19cb79520@redhat.com> Message-ID: Hi Hu Bert, Appreciate the feedback. Also are the other boiling issues related to logs fixed now? -Amar On Mon, Mar 18, 2019 at 3:54 PM Hu Bert wrote: > update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2 > volumes done. In 'gluster peer status' the peers stay connected during > the upgrade, no 'peer rejected' messages. No cksum mismatches in the > logs. Looks good :-) > > Am Mo., 18. M?rz 2019 um 09:54 Uhr schrieb Hu Bert >: > > > > Good morning :-) > > > > for debian the packages are there: > > > https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ > > > > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if there > > are some errors etc. and report back. > > > > btw: no release notes for 5.4 and 5.5 so far? > > https://docs.gluster.org/en/latest/release-notes/ ? > > > > Am Fr., 15. M?rz 2019 um 14:28 Uhr schrieb Shyam Ranganathan > > : > > > > > > We created a 5.5 release tag, and it is under packaging now. It should > > > be packaged and ready for testing early next week and should be > released > > > close to mid-week next week. > > > > > > Thanks, > > > Shyam > > > On 3/13/19 12:34 PM, Artem Russakovskii wrote: > > > > Wednesday now with no update :-/ > > > > > > > > Sincerely, > > > > Artem > > > > > > > > -- > > > > Founder, Android Police , APK Mirror > > > > , Illogical Robot LLC > > > > beerpla.net | +ArtemRussakovskii > > > > | @ArtemR > > > > > > > > > > > > > > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii < > archon810 at gmail.com > > > > > wrote: > > > > > > > > Hi Amar, > > > > > > > > Any updates on this? I'm still not seeing it in OpenSUSE build > > > > repos. Maybe later today? > > > > > > > > Thanks. > > > > > > > > Sincerely, > > > > Artem > > > > > > > > -- > > > > Founder, Android Police , APK > Mirror > > > > , Illogical Robot LLC > > > > beerpla.net | +ArtemRussakovskii > > > > | @ArtemR > > > > > > > > > > > > > > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan > > > > > wrote: > > > > > > > > We are talking days. Not weeks. Considering already it is > > > > Thursday here. 1 more day for tagging, and packaging. May be > ok > > > > to expect it on Monday. > > > > > > > > -Amar > > > > > > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii > > > > > wrote: > > > > > > > > Is the next release going to be an imminent hotfix, i.e. > > > > something like today/tomorrow, or are we talking weeks? > > > > > > > > Sincerely, > > > > Artem > > > > > > > > -- > > > > Founder, Android Police , > APK > > > > Mirror , Illogical Robot LLC > > > > beerpla.net | +ArtemRussakovskii > > > > | @ArtemR > > > > > > > > > > > > > > > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii > > > > > > wrote: > > > > > > > > Ended up downgrading to 5.3 just in case. Peer status > > > > and volume status are OK now. > > > > > > > > zypper install --oldpackage glusterfs-5.3-lp150.100.1 > > > > Loading repository data... > > > > Reading installed packages... > > > > Resolving package dependencies... > > > > > > > > Problem: glusterfs-5.3-lp150.100.1.x86_64 requires > > > > libgfapi0 = 5.3, but this requirement cannot be > provided > > > > not installable providers: > > > > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] > > > > Solution 1: Following actions will be done: > > > > downgrade of libgfapi0-5.4-lp150.100.1.x86_64 to > > > > libgfapi0-5.3-lp150.100.1.x86_64 > > > > downgrade of > libgfchangelog0-5.4-lp150.100.1.x86_64 to > > > > libgfchangelog0-5.3-lp150.100.1.x86_64 > > > > downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 to > > > > libgfrpc0-5.3-lp150.100.1.x86_64 > > > > downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 to > > > > libgfxdr0-5.3-lp150.100.1.x86_64 > > > > downgrade of libglusterfs0-5.4-lp150.100.1.x86_64 > to > > > > libglusterfs0-5.3-lp150.100.1.x86_64 > > > > Solution 2: do not install > glusterfs-5.3-lp150.100.1.x86_64 > > > > Solution 3: break glusterfs-5.3-lp150.100.1.x86_64 > by > > > > ignoring some of its dependencies > > > > > > > > Choose from above solutions by number or cancel > > > > [1/2/3/c] (c): 1 > > > > Resolving dependencies... > > > > Resolving package dependencies... > > > > > > > > The following 6 packages are going to be downgraded: > > > > glusterfs libgfapi0 libgfchangelog0 libgfrpc0 > > > > libgfxdr0 libglusterfs0 > > > > > > > > 6 packages to downgrade. > > > > > > > > Sincerely, > > > > Artem > > > > > > > > -- > > > > Founder, Android Police > > > > , APK Mirror > > > > , Illogical Robot LLC > > > > beerpla.net | > +ArtemRussakovskii > > > > | > @ArtemR > > > > > > > > > > > > > > > > On Tue, Mar 5, 2019 at 10:57 AM Artem Russakovskii > > > > > > wrote: > > > > > > > > Noticed the same when upgrading from 5.3 to 5.4, > as > > > > mentioned. > > > > > > > > I'm confused though. Is actual replication > affected, > > > > because the 5.4 server and the 3x 5.3 servers > still > > > > show heal info as all 4 connected, and the files > > > > seem to be replicating correctly as well. > > > > > > > > So what's actually affected - just the status > > > > command, or leaving 5.4 on one of the nodes is > doing > > > > some damage to the underlying fs? Is it fixable > by > > > > tweaking transport.socket.ssl-enabled? Does > > > > upgrading all servers to 5.4 resolve it, or > should > > > > we revert back to 5.3? > > > > > > > > Sincerely, > > > > Artem > > > > > > > > -- > > > > Founder, Android Police > > > > , APK Mirror > > > > , Illogical Robot LLC > > > > beerpla.net | > > > > +ArtemRussakovskii > > > > > > > > | @ArtemR > > > > > > > > > > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert > > > > > > > > wrote: > > > > > > > > fyi: did a downgrade 5.4 -> 5.3 and it > worked. > > > > all replicas are up and > > > > running. Awaiting updated v5.4. > > > > > > > > thx :-) > > > > > > > > Am Di., 5. M?rz 2019 um 09:26 Uhr schrieb > Hari > > > > Gowtham > > > >: > > > > > > > > > > There are plans to revert the patch causing > > > > this error and rebuilt 5.4. > > > > > This should happen faster. the rebuilt 5.4 > > > > should be void of this upgrade issue. > > > > > > > > > > In the meantime, you can use 5.3 for this > cluster. > > > > > Downgrading to 5.3 will work if it was just > > > > one node that was upgrade to 5.4 > > > > > and the other nodes are still in 5.3. > > > > > > > > > > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert > > > > > > > > wrote: > > > > > > > > > > > > Hi Hari, > > > > > > > > > > > > thx for the hint. Do you know when this > will > > > > be fixed? Is a downgrade > > > > > > 5.4 -> 5.3 a possibility to fix this? > > > > > > > > > > > > Hubert > > > > > > > > > > > > Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb > > > > Hari Gowtham > > > >: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > This is a known issue we are working > on. > > > > > > > As the checksum differs between the > > > > updated and non updated node, the > > > > > > > peers are getting rejected. > > > > > > > The bricks aren't coming because of the > > > > same issue. > > > > > > > > > > > > > > More about the issue: > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1685120 > > > > > > > > > > > > > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert > > > > > > > > wrote: > > > > > > > > > > > > > > > > Interestingly: gluster volume status > > > > misses gluster1, while heal > > > > > > > > statistics show gluster1: > > > > > > > > > > > > > > > > gluster volume status workdata > > > > > > > > Status of volume: workdata > > > > > > > > Gluster process > > > > TCP Port RDMA Port Online Pid > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > > > > Brick gluster2:/gluster/md4/workdata > > > > 49153 0 Y 1723 > > > > > > > > Brick gluster3:/gluster/md4/workdata > > > > 49153 0 Y 2068 > > > > > > > > Self-heal Daemon on localhost > > > > N/A N/A Y 1732 > > > > > > > > Self-heal Daemon on gluster3 > > > > N/A N/A Y 2077 > > > > > > > > > > > > > > > > vs. > > > > > > > > > > > > > > > > gluster volume heal workdata > statistics > > > > heal-count > > > > > > > > Gathering count of entries to be > healed > > > > on volume workdata has been successful > > > > > > > > > > > > > > > > Brick gluster1:/gluster/md4/workdata > > > > > > > > Number of entries: 0 > > > > > > > > > > > > > > > > Brick gluster2:/gluster/md4/workdata > > > > > > > > Number of entries: 10745 > > > > > > > > > > > > > > > > Brick gluster3:/gluster/md4/workdata > > > > > > > > Number of entries: 10744 > > > > > > > > > > > > > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr > > > > schrieb Hu Bert > > > >: > > > > > > > > > > > > > > > > > > Hi Miling, > > > > > > > > > > > > > > > > > > well, there are such entries, but > > > > those haven't been a problem during > > > > > > > > > install and the last kernel > > > > update+reboot. The entries look like: > > > > > > > > > > > > > > > > > > PUBLIC_IP gluster2.alpserver.de > > > > gluster2 > > > > > > > > > > > > > > > > > > 192.168.0.50 gluster1 > > > > > > > > > 192.168.0.51 gluster2 > > > > > > > > > 192.168.0.52 gluster3 > > > > > > > > > > > > > > > > > > 'ping gluster2' resolves to LAN > IP; I > > > > removed the last entry in the > > > > > > > > > 1st line, did a reboot ... no, > didn't > > > > help. From > > > > > > > > > /var/log/glusterfs/glusterd.log > > > > > > > > > on gluster 2: > > > > > > > > > > > > > > > > > > [2019-03-05 07:04:36.188128] E > [MSGID: > > > > 106010] > > > > > > > > > > > > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] > > > > 0-management: > > > > > > > > > Version of Cksums persistent > differ. > > > > local cksum = 3950307018, remote > > > > > > > > > cksum = 455409345 on peer gluster1 > > > > > > > > > [2019-03-05 07:04:36.188314] I > [MSGID: > > > > 106493] > > > > > > > > > > > > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] > > > > 0-glusterd: > > > > > > > > > Responded to gluster1 (0), ret: 0, > > > > op_ret: -1 > > > > > > > > > > > > > > > > > > Interestingly there are no entries > in > > > > the brick logs of the rejected > > > > > > > > > server. Well, not surprising as no > > > > brick process is running. The > > > > > > > > > server gluster1 is still in > rejected > > > > state. > > > > > > > > > > > > > > > > > > 'gluster volume start workdata > force' > > > > starts the brick process on > > > > > > > > > gluster1, and some heals are > happening > > > > on gluster2+3, but via 'gluster > > > > > > > > > volume status workdata' the volumes > > > > still aren't complete. > > > > > > > > > > > > > > > > > > gluster1: > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > > > > > Brick > gluster1:/gluster/md4/workdata > > > > 49152 0 Y 2523 > > > > > > > > > Self-heal Daemon on localhost > > > > N/A N/A Y 2549 > > > > > > > > > > > > > > > > > > gluster2: > > > > > > > > > Gluster process > > > > TCP Port RDMA Port Online Pid > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > > > > > Brick > gluster2:/gluster/md4/workdata > > > > 49153 0 Y 1723 > > > > > > > > > Brick > gluster3:/gluster/md4/workdata > > > > 49153 0 Y 2068 > > > > > > > > > Self-heal Daemon on localhost > > > > N/A N/A Y 1732 > > > > > > > > > Self-heal Daemon on gluster3 > > > > N/A N/A Y 2077 > > > > > > > > > > > > > > > > > > > > > > > > > > > Hubert > > > > > > > > > > > > > > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr > > > > schrieb Milind Changire > > > >: > > > > > > > > > > > > > > > > > > > > There are probably DNS entries or > > > > /etc/hosts entries with the public IP > Addresses > > > > that the host names (gluster1, gluster2, > > > > gluster3) are getting resolved to. > > > > > > > > > > /etc/resolv.conf would tell > which is > > > > the default domain searched for the node > names > > > > and the DNS servers which respond to the > queries. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 5, 2019 at 12:14 PM > Hu > > > > Bert > > > > wrote: > > > > > > > > > >> > > > > > > > > > >> Good morning, > > > > > > > > > >> > > > > > > > > > >> i have a replicate 3 setup with > 2 > > > > volumes, running on version 5.3 on > > > > > > > > > >> debian stretch. This morning i > > > > upgraded one server to version 5.4 and > > > > > > > > > >> rebooted the machine; after the > > > > restart i noticed that: > > > > > > > > > >> > > > > > > > > > >> - no brick process is running > > > > > > > > > >> - gluster volume status only > shows > > > > the server itself: > > > > > > > > > >> gluster volume status workdata > > > > > > > > > >> Status of volume: workdata > > > > > > > > > >> Gluster process > > > > TCP Port RDMA Port Online Pid > > > > > > > > > >> > > > > > ------------------------------------------------------------------------------ > > > > > > > > > >> Brick > > > > gluster1:/gluster/md4/workdata N/A > > > > N/A N N/A > > > > > > > > > >> NFS Server on localhost > > > > N/A N/A N N/A > > > > > > > > > >> > > > > > > > > > >> - gluster peer status on the > server > > > > > > > > > >> gluster peer status > > > > > > > > > >> Number of Peers: 2 > > > > > > > > > >> > > > > > > > > > >> Hostname: gluster3 > > > > > > > > > >> Uuid: > > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > > > > > > > >> State: Peer Rejected (Connected) > > > > > > > > > >> > > > > > > > > > >> Hostname: gluster2 > > > > > > > > > >> Uuid: > > > > 162fea82-406a-4f51-81a3-e90235d8da27 > > > > > > > > > >> State: Peer Rejected (Connected) > > > > > > > > > >> > > > > > > > > > >> - gluster peer status on the > other > > > > 2 servers: > > > > > > > > > >> gluster peer status > > > > > > > > > >> Number of Peers: 2 > > > > > > > > > >> > > > > > > > > > >> Hostname: gluster1 > > > > > > > > > >> Uuid: > > > > 9a360776-7b58-49ae-831e-a0ce4e4afbef > > > > > > > > > >> State: Peer Rejected (Connected) > > > > > > > > > >> > > > > > > > > > >> Hostname: gluster3 > > > > > > > > > >> Uuid: > > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > > > > > > > > >> State: Peer in Cluster > (Connected) > > > > > > > > > >> > > > > > > > > > >> I noticed that, in the brick > logs, > > > > i see that the public IP is used > > > > > > > > > >> instead of the LAN IP. brick > logs > > > > from one of the volumes: > > > > > > > > > >> > > > > > > > > > >> rejected node: > > > > https://pastebin.com/qkpj10Sd > > > > > > > > > >> connected nodes: > > > > https://pastebin.com/8SxVVYFV > > > > > > > > > >> > > > > > > > > > >> Why is the public IP suddenly > used > > > > instead of the LAN IP? Killing all > > > > > > > > > >> gluster processes and rebooting > > > > (again) didn't help. > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> Thx, > > > > > > > > > >> Hubert > > > > > > > > > >> > > > > > _______________________________________________ > > > > > > > > > >> Gluster-users mailing list > > > > > > > > > >> Gluster-users at gluster.org > > > > > > > > > > > > > >> > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Milind > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Gluster-users mailing list > > > > > > > > Gluster-users at gluster.org > > > > > > > > > > > > > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Regards, > > > > > > > Hari Gowtham. > > > > > > > > > > > > > > > > > > > > -- > > > > > Regards, > > > > > Hari Gowtham. > > > > > _______________________________________________ > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > > > > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > _______________________________________________ > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org Gluster-users at gluster.org> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > -- > > > > Amar Tumballi (amarts) > > > > > > > > > > > > _______________________________________________ > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-users > -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rightkicktech at gmail.com Mon Mar 18 10:52:42 2019 From: rightkicktech at gmail.com (Alex K) Date: Mon, 18 Mar 2019 12:52:42 +0200 Subject: [Gluster-users] Gluster and bonding In-Reply-To: References: Message-ID: Performed some tests simulating the setup on OVS. When using mode 6 I had mixed results for both scenarios (see below): [image: image.png] There were times that hosts were not able to reach each other (simple ping tests) and other time where hosts were able to reach each other with ping but gluster volumes were down due to connectivity issues being reported (endpoint is not connected). systemctl restart network usually resolved the gluster connectivity issue. This was regardless of the scenario (interlink or not). I will need to do some more tests. On Tue, Feb 26, 2019 at 4:14 PM Alex K wrote: > > Thank you to all for your suggestions. > > I came here since only gluster was having issues to start. Ping and other > networking services were showing everything fine, so I guess there is sth > at gluster that does not like what I tried to do. > Unfortunately I have this system in production and I cannot experiment. It > was a customer request to add redundancy to the switch and I went with what > I assumed would work. > I guess I have to have the switches stacked, but the current ones do not > support this. They are just simple managed switches. > > Multiple IPs per peers could be a solution. > I will search a little more and in case I have sth I will get back. > > On Tue, Feb 26, 2019 at 6:52 AM Strahil wrote: > >> Hi Alex, >> >> As per the following ( ttps:// >> community.cisco.com/t5/switching/lacp-load-balancing-in-2-switches-part-of-3750-stack-switch/td-p/2268111 >> ) your switches need to be stacked in order to support lacp with your setup. >> Yet, I'm not sure if balance-alb will work with 2 separate switches - >> maybe some special configuration is needed ?!? >> As far as I know gluster can have multiple IPs matched to a single peer, >> but I'm not sure if having 2 separate networks will be used as >> active-backup or active-active. >> >> Someone more experienced should jump in. >> >> Best Regards, >> Strahil Nikolov >> On Feb 25, 2019 12:43, Alex K wrote: >> >> Hi All, >> >> I was asking if it is possible to have the two separate cables connected >> to two different physical switched. When trying mode6 or mode1 in this >> setup gluster was refusing to start the volumes, giving me "transport >> endpoint is not connected". >> >> server1: cable1 ---------------- switch1 --------------------- server2: >> cable1 >> | >> server1: cable2 ---------------- switch2 --------------------- server2: >> cable2 >> >> Both switches are connected with each other also. This is done to achieve >> redundancy for the switches. >> When disconnecting cable2 from both servers, then gluster was happy. >> What could be the problem? >> >> Thanx, >> Alex >> >> >> On Mon, Feb 25, 2019 at 11:32 AM Jorick Astrego >> wrote: >> >> Hi, >> >> We use bonding mode 6 (balance-alb) for GlusterFS traffic >> >> >> >> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/network4 >> >> Preferred bonding mode for Red Hat Gluster Storage client is mode 6 >> (balance-alb), this allows client to transmit writes in parallel on >> separate NICs much of the time. >> >> Regards, >> >> Jorick Astrego >> On 2/25/19 5:41 AM, Dmitry Melekhov wrote: >> >> 23.02.2019 19:54, Alex K ?????: >> >> Hi all, >> >> I have a replica 3 setup where each server was configured with a dual >> interfaces in mode 6 bonding. All cables were connected to one common >> network switch. >> >> To add redundancy to the switch, and avoid being a single point of >> failure, I connected each second cable of each server to a second switch. >> This turned out to not function as gluster was refusing to start the volume >> logging "transport endpoint is disconnected" although all nodes were able >> to reach each other (ping) in the storage network. I switched the mode to >> mode 1 (active/passive) and initially it worked but following a reboot of >> all cluster same issue appeared. Gluster is not starting the volumes. >> >> Isn't active/passive supposed to work like that? Can one have such >> redundant network setup or are there any other recommended approaches? >> >> >> Yes, we use lacp, I guess this is mode 4 ( we use teamd ), it is, no >> doubt, best way. >> >> >> Thanx, >> Alex >> >> _______________________________________________ >> Gluster-users mailing listGluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 31291 bytes Desc: not available URL: From revirii at googlemail.com Mon Mar 18 12:41:17 2019 From: revirii at googlemail.com (Hu Bert) Date: Mon, 18 Mar 2019 13:41:17 +0100 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: <2ed171d2-df68-ada3-e0de-53f19cb79520@redhat.com> Message-ID: Hi Amar, if you refer to this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test setup i haven't seen those entries, while copying & deleting a few GBs of data. For a final statement we have to wait until i updated our live gluster servers - could take place on tuesday or wednesday. Maybe other users can do an update to 5.4 as well and report back here. Hubert Am Mo., 18. M?rz 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan : > > Hi Hu Bert, > > Appreciate the feedback. Also are the other boiling issues related to logs fixed now? > > -Amar > > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert wrote: >> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2 >> volumes done. In 'gluster peer status' the peers stay connected during >> the upgrade, no 'peer rejected' messages. No cksum mismatches in the >> logs. Looks good :-) >> >> Am Mo., 18. M?rz 2019 um 09:54 Uhr schrieb Hu Bert : >> > >> > Good morning :-) >> > >> > for debian the packages are there: >> > https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ >> > >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if there >> > are some errors etc. and report back. >> > >> > btw: no release notes for 5.4 and 5.5 so far? >> > https://docs.gluster.org/en/latest/release-notes/ ? >> > >> > Am Fr., 15. M?rz 2019 um 14:28 Uhr schrieb Shyam Ranganathan >> > : >> > > >> > > We created a 5.5 release tag, and it is under packaging now. It should >> > > be packaged and ready for testing early next week and should be released >> > > close to mid-week next week. >> > > >> > > Thanks, >> > > Shyam >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote: >> > > > Wednesday now with no update :-/ >> > > > >> > > > Sincerely, >> > > > Artem >> > > > >> > > > -- >> > > > Founder, Android Police , APK Mirror >> > > > , Illogical Robot LLC >> > > > beerpla.net | +ArtemRussakovskii >> > > > | @ArtemR >> > > > >> > > > >> > > > >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii > > > > > wrote: >> > > > >> > > > Hi Amar, >> > > > >> > > > Any updates on this? I'm still not seeing it in OpenSUSE build >> > > > repos. Maybe later today? >> > > > >> > > > Thanks. >> > > > >> > > > Sincerely, >> > > > Artem >> > > > >> > > > -- >> > > > Founder, Android Police , APK Mirror >> > > > , Illogical Robot LLC >> > > > beerpla.net | +ArtemRussakovskii >> > > > | @ArtemR >> > > > >> > > > >> > > > >> > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan >> > > > > wrote: >> > > > >> > > > We are talking days. Not weeks. Considering already it is >> > > > Thursday here. 1 more day for tagging, and packaging. May be ok >> > > > to expect it on Monday. >> > > > >> > > > -Amar >> > > > >> > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii >> > > > > wrote: >> > > > >> > > > Is the next release going to be an imminent hotfix, i.e. >> > > > something like today/tomorrow, or are we talking weeks? >> > > > >> > > > Sincerely, >> > > > Artem >> > > > >> > > > -- >> > > > Founder, Android Police , APK >> > > > Mirror , Illogical Robot LLC >> > > > beerpla.net | +ArtemRussakovskii >> > > > | @ArtemR >> > > > >> > > > >> > > > >> > > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii >> > > > > wrote: >> > > > >> > > > Ended up downgrading to 5.3 just in case. Peer status >> > > > and volume status are OK now. >> > > > >> > > > zypper install --oldpackage glusterfs-5.3-lp150.100.1 >> > > > Loading repository data... >> > > > Reading installed packages... >> > > > Resolving package dependencies... >> > > > >> > > > Problem: glusterfs-5.3-lp150.100.1.x86_64 requires >> > > > libgfapi0 = 5.3, but this requirement cannot be provided >> > > > not installable providers: >> > > > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] >> > > > Solution 1: Following actions will be done: >> > > > downgrade of libgfapi0-5.4-lp150.100.1.x86_64 to >> > > > libgfapi0-5.3-lp150.100.1.x86_64 >> > > > downgrade of libgfchangelog0-5.4-lp150.100.1.x86_64 to >> > > > libgfchangelog0-5.3-lp150.100.1.x86_64 >> > > > downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 to >> > > > libgfrpc0-5.3-lp150.100.1.x86_64 >> > > > downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 to >> > > > libgfxdr0-5.3-lp150.100.1.x86_64 >> > > > downgrade of libglusterfs0-5.4-lp150.100.1.x86_64 to >> > > > libglusterfs0-5.3-lp150.100.1.x86_64 >> > > > Solution 2: do not install glusterfs-5.3-lp150.100.1.x86_64 >> > > > Solution 3: break glusterfs-5.3-lp150.100.1.x86_64 by >> > > > ignoring some of its dependencies >> > > > >> > > > Choose from above solutions by number or cancel >> > > > [1/2/3/c] (c): 1 >> > > > Resolving dependencies... >> > > > Resolving package dependencies... >> > > > >> > > > The following 6 packages are going to be downgraded: >> > > > glusterfs libgfapi0 libgfchangelog0 libgfrpc0 >> > > > libgfxdr0 libglusterfs0 >> > > > >> > > > 6 packages to downgrade. >> > > > >> > > > Sincerely, >> > > > Artem >> > > > >> > > > -- >> > > > Founder, Android Police >> > > > , APK Mirror >> > > > , Illogical Robot LLC >> > > > beerpla.net | +ArtemRussakovskii >> > > > | @ArtemR >> > > > >> > > > >> > > > >> > > > On Tue, Mar 5, 2019 at 10:57 AM Artem Russakovskii >> > > > > wrote: >> > > > >> > > > Noticed the same when upgrading from 5.3 to 5.4, as >> > > > mentioned. >> > > > >> > > > I'm confused though. Is actual replication affected, >> > > > because the 5.4 server and the 3x 5.3 servers still >> > > > show heal info as all 4 connected, and the files >> > > > seem to be replicating correctly as well. >> > > > >> > > > So what's actually affected - just the status >> > > > command, or leaving 5.4 on one of the nodes is doing >> > > > some damage to the underlying fs? Is it fixable by >> > > > tweaking transport.socket.ssl-enabled? Does >> > > > upgrading all servers to 5.4 resolve it, or should >> > > > we revert back to 5.3? >> > > > >> > > > Sincerely, >> > > > Artem >> > > > >> > > > -- >> > > > Founder, Android Police >> > > > , APK Mirror >> > > > , Illogical Robot LLC >> > > > beerpla.net | >> > > > +ArtemRussakovskii >> > > > >> > > > | @ArtemR >> > > > >> > > > >> > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert >> > > > > > > > > wrote: >> > > > >> > > > fyi: did a downgrade 5.4 -> 5.3 and it worked. >> > > > all replicas are up and >> > > > running. Awaiting updated v5.4. >> > > > >> > > > thx :-) >> > > > >> > > > Am Di., 5. M?rz 2019 um 09:26 Uhr schrieb Hari >> > > > Gowtham > > > > >: >> > > > > >> > > > > There are plans to revert the patch causing >> > > > this error and rebuilt 5.4. >> > > > > This should happen faster. the rebuilt 5.4 >> > > > should be void of this upgrade issue. >> > > > > >> > > > > In the meantime, you can use 5.3 for this cluster. >> > > > > Downgrading to 5.3 will work if it was just >> > > > one node that was upgrade to 5.4 >> > > > > and the other nodes are still in 5.3. >> > > > > >> > > > > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert >> > > > > > > > > wrote: >> > > > > > >> > > > > > Hi Hari, >> > > > > > >> > > > > > thx for the hint. Do you know when this will >> > > > be fixed? Is a downgrade >> > > > > > 5.4 -> 5.3 a possibility to fix this? >> > > > > > >> > > > > > Hubert >> > > > > > >> > > > > > Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb >> > > > Hari Gowtham > > > > >: >> > > > > > > >> > > > > > > Hi, >> > > > > > > >> > > > > > > This is a known issue we are working on. >> > > > > > > As the checksum differs between the >> > > > updated and non updated node, the >> > > > > > > peers are getting rejected. >> > > > > > > The bricks aren't coming because of the >> > > > same issue. >> > > > > > > >> > > > > > > More about the issue: >> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1685120 >> > > > > > > >> > > > > > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert >> > > > > > > > > wrote: >> > > > > > > > >> > > > > > > > Interestingly: gluster volume status >> > > > misses gluster1, while heal >> > > > > > > > statistics show gluster1: >> > > > > > > > >> > > > > > > > gluster volume status workdata >> > > > > > > > Status of volume: workdata >> > > > > > > > Gluster process >> > > > TCP Port RDMA Port Online Pid >> > > > > > > > >> > > > ------------------------------------------------------------------------------ >> > > > > > > > Brick gluster2:/gluster/md4/workdata >> > > > 49153 0 Y 1723 >> > > > > > > > Brick gluster3:/gluster/md4/workdata >> > > > 49153 0 Y 2068 >> > > > > > > > Self-heal Daemon on localhost >> > > > N/A N/A Y 1732 >> > > > > > > > Self-heal Daemon on gluster3 >> > > > N/A N/A Y 2077 >> > > > > > > > >> > > > > > > > vs. >> > > > > > > > >> > > > > > > > gluster volume heal workdata statistics >> > > > heal-count >> > > > > > > > Gathering count of entries to be healed >> > > > on volume workdata has been successful >> > > > > > > > >> > > > > > > > Brick gluster1:/gluster/md4/workdata >> > > > > > > > Number of entries: 0 >> > > > > > > > >> > > > > > > > Brick gluster2:/gluster/md4/workdata >> > > > > > > > Number of entries: 10745 >> > > > > > > > >> > > > > > > > Brick gluster3:/gluster/md4/workdata >> > > > > > > > Number of entries: 10744 >> > > > > > > > >> > > > > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr >> > > > schrieb Hu Bert > > > > >: >> > > > > > > > > >> > > > > > > > > Hi Miling, >> > > > > > > > > >> > > > > > > > > well, there are such entries, but >> > > > those haven't been a problem during >> > > > > > > > > install and the last kernel >> > > > update+reboot. The entries look like: >> > > > > > > > > >> > > > > > > > > PUBLIC_IP gluster2.alpserver.de >> > > > gluster2 >> > > > > > > > > >> > > > > > > > > 192.168.0.50 gluster1 >> > > > > > > > > 192.168.0.51 gluster2 >> > > > > > > > > 192.168.0.52 gluster3 >> > > > > > > > > >> > > > > > > > > 'ping gluster2' resolves to LAN IP; I >> > > > removed the last entry in the >> > > > > > > > > 1st line, did a reboot ... no, didn't >> > > > help. From >> > > > > > > > > /var/log/glusterfs/glusterd.log >> > > > > > > > > on gluster 2: >> > > > > > > > > >> > > > > > > > > [2019-03-05 07:04:36.188128] E [MSGID: >> > > > 106010] >> > > > > > > > > >> > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] >> > > > 0-management: >> > > > > > > > > Version of Cksums persistent differ. >> > > > local cksum = 3950307018, remote >> > > > > > > > > cksum = 455409345 on peer gluster1 >> > > > > > > > > [2019-03-05 07:04:36.188314] I [MSGID: >> > > > 106493] >> > > > > > > > > >> > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] >> > > > 0-glusterd: >> > > > > > > > > Responded to gluster1 (0), ret: 0, >> > > > op_ret: -1 >> > > > > > > > > >> > > > > > > > > Interestingly there are no entries in >> > > > the brick logs of the rejected >> > > > > > > > > server. Well, not surprising as no >> > > > brick process is running. The >> > > > > > > > > server gluster1 is still in rejected >> > > > state. >> > > > > > > > > >> > > > > > > > > 'gluster volume start workdata force' >> > > > starts the brick process on >> > > > > > > > > gluster1, and some heals are happening >> > > > on gluster2+3, but via 'gluster >> > > > > > > > > volume status workdata' the volumes >> > > > still aren't complete. >> > > > > > > > > >> > > > > > > > > gluster1: >> > > > > > > > > >> > > > ------------------------------------------------------------------------------ >> > > > > > > > > Brick gluster1:/gluster/md4/workdata >> > > > 49152 0 Y 2523 >> > > > > > > > > Self-heal Daemon on localhost >> > > > N/A N/A Y 2549 >> > > > > > > > > >> > > > > > > > > gluster2: >> > > > > > > > > Gluster process >> > > > TCP Port RDMA Port Online Pid >> > > > > > > > > >> > > > ------------------------------------------------------------------------------ >> > > > > > > > > Brick gluster2:/gluster/md4/workdata >> > > > 49153 0 Y 1723 >> > > > > > > > > Brick gluster3:/gluster/md4/workdata >> > > > 49153 0 Y 2068 >> > > > > > > > > Self-heal Daemon on localhost >> > > > N/A N/A Y 1732 >> > > > > > > > > Self-heal Daemon on gluster3 >> > > > N/A N/A Y 2077 >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > Hubert >> > > > > > > > > >> > > > > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr >> > > > schrieb Milind Changire > > > > >: >> > > > > > > > > > >> > > > > > > > > > There are probably DNS entries or >> > > > /etc/hosts entries with the public IP Addresses >> > > > that the host names (gluster1, gluster2, >> > > > gluster3) are getting resolved to. >> > > > > > > > > > /etc/resolv.conf would tell which is >> > > > the default domain searched for the node names >> > > > and the DNS servers which respond to the queries. >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu >> > > > Bert > > > > > wrote: >> > > > > > > > > >> >> > > > > > > > > >> Good morning, >> > > > > > > > > >> >> > > > > > > > > >> i have a replicate 3 setup with 2 >> > > > volumes, running on version 5.3 on >> > > > > > > > > >> debian stretch. This morning i >> > > > upgraded one server to version 5.4 and >> > > > > > > > > >> rebooted the machine; after the >> > > > restart i noticed that: >> > > > > > > > > >> >> > > > > > > > > >> - no brick process is running >> > > > > > > > > >> - gluster volume status only shows >> > > > the server itself: >> > > > > > > > > >> gluster volume status workdata >> > > > > > > > > >> Status of volume: workdata >> > > > > > > > > >> Gluster process >> > > > TCP Port RDMA Port Online Pid >> > > > > > > > > >> >> > > > ------------------------------------------------------------------------------ >> > > > > > > > > >> Brick >> > > > gluster1:/gluster/md4/workdata N/A >> > > > N/A N N/A >> > > > > > > > > >> NFS Server on localhost >> > > > N/A N/A N N/A >> > > > > > > > > >> >> > > > > > > > > >> - gluster peer status on the server >> > > > > > > > > >> gluster peer status >> > > > > > > > > >> Number of Peers: 2 >> > > > > > > > > >> >> > > > > > > > > >> Hostname: gluster3 >> > > > > > > > > >> Uuid: >> > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a >> > > > > > > > > >> State: Peer Rejected (Connected) >> > > > > > > > > >> >> > > > > > > > > >> Hostname: gluster2 >> > > > > > > > > >> Uuid: >> > > > 162fea82-406a-4f51-81a3-e90235d8da27 >> > > > > > > > > >> State: Peer Rejected (Connected) >> > > > > > > > > >> >> > > > > > > > > >> - gluster peer status on the other >> > > > 2 servers: >> > > > > > > > > >> gluster peer status >> > > > > > > > > >> Number of Peers: 2 >> > > > > > > > > >> >> > > > > > > > > >> Hostname: gluster1 >> > > > > > > > > >> Uuid: >> > > > 9a360776-7b58-49ae-831e-a0ce4e4afbef >> > > > > > > > > >> State: Peer Rejected (Connected) >> > > > > > > > > >> >> > > > > > > > > >> Hostname: gluster3 >> > > > > > > > > >> Uuid: >> > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a >> > > > > > > > > >> State: Peer in Cluster (Connected) >> > > > > > > > > >> >> > > > > > > > > >> I noticed that, in the brick logs, >> > > > i see that the public IP is used >> > > > > > > > > >> instead of the LAN IP. brick logs >> > > > from one of the volumes: >> > > > > > > > > >> >> > > > > > > > > >> rejected node: >> > > > https://pastebin.com/qkpj10Sd >> > > > > > > > > >> connected nodes: >> > > > https://pastebin.com/8SxVVYFV >> > > > > > > > > >> >> > > > > > > > > >> Why is the public IP suddenly used >> > > > instead of the LAN IP? Killing all >> > > > > > > > > >> gluster processes and rebooting >> > > > (again) didn't help. >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> Thx, >> > > > > > > > > >> Hubert >> > > > > > > > > >> >> > > > _______________________________________________ >> > > > > > > > > >> Gluster-users mailing list >> > > > > > > > > >> Gluster-users at gluster.org >> > > > >> > > > > > > > > >> >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > -- >> > > > > > > > > > Milind >> > > > > > > > > > >> > > > > > > > >> > > > _______________________________________________ >> > > > > > > > Gluster-users mailing list >> > > > > > > > Gluster-users at gluster.org >> > > > >> > > > > > > > >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > Regards, >> > > > > > > Hari Gowtham. >> > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Regards, >> > > > > Hari Gowtham. >> > > > _______________________________________________ >> > > > Gluster-users mailing list >> > > > Gluster-users at gluster.org >> > > > >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users >> > > > >> > > > _______________________________________________ >> > > > Gluster-users mailing list >> > > > Gluster-users at gluster.org >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users >> > > > >> > > > >> > > > >> > > > -- >> > > > Amar Tumballi (amarts) >> > > > >> > > > >> > > > _______________________________________________ >> > > > Gluster-users mailing list >> > > > Gluster-users at gluster.org >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users >> > > > >> > > _______________________________________________ >> > > Gluster-users mailing list >> > > Gluster-users at gluster.org >> > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Amar Tumballi (amarts) From valerio.luccio at nyu.edu Mon Mar 18 15:07:46 2019 From: valerio.luccio at nyu.edu (Valerio Luccio) Date: Mon, 18 Mar 2019 11:07:46 -0400 Subject: [Gluster-users] NFS export of gluster - solution Message-ID: <4db533c1-3710-31b0-64ca-486c19fb4a63@nyu.edu> So, I recently start NFS exporting of my gluster so that I could mount it from a legacy Mac OS X server. Every 24/36 hours the export seemed to freeze causing the server to seize up. The ganesha log was filled with errors related to RQUOTA. Frank Filz of the nfs-ganesha suggested that I'd try setting "Enable_RQUOTA = false;" in the NFS_CORE_PARAM config block of the ganesha.conf file and that seems to have done the trick, 5 days and counting without a problem. -- Valerio Luccio (212) 998-8736 Center for Brain Imaging 4 Washington Place, Room 157 New York University New York, NY 10003 "In an open world, who needs windows or gates ?" From brandon at thinkhuge.net Mon Mar 18 16:46:09 2019 From: brandon at thinkhuge.net (brandon at thinkhuge.net) Date: Mon, 18 Mar 2019 09:46:09 -0700 Subject: [Gluster-users] Transport endpoint is not connected failures in 5.3 under high I/O load Message-ID: <122f01d4ddaa$177772f0$466658d0$@thinkhuge.net> Hello list, We are having critical failures under load of CentOS7 glusterfs 5.3 with our servers losing their local mount point with the issue - "Transport endpoint is not connected" Not sure if it is related but the logs are full of the following message. [2019-03-18 14:00:02.656876] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler We operate multiple separate glusterfs distributed clusters of about 6-8 nodes. Our 2 biggest, separate, and most I/O active glusterfs clusters are both having the issues. We are trying to use glusterfs as a unified file system for pureftpd backup services for a VPS service. We have a relatively small backup window of the weekend where all our servers backup at the same time. When backups start early on Saturday it causes a sustained massive amount of FTP file upload I/O for around 48 hours with all the compressed backup files being uploaded. For our london 8 node cluster for example there is about 45 TB of uploads in ~48 hours currently. We do have some other smaller issues with directory listing under this load too but, it has been working for a couple years since 3.x until we've updated recently and randomly now all servers are losing their glusterfs mount with the "Transport endpoint is not connected" issue. Our glusterfs servers are all mostly the same with small variations. Mostly they are supermicro E3 cpu, 16 gb ram, LSI raid10 hdd (with and without bbu). Drive arrays vary between 4-16 sata3 hdd drives each node depending on if the servers are older or newer. Firmware is kept up-to-date as well as running the latest LSI compiled driver. the newer 16 drive backup servers have 2 x 1Gbit LACP teamed interfaces also. [root at lonbaknode3 ~]# uname -r 3.10.0-957.5.1.el7.x86_64 [root at lonbaknode3 ~]# rpm -qa |grep gluster centos-release-gluster5-1.0-1.el7.centos.noarch glusterfs-libs-5.3-2.el7.x86_64 glusterfs-api-5.3-2.el7.x86_64 glusterfs-5.3-2.el7.x86_64 glusterfs-cli-5.3-2.el7.x86_64 glusterfs-client-xlators-5.3-2.el7.x86_64 glusterfs-server-5.3-2.el7.x86_64 glusterfs-fuse-5.3-2.el7.x86_64 [root at lonbaknode3 ~]# [root at lonbaknode3 ~]# gluster volume info all Volume Name: volbackups Type: Distribute Volume ID: 32bf4fe9-5450-49f8-b6aa-05471d3bdffa Status: Started Snapshot Count: 0 Number of Bricks: 8 Transport-type: tcp Bricks: Brick1: lonbaknode3.domain.net:/lvbackups/brick Brick2: lonbaknode4.domain.net:/lvbackups/brick Brick3: lonbaknode5.domain.net:/lvbackups/brick Brick4: lonbaknode6.domain.net:/lvbackups/brick Brick5: lonbaknode7.domain.net:/lvbackups/brick Brick6: lonbaknode8.domain.net:/lvbackups/brick Brick7: lonbaknode9.domain.net:/lvbackups/brick Brick8: lonbaknode10.domain.net:/lvbackups/brick Options Reconfigured: transport.address-family: inet nfs.disable: on cluster.min-free-disk: 1% performance.cache-size: 8GB performance.cache-max-file-size: 128MB diagnostics.brick-log-level: WARNING diagnostics.brick-sys-log-level: WARNING client.event-threads: 3 performance.client-io-threads: on performance.io-thread-count: 24 network.inode-lru-limit: 1048576 performance.parallel-readdir: on performance.cache-invalidation: on performance.md-cache-timeout: 600 features.cache-invalidation: on features.cache-invalidation-timeout: 600 [root at lonbaknode3 ~]# Mount output shows the following: lonbaknode3.domain.net:/volbackups on /home/volbackups type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=1 31072) If you notice anything in our volume or mount settings above missing or otherwise bad feel free to let us know. Still learning this glusterfs. I tried searching for any recommended performance settings but, it's not always clear which setting is most applicable or beneficial to our workload. I have just found this post that looks like it is the same issues. https://lists.gluster.org/pipermail/gluster-users/2019-March/035958.html We have not yet tried the suggestion of "performance.write-behind: off" but, we will do so if that is recommended. Could someone knowledgeable advise anything for these issues? If any more information is needed do let us know. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brandon at thinkhuge.net Mon Mar 18 19:33:44 2019 From: brandon at thinkhuge.net (brandon at thinkhuge.net) Date: Mon, 18 Mar 2019 12:33:44 -0700 Subject: [Gluster-users] Constant fuse client crashes "fixed" by setting performance.write-behind: off. Any hope for a 4.1.8 release? Message-ID: <153001d4ddc1$80a1eff0$81e5cfd0$@thinkhuge.net> Hello Ville-Pekka and list, I believe we are experiencing similar gluster fuse client crashes on 5.3 as mentioned here. This morning I made a post in regards. https://lists.gluster.org/pipermail/gluster-users/2019-March/036036.html Has this "performance.write-behind: off" setting continued to be all you needed to workaround the issue? Thanks, Brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: From vbellur at redhat.com Mon Mar 18 21:02:00 2019 From: vbellur at redhat.com (Vijay Bellur) Date: Mon, 18 Mar 2019 14:02:00 -0700 Subject: [Gluster-users] [Gluster-devel] Different glusterfs clients's data not consistent. In-Reply-To: References: Message-ID: On Mon, Mar 18, 2019 at 1:21 PM ?? <994506334 at qq.com> wrote: > Three node? node1? node2, node3 > > Steps: > > 1. gluster volume create volume_test node1:/brick1 > 2. gluster volume set volume_test cluster.server-quorum-ratio 51 > 3. gluster volume set volume_test cluster.server-quorum-type server > 4. On node1, mount -t glusterfs node1:/volume_test /mnt. > 5. On node2, mount -t glusterfs node2:/volume_test /mnt. > 6. On node1, killall glusterd > 7. On node2, gluster volume add-brick volume_test node2:/brick2 > 8. On node2. mkdir /mnt/test > 8. touch /mnt/test/file1 on two nodes. > > On node1, found /brick1/file1. But on node2, also found /brick2/file1. > Can you please check the output of stat on file1 in both the bricks? There's a good likelihood that one of them is a link file [1]. > > I don't want to set cluster.server-quorum-ratio to 100. > Cound you help me to solve this porblem? > If it is a link file, a volume rebalance operation might provide the behavior you are looking for. Regards, Vijay [1] http://pl.atyp.us/hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jthottan at redhat.com Tue Mar 19 03:55:18 2019 From: jthottan at redhat.com (Jiffin Thottan) Date: Mon, 18 Mar 2019 23:55:18 -0400 (EDT) Subject: [Gluster-users] NFS export of gluster - solution In-Reply-To: <4db533c1-3710-31b0-64ca-486c19fb4a63@nyu.edu> References: <4db533c1-3710-31b0-64ca-486c19fb4a63@nyu.edu> Message-ID: <864377152.13828951.1552967718384.JavaMail.zimbra@redhat.com> Thanks Valerio for sharing the information ----- Original Message ----- From: "Valerio Luccio" To: "gluster-users" Sent: Monday, March 18, 2019 8:37:46 PM Subject: [Gluster-users] NFS export of gluster - solution So, I recently start NFS exporting of my gluster so that I could mount it from a legacy Mac OS X server. Every 24/36 hours the export seemed to freeze causing the server to seize up. The ganesha log was filled with errors related to RQUOTA. Frank Filz of the nfs-ganesha suggested that I'd try setting "Enable_RQUOTA = false;" in the NFS_CORE_PARAM config block of the ganesha.conf file and that seems to have done the trick, 5 days and counting without a problem. -- Valerio Luccio (212) 998-8736 Center for Brain Imaging 4 Washington Place, Room 157 New York University New York, NY 10003 "In an open world, who needs windows or gates ?" _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users From sankarshan.mukhopadhyay at gmail.com Tue Mar 19 04:37:36 2019 From: sankarshan.mukhopadhyay at gmail.com (Sankarshan Mukhopadhyay) Date: Tue, 19 Mar 2019 10:07:36 +0530 Subject: [Gluster-users] NFS export of gluster - solution In-Reply-To: <864377152.13828951.1552967718384.JavaMail.zimbra@redhat.com> References: <4db533c1-3710-31b0-64ca-486c19fb4a63@nyu.edu> <864377152.13828951.1552967718384.JavaMail.zimbra@redhat.com> Message-ID: On Tue, Mar 19, 2019 at 9:25 AM Jiffin Thottan wrote: > > Thanks Valerio for sharing the information > > ----- Original Message ----- > From: "Valerio Luccio" > To: "gluster-users" > Sent: Monday, March 18, 2019 8:37:46 PM > Subject: [Gluster-users] NFS export of gluster - solution > > So, I recently start NFS exporting of my gluster so that I could mount > it from a legacy Mac OS X server. Every 24/36 hours the export seemed to > freeze causing the server to seize up. The ganesha log was filled with > errors related to RQUOTA. Frank Filz of the nfs-ganesha suggested that > I'd try setting "Enable_RQUOTA = false;" in the NFS_CORE_PARAM config > block of the ganesha.conf file and that seems to have done the trick, 5 > days and counting without a problem. > Does this configuration change need to be updated in any existing documentation (for Gluster, nfs-ganesha)? From atumball at redhat.com Tue Mar 19 04:46:02 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Tue, 19 Mar 2019 10:16:02 +0530 Subject: [Gluster-users] Constant fuse client crashes "fixed" by setting performance.write-behind: off. Any hope for a 4.1.8 release? In-Reply-To: <153001d4ddc1$80a1eff0$81e5cfd0$@thinkhuge.net> References: <153001d4ddc1$80a1eff0$81e5cfd0$@thinkhuge.net> Message-ID: Due to this issue, along with few other logging issues, we did make a glusterfs-5.5 release, which has the fix for particular crash. Regards, Amar On Tue, 19 Mar, 2019, 1:04 AM , wrote: > Hello Ville-Pekka and list, > > > > I believe we are experiencing similar gluster fuse client crashes on 5.3 > as mentioned here. This morning I made a post in regards. > > > > https://lists.gluster.org/pipermail/gluster-users/2019-March/036036.html > > > > Has this "performance.write-behind: off" setting continued to be all you > needed to workaround the issue? > > > > Thanks, > > > > Brandon > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From happe at nbi.dk Tue Mar 19 12:09:01 2019 From: happe at nbi.dk (Hans Henrik Happe) Date: Tue, 19 Mar 2019 13:09:01 +0100 Subject: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0 In-Reply-To: References: Message-ID: Hi, Looking into something else I fell over this proposal. Being a shop that are going into "Leaving GlusterFS" mode, I thought I would give my two cents. While being partially an HPC shop with a few Lustre filesystems,? we chose GlusterFS for an archiving solution (2-3 PB), because we could find files in the underlying ZFS filesystems if GlusterFS went sour. We have used the access to the underlying files plenty, because of the continuous instability of GlusterFS'. Meanwhile, Lustre have been almost effortless to run and mainly for that reason we are planning to move away from GlusterFS. Reading this proposal kind of underlined that "Leaving GluserFS" is the right thing to do. While I never understood why GlusterFS has been in feature crazy mode instead of stabilizing mode, taking away crucial features I don't get. With RoCE, RDMA is getting mainstream. Quotas are very useful, even though the current implementation are not perfect. Tiering also makes so much sense, but, for large files, not on a per-file level. To be honest we only use quotas. We got scared of trying out new performance features that potentially would open up a new back of issues. Sorry for being such a buzzkill. I really wanted it to be different. Cheers, Hans Henrik On 19/07/2018 08.56, Amar Tumballi wrote: > * > > Hi all, > > Over last 12 years of Gluster, we have developed many features, and > continue to support most of it till now. But along the way, we have > figured out better methods of doing things. Also we are not actively > maintaining some of these features. > > We are now thinking of cleaning up some of these ?unsupported? > features, and mark them as ?SunSet? (i.e., would be totally taken out > of codebase in following releases) in next upcoming release, v5.0. The > release notes will provide options for smoothly migrating to the > supported configurations. > > If you are using any of these features, do let us know, so that we can > help you with ?migration?.. Also, we are happy to guide new developers > to work on those components which are not actively being maintained by > current set of developers. > > > List of features hitting sunset: > > > ?cluster/stripe? translator: > > This translator was developed very early in the evolution of > GlusterFS, and addressed one of the very common question of > Distributed FS, which is ?What happens if one of my file is bigger > than the available brick. Say, I have 2 TB hard drive, exported in > glusterfs, my file is 3 TB?. While it solved the purpose, it was very > hard to handle failure scenarios, and give a real good experience to > our users with this feature. Over the time, Gluster solved the problem > with it?s ?Shard? feature, which solves the problem in much better > way, and provides much better solution with existing well supported > stack. Hence the proposal for Deprecation. > > If you are using this feature, then do write to us, as it needs a > proper migration from existing volume to a new full supported volume > type before you upgrade. > > > ?storage/bd? translator: > > This feature got into the code base 5 years back with this patch > [1]. Plan was to use a block device > directly as a brick, which would help to handle disk-image storage > much easily in glusterfs. > > As the feature is not getting more contribution, and we are not seeing > any user traction on this, would like to propose for Deprecation. > > If you are using the feature, plan to move to a supported gluster > volume configuration, and have your setup ?supported? before upgrading > to your new gluster version. > > > ?RDMA? transport support: > > Gluster started supporting RDMA while ib-verbs was still new, and very > high-end infra around that time were using Infiniband. Engineers did > work with Mellanox, and got the technology into GlusterFS for better > data migration, data copy. While current day kernels support very good > speed with IPoIB module itself, and there are no more bandwidth for > experts in these area to maintain the feature, we recommend migrating > over to TCP (IP based) network for your volume. > > If you are successfully using RDMA transport, do get in touch with us > to prioritize the migration plan for your volume. Plan is to work on > this after the release, so by version 6.0, we will have a cleaner > transport code, which just needs to support one type. > > > ?Tiering? feature > > Gluster?s tiering feature which was planned to be providing an option > to keep your ?hot? data in different location than your cold data, so > one can get better performance. While we saw some users for the > feature, it needs much more attention to be completely bug free. At > the time, we are not having any active maintainers for the feature, > and hence suggesting to take it out of the ?supported? tag. > > If you are willing to take it up, and maintain it, do let us know, and > we are happy to assist you. > > If you are already using tiering feature, before upgrading, make sure > to do gluster volume tier detachall the bricks before upgrading to > next release. Also, we recommend you to use features like dmcacheon > your LVM setup to get best performance from bricks. > > > ?Quota? > > This is a call out for ?Quota? feature, to let you all know that it > will be ?no new development? state. While this feature is ?actively? > in use by many people, the challenges we have in accounting mechanisms > involved, has made it hard to achieve good performance with the > feature. Also, the amount of extended attribute get/set operations > while using the feature is not very ideal. Hence we recommend our > users to move towards setting quota on backend bricks directly (ie, > XFS project quota), or to use different volumes for different > directories etc. > > As the feature wouldn?t be deprecated immediately, the feature doesn?t > need a migration plan when you upgrade to newer version, but if you > are a new user, we wouldn?t recommend setting quota feature. By the > release dates, we will be publishing our best alternatives guide for > gluster?s current quota feature. > > Note that if you want to contribute to the feature, we have project > quota based issue open > [2] Happy to get > contributions, and help in getting a newer approach to Quota. > > > ------------------------------------------------------------------------ > > These are our set of initial features which we propose to take out of > ?fully? supported features. While we are in the process of making the > user/developer experience of the project much better with providing > well maintained codebase, we may come up with few more set of features > which we may possibly consider to move out of support, and hence keep > watching this space. > > [1] - http://review.gluster.org/4809 > > [2] - https://github.com/gluster/glusterfs/issues/184 > > Regards, > > Vijay, Shyam, Amar > > * > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim.kinney at gmail.com Tue Mar 19 12:51:26 2019 From: jim.kinney at gmail.com (Jim Kinney) Date: Tue, 19 Mar 2019 08:51:26 -0400 Subject: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0 In-Reply-To: References: Message-ID: 0">For my uses, the RDMA transport is essential. Much of my storage is used for HPC systems and IB is the network layer. We still use v3.12. Issues with glusterfs fuse mounts cause issues with python file open for write. We have to use nfs to avoid this. Really want to see better back-end tools to facilitate cleaning up of glusterfs failures. If system is going to use hard linked ID, need a mapping of id to file to fix things. That option is now on for all exports. It should be the default If a host is down and users delete files by the thousands, gluster _never_ catches up. Finding path names for ids across even a 40TB mount, much less the 200+TB one, is a slow process. A network outage of 2 minutes and one system didn't get the call to recursively delete several dozen directories each with several thousand files. On March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe wrote: >Hi, > >Looking into something else I fell over this proposal. Being a shop >that >are going into "Leaving GlusterFS" mode, I thought I would give my two >cents. > >While being partially an HPC shop with a few Lustre filesystems,? we >chose GlusterFS for an archiving solution (2-3 PB), because we could >find files in the underlying ZFS filesystems if GlusterFS went sour. > >We have used the access to the underlying files plenty, because of the >continuous instability of GlusterFS'. Meanwhile, Lustre have been >almost >effortless to run and mainly for that reason we are planning to move >away from GlusterFS. > >Reading this proposal kind of underlined that "Leaving GluserFS" is the >right thing to do. While I never understood why GlusterFS has been in >feature crazy mode instead of stabilizing mode, taking away crucial >features I don't get. With RoCE, RDMA is getting mainstream. Quotas are >very useful, even though the current implementation are not perfect. >Tiering also makes so much sense, but, for large files, not on a >per-file level. > >To be honest we only use quotas. We got scared of trying out new >performance features that potentially would open up a new back of >issues. > >Sorry for being such a buzzkill. I really wanted it to be different. > >Cheers, >Hans Henrik > >On 19/07/2018 08.56, Amar Tumballi wrote: >> * >> >> Hi all, >> >> Over last 12 years of Gluster, we have developed many features, and >> continue to support most of it till now. But along the way, we have >> figured out better methods of doing things. Also we are not actively >> maintaining some of these features. >> >> We are now thinking of cleaning up some of these ?unsupported? >> features, and mark them as ?SunSet? (i.e., would be totally taken out >> of codebase in following releases) in next upcoming release, v5.0. >The >> release notes will provide options for smoothly migrating to the >> supported configurations. >> >> If you are using any of these features, do let us know, so that we >can >> help you with ?migration?.. Also, we are happy to guide new >developers >> to work on those components which are not actively being maintained >by >> current set of developers. >> >> >> List of features hitting sunset: >> >> >> ?cluster/stripe? translator: >> >> This translator was developed very early in the evolution of >> GlusterFS, and addressed one of the very common question of >> Distributed FS, which is ?What happens if one of my file is bigger >> than the available brick. Say, I have 2 TB hard drive, exported in >> glusterfs, my file is 3 TB?. While it solved the purpose, it was very >> hard to handle failure scenarios, and give a real good experience to >> our users with this feature. Over the time, Gluster solved the >problem >> with it?s ?Shard? feature, which solves the problem in much better >> way, and provides much better solution with existing well supported >> stack. Hence the proposal for Deprecation. >> >> If you are using this feature, then do write to us, as it needs a >> proper migration from existing volume to a new full supported volume >> type before you upgrade. >> >> >> ?storage/bd? translator: >> >> This feature got into the code base 5 years back with this patch >> [1]. Plan was to use a block device >> directly as a brick, which would help to handle disk-image storage >> much easily in glusterfs. >> >> As the feature is not getting more contribution, and we are not >seeing >> any user traction on this, would like to propose for Deprecation. >> >> If you are using the feature, plan to move to a supported gluster >> volume configuration, and have your setup ?supported? before >upgrading >> to your new gluster version. >> >> >> ?RDMA? transport support: >> >> Gluster started supporting RDMA while ib-verbs was still new, and >very >> high-end infra around that time were using Infiniband. Engineers did >> work with Mellanox, and got the technology into GlusterFS for better >> data migration, data copy. While current day kernels support very >good >> speed with IPoIB module itself, and there are no more bandwidth for >> experts in these area to maintain the feature, we recommend migrating >> over to TCP (IP based) network for your volume. >> >> If you are successfully using RDMA transport, do get in touch with us >> to prioritize the migration plan for your volume. Plan is to work on >> this after the release, so by version 6.0, we will have a cleaner >> transport code, which just needs to support one type. >> >> >> ?Tiering? feature >> >> Gluster?s tiering feature which was planned to be providing an option >> to keep your ?hot? data in different location than your cold data, so >> one can get better performance. While we saw some users for the >> feature, it needs much more attention to be completely bug free. At >> the time, we are not having any active maintainers for the feature, >> and hence suggesting to take it out of the ?supported? tag. >> >> If you are willing to take it up, and maintain it, do let us know, >and >> we are happy to assist you. >> >> If you are already using tiering feature, before upgrading, make sure >> to do gluster volume tier detachall the bricks before upgrading to >> next release. Also, we recommend you to use features like dmcacheon >> your LVM setup to get best performance from bricks. >> >> >> ?Quota? >> >> This is a call out for ?Quota? feature, to let you all know that it >> will be ?no new development? state. While this feature is ?actively? >> in use by many people, the challenges we have in accounting >mechanisms >> involved, has made it hard to achieve good performance with the >> feature. Also, the amount of extended attribute get/set operations >> while using the feature is not very ideal. Hence we recommend our >> users to move towards setting quota on backend bricks directly (ie, >> XFS project quota), or to use different volumes for different >> directories etc. >> >> As the feature wouldn?t be deprecated immediately, the feature >doesn?t >> need a migration plan when you upgrade to newer version, but if you >> are a new user, we wouldn?t recommend setting quota feature. By the >> release dates, we will be publishing our best alternatives guide for >> gluster?s current quota feature. >> >> Note that if you want to contribute to the feature, we have project >> quota based issue open >> [2] Happy to get >> contributions, and help in getting a newer approach to Quota. >> >> >> >------------------------------------------------------------------------ >> >> These are our set of initial features which we propose to take out of >> ?fully? supported features. While we are in the process of making the >> user/developer experience of the project much better with providing >> well maintained codebase, we may come up with few more set of >features >> which we may possibly consider to move out of support, and hence keep >> watching this space. >> >> [1] - http://review.gluster.org/4809 >> >> [2] - https://github.com/gluster/glusterfs/issues/184 >> >> Regards, >> >> Vijay, Shyam, Amar >> >> * >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users -- Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Tue Mar 19 13:10:26 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Tue, 19 Mar 2019 18:40:26 +0530 Subject: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0 In-Reply-To: References: Message-ID: Hi Hans, Thanks for the honest feedback. Appreciate this. On Tue, Mar 19, 2019 at 5:39 PM Hans Henrik Happe wrote: > Hi, > > Looking into something else I fell over this proposal. Being a shop that > are going into "Leaving GlusterFS" mode, I thought I would give my two > cents. > > While being partially an HPC shop with a few Lustre filesystems, we chose > GlusterFS for an archiving solution (2-3 PB), because we could find files > in the underlying ZFS filesystems if GlusterFS went sour. > > We have used the access to the underlying files plenty, because of the > continuous instability of GlusterFS'. Meanwhile, Lustre have been almost > effortless to run and mainly for that reason we are planning to move away > from GlusterFS. > > Reading this proposal kind of underlined that "Leaving GluserFS" is the > right thing to do. While I never understood why GlusterFS has been in > feature crazy mode instead of stabilizing mode, taking away crucial > features I don't get. With RoCE, RDMA is getting mainstream. Quotas are > very useful, even though the current implementation are not perfect. > Tiering also makes so much sense, but, for large files, not on a per-file > level. > > It is a right concern to raise, and removing the existing features is not a good thing most of the times. But, one thing we noticed over the years is, the features which we develop, and not take to completion cause the major heart-burn. People think it is present, and it is already few years since its introduced, but if the developers are not working on it, users would always feel that the product doesn't work, because that one feature didn't work. Other than Quota in the proposal email, for all other features, even though we have *some* users, we are inclined towards deprecating them, considering projects overall goals of stability in the longer run. > To be honest we only use quotas. We got scared of trying out new > performance features that potentially would open up a new back of issues. > > About Quota, we heard enough voices, so we will make sure we keep it. The original email was 'Proposal', and hence these opinions matter for decision. Sorry for being such a buzzkill. I really wanted it to be different. > > We hear you. Please let us know one thing, which were the versions you tried ? We hope in coming months, our recent focus on Stability and Technical debt reduction will help you to re-look at Gluster after sometime. > Cheers, > Hans Henrik > On 19/07/2018 08.56, Amar Tumballi wrote: > > > * Hi all, Over last 12 years of Gluster, we have developed many features, > and continue to support most of it till now. But along the way, we have > figured out better methods of doing things. Also we are not actively > maintaining some of these features. We are now thinking of cleaning up some > of these ?unsupported? features, and mark them as ?SunSet? (i.e., would be > totally taken out of codebase in following releases) in next upcoming > release, v5.0. The release notes will provide options for smoothly > migrating to the supported configurations. If you are using any of these > features, do let us know, so that we can help you with ?migration?.. Also, > we are happy to guide new developers to work on those components which are > not actively being maintained by current set of developers. List of > features hitting sunset: ?cluster/stripe? translator: This translator was > developed very early in the evolution of GlusterFS, and addressed one of > the very common question of Distributed FS, which is ?What happens if one > of my file is bigger than the available brick. Say, I have 2 TB hard drive, > exported in glusterfs, my file is 3 TB?. While it solved the purpose, it > was very hard to handle failure scenarios, and give a real good experience > to our users with this feature. Over the time, Gluster solved the problem > with it?s ?Shard? feature, which solves the problem in much better way, and > provides much better solution with existing well supported stack. Hence the > proposal for Deprecation. If you are using this feature, then do write to > us, as it needs a proper migration from existing volume to a new full > supported volume type before you upgrade. ?storage/bd? translator: This > feature got into the code base 5 years back with this patch > [1]. Plan was to use a block device > directly as a brick, which would help to handle disk-image storage much > easily in glusterfs. As the feature is not getting more contribution, and > we are not seeing any user traction on this, would like to propose for > Deprecation. If you are using the feature, plan to move to a supported > gluster volume configuration, and have your setup ?supported? before > upgrading to your new gluster version. ?RDMA? transport support: Gluster > started supporting RDMA while ib-verbs was still new, and very high-end > infra around that time were using Infiniband. Engineers did work with > Mellanox, and got the technology into GlusterFS for better data migration, > data copy. While current day kernels support very good speed with IPoIB > module itself, and there are no more bandwidth for experts in these area to > maintain the feature, we recommend migrating over to TCP (IP based) network > for your volume. If you are successfully using RDMA transport, do get in > touch with us to prioritize the migration plan for your volume. Plan is to > work on this after the release, so by version 6.0, we will have a cleaner > transport code, which just needs to support one type. ?Tiering? feature > Gluster?s tiering feature which was planned to be providing an option to > keep your ?hot? data in different location than your cold data, so one can > get better performance. While we saw some users for the feature, it needs > much more attention to be completely bug free. At the time, we are not > having any active maintainers for the feature, and hence suggesting to take > it out of the ?supported? tag. If you are willing to take it up, and > maintain it, do let us know, and we are happy to assist you. If you are > already using tiering feature, before upgrading, make sure to do gluster > volume tier detach all the bricks before upgrading to next release. Also, > we recommend you to use features like dmcache on your LVM setup to get best > performance from bricks. ?Quota? This is a call out for ?Quota? feature, to > let you all know that it will be ?no new development? state. While this > feature is ?actively? in use by many people, the challenges we have in > accounting mechanisms involved, has made it hard to achieve good > performance with the feature. Also, the amount of extended attribute > get/set operations while using the feature is not very ideal. Hence we > recommend our users to move towards setting quota on backend bricks > directly (ie, XFS project quota), or to use different volumes for different > directories etc. As the feature wouldn?t be deprecated immediately, the > feature doesn?t need a migration plan when you upgrade to newer version, > but if you are a new user, we wouldn?t recommend setting quota feature. By > the release dates, we will be publishing our best alternatives guide for > gluster?s current quota feature. Note that if you want to contribute to the > feature, we have project quota based issue open > [2] Happy to get > contributions, and help in getting a newer approach to Quota. > ------------------------------ These are our set of initial features which > we propose to take out of ?fully? supported features. While we are in the > process of making the user/developer experience of the project much better > with providing well maintained codebase, we may come up with few more set > of features which we may possibly consider to move out of support, and > hence keep watching this space. [1] - http://review.gluster.org/4809 > [2] - > https://github.com/gluster/glusterfs/issues/184 > Regards, Vijay, Shyam, > Amar * > > > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users > > -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Tue Mar 19 13:12:25 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Tue, 19 Mar 2019 18:42:25 +0530 Subject: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0 In-Reply-To: References: Message-ID: Hi Jim, On Tue, Mar 19, 2019 at 6:21 PM Jim Kinney wrote: > > Issues with glusterfs fuse mounts cause issues with python file open for > write. We have to use nfs to avoid this. > > Really want to see better back-end tools to facilitate cleaning up of > glusterfs failures. If system is going to use hard linked ID, need a > mapping of id to file to fix things. That option is now on for all exports. > It should be the default If a host is down and users delete files by the > thousands, gluster _never_ catches up. Finding path names for ids across > even a 40TB mount, much less the 200+TB one, is a slow process. A network > outage of 2 minutes and one system didn't get the call to recursively > delete several dozen directories each with several thousand files. > > Are you talking about some issues in geo-replication module or some other application using native mount? Happy to take the discussion forward about these issues. Are there any bugs open on this? Thanks, Amar > > > nfs > On March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe wrote: >> >> Hi, >> >> Looking into something else I fell over this proposal. Being a shop that >> are going into "Leaving GlusterFS" mode, I thought I would give my two >> cents. >> >> While being partially an HPC shop with a few Lustre filesystems, we >> chose GlusterFS for an archiving solution (2-3 PB), because we could find >> files in the underlying ZFS filesystems if GlusterFS went sour. >> >> We have used the access to the underlying files plenty, because of the >> continuous instability of GlusterFS'. Meanwhile, Lustre have been almost >> effortless to run and mainly for that reason we are planning to move away >> from GlusterFS. >> >> Reading this proposal kind of underlined that "Leaving GluserFS" is the >> right thing to do. While I never understood why GlusterFS has been in >> feature crazy mode instead of stabilizing mode, taking away crucial >> features I don't get. With RoCE, RDMA is getting mainstream. Quotas are >> very useful, even though the current implementation are not perfect. >> Tiering also makes so much sense, but, for large files, not on a per-file >> level. >> >> To be honest we only use quotas. We got scared of trying out new >> performance features that potentially would open up a new back of issues. >> >> Sorry for being such a buzzkill. I really wanted it to be different. >> >> Cheers, >> Hans Henrik >> On 19/07/2018 08.56, Amar Tumballi wrote: >> >> >> * Hi all, Over last 12 years of Gluster, we have developed many features, >> and continue to support most of it till now. But along the way, we have >> figured out better methods of doing things. Also we are not actively >> maintaining some of these features. We are now thinking of cleaning up some >> of these ?unsupported? features, and mark them as ?SunSet? (i.e., would be >> totally taken out of codebase in following releases) in next upcoming >> release, v5.0. The release notes will provide options for smoothly >> migrating to the supported configurations. If you are using any of these >> features, do let us know, so that we can help you with ?migration?.. Also, >> we are happy to guide new developers to work on those components which are >> not actively being maintained by current set of developers. List of >> features hitting sunset: ?cluster/stripe? translator: This translator was >> developed very early in the evolution of GlusterFS, and addressed one of >> the very common question of Distributed FS, which is ?What happens if one >> of my file is bigger than the available brick. Say, I have 2 TB hard drive, >> exported in glusterfs, my file is 3 TB?. While it solved the purpose, it >> was very hard to handle failure scenarios, and give a real good experience >> to our users with this feature. Over the time, Gluster solved the problem >> with it?s ?Shard? feature, which solves the problem in much better way, and >> provides much better solution with existing well supported stack. Hence the >> proposal for Deprecation. If you are using this feature, then do write to >> us, as it needs a proper migration from existing volume to a new full >> supported volume type before you upgrade. ?storage/bd? translator: This >> feature got into the code base 5 years back with this patch >> [1]. Plan was to use a block device >> directly as a brick, which would help to handle disk-image storage much >> easily in glusterfs. As the feature is not getting more contribution, and >> we are not seeing any user traction on this, would like to propose for >> Deprecation. If you are using the feature, plan to move to a supported >> gluster volume configuration, and have your setup ?supported? before >> upgrading to your new gluster version. ?RDMA? transport support: Gluster >> started supporting RDMA while ib-verbs was still new, and very high-end >> infra around that time were using Infiniband. Engineers did work with >> Mellanox, and got the technology into GlusterFS for better data migration, >> data copy. While current day kernels support very good speed with IPoIB >> module itself, and there are no more bandwidth for experts in these area to >> maintain the feature, we recommend migrating over to TCP (IP based) network >> for your volume. If you are successfully using RDMA transport, do get in >> touch with us to prioritize the migration plan for your volume. Plan is to >> work on this after the release, so by version 6.0, we will have a cleaner >> transport code, which just needs to support one type. ?Tiering? feature >> Gluster?s tiering feature which was planned to be providing an option to >> keep your ?hot? data in different location than your cold data, so one can >> get better performance. While we saw some users for the feature, it needs >> much more attention to be completely bug free. At the time, we are not >> having any active maintainers for the feature, and hence suggesting to take >> it out of the ?supported? tag. If you are willing to take it up, and >> maintain it, do let us know, and we are happy to assist you. If you are >> already using tiering feature, before upgrading, make sure to do gluster >> volume tier detach all the bricks before upgrading to next release. Also, >> we recommend you to use features like dmcache on your LVM setup to get best >> performance from bricks. ?Quota? This is a call out for ?Quota? feature, to >> let you all know that it will be ?no new development? state. While this >> feature is ?actively? in use by many people, the challenges we have in >> accounting mechanisms involved, has made it hard to achieve good >> performance with the feature. Also, the amount of extended attribute >> get/set operations while using the feature is not very ideal. Hence we >> recommend our users to move towards setting quota on backend bricks >> directly (ie, XFS project quota), or to use different volumes for different >> directories etc. As the feature wouldn?t be deprecated immediately, the >> feature doesn?t need a migration plan when you upgrade to newer version, >> but if you are a new user, we wouldn?t recommend setting quota feature. By >> the release dates, we will be publishing our best alternatives guide for >> gluster?s current quota feature. Note that if you want to contribute to the >> feature, we have project quota based issue open >> [2] Happy to get >> contributions, and help in getting a newer approach to Quota. >> ------------------------------ These are our set of initial features which >> we propose to take out of ?fully? supported features. While we are in the >> process of making the user/developer experience of the project much better >> with providing well maintained codebase, we may come up with few more set >> of features which we may possibly consider to move out of support, and >> hence keep watching this space. [1] - http://review.gluster.org/4809 >> [2] - >> https://github.com/gluster/glusterfs/issues/184 >> Regards, Vijay, Shyam, >> Amar * >> >> >> >> _______________________________________________ >> Gluster-users mailing listGluster-users at gluster.orghttps://lists.glus >> -- >> Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity.ter.org/mailman/listinfo/gluster-users >> >> -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From happe at nbi.dk Tue Mar 19 14:12:15 2019 From: happe at nbi.dk (Hans Henrik Happe) Date: Tue, 19 Mar 2019 15:12:15 +0100 Subject: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0 In-Reply-To: References: Message-ID: On 19/03/2019 14.10, Amar Tumballi Suryanarayan wrote: > Hi Hans, > > Thanks for the honest feedback. Appreciate this. > > On Tue, Mar 19, 2019 at 5:39 PM Hans Henrik Happe > wrote: > > Hi, > > Looking into something else I fell over this proposal. Being a > shop that are going into "Leaving GlusterFS" mode, I thought I > would give my two cents. > > While being partially an HPC shop with a few Lustre filesystems,? > we chose GlusterFS for an archiving solution (2-3 PB), because we > could find files in the underlying ZFS filesystems if GlusterFS > went sour. > > We have used the access to the underlying files plenty, because of > the continuous instability of GlusterFS'. Meanwhile, Lustre have > been almost effortless to run and mainly for that reason we are > planning to move away from GlusterFS. > > Reading this proposal kind of underlined that "Leaving GluserFS" > is the right thing to do. While I never understood why GlusterFS > has been in feature crazy mode instead of stabilizing mode, taking > away crucial features I don't get. With RoCE, RDMA is getting > mainstream. Quotas are very useful, even though the current > implementation are not perfect. Tiering also makes so much sense, > but, for large files, not on a per-file level. > > > It is a right concern to raise, and removing the existing features is > not a good thing most of the times. But, one thing we noticed over the > years is, the features which we develop, and not take to completion > cause the major heart-burn. People think it is present, and it is > already few years since its introduced, but if the developers are not > working on it, users would always feel that the product doesn't work, > because that one feature didn't work.? > > Other than Quota in the proposal email, for all other features, even > though we have *some* users, we are inclined towards deprecating them, > considering projects overall goals of stability in the longer run. > ? > > To be honest we only use quotas. We got scared of trying out new > performance features that potentially would open up a new back of > issues. > > About Quota, we heard enough voices, so we will make sure we keep it. > The original email was 'Proposal', and hence these opinions matter for > decision. > > Sorry for being such a buzzkill. I really wanted it to be different. > > We hear you. Please let us know one thing, which were the versions you > tried ? > We started at 3.6 4 years ago. Now we are at 3.12.15, working towards moving to 4.1.latest. > We hope in coming months, our recent focus on Stability and Technical > debt reduction will help you to re-look at Gluster after sometime. That's great to hear. > > Cheers, > Hans Henrik > > On 19/07/2018 08.56, Amar Tumballi wrote: >> * >> >> Hi all, >> >> Over last 12 years of Gluster, we have developed many features, >> and continue to support most of it till now. But along the way, >> we have figured out better methods of doing things. Also we are >> not actively maintaining some of these features. >> >> We are now thinking of cleaning up some of these ?unsupported? >> features, and mark them as ?SunSet? (i.e., would be totally taken >> out of codebase in following releases) in next upcoming release, >> v5.0. The release notes will provide options for smoothly >> migrating to the supported configurations. >> >> If you are using any of these features, do let us know, so that >> we can help you with ?migration?.. Also, we are happy to guide >> new developers to work on those components which are not actively >> being maintained by current set of developers. >> >> >> List of features hitting sunset: >> >> >> ?cluster/stripe? translator: >> >> This translator was developed very early in the evolution of >> GlusterFS, and addressed one of the very common question of >> Distributed FS, which is ?What happens if one of my file is >> bigger than the available brick. Say, I have 2 TB hard drive, >> exported in glusterfs, my file is 3 TB?. While it solved the >> purpose, it was very hard to handle failure scenarios, and give a >> real good experience to our users with this feature. Over the >> time, Gluster solved the problem with it?s ?Shard? feature, which >> solves the problem in much better way, and provides much better >> solution with existing well supported stack. Hence the proposal >> for Deprecation. >> >> If you are using this feature, then do write to us, as it needs a >> proper migration from existing volume to a new full supported >> volume type before you upgrade. >> >> >> ?storage/bd? translator: >> >> This feature got into the code base 5 years back with this patch >> [1]. Plan was to use a block >> device directly as a brick, which would help to handle disk-image >> storage much easily in glusterfs. >> >> As the feature is not getting more contribution, and we are not >> seeing any user traction on this, would like to propose for >> Deprecation. >> >> If you are using the feature, plan to move to a supported gluster >> volume configuration, and have your setup ?supported? before >> upgrading to your new gluster version. >> >> >> ?RDMA? transport support: >> >> Gluster started supporting RDMA while ib-verbs was still new, and >> very high-end infra around that time were using Infiniband. >> Engineers did work with Mellanox, and got the technology into >> GlusterFS for better data migration, data copy. While current day >> kernels support very good speed with IPoIB module itself, and >> there are no more bandwidth for experts in these area to maintain >> the feature, we recommend migrating over to TCP (IP based) >> network for your volume. >> >> If you are successfully using RDMA transport, do get in touch >> with us to prioritize the migration plan for your volume. Plan is >> to work on this after the release, so by version 6.0, we will >> have a cleaner transport code, which just needs to support one type. >> >> >> ?Tiering? feature >> >> Gluster?s tiering feature which was planned to be providing an >> option to keep your ?hot? data in different location than your >> cold data, so one can get better performance. While we saw some >> users for the feature, it needs much more attention to be >> completely bug free. At the time, we are not having any active >> maintainers for the feature, and hence suggesting to take it out >> of the ?supported? tag. >> >> If you are willing to take it up, and maintain it, do let us >> know, and we are happy to assist you. >> >> If you are already using tiering feature, before upgrading, make >> sure to do gluster volume tier detachall the bricks before >> upgrading to next release. Also, we recommend you to use features >> like dmcacheon your LVM setup to get best performance from bricks. >> >> >> ?Quota? >> >> This is a call out for ?Quota? feature, to let you all know that >> it will be ?no new development? state. While this feature is >> ?actively? in use by many people, the challenges we have in >> accounting mechanisms involved, has made it hard to achieve good >> performance with the feature. Also, the amount of extended >> attribute get/set operations while using the feature is not very >> ideal. Hence we recommend our users to move towards setting quota >> on backend bricks directly (ie, XFS project quota), or to use >> different volumes for different directories etc. >> >> As the feature wouldn?t be deprecated immediately, the feature >> doesn?t need a migration plan when you upgrade to newer version, >> but if you are a new user, we wouldn?t recommend setting quota >> feature. By the release dates, we will be publishing our best >> alternatives guide for gluster?s current quota feature. >> >> Note that if you want to contribute to the feature, we have >> project quota based issue open >> [2] Happy to get >> contributions, and help in getting a newer approach to Quota. >> >> >> ------------------------------------------------------------------------ >> >> These are our set of initial features which we propose to take >> out of ?fully? supported features. While we are in the process of >> making the user/developer experience of the project much better >> with providing well maintained codebase, we may come up with few >> more set of features which we may possibly consider to move out >> of support, and hence keep watching this space. >> >> [1] - http://review.gluster.org/4809 >> >> [2] - https://github.com/gluster/glusterfs/issues/184 >> >> Regards, >> >> Vijay, Shyam, Amar >> >> * >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From archon810 at gmail.com Tue Mar 19 14:53:09 2019 From: archon810 at gmail.com (Artem Russakovskii) Date: Tue, 19 Mar 2019 07:53:09 -0700 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: <2ed171d2-df68-ada3-e0de-53f19cb79520@redhat.com> Message-ID: The flood is indeed fixed for us on 5.5. However, the crashes are not. Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | +ArtemRussakovskii | @ArtemR On Mon, Mar 18, 2019 at 5:41 AM Hu Bert wrote: > Hi Amar, > > if you refer to this bug: > https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test > setup i haven't seen those entries, while copying & deleting a few GBs > of data. For a final statement we have to wait until i updated our > live gluster servers - could take place on tuesday or wednesday. > > Maybe other users can do an update to 5.4 as well and report back here. > > > Hubert > > > > Am Mo., 18. M?rz 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan > : > > > > Hi Hu Bert, > > > > Appreciate the feedback. Also are the other boiling issues related to > logs fixed now? > > > > -Amar > > > > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert wrote: > >> > >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2 > >> volumes done. In 'gluster peer status' the peers stay connected during > >> the upgrade, no 'peer rejected' messages. No cksum mismatches in the > >> logs. Looks good :-) > >> > >> Am Mo., 18. M?rz 2019 um 09:54 Uhr schrieb Hu Bert < > revirii at googlemail.com>: > >> > > >> > Good morning :-) > >> > > >> > for debian the packages are there: > >> > > https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ > >> > > >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if there > >> > are some errors etc. and report back. > >> > > >> > btw: no release notes for 5.4 and 5.5 so far? > >> > https://docs.gluster.org/en/latest/release-notes/ ? > >> > > >> > Am Fr., 15. M?rz 2019 um 14:28 Uhr schrieb Shyam Ranganathan > >> > : > >> > > > >> > > We created a 5.5 release tag, and it is under packaging now. It > should > >> > > be packaged and ready for testing early next week and should be > released > >> > > close to mid-week next week. > >> > > > >> > > Thanks, > >> > > Shyam > >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote: > >> > > > Wednesday now with no update :-/ > >> > > > > >> > > > Sincerely, > >> > > > Artem > >> > > > > >> > > > -- > >> > > > Founder, Android Police , APK > Mirror > >> > > > , Illogical Robot LLC > >> > > > beerpla.net | +ArtemRussakovskii > >> > > > | @ArtemR > >> > > > > >> > > > > >> > > > > >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii < > archon810 at gmail.com > >> > > > > wrote: > >> > > > > >> > > > Hi Amar, > >> > > > > >> > > > Any updates on this? I'm still not seeing it in OpenSUSE build > >> > > > repos. Maybe later today? > >> > > > > >> > > > Thanks. > >> > > > > >> > > > Sincerely, > >> > > > Artem > >> > > > > >> > > > -- > >> > > > Founder, Android Police , APK > Mirror > >> > > > , Illogical Robot LLC > >> > > > beerpla.net | +ArtemRussakovskii > >> > > > | @ArtemR > >> > > > > >> > > > > >> > > > > >> > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan > >> > > > > wrote: > >> > > > > >> > > > We are talking days. Not weeks. Considering already it is > >> > > > Thursday here. 1 more day for tagging, and packaging. May > be ok > >> > > > to expect it on Monday. > >> > > > > >> > > > -Amar > >> > > > > >> > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii > >> > > > > wrote: > >> > > > > >> > > > Is the next release going to be an imminent hotfix, > i.e. > >> > > > something like today/tomorrow, or are we talking > weeks? > >> > > > > >> > > > Sincerely, > >> > > > Artem > >> > > > > >> > > > -- > >> > > > Founder, Android Police , > APK > >> > > > Mirror , Illogical Robot > LLC > >> > > > beerpla.net | > +ArtemRussakovskii > >> > > > | > @ArtemR > >> > > > > >> > > > > >> > > > > >> > > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii > >> > > > > > wrote: > >> > > > > >> > > > Ended up downgrading to 5.3 just in case. Peer > status > >> > > > and volume status are OK now. > >> > > > > >> > > > zypper install --oldpackage > glusterfs-5.3-lp150.100.1 > >> > > > Loading repository data... > >> > > > Reading installed packages... > >> > > > Resolving package dependencies... > >> > > > > >> > > > Problem: glusterfs-5.3-lp150.100.1.x86_64 requires > >> > > > libgfapi0 = 5.3, but this requirement cannot be > provided > >> > > > not installable providers: > >> > > > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] > >> > > > Solution 1: Following actions will be done: > >> > > > downgrade of libgfapi0-5.4-lp150.100.1.x86_64 to > >> > > > libgfapi0-5.3-lp150.100.1.x86_64 > >> > > > downgrade of > libgfchangelog0-5.4-lp150.100.1.x86_64 to > >> > > > libgfchangelog0-5.3-lp150.100.1.x86_64 > >> > > > downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 to > >> > > > libgfrpc0-5.3-lp150.100.1.x86_64 > >> > > > downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 to > >> > > > libgfxdr0-5.3-lp150.100.1.x86_64 > >> > > > downgrade of > libglusterfs0-5.4-lp150.100.1.x86_64 to > >> > > > libglusterfs0-5.3-lp150.100.1.x86_64 > >> > > > Solution 2: do not install > glusterfs-5.3-lp150.100.1.x86_64 > >> > > > Solution 3: break > glusterfs-5.3-lp150.100.1.x86_64 by > >> > > > ignoring some of its dependencies > >> > > > > >> > > > Choose from above solutions by number or cancel > >> > > > [1/2/3/c] (c): 1 > >> > > > Resolving dependencies... > >> > > > Resolving package dependencies... > >> > > > > >> > > > The following 6 packages are going to be > downgraded: > >> > > > glusterfs libgfapi0 libgfchangelog0 libgfrpc0 > >> > > > libgfxdr0 libglusterfs0 > >> > > > > >> > > > 6 packages to downgrade. > >> > > > > >> > > > Sincerely, > >> > > > Artem > >> > > > > >> > > > -- > >> > > > Founder, Android Police > >> > > > , APK Mirror > >> > > > , Illogical Robot LLC > >> > > > beerpla.net | > +ArtemRussakovskii > >> > > > | > @ArtemR > >> > > > > >> > > > > >> > > > > >> > > > On Tue, Mar 5, 2019 at 10:57 AM Artem Russakovskii > >> > > > > > wrote: > >> > > > > >> > > > Noticed the same when upgrading from 5.3 to > 5.4, as > >> > > > mentioned. > >> > > > > >> > > > I'm confused though. Is actual replication > affected, > >> > > > because the 5.4 server and the 3x 5.3 servers > still > >> > > > show heal info as all 4 connected, and the > files > >> > > > seem to be replicating correctly as well. > >> > > > > >> > > > So what's actually affected - just the status > >> > > > command, or leaving 5.4 on one of the nodes > is doing > >> > > > some damage to the underlying fs? Is it > fixable by > >> > > > tweaking transport.socket.ssl-enabled? Does > >> > > > upgrading all servers to 5.4 resolve it, or > should > >> > > > we revert back to 5.3? > >> > > > > >> > > > Sincerely, > >> > > > Artem > >> > > > > >> > > > -- > >> > > > Founder, Android Police > >> > > > , APK Mirror > >> > > > , Illogical Robot > LLC > >> > > > beerpla.net | > >> > > > +ArtemRussakovskii > >> > > > > >> > > > | @ArtemR > >> > > > > >> > > > > >> > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert > >> > > > >> > > > > wrote: > >> > > > > >> > > > fyi: did a downgrade 5.4 -> 5.3 and it > worked. > >> > > > all replicas are up and > >> > > > running. Awaiting updated v5.4. > >> > > > > >> > > > thx :-) > >> > > > > >> > > > Am Di., 5. M?rz 2019 um 09:26 Uhr schrieb > Hari > >> > > > Gowtham >> > > > >: > >> > > > > > >> > > > > There are plans to revert the patch > causing > >> > > > this error and rebuilt 5.4. > >> > > > > This should happen faster. the rebuilt > 5.4 > >> > > > should be void of this upgrade issue. > >> > > > > > >> > > > > In the meantime, you can use 5.3 for > this cluster. > >> > > > > Downgrading to 5.3 will work if it was > just > >> > > > one node that was upgrade to 5.4 > >> > > > > and the other nodes are still in 5.3. > >> > > > > > >> > > > > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert > >> > > > >> > > > > wrote: > >> > > > > > > >> > > > > > Hi Hari, > >> > > > > > > >> > > > > > thx for the hint. Do you know when > this will > >> > > > be fixed? Is a downgrade > >> > > > > > 5.4 -> 5.3 a possibility to fix this? > >> > > > > > > >> > > > > > Hubert > >> > > > > > > >> > > > > > Am Di., 5. M?rz 2019 um 08:32 Uhr > schrieb > >> > > > Hari Gowtham >> > > > >: > >> > > > > > > > >> > > > > > > Hi, > >> > > > > > > > >> > > > > > > This is a known issue we are > working on. > >> > > > > > > As the checksum differs between the > >> > > > updated and non updated node, the > >> > > > > > > peers are getting rejected. > >> > > > > > > The bricks aren't coming because of > the > >> > > > same issue. > >> > > > > > > > >> > > > > > > More about the issue: > >> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1685120 > >> > > > > > > > >> > > > > > > On Tue, Mar 5, 2019 at 12:56 PM Hu > Bert > >> > > > >> > > > > wrote: > >> > > > > > > > > >> > > > > > > > Interestingly: gluster volume > status > >> > > > misses gluster1, while heal > >> > > > > > > > statistics show gluster1: > >> > > > > > > > > >> > > > > > > > gluster volume status workdata > >> > > > > > > > Status of volume: workdata > >> > > > > > > > Gluster process > >> > > > TCP Port RDMA Port Online Pid > >> > > > > > > > > >> > > > > ------------------------------------------------------------------------------ > >> > > > > > > > Brick > gluster2:/gluster/md4/workdata > >> > > > 49153 0 Y 1723 > >> > > > > > > > Brick > gluster3:/gluster/md4/workdata > >> > > > 49153 0 Y 2068 > >> > > > > > > > Self-heal Daemon on localhost > >> > > > N/A N/A Y 1732 > >> > > > > > > > Self-heal Daemon on gluster3 > >> > > > N/A N/A Y 2077 > >> > > > > > > > > >> > > > > > > > vs. > >> > > > > > > > > >> > > > > > > > gluster volume heal workdata > statistics > >> > > > heal-count > >> > > > > > > > Gathering count of entries to be > healed > >> > > > on volume workdata has been successful > >> > > > > > > > > >> > > > > > > > Brick > gluster1:/gluster/md4/workdata > >> > > > > > > > Number of entries: 0 > >> > > > > > > > > >> > > > > > > > Brick > gluster2:/gluster/md4/workdata > >> > > > > > > > Number of entries: 10745 > >> > > > > > > > > >> > > > > > > > Brick > gluster3:/gluster/md4/workdata > >> > > > > > > > Number of entries: 10744 > >> > > > > > > > > >> > > > > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr > >> > > > schrieb Hu Bert >> > > > >: > >> > > > > > > > > > >> > > > > > > > > Hi Miling, > >> > > > > > > > > > >> > > > > > > > > well, there are such entries, > but > >> > > > those haven't been a problem during > >> > > > > > > > > install and the last kernel > >> > > > update+reboot. The entries look like: > >> > > > > > > > > > >> > > > > > > > > PUBLIC_IP > gluster2.alpserver.de > >> > > > gluster2 > >> > > > > > > > > > >> > > > > > > > > 192.168.0.50 gluster1 > >> > > > > > > > > 192.168.0.51 gluster2 > >> > > > > > > > > 192.168.0.52 gluster3 > >> > > > > > > > > > >> > > > > > > > > 'ping gluster2' resolves to LAN > IP; I > >> > > > removed the last entry in the > >> > > > > > > > > 1st line, did a reboot ... no, > didn't > >> > > > help. From > >> > > > > > > > > /var/log/glusterfs/glusterd.log > >> > > > > > > > > on gluster 2: > >> > > > > > > > > > >> > > > > > > > > [2019-03-05 07:04:36.188128] E > [MSGID: > >> > > > 106010] > >> > > > > > > > > > >> > > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] > >> > > > 0-management: > >> > > > > > > > > Version of Cksums persistent > differ. > >> > > > local cksum = 3950307018, remote > >> > > > > > > > > cksum = 455409345 on peer > gluster1 > >> > > > > > > > > [2019-03-05 07:04:36.188314] I > [MSGID: > >> > > > 106493] > >> > > > > > > > > > >> > > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] > >> > > > 0-glusterd: > >> > > > > > > > > Responded to gluster1 (0), ret: > 0, > >> > > > op_ret: -1 > >> > > > > > > > > > >> > > > > > > > > Interestingly there are no > entries in > >> > > > the brick logs of the rejected > >> > > > > > > > > server. Well, not surprising as > no > >> > > > brick process is running. The > >> > > > > > > > > server gluster1 is still in > rejected > >> > > > state. > >> > > > > > > > > > >> > > > > > > > > 'gluster volume start workdata > force' > >> > > > starts the brick process on > >> > > > > > > > > gluster1, and some heals are > happening > >> > > > on gluster2+3, but via 'gluster > >> > > > > > > > > volume status workdata' the > volumes > >> > > > still aren't complete. > >> > > > > > > > > > >> > > > > > > > > gluster1: > >> > > > > > > > > > >> > > > > ------------------------------------------------------------------------------ > >> > > > > > > > > Brick > gluster1:/gluster/md4/workdata > >> > > > 49152 0 Y 2523 > >> > > > > > > > > Self-heal Daemon on localhost > >> > > > N/A N/A Y 2549 > >> > > > > > > > > > >> > > > > > > > > gluster2: > >> > > > > > > > > Gluster process > >> > > > TCP Port RDMA Port Online Pid > >> > > > > > > > > > >> > > > > ------------------------------------------------------------------------------ > >> > > > > > > > > Brick > gluster2:/gluster/md4/workdata > >> > > > 49153 0 Y 1723 > >> > > > > > > > > Brick > gluster3:/gluster/md4/workdata > >> > > > 49153 0 Y 2068 > >> > > > > > > > > Self-heal Daemon on localhost > >> > > > N/A N/A Y 1732 > >> > > > > > > > > Self-heal Daemon on gluster3 > >> > > > N/A N/A Y 2077 > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > Hubert > >> > > > > > > > > > >> > > > > > > > > Am Di., 5. M?rz 2019 um 07:58 > Uhr > >> > > > schrieb Milind Changire < > mchangir at redhat.com > >> > > > >: > >> > > > > > > > > > > >> > > > > > > > > > There are probably DNS > entries or > >> > > > /etc/hosts entries with the public IP > Addresses > >> > > > that the host names (gluster1, gluster2, > >> > > > gluster3) are getting resolved to. > >> > > > > > > > > > /etc/resolv.conf would tell > which is > >> > > > the default domain searched for the node > names > >> > > > and the DNS servers which respond to the > queries. > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > On Tue, Mar 5, 2019 at 12:14 > PM Hu > >> > > > Bert >> > > > > wrote: > >> > > > > > > > > >> > >> > > > > > > > > >> Good morning, > >> > > > > > > > > >> > >> > > > > > > > > >> i have a replicate 3 setup > with 2 > >> > > > volumes, running on version 5.3 on > >> > > > > > > > > >> debian stretch. This morning > i > >> > > > upgraded one server to version 5.4 and > >> > > > > > > > > >> rebooted the machine; after > the > >> > > > restart i noticed that: > >> > > > > > > > > >> > >> > > > > > > > > >> - no brick process is running > >> > > > > > > > > >> - gluster volume status only > shows > >> > > > the server itself: > >> > > > > > > > > >> gluster volume status > workdata > >> > > > > > > > > >> Status of volume: workdata > >> > > > > > > > > >> Gluster process > >> > > > TCP Port RDMA Port Online Pid > >> > > > > > > > > >> > >> > > > > ------------------------------------------------------------------------------ > >> > > > > > > > > >> Brick > >> > > > gluster1:/gluster/md4/workdata N/A > >> > > > N/A N N/A > >> > > > > > > > > >> NFS Server on localhost > >> > > > N/A N/A N N/A > >> > > > > > > > > >> > >> > > > > > > > > >> - gluster peer status on the > server > >> > > > > > > > > >> gluster peer status > >> > > > > > > > > >> Number of Peers: 2 > >> > > > > > > > > >> > >> > > > > > > > > >> Hostname: gluster3 > >> > > > > > > > > >> Uuid: > >> > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a > >> > > > > > > > > >> State: Peer Rejected > (Connected) > >> > > > > > > > > >> > >> > > > > > > > > >> Hostname: gluster2 > >> > > > > > > > > >> Uuid: > >> > > > 162fea82-406a-4f51-81a3-e90235d8da27 > >> > > > > > > > > >> State: Peer Rejected > (Connected) > >> > > > > > > > > >> > >> > > > > > > > > >> - gluster peer status on the > other > >> > > > 2 servers: > >> > > > > > > > > >> gluster peer status > >> > > > > > > > > >> Number of Peers: 2 > >> > > > > > > > > >> > >> > > > > > > > > >> Hostname: gluster1 > >> > > > > > > > > >> Uuid: > >> > > > 9a360776-7b58-49ae-831e-a0ce4e4afbef > >> > > > > > > > > >> State: Peer Rejected > (Connected) > >> > > > > > > > > >> > >> > > > > > > > > >> Hostname: gluster3 > >> > > > > > > > > >> Uuid: > >> > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a > >> > > > > > > > > >> State: Peer in Cluster > (Connected) > >> > > > > > > > > >> > >> > > > > > > > > >> I noticed that, in the brick > logs, > >> > > > i see that the public IP is used > >> > > > > > > > > >> instead of the LAN IP. brick > logs > >> > > > from one of the volumes: > >> > > > > > > > > >> > >> > > > > > > > > >> rejected node: > >> > > > https://pastebin.com/qkpj10Sd > >> > > > > > > > > >> connected nodes: > >> > > > https://pastebin.com/8SxVVYFV > >> > > > > > > > > >> > >> > > > > > > > > >> Why is the public IP > suddenly used > >> > > > instead of the LAN IP? Killing all > >> > > > > > > > > >> gluster processes and > rebooting > >> > > > (again) didn't help. > >> > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > > >> Thx, > >> > > > > > > > > >> Hubert > >> > > > > > > > > >> > >> > > > > _______________________________________________ > >> > > > > > > > > >> Gluster-users mailing list > >> > > > > > > > > >> Gluster-users at gluster.org > >> > > > > >> > > > > > > > > >> > >> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > -- > >> > > > > > > > > > Milind > >> > > > > > > > > > > >> > > > > > > > > >> > > > > _______________________________________________ > >> > > > > > > > Gluster-users mailing list > >> > > > > > > > Gluster-users at gluster.org > >> > > > > >> > > > > > > > > >> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > -- > >> > > > > > > Regards, > >> > > > > > > Hari Gowtham. > >> > > > > > >> > > > > > >> > > > > > >> > > > > -- > >> > > > > Regards, > >> > > > > Hari Gowtham. > >> > > > > _______________________________________________ > >> > > > Gluster-users mailing list > >> > > > Gluster-users at gluster.org > >> > > > > >> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > >> > > > > >> > > > _______________________________________________ > >> > > > Gluster-users mailing list > >> > > > Gluster-users at gluster.org Gluster-users at gluster.org> > >> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > >> > > > > >> > > > > >> > > > > >> > > > -- > >> > > > Amar Tumballi (amarts) > >> > > > > >> > > > > >> > > > _______________________________________________ > >> > > > Gluster-users mailing list > >> > > > Gluster-users at gluster.org > >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users > >> > > > > >> > > _______________________________________________ > >> > > Gluster-users mailing list > >> > > Gluster-users at gluster.org > >> > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > -- > > Amar Tumballi (amarts) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From archon810 at gmail.com Tue Mar 19 14:54:39 2019 From: archon810 at gmail.com (Artem Russakovskii) Date: Tue, 19 Mar 2019 07:54:39 -0700 Subject: [Gluster-users] Constant fuse client crashes "fixed" by setting performance.write-behind: off. Any hope for a 4.1.8 release? In-Reply-To: References: <153001d4ddc1$80a1eff0$81e5cfd0$@thinkhuge.net> Message-ID: I upgraded the node that was crashing to 5.5 yesterday. Today, it got another crash. This is a 1x4 replicate cluster, you can find the config mentioned in my previous reports, and Amar should have it as well. Here's the log: ==> mnt-_data1.log <== The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-_data1-replicate-0: selecting local read_child _data1-client-3" repeated 4 times between [2019-03-19 14:40:50.741147] and [2019-03-19 14:40:56.874832] pending frames: frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(1) op(READ) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 6 time of crash: 2019-03-19 14:40:57 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 5.5 /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7ff841f8364c] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7ff841f8dd26] /lib64/libc.so.6(+0x36160)[0x7ff84114a160] /lib64/libc.so.6(gsignal+0x110)[0x7ff84114a0e0] /lib64/libc.so.6(abort+0x151)[0x7ff84114b6c1] /lib64/libc.so.6(+0x2e6fa)[0x7ff8411426fa] /lib64/libc.so.6(+0x2e772)[0x7ff841142772] /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7ff8414d80b8] /usr/lib64/glusterfs/5.5/xlator/cluster/replicate.so(+0x5de3d)[0x7ff839fbae3d] /usr/lib64/glusterfs/5.5/xlator/cluster/replicate.so(+0x70d51)[0x7ff839fcdd51] /usr/lib64/glusterfs/5.5/xlator/protocol/client.so(+0x58e1f)[0x7ff83a252e1f] /usr/lib64/libgfrpc.so.0(+0xe820)[0x7ff841d4e820] /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7ff841d4eb6f] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7ff841d4b063] /usr/lib64/glusterfs/5.5/rpc-transport/socket.so(+0xa0ce)[0x7ff83b9690ce] /usr/lib64/libglusterfs.so.0(+0x85519)[0x7ff841fe1519] /lib64/libpthread.so.0(+0x7559)[0x7ff8414d5559] /lib64/libc.so.6(clone+0x3f)[0x7ff84120c81f] --------- Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | +ArtemRussakovskii | @ArtemR On Mon, Mar 18, 2019 at 9:46 PM Amar Tumballi Suryanarayan < atumball at redhat.com> wrote: > Due to this issue, along with few other logging issues, we did make a > glusterfs-5.5 release, which has the fix for particular crash. > > Regards, > Amar > > On Tue, 19 Mar, 2019, 1:04 AM , wrote: > >> Hello Ville-Pekka and list, >> >> >> >> I believe we are experiencing similar gluster fuse client crashes on 5.3 >> as mentioned here. This morning I made a post in regards. >> >> >> >> https://lists.gluster.org/pipermail/gluster-users/2019-March/036036.html >> >> >> >> Has this "performance.write-behind: off" setting continued to be all you >> needed to workaround the issue? >> >> >> >> Thanks, >> >> >> >> Brandon >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From srangana at redhat.com Tue Mar 19 15:11:03 2019 From: srangana at redhat.com (Shyam Ranganathan) Date: Tue, 19 Mar 2019 11:11:03 -0400 Subject: [Gluster-users] Release 6: Tagged and ready for packaging Message-ID: <924a16b2-8574-0915-b24b-86f88fbbbfa3@redhat.com> Hi, RC1 testing is complete and blockers have been addressed. The release is now tagged for a final round of packaging and package testing before release. Thanks for testing out the RC builds and reporting issues that needed to be addressed. As packaging and final package testing is finishing up, we would be writing the upgrade guide for the release as well, before announcing the release for general consumption. Shyam From brandon at thinkhuge.net Tue Mar 19 16:01:10 2019 From: brandon at thinkhuge.net (brandon at thinkhuge.net) Date: Tue, 19 Mar 2019 09:01:10 -0700 Subject: [Gluster-users] Constant fuse client crashes "fixed" by setting performance.write-behind: off. Any hope for a 4.1.8 release? Message-ID: <104b01d4de6c$f92542f0$eb6fc8d0$@thinkhuge.net> Hey Artem, Wondering have you tried this "performance.write-behind: off" setting? I've added this to my multiple separate gluster clusters but, I won't know until weekend ftp backups run again if it helps with our situation as a workaround. We need this fixed highest priority I know that though. Can anyone please advise what steps can I take to get similar crash log information from CentOS 7 yum repo built gluster? Would that help if I shared that? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim.kinney at gmail.com Tue Mar 19 16:06:17 2019 From: jim.kinney at gmail.com (Jim Kinney) Date: Tue, 19 Mar 2019 12:06:17 -0400 Subject: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0 In-Reply-To: References: Message-ID: Native mount issue with multiple clients (centos7 glusterfs 3.12). Seems to hit python 2.7 and 3+. User tries to open file(s) for write on long process and system eventually times out. Switching to NFS stops the error. No bug notice yet. Too many pans on the fire :-( On Tue, 2019-03-19 at 18:42 +0530, Amar Tumballi Suryanarayan wrote: > Hi Jim, > > On Tue, Mar 19, 2019 at 6:21 PM Jim Kinney > wrote: > > > > > > > > Issues with glusterfs fuse mounts cause issues with python file > > open for write. We have to use nfs to avoid this. > > > > Really want to see better back-end tools to facilitate cleaning up > > of glusterfs failures. If system is going to use hard linked ID, > > need a mapping of id to file to fix things. That option is now on > > for all exports. It should be the default If a host is down and > > users delete files by the thousands, gluster _never_ catches up. > > Finding path names for ids across even a 40TB mount, much less the > > 200+TB one, is a slow process. A network outage of 2 minutes and > > one system didn't get the call to recursively delete several dozen > > directories each with several thousand files. > > > > > > Are you talking about some issues in geo-replication module or some > other application using native mount? Happy to take the discussion > forward about these issues. > Are there any bugs open on this? > Thanks,Amar > > nfsOn March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe < > > happe at nbi.dk> wrote: > > > Hi, > > > Looking into something else I fell over this proposal. Being > > > a > > > shop that are going into "Leaving GlusterFS" mode, I > > > thought I > > > would give my two cents. > > > > > > > > > While being partially an HPC shop with a few Lustre > > > filesystems, > > > we chose GlusterFS for an archiving solution (2-3 PB), > > > because we > > > could find files in the underlying ZFS filesystems if > > > GlusterFS > > > went sour. > > > We have used the access to the underlying files plenty, > > > because > > > of the continuous instability of GlusterFS'. Meanwhile, > > > Lustre > > > have been almost effortless to run and mainly for that > > > reason we > > > are planning to move away from GlusterFS. > > > Reading this proposal kind of underlined that "Leaving > > > GluserFS" > > > is the right thing to do. While I never understood why > > > GlusterFS > > > has been in feature crazy mode instead of stabilizing mode, > > > taking > > > away crucial features I don't get. With RoCE, RDMA is > > > getting > > > mainstream. Quotas are very useful, even though the current > > > implementation are not perfect. Tiering also makes so much > > > sense, > > > but, for large files, not on a per-file level. > > > To be honest we only use quotas. We got scared of trying out > > > new > > > performance features that potentially would open up a new > > > back of > > > issues. > > > Sorry for being such a buzzkill. I really wanted it to be > > > different. > > > > > > > > > Cheers, > > > > > > Hans Henrik > > > > > > > > > On 19/07/2018 08.56, Amar Tumballi > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > Over last 12 years of Gluster, we have developed many > > > > features, and continue to support most of it till now. But > > > > along the way, we have figured out better methods of doing > > > > things. Also we are not actively maintaining some of these > > > > features. > > > > We are now thinking of cleaning up some of these > > > > ?unsupported? features, and mark them as ?SunSet? (i.e., would > > > > be totally taken out of codebase in following releases) in next > > > > upcoming release, v5.0. The release notes will provide options > > > > for smoothly migrating to the supported configurations. > > > > If you are using any of these features, do let us > > > > know, so that we can help you with ?migration?.. Also, we are > > > > happy to guide new developers to work on those components which > > > > are not actively being maintained by current set of developers. > > > > List of features hitting sunset: > > > > ?cluster/stripe? translator: > > > > This translator was developed very early in the > > > > evolution of GlusterFS, and addressed one of the very common > > > > question of Distributed FS, which is ?What happens if one of my > > > > file is bigger than the available brick. Say, I have 2 TB hard > > > > drive, exported in glusterfs, my file is 3 TB?. While it solved > > > > the purpose, it was very hard to handle failure scenarios, and > > > > give a real good experience to our users with this feature. > > > > Over the time, Gluster solved the problem with it?s ?Shard? > > > > feature, which solves the problem in much better way, and > > > > provides much better solution with existing well supported > > > > stack. Hence the proposal for Deprecation. > > > > If you are using this feature, then do write to us, > > > > as it needs a proper migration from existing volume to a new > > > > full supported volume type before you upgrade. > > > > ?storage/bd? translator: > > > > This feature got into the code base 5 years back with > > > > this patch[1]. Plan was to use a block device directly as a > > > > brick, which would help to handle disk-image storage much > > > > easily in glusterfs. > > > > As the feature is not getting more contribution, and > > > > we are not seeing any user traction on this, would like to > > > > propose for Deprecation. > > > > If you are using the feature, plan to move to a > > > > supported gluster volume configuration, and have your setup > > > > ?supported? before upgrading to your new gluster version. > > > > ?RDMA? transport support: > > > > Gluster started supporting RDMA while ib-verbs was > > > > still new, and very high-end infra around that time were using > > > > Infiniband. Engineers did work with Mellanox, and got the > > > > technology into GlusterFS for better data migration, data copy. > > > > While current day kernels support very good speed with IPoIB > > > > module itself, and there are no more bandwidth for experts in > > > > these area to maintain the feature, we recommend migrating over > > > > to TCP (IP based) network for your volume. > > > > If you are successfully using RDMA transport, do get > > > > in touch with us to prioritize the migration plan for your > > > > volume. Plan is to work on this after the release, so by > > > > version 6.0, we will have a cleaner transport code, which just > > > > needs to support one type. > > > > ?Tiering? feature > > > > Gluster?s tiering feature which was planned to be > > > > providing an option to keep your ?hot? data in different > > > > location than your cold data, so one can get better > > > > performance. While we saw some users for the feature, it needs > > > > much more attention to be completely bug free. At the time, we > > > > are not having any active maintainers for the feature, and > > > > hence suggesting to take it out of the ?supported? tag. > > > > If you are willing to take it up, and maintain it, do > > > > let us know, and we are happy to assist you. > > > > If you are already using tiering feature, before > > > > upgrading, make sure to do gluster volume tier detach all the > > > > bricks before upgrading to next release. Also, we recommend you > > > > to use features like dmcache on your LVM setup to get best > > > > performance from bricks. > > > > ?Quota? > > > > This is a call out for ?Quota? feature, to let you > > > > all know that it will be ?no new development? state. While this > > > > feature is ?actively? in use by many people, the challenges we > > > > have in accounting mechanisms involved, has made it hard to > > > > achieve good performance with the feature. Also, the amount of > > > > extended attribute get/set operations while using the feature > > > > is not very ideal. Hence we recommend our users to move towards > > > > setting quota on backend bricks directly (ie, XFS project > > > > quota), or to use different volumes for different directories > > > > etc. > > > > As the feature wouldn?t be deprecated immediately, > > > > the feature doesn?t need a migration plan when you upgrade to > > > > newer version, but if you are a new user, we wouldn?t recommend > > > > setting quota feature. By the release dates, we will be > > > > publishing our best alternatives guide for gluster?s current > > > > quota feature. > > > > Note that if you want to contribute to the feature, > > > > we have project quota based issue open[2] Happy to get > > > > contributions, and help in getting a newer approach to Quota. > > > > > > > > > > > > > > > > These are our set of initial features which we > > > > propose to take out of ?fully? supported features. While we are > > > > in the process of making the user/developer experience of the > > > > project much better with providing well maintained codebase, we > > > > may come up with few more set of features which we may possibly > > > > consider to move out of support, and hence keep watching this > > > > space. > > > > [1] - http://review.gluster.org/4809 > > > > [2] - https://github.com/gluster/glusterfs/issues/184 > > > > > > > > Regards, > > > > > > > > Vijay, Shyam, Amar > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________Gluster- > > > > users mailing listGluster-users at gluster.org > > > > https://lists.glus > > > > -- > > > > Sent from my Android device with K-9 Mail. All tyopes are thumb > > > > related and reflect > > > > authenticity.ter.org/mailman/listinfo/gluster-users > > > > > > > > > > > > -- James P. Kinney III Every time you stop a school, you will have to build a jail. What you gain at one end you lose at the other. It's like feeding a dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark Twain http://heretothereideas.blogspot.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pedro at pmc.digital Mon Mar 18 09:17:43 2019 From: pedro at pmc.digital (Pedro Costa) Date: Mon, 18 Mar 2019 09:17:43 +0000 Subject: [Gluster-users] Help analise statedumps In-Reply-To: References: Message-ID: Hi, Sorry to revive old thread, but just to let you know that with the latest 5.4 version this has virtually stopped happening. I can?t ascertain for sure yet, but since the update the memory footprint of Gluster has been massively reduced. Thanks to everyone, great job. Cheers, P. From: Pedro Costa Sent: 04 February 2019 11:28 To: 'Sanju Rakonde' Cc: 'gluster-users' Subject: RE: [Gluster-users] Help analise statedumps Hi Sanju, If it helps, here?s also a statedump (taken just now) since the reboot?s: https://pmcdigital.sharepoint.com/:u:/g/EbsT2RZsuc5BsRrf7F-fw-4BocyeogW-WvEike_sg8CpZg?e=a7nTqS Many thanks, P. From: Pedro Costa Sent: 04 February 2019 10:12 To: 'Sanju Rakonde' > Cc: gluster-users > Subject: RE: [Gluster-users] Help analise statedumps Hi Sanju, The process was `glusterfs`, yes I took the statedump for the same process (different PID since it was rebooted). Cheers, P. From: Sanju Rakonde > Sent: 04 February 2019 06:10 To: Pedro Costa > Cc: gluster-users > Subject: Re: [Gluster-users] Help analise statedumps Hi, Can you please specify which process has leak? Have you took the statedump of the same process which has leak? Thanks, Sanju On Sat, Feb 2, 2019 at 3:15 PM Pedro Costa > wrote: Hi, I have a 3x replicated cluster running 4.1.7 on ubuntu 16.04.5, all 3 replicas are also clients hosting a Node.js/Nginx web server. The current configuration is as such: Volume Name: gvol1 Type: Replicate Volume ID: XXXXXX Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: vm000000:/srv/brick1/gvol1 Brick2: vm000001:/srv/brick1/gvol1 Brick3: vm000002:/srv/brick1/gvol1 Options Reconfigured: cluster.self-heal-readdir-size: 2KB cluster.self-heal-window-size: 2 cluster.background-self-heal-count: 20 network.ping-timeout: 5 disperse.eager-lock: off performance.parallel-readdir: on performance.readdir-ahead: on performance.rda-cache-limit: 128MB performance.cache-refresh-timeout: 10 performance.nl-cache-timeout: 600 performance.nl-cache: on cluster.nufa: on performance.enable-least-priority: off server.outstanding-rpc-limit: 128 performance.strict-o-direct: on cluster.shd-max-threads: 12 client.event-threads: 4 cluster.lookup-optimize: on network.inode-lru-limit: 90000 performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.cache-samba-metadata: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: on features.utime: on storage.ctime: on server.event-threads: 4 performance.cache-size: 256MB performance.read-ahead: on cluster.readdir-optimize: on cluster.strict-readdir: on performance.io-thread-count: 8 server.allow-insecure: on cluster.read-hash-mode: 0 cluster.lookup-unhashed: auto cluster.choose-local: on I believe there?s a memory leak somewhere, it just keeps going up until it hangs one or more nodes taking the whole cluster down sometimes. I have taken 2 statedumps on one of the nodes, one where the memory is too high and another just after a reboot with the app running and the volume fully healed. https://pmcdigital.sharepoint.com/:u:/g/EYDsNqTf1UdEuE6B0ZNVPfIBf_I-AbaqHotB1lJOnxLlTg?e=boYP09 (high memory) https://pmcdigital.sharepoint.com/:u:/g/EWZBsnET2xBHl6OxO52RCfIBvQ0uIDQ1GKJZ1GrnviyMhg?e=wI3yaY (after reboot) Any help would be greatly appreciated, Kindest Regards, Pedro Maia Costa Senior Developer, pmc.digital _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim.kinney at gmail.com Tue Mar 19 19:52:39 2019 From: jim.kinney at gmail.com (Jim Kinney) Date: Tue, 19 Mar 2019 15:52:39 -0400 Subject: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0 In-Reply-To: References: Message-ID: <2aa722b474de38085772f5513facefa878ff70f3.camel@gmail.com> This python will fail when writing to a file in a glusterfs fuse mounted directory. import mmap # write a simple example filewith open("hello.txt", "wb") as f: f.write("Hello Python!\n") with open("hello.txt", "r+b") as f: # memory-map the file, size 0 means whole file mm = mmap.mmap(f.fileno(), 0) # read content via standard file methods print mm.readline() # prints "Hello Python!" # read content via slice notation print mm[:5] # prints "Hello" # update content using slice notation; # note that new content must have same size mm[6:] = " world!\n" # ... and read again using standard file methods mm.seek(0) print mm.readline() # prints "Hello world!" # close the map mm.close() On Tue, 2019-03-19 at 12:06 -0400, Jim Kinney wrote: > Native mount issue with multiple clients (centos7 glusterfs 3.12). > Seems to hit python 2.7 and 3+. User tries to open file(s) for write > on long process and system eventually times out. > Switching to NFS stops the error. > No bug notice yet. Too many pans on the fire :-( > On Tue, 2019-03-19 at 18:42 +0530, Amar Tumballi Suryanarayan wrote: > > Hi Jim, > > > > On Tue, Mar 19, 2019 at 6:21 PM Jim Kinney > > wrote: > > > > > > > > > > > > Issues with glusterfs fuse mounts cause issues with python file > > > open for write. We have to use nfs to avoid this. > > > > > > Really want to see better back-end tools to facilitate cleaning > > > up of glusterfs failures. If system is going to use hard linked > > > ID, need a mapping of id to file to fix things. That option is > > > now on for all exports. It should be the default If a host is > > > down and users delete files by the thousands, gluster _never_ > > > catches up. Finding path names for ids across even a 40TB mount, > > > much less the 200+TB one, is a slow process. A network outage of > > > 2 minutes and one system didn't get the call to recursively > > > delete several dozen directories each with several thousand > > > files. > > > > > > > > > > Are you talking about some issues in geo-replication module or some > > other application using native mount? Happy to take the discussion > > forward about these issues. > > Are there any bugs open on this? > > Thanks,Amar > > > nfsOn March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe < > > > happe at nbi.dk> wrote: > > > > Hi, > > > > Looking into something else I fell over this proposal. > > > > Being a > > > > shop that are going into "Leaving GlusterFS" mode, I > > > > thought I > > > > would give my two cents. > > > > > > > > > > > > While being partially an HPC shop with a few Lustre > > > > filesystems, > > > > we chose GlusterFS for an archiving solution (2-3 PB), > > > > because we > > > > could find files in the underlying ZFS filesystems if > > > > GlusterFS > > > > went sour. > > > > We have used the access to the underlying files plenty, > > > > because > > > > of the continuous instability of GlusterFS'. Meanwhile, > > > > Lustre > > > > have been almost effortless to run and mainly for that > > > > reason we > > > > are planning to move away from GlusterFS. > > > > Reading this proposal kind of underlined that "Leaving > > > > GluserFS" > > > > is the right thing to do. While I never understood why > > > > GlusterFS > > > > has been in feature crazy mode instead of stabilizing > > > > mode, taking > > > > away crucial features I don't get. With RoCE, RDMA is > > > > getting > > > > mainstream. Quotas are very useful, even though the > > > > current > > > > implementation are not perfect. Tiering also makes so > > > > much sense, > > > > but, for large files, not on a per-file level. > > > > To be honest we only use quotas. We got scared of trying > > > > out new > > > > performance features that potentially would open up a new > > > > back of > > > > issues. > > > > Sorry for being such a buzzkill. I really wanted it to be > > > > different. > > > > > > > > > > > > Cheers, > > > > > > > > Hans Henrik > > > > > > > > > > > > On 19/07/2018 08.56, Amar Tumballi > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > Over last 12 years of Gluster, we have developed > > > > > many features, and continue to support most of it till now. > > > > > But along the way, we have figured out better methods of > > > > > doing things. Also we are not actively maintaining some of > > > > > these features. > > > > > We are now thinking of cleaning up some of these > > > > > ?unsupported? features, and mark them as ?SunSet? (i.e., > > > > > would be totally taken out of codebase in following releases) > > > > > in next upcoming release, v5.0. The release notes will > > > > > provide options for smoothly migrating to the supported > > > > > configurations. > > > > > If you are using any of these features, do let us > > > > > know, so that we can help you with ?migration?.. Also, we are > > > > > happy to guide new developers to work on those components > > > > > which are not actively being maintained by current set of > > > > > developers. > > > > > List of features hitting sunset: > > > > > ?cluster/stripe? translator: > > > > > This translator was developed very early in the > > > > > evolution of GlusterFS, and addressed one of the very common > > > > > question of Distributed FS, which is ?What happens if one of > > > > > my file is bigger than the available brick. Say, I have 2 TB > > > > > hard drive, exported in glusterfs, my file is 3 TB?. While it > > > > > solved the purpose, it was very hard to handle failure > > > > > scenarios, and give a real good experience to our users with > > > > > this feature. Over the time, Gluster solved the problem with > > > > > it?s ?Shard? feature, which solves the problem in much better > > > > > way, and provides much better solution with existing well > > > > > supported stack. Hence the proposal for Deprecation. > > > > > If you are using this feature, then do write to us, > > > > > as it needs a proper migration from existing volume to a new > > > > > full supported volume type before you upgrade. > > > > > ?storage/bd? translator: > > > > > This feature got into the code base 5 years back > > > > > with this patch[1]. Plan was to use a block device directly > > > > > as a brick, which would help to handle disk-image storage > > > > > much easily in glusterfs. > > > > > As the feature is not getting more contribution, > > > > > and we are not seeing any user traction on this, would like > > > > > to propose for Deprecation. > > > > > If you are using the feature, plan to move to a > > > > > supported gluster volume configuration, and have your setup > > > > > ?supported? before upgrading to your new gluster version. > > > > > ?RDMA? transport support: > > > > > Gluster started supporting RDMA while ib-verbs was > > > > > still new, and very high-end infra around that time were > > > > > using Infiniband. Engineers did work with Mellanox, and got > > > > > the technology into GlusterFS for better data migration, data > > > > > copy. While current day kernels support very good speed with > > > > > IPoIB module itself, and there are no more bandwidth for > > > > > experts in these area to maintain the feature, we recommend > > > > > migrating over to TCP (IP based) network for your volume. > > > > > If you are successfully using RDMA transport, do > > > > > get in touch with us to prioritize the migration plan for > > > > > your volume. Plan is to work on this after the release, so by > > > > > version 6.0, we will have a cleaner transport code, which > > > > > just needs to support one type. > > > > > ?Tiering? feature > > > > > Gluster?s tiering feature which was planned to be > > > > > providing an option to keep your ?hot? data in different > > > > > location than your cold data, so one can get better > > > > > performance. While we saw some users for the feature, it > > > > > needs much more attention to be completely bug free. At the > > > > > time, we are not having any active maintainers for the > > > > > feature, and hence suggesting to take it out of the > > > > > ?supported? tag. > > > > > If you are willing to take it up, and maintain it, > > > > > do let us know, and we are happy to assist you. > > > > > If you are already using tiering feature, before > > > > > upgrading, make sure to do gluster volume tier detach all the > > > > > bricks before upgrading to next release. Also, we recommend > > > > > you to use features like dmcache on your LVM setup to get > > > > > best performance from bricks. > > > > > ?Quota? > > > > > This is a call out for ?Quota? feature, to let you > > > > > all know that it will be ?no new development? state. While > > > > > this feature is ?actively? in use by many people, the > > > > > challenges we have in accounting mechanisms involved, has > > > > > made it hard to achieve good performance with the feature. > > > > > Also, the amount of extended attribute get/set operations > > > > > while using the feature is not very ideal. Hence we recommend > > > > > our users to move towards setting quota on backend bricks > > > > > directly (ie, XFS project quota), or to use different volumes > > > > > for different directories etc. > > > > > As the feature wouldn?t be deprecated immediately, > > > > > the feature doesn?t need a migration plan when you upgrade to > > > > > newer version, but if you are a new user, we wouldn?t > > > > > recommend setting quota feature. By the release dates, we > > > > > will be publishing our best alternatives guide for gluster?s > > > > > current quota feature. > > > > > Note that if you want to contribute to the feature, > > > > > we have project quota based issue open[2] Happy to get > > > > > contributions, and help in getting a newer approach to Quota. > > > > > > > > > > > > > > > > > > > > These are our set of initial features which we > > > > > propose to take out of ?fully? supported features. While we > > > > > are in the process of making the user/developer experience of > > > > > the project much better with providing well maintained > > > > > codebase, we may come up with few more set of features which > > > > > we may possibly consider to move out of support, and hence > > > > > keep watching this space. > > > > > [1] - http://review.gluster.org/4809 > > > > > [2] - > > > > > https://github.com/gluster/glusterfs/issues/184 > > > > > > > > > > Regards, > > > > > > > > > > Vijay, Shyam, Amar > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________Gluster- > > > > > users mailing listGluster-users at gluster.org > > > > > https://lists.glus > > > > > -- > > > > > Sent from my Android device with K-9 Mail. All tyopes are > > > > > thumb related and reflect > > > > > authenticity.ter.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > -- > James P. Kinney III > Every time you stop a school, you will have to build a jail. What > yougain at one end you lose at the other. It's like feeding a dog on > hisown tail. It won't fatten the dog.- Speech 11/23/1900 Mark Twain > http://heretothereideas.blogspot.com/ -- James P. Kinney III Every time you stop a school, you will have to build a jail. What you gain at one end you lose at the other. It's like feeding a dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark Twain http://heretothereideas.blogspot.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From schandinp at gmail.com Tue Mar 19 20:27:48 2019 From: schandinp at gmail.com (Pablo Schandin) Date: Tue, 19 Mar 2019 17:27:48 -0300 Subject: [Gluster-users] / - is in split-brain Message-ID: Hello all! I had a volume with only a local brick running vms and recently added a second (remote) brick to the volume. After adding the brick, the heal command reported the following: root at gluster-gu1:~# gluster volume heal gv1 info > Brick gluster-gu1:/mnt/gv_gu1/brick > / - Is in split-brain > Status: Connected > Number of entries: 1 > Brick gluster-gu2:/mnt/gv_gu1/brick > Status: Connected > Number of entries: 0 All other files healed correctly. I noticed that in the xfs of the brick I see a directory named localadmin but when I ls the gluster volume mountpoint I got an error and a lot of ??? root at gluster-gu1:/var/lib/vmImages_gu1# ll > ls: cannot access 'localadmin': No data available > d????????? ? ? ? ? ? localadmin/ This goes for both servers that have that volume gv1 mounted. Both see that directory like that. While in the xfs brick /mnt/gv_gu1/brick/localadmin is an accessible directory. root at gluster-gu1:/mnt/gv_gu1/brick/localadmin# ll > total 4 > drwxr-xr-x 2 localadmin root 6 Mar 7 09:40 ./ > drwxr-xr-x 6 root root 4096 Mar 7 09:40 ../ When I added the second brick to the volume, this localadmin folder was not replicated there I imagine because of this strange behavior. Can someone help me with this? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvin at netvel.net Tue Mar 19 20:34:39 2019 From: alvin at netvel.net (Alvin Starr) Date: Tue, 19 Mar 2019 16:34:39 -0400 Subject: [Gluster-users] recovery from reboot time? Message-ID: We have a simple replicated volume? with 1 brick on each node of 17TB. There is something like 35M files and directories on the volume. One of the servers rebooted and is now "doing something". It kind of looks like its doing some kind of sality check with the node that did not reboot but its hard to say and it looks like it may run for hours/days/months.... Will Gluster take a long time with Lots of little files to resync? -- Alvin Starr || land: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin at netvel.net || From vbellur at redhat.com Tue Mar 19 20:59:42 2019 From: vbellur at redhat.com (Vijay Bellur) Date: Tue, 19 Mar 2019 13:59:42 -0700 Subject: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0 In-Reply-To: <2aa722b474de38085772f5513facefa878ff70f3.camel@gmail.com> References: <2aa722b474de38085772f5513facefa878ff70f3.camel@gmail.com> Message-ID: Thank you for the reproducer! Can you please let us know the output of `gluster volume info`? Regards, Vijay On Tue, Mar 19, 2019 at 12:53 PM Jim Kinney wrote: > This python will fail when writing to a file in a glusterfs fuse mounted > directory. > > import mmap > > # write a simple example file > with open("hello.txt", "wb") as f: > f.write("Hello Python!\n") > > with open("hello.txt", "r+b") as f: > # memory-map the file, size 0 means whole file > mm = mmap.mmap(f.fileno(), 0) > # read content via standard file methods > print mm.readline() # prints "Hello Python!" > # read content via slice notation > print mm[:5] # prints "Hello" > # update content using slice notation; > # note that new content must have same size > mm[6:] = " world!\n" > # ... and read again using standard file methods > mm.seek(0) > print mm.readline() # prints "Hello world!" > # close the map > mm.close() > > > > > > > > On Tue, 2019-03-19 at 12:06 -0400, Jim Kinney wrote: > > Native mount issue with multiple clients (centos7 glusterfs 3.12). > > Seems to hit python 2.7 and 3+. User tries to open file(s) for write on > long process and system eventually times out. > > Switching to NFS stops the error. > > No bug notice yet. Too many pans on the fire :-( > > On Tue, 2019-03-19 at 18:42 +0530, Amar Tumballi Suryanarayan wrote: > > Hi Jim, > > On Tue, Mar 19, 2019 at 6:21 PM Jim Kinney wrote: > > > Issues with glusterfs fuse mounts cause issues with python file open for > write. We have to use nfs to avoid this. > > Really want to see better back-end tools to facilitate cleaning up of > glusterfs failures. If system is going to use hard linked ID, need a > mapping of id to file to fix things. That option is now on for all exports. > It should be the default If a host is down and users delete files by the > thousands, gluster _never_ catches up. Finding path names for ids across > even a 40TB mount, much less the 200+TB one, is a slow process. A network > outage of 2 minutes and one system didn't get the call to recursively > delete several dozen directories each with several thousand files. > > > > Are you talking about some issues in geo-replication module or some other > application using native mount? Happy to take the discussion forward about > these issues. > > Are there any bugs open on this? > > Thanks, > Amar > > > > > nfs > On March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe wrote: > > Hi, > > Looking into something else I fell over this proposal. Being a shop that > are going into "Leaving GlusterFS" mode, I thought I would give my two > cents. > > While being partially an HPC shop with a few Lustre filesystems, we chose > GlusterFS for an archiving solution (2-3 PB), because we could find files > in the underlying ZFS filesystems if GlusterFS went sour. > > We have used the access to the underlying files plenty, because of the > continuous instability of GlusterFS'. Meanwhile, Lustre have been almost > effortless to run and mainly for that reason we are planning to move away > from GlusterFS. > > Reading this proposal kind of underlined that "Leaving GluserFS" is the > right thing to do. While I never understood why GlusterFS has been in > feature crazy mode instead of stabilizing mode, taking away crucial > features I don't get. With RoCE, RDMA is getting mainstream. Quotas are > very useful, even though the current implementation are not perfect. > Tiering also makes so much sense, but, for large files, not on a per-file > level. > > To be honest we only use quotas. We got scared of trying out new > performance features that potentially would open up a new back of issues. > > Sorry for being such a buzzkill. I really wanted it to be different. > > Cheers, > Hans Henrik > On 19/07/2018 08.56, Amar Tumballi wrote: > > > * Hi all, Over last 12 years of Gluster, we have developed many features, > and continue to support most of it till now. But along the way, we have > figured out better methods of doing things. Also we are not actively > maintaining some of these features. We are now thinking of cleaning up some > of these ?unsupported? features, and mark them as ?SunSet? (i.e., would be > totally taken out of codebase in following releases) in next upcoming > release, v5.0. The release notes will provide options for smoothly > migrating to the supported configurations. If you are using any of these > features, do let us know, so that we can help you with ?migration?.. Also, > we are happy to guide new developers to work on those components which are > not actively being maintained by current set of developers. List of > features hitting sunset: ?cluster/stripe? translator: This translator was > developed very early in the evolution of GlusterFS, and addressed one of > the very common question of Distributed FS, which is ?What happens if one > of my file is bigger than the available brick. Say, I have 2 TB hard drive, > exported in glusterfs, my file is 3 TB?. While it solved the purpose, it > was very hard to handle failure scenarios, and give a real good experience > to our users with this feature. Over the time, Gluster solved the problem > with it?s ?Shard? feature, which solves the problem in much better way, and > provides much better solution with existing well supported stack. Hence the > proposal for Deprecation. If you are using this feature, then do write to > us, as it needs a proper migration from existing volume to a new full > supported volume type before you upgrade. ?storage/bd? translator: This > feature got into the code base 5 years back with this patch > [1]. Plan was to use a block device > directly as a brick, which would help to handle disk-image storage much > easily in glusterfs. As the feature is not getting more contribution, and > we are not seeing any user traction on this, would like to propose for > Deprecation. If you are using the feature, plan to move to a supported > gluster volume configuration, and have your setup ?supported? before > upgrading to your new gluster version. ?RDMA? transport support: Gluster > started supporting RDMA while ib-verbs was still new, and very high-end > infra around that time were using Infiniband. Engineers did work with > Mellanox, and got the technology into GlusterFS for better data migration, > data copy. While current day kernels support very good speed with IPoIB > module itself, and there are no more bandwidth for experts in these area to > maintain the feature, we recommend migrating over to TCP (IP based) network > for your volume. If you are successfully using RDMA transport, do get in > touch with us to prioritize the migration plan for your volume. Plan is to > work on this after the release, so by version 6.0, we will have a cleaner > transport code, which just needs to support one type. ?Tiering? feature > Gluster?s tiering feature which was planned to be providing an option to > keep your ?hot? data in different location than your cold data, so one can > get better performance. While we saw some users for the feature, it needs > much more attention to be completely bug free. At the time, we are not > having any active maintainers for the feature, and hence suggesting to take > it out of the ?supported? tag. If you are willing to take it up, and > maintain it, do let us know, and we are happy to assist you. If you are > already using tiering feature, before upgrading, make sure to do gluster > volume tier detach all the bricks before upgrading to next release. Also, > we recommend you to use features like dmcache on your LVM setup to get best > performance from bricks. ?Quota? This is a call out for ?Quota? feature, to > let you all know that it will be ?no new development? state. While this > feature is ?actively? in use by many people, the challenges we have in > accounting mechanisms involved, has made it hard to achieve good > performance with the feature. Also, the amount of extended attribute > get/set operations while using the feature is not very ideal. Hence we > recommend our users to move towards setting quota on backend bricks > directly (ie, XFS project quota), or to use different volumes for different > directories etc. As the feature wouldn?t be deprecated immediately, the > feature doesn?t need a migration plan when you upgrade to newer version, > but if you are a new user, we wouldn?t recommend setting quota feature. By > the release dates, we will be publishing our best alternatives guide for > gluster?s current quota feature. Note that if you want to contribute to the > feature, we have project quota based issue open > [2] Happy to get > contributions, and help in getting a newer approach to Quota. > ------------------------------ These are our set of initial features which > we propose to take out of ?fully? supported features. While we are in the > process of making the user/developer experience of the project much better > with providing well maintained codebase, we may come up with few more set > of features which we may possibly consider to move out of support, and > hence keep watching this space. [1] - http://review.gluster.org/4809 > [2] - > https://github.com/gluster/glusterfs/issues/184 > Regards, Vijay, Shyam, > Amar * > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > > https://lists.glus > > > -- > > > Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity.ter.org/mailman/listinfo/gluster-users > > > > > > -- > > > James P. Kinney III > > > Every time you stop a school, you will have to build a jail. What you > > gain at one end you lose at the other. It's like feeding a dog on his > > own tail. It won't fatten the dog. > > - Speech 11/23/1900 Mark Twain > > > http://heretothereideas.blogspot.com/ > > -- > > James P. Kinney III Every time you stop a school, you will have to build a > jail. What you gain at one end you lose at the other. It's like feeding a > dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark > Twain http://heretothereideas.blogspot.com/ > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim.kinney at gmail.com Tue Mar 19 21:20:57 2019 From: jim.kinney at gmail.com (Jim Kinney) Date: Tue, 19 Mar 2019 17:20:57 -0400 Subject: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0 In-Reply-To: References: <2aa722b474de38085772f5513facefa878ff70f3.camel@gmail.com> Message-ID: Volume Name: home Type: Replicate Volume ID: 5367adb1-99fc-44c3-98c4-71f7a41e628a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp,rdma Bricks: Brick1: bmidata1:/data/glusterfs/home/brick/brick Brick2: bmidata2:/data/glusterfs/home/brick/brick Options Reconfigured: performance.client-io-threads: off storage.build-pgfid: on cluster.self-heal-daemon: enable performance.readdir-ahead: off nfs.disable: off There are 11 other volumes and all are similar. On Tue, 2019-03-19 at 13:59 -0700, Vijay Bellur wrote: > Thank you for the reproducer! Can you please let us know the output > of `gluster volume info`? > Regards, > Vijay > > On Tue, Mar 19, 2019 at 12:53 PM Jim Kinney > wrote: > > This python will fail when writing to a file in a glusterfs fuse > > mounted directory. > > > > import mmap > > > > # write a simple example file > > with open("hello.txt", "wb") as f: > > f.write("Hello Python!\n") > > > > with open("hello.txt", "r+b") as f: > > # memory-map the file, size 0 means whole file > > mm = mmap.mmap(f.fileno(), 0) > > # read content via standard file methods > > print mm.readline() # prints "Hello Python!" > > # read content via slice notation > > print mm[:5] # prints "Hello" > > # update content using slice notation; > > # note that new content must have same size > > mm[6:] = " world!\n" > > # ... and read again using standard file methods > > mm.seek(0) > > print mm.readline() # prints "Hello world!" > > # close the map > > mm.close() > > > > > > > > > > > > > > > > > > On Tue, 2019-03-19 at 12:06 -0400, Jim Kinney wrote: > > > Native mount issue with multiple clients (centos7 glusterfs > > > 3.12). > > > Seems to hit python 2.7 and 3+. User tries to open file(s) for > > > write on long process and system eventually times out. > > > Switching to NFS stops the error. > > > No bug notice yet. Too many pans on the fire :-( > > > On Tue, 2019-03-19 at 18:42 +0530, Amar Tumballi Suryanarayan > > > wrote: > > > > Hi Jim, > > > > > > > > On Tue, Mar 19, 2019 at 6:21 PM Jim Kinney < > > > > jim.kinney at gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > Issues with glusterfs fuse mounts cause issues with python > > > > > file open for write. We have to use nfs to avoid this. > > > > > > > > > > Really want to see better back-end tools to facilitate > > > > > cleaning up of glusterfs failures. If system is going to use > > > > > hard linked ID, need a mapping of id to file to fix things. > > > > > That option is now on for all exports. It should be the > > > > > default If a host is down and users delete files by the > > > > > thousands, gluster _never_ catches up. Finding path names for > > > > > ids across even a 40TB mount, much less the 200+TB one, is a > > > > > slow process. A network outage of 2 minutes and one system > > > > > didn't get the call to recursively delete several dozen > > > > > directories each with several thousand files. > > > > > > > > > > > > > > > > > > Are you talking about some issues in geo-replication module or > > > > some other application using native mount? Happy to take the > > > > discussion forward about these issues. > > > > Are there any bugs open on this? > > > > Thanks,Amar > > > > > nfsOn March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe < > > > > > happe at nbi.dk> wrote: > > > > > > Hi, > > > > > > Looking into something else I fell over this proposal. > > > > > > Being a > > > > > > shop that are going into "Leaving GlusterFS" mode, I > > > > > > thought I > > > > > > would give my two cents. > > > > > > > > > > > > > > > > > > While being partially an HPC shop with a few Lustre > > > > > > filesystems, > > > > > > we chose GlusterFS for an archiving solution (2-3 > > > > > > PB), because we > > > > > > could find files in the underlying ZFS filesystems if > > > > > > GlusterFS > > > > > > went sour. > > > > > > We have used the access to the underlying files plenty, > > > > > > because > > > > > > of the continuous instability of GlusterFS'. > > > > > > Meanwhile, Lustre > > > > > > have been almost effortless to run and mainly for > > > > > > that reason we > > > > > > are planning to move away from GlusterFS. > > > > > > Reading this proposal kind of underlined that "Leaving > > > > > > GluserFS" > > > > > > is the right thing to do. While I never understood > > > > > > why GlusterFS > > > > > > has been in feature crazy mode instead of stabilizing > > > > > > mode, taking > > > > > > away crucial features I don't get. With RoCE, RDMA is > > > > > > getting > > > > > > mainstream. Quotas are very useful, even though the > > > > > > current > > > > > > implementation are not perfect. Tiering also makes so > > > > > > much sense, > > > > > > but, for large files, not on a per-file level. > > > > > > To be honest we only use quotas. We got scared of > > > > > > trying out new > > > > > > performance features that potentially would open up a > > > > > > new back of > > > > > > issues. > > > > > > Sorry for being such a buzzkill. I really wanted it to > > > > > > be > > > > > > different. > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Hans Henrik > > > > > > > > > > > > > > > > > > On 19/07/2018 08.56, Amar Tumballi > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > Over last 12 years of Gluster, we have > > > > > > > developed many features, and continue to support most of > > > > > > > it till now. But along the way, we have figured out > > > > > > > better methods of doing things. Also we are not actively > > > > > > > maintaining some of these features. > > > > > > > We are now thinking of cleaning up some of > > > > > > > these ?unsupported? features, and mark them as ?SunSet? > > > > > > > (i.e., would be totally taken out of codebase in > > > > > > > following releases) in next upcoming release, v5.0. The > > > > > > > release notes will provide options for smoothly migrating > > > > > > > to the supported configurations. > > > > > > > If you are using any of these features, do let > > > > > > > us know, so that we can help you with ?migration?.. Also, > > > > > > > we are happy to guide new developers to work on those > > > > > > > components which are not actively being maintained by > > > > > > > current set of developers. > > > > > > > List of features hitting sunset: > > > > > > > ?cluster/stripe? translator: > > > > > > > This translator was developed very early in the > > > > > > > evolution of GlusterFS, and addressed one of the very > > > > > > > common question of Distributed FS, which is ?What happens > > > > > > > if one of my file is bigger than the available brick. > > > > > > > Say, I have 2 TB hard drive, exported in glusterfs, my > > > > > > > file is 3 TB?. While it solved the purpose, it was very > > > > > > > hard to handle failure scenarios, and give a real good > > > > > > > experience to our users with this feature. Over the time, > > > > > > > Gluster solved the problem with it?s ?Shard? feature, > > > > > > > which solves the problem in much better way, and provides > > > > > > > much better solution with existing well supported stack. > > > > > > > Hence the proposal for Deprecation. > > > > > > > If you are using this feature, then do write to > > > > > > > us, as it needs a proper migration from existing volume > > > > > > > to a new full supported volume type before you upgrade. > > > > > > > ?storage/bd? translator: > > > > > > > This feature got into the code base 5 years > > > > > > > back with this patch[1]. Plan was to use a block device > > > > > > > directly as a brick, which would help to handle disk- > > > > > > > image storage much easily in glusterfs. > > > > > > > As the feature is not getting more > > > > > > > contribution, and we are not seeing any user traction on > > > > > > > this, would like to propose for Deprecation. > > > > > > > If you are using the feature, plan to move to a > > > > > > > supported gluster volume configuration, and have your > > > > > > > setup ?supported? before upgrading to your new gluster > > > > > > > version. > > > > > > > ?RDMA? transport support: > > > > > > > Gluster started supporting RDMA while ib-verbs > > > > > > > was still new, and very high-end infra around that time > > > > > > > were using Infiniband. Engineers did work with Mellanox, > > > > > > > and got the technology into GlusterFS for better data > > > > > > > migration, data copy. While current day kernels support > > > > > > > very good speed with IPoIB module itself, and there are > > > > > > > no more bandwidth for experts in these area to maintain > > > > > > > the feature, we recommend migrating over to TCP (IP > > > > > > > based) network for your volume. > > > > > > > If you are successfully using RDMA transport, > > > > > > > do get in touch with us to prioritize the migration plan > > > > > > > for your volume. Plan is to work on this after the > > > > > > > release, so by version 6.0, we will have a cleaner > > > > > > > transport code, which just needs to support one type. > > > > > > > ?Tiering? feature > > > > > > > Gluster?s tiering feature which was planned to > > > > > > > be providing an option to keep your ?hot? data in > > > > > > > different location than your cold data, so one can get > > > > > > > better performance. While we saw some users for the > > > > > > > feature, it needs much more attention to be completely > > > > > > > bug free. At the time, we are not having any active > > > > > > > maintainers for the feature, and hence suggesting to take > > > > > > > it out of the ?supported? tag. > > > > > > > If you are willing to take it up, and maintain > > > > > > > it, do let us know, and we are happy to assist you. > > > > > > > If you are already using tiering feature, > > > > > > > before upgrading, make sure to do gluster volume tier > > > > > > > detach all the bricks before upgrading to next release. > > > > > > > Also, we recommend you to use features like dmcache on > > > > > > > your LVM setup to get best performance from bricks. > > > > > > > ?Quota? > > > > > > > This is a call out for ?Quota? feature, to let > > > > > > > you all know that it will be ?no new development? state. > > > > > > > While this feature is ?actively? in use by many people, > > > > > > > the challenges we have in accounting mechanisms involved, > > > > > > > has made it hard to achieve good performance with the > > > > > > > feature. Also, the amount of extended attribute get/set > > > > > > > operations while using the feature is not very ideal. > > > > > > > Hence we recommend our users to move towards setting > > > > > > > quota on backend bricks directly (ie, XFS project quota), > > > > > > > or to use different volumes for different directories > > > > > > > etc. > > > > > > > As the feature wouldn?t be deprecated > > > > > > > immediately, the feature doesn?t need a migration plan > > > > > > > when you upgrade to newer version, but if you are a new > > > > > > > user, we wouldn?t recommend setting quota feature. By the > > > > > > > release dates, we will be publishing our best > > > > > > > alternatives guide for gluster?s current quota feature. > > > > > > > Note that if you want to contribute to the > > > > > > > feature, we have project quota based issue open[2] Happy > > > > > > > to get contributions, and help in getting a newer > > > > > > > approach to Quota. > > > > > > > > > > > > > > > > > > > > > > > > > > > > These are our set of initial features which we > > > > > > > propose to take out of ?fully? supported features. While > > > > > > > we are in the process of making the user/developer > > > > > > > experience of the project much better with providing well > > > > > > > maintained codebase, we may come up with few more set of > > > > > > > features which we may possibly consider to move out of > > > > > > > support, and hence keep watching this space. > > > > > > > [1] - http://review.gluster.org/4809 > > > > > > > [2] - > > > > > > > https://github.com/gluster/glusterfs/issues/184 > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > Vijay, Shyam, Amar > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________Glus > > > > > > > ter-users mailing listGluster-users at gluster.org > > > > > > > https://lists.glus > > > > > > > -- > > > > > > > Sent from my Android device with K-9 Mail. All tyopes are > > > > > > > thumb related and reflect > > > > > > > authenticity.ter.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > James P. Kinney III > > > Every time you stop a school, you will have to build a jail. What > > > yougain at one end you lose at the other. It's like feeding a dog > > > on hisown tail. It won't fatten the dog.- Speech 11/23/1900 Mark > > > Twain > > > http://heretothereideas.blogspot.com/ > > -- > > James P. Kinney III > > Every time you stop a school, you will have to build a jail. What > > yougain at one end you lose at the other. It's like feeding a dog > > on hisown tail. It won't fatten the dog.- Speech 11/23/1900 Mark > > Twain > > http://heretothereideas.blogspot.com/ > > > > > > _______________________________________________ > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > -- James P. Kinney III Every time you stop a school, you will have to build a jail. What you gain at one end you lose at the other. It's like feeding a dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark Twain http://heretothereideas.blogspot.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sankarshan.mukhopadhyay at gmail.com Wed Mar 20 01:49:54 2019 From: sankarshan.mukhopadhyay at gmail.com (Sankarshan Mukhopadhyay) Date: Wed, 20 Mar 2019 07:19:54 +0530 Subject: [Gluster-users] [Gluster-Maintainers] Proposal to mark few features as Deprecated / SunSet from Version 5.0 In-Reply-To: References: <2aa722b474de38085772f5513facefa878ff70f3.camel@gmail.com> Message-ID: Now that there are sufficient detail in place, could a Gluster team member file a RHBZ and post it back to this thread? On Wed, Mar 20, 2019 at 2:51 AM Jim Kinney wrote: > > Volume Name: home > Type: Replicate > Volume ID: 5367adb1-99fc-44c3-98c4-71f7a41e628a > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp,rdma > Bricks: > Brick1: bmidata1:/data/glusterfs/home/brick/brick > Brick2: bmidata2:/data/glusterfs/home/brick/brick > Options Reconfigured: > performance.client-io-threads: off > storage.build-pgfid: on > cluster.self-heal-daemon: enable > performance.readdir-ahead: off > nfs.disable: off > > > There are 11 other volumes and all are similar. > > > On Tue, 2019-03-19 at 13:59 -0700, Vijay Bellur wrote: > > Thank you for the reproducer! Can you please let us know the output of `gluster volume info`? > > Regards, > Vijay > > On Tue, Mar 19, 2019 at 12:53 PM Jim Kinney wrote: > > This python will fail when writing to a file in a glusterfs fuse mounted directory. > > import mmap > > # write a simple example file > with open("hello.txt", "wb") as f: > f.write("Hello Python!\n") > > with open("hello.txt", "r+b") as f: > # memory-map the file, size 0 means whole file > mm = mmap.mmap(f.fileno(), 0) > # read content via standard file methods > print mm.readline() # prints "Hello Python!" > # read content via slice notation > print mm[:5] # prints "Hello" > # update content using slice notation; > # note that new content must have same size > mm[6:] = " world!\n" > # ... and read again using standard file methods > mm.seek(0) > print mm.readline() # prints "Hello world!" > # close the map > mm.close() > > > > > > > > On Tue, 2019-03-19 at 12:06 -0400, Jim Kinney wrote: > > Native mount issue with multiple clients (centos7 glusterfs 3.12). > > Seems to hit python 2.7 and 3+. User tries to open file(s) for write on long process and system eventually times out. > > Switching to NFS stops the error. > > No bug notice yet. Too many pans on the fire :-( > > On Tue, 2019-03-19 at 18:42 +0530, Amar Tumballi Suryanarayan wrote: > > Hi Jim, > > On Tue, Mar 19, 2019 at 6:21 PM Jim Kinney wrote: > > > Issues with glusterfs fuse mounts cause issues with python file open for write. We have to use nfs to avoid this. > > Really want to see better back-end tools to facilitate cleaning up of glusterfs failures. If system is going to use hard linked ID, need a mapping of id to file to fix things. That option is now on for all exports. It should be the default If a host is down and users delete files by the thousands, gluster _never_ catches up. Finding path names for ids across even a 40TB mount, much less the 200+TB one, is a slow process. A network outage of 2 minutes and one system didn't get the call to recursively delete several dozen directories each with several thousand files. > > > > Are you talking about some issues in geo-replication module or some other application using native mount? Happy to take the discussion forward about these issues. > > Are there any bugs open on this? > > Thanks, > Amar > > > > > nfs > On March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe wrote: > > Hi, > > Looking into something else I fell over this proposal. Being a shop that are going into "Leaving GlusterFS" mode, I thought I would give my two cents. > > While being partially an HPC shop with a few Lustre filesystems, we chose GlusterFS for an archiving solution (2-3 PB), because we could find files in the underlying ZFS filesystems if GlusterFS went sour. > > We have used the access to the underlying files plenty, because of the continuous instability of GlusterFS'. Meanwhile, Lustre have been almost effortless to run and mainly for that reason we are planning to move away from GlusterFS. > > Reading this proposal kind of underlined that "Leaving GluserFS" is the right thing to do. While I never understood why GlusterFS has been in feature crazy mode instead of stabilizing mode, taking away crucial features I don't get. With RoCE, RDMA is getting mainstream. Quotas are very useful, even though the current implementation are not perfect. Tiering also makes so much sense, but, for large files, not on a per-file level. > > To be honest we only use quotas. We got scared of trying out new performance features that potentially would open up a new back of issues. > > Sorry for being such a buzzkill. I really wanted it to be different. > > Cheers, > Hans Henrik > > On 19/07/2018 08.56, Amar Tumballi wrote: > > Hi all, > > Over last 12 years of Gluster, we have developed many features, and continue to support most of it till now. But along the way, we have figured out better methods of doing things. Also we are not actively maintaining some of these features. > > We are now thinking of cleaning up some of these ?unsupported? features, and mark them as ?SunSet? (i.e., would be totally taken out of codebase in following releases) in next upcoming release, v5.0. The release notes will provide options for smoothly migrating to the supported configurations. > > If you are using any of these features, do let us know, so that we can help you with ?migration?.. Also, we are happy to guide new developers to work on those components which are not actively being maintained by current set of developers. > > List of features hitting sunset: > > ?cluster/stripe? translator: > > This translator was developed very early in the evolution of GlusterFS, and addressed one of the very common question of Distributed FS, which is ?What happens if one of my file is bigger than the available brick. Say, I have 2 TB hard drive, exported in glusterfs, my file is 3 TB?. While it solved the purpose, it was very hard to handle failure scenarios, and give a real good experience to our users with this feature. Over the time, Gluster solved the problem with it?s ?Shard? feature, which solves the problem in much better way, and provides much better solution with existing well supported stack. Hence the proposal for Deprecation. > > If you are using this feature, then do write to us, as it needs a proper migration from existing volume to a new full supported volume type before you upgrade. > > ?storage/bd? translator: > > This feature got into the code base 5 years back with this patch[1]. Plan was to use a block device directly as a brick, which would help to handle disk-image storage much easily in glusterfs. > > As the feature is not getting more contribution, and we are not seeing any user traction on this, would like to propose for Deprecation. > > If you are using the feature, plan to move to a supported gluster volume configuration, and have your setup ?supported? before upgrading to your new gluster version. > > ?RDMA? transport support: > > Gluster started supporting RDMA while ib-verbs was still new, and very high-end infra around that time were using Infiniband. Engineers did work with Mellanox, and got the technology into GlusterFS for better data migration, data copy. While current day kernels support very good speed with IPoIB module itself, and there are no more bandwidth for experts in these area to maintain the feature, we recommend migrating over to TCP (IP based) network for your volume. > > If you are successfully using RDMA transport, do get in touch with us to prioritize the migration plan for your volume. Plan is to work on this after the release, so by version 6.0, we will have a cleaner transport code, which just needs to support one type. > > ?Tiering? feature > > Gluster?s tiering feature which was planned to be providing an option to keep your ?hot? data in different location than your cold data, so one can get better performance. While we saw some users for the feature, it needs much more attention to be completely bug free. At the time, we are not having any active maintainers for the feature, and hence suggesting to take it out of the ?supported? tag. > > If you are willing to take it up, and maintain it, do let us know, and we are happy to assist you. > > If you are already using tiering feature, before upgrading, make sure to do gluster volume tier detach all the bricks before upgrading to next release. Also, we recommend you to use features like dmcache on your LVM setup to get best performance from bricks. > > ?Quota? > > This is a call out for ?Quota? feature, to let you all know that it will be ?no new development? state. While this feature is ?actively? in use by many people, the challenges we have in accounting mechanisms involved, has made it hard to achieve good performance with the feature. Also, the amount of extended attribute get/set operations while using the feature is not very ideal. Hence we recommend our users to move towards setting quota on backend bricks directly (ie, XFS project quota), or to use different volumes for different directories etc. > > As the feature wouldn?t be deprecated immediately, the feature doesn?t need a migration plan when you upgrade to newer version, but if you are a new user, we wouldn?t recommend setting quota feature. By the release dates, we will be publishing our best alternatives guide for gluster?s current quota feature. > > Note that if you want to contribute to the feature, we have project quota based issue open[2] Happy to get contributions, and help in getting a newer approach to Quota. > > > ________________________________ > > These are our set of initial features which we propose to take out of ?fully? supported features. While we are in the process of making the user/developer experience of the project much better with providing well maintained codebase, we may come up with few more set of features which we may possibly consider to move out of support, and hence keep watching this space. > > [1] - http://review.gluster.org/4809 > > [2] - https://github.com/gluster/glusterfs/issues/184 > > Regards, > > Vijay, Shyam, Amar > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > > https://lists.glus > > > -- > > > Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity.ter.org/mailman/listinfo/gluster-users > > > > -- > > > James P. Kinney III > > > Every time you stop a school, you will have to build a jail. What you > > gain at one end you lose at the other. It's like feeding a dog on his > > own tail. It won't fatten the dog. > > - Speech 11/23/1900 Mark Twain > > > http://heretothereideas.blogspot.com/ > > -- > > > James P. Kinney III > > > Every time you stop a school, you will have to build a jail. What you > > gain at one end you lose at the other. It's like feeding a dog on his > > own tail. It won't fatten the dog. > > - Speech 11/23/1900 Mark Twain > > > http://heretothereideas.blogspot.com/ > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > -- > > James P. Kinney III > > Every time you stop a school, you will have to build a jail. What you > gain at one end you lose at the other. It's like feeding a dog on his > own tail. It won't fatten the dog. > - Speech 11/23/1900 Mark Twain > > http://heretothereideas.blogspot.com/ > From sankarshan.mukhopadhyay at gmail.com Wed Mar 20 01:53:04 2019 From: sankarshan.mukhopadhyay at gmail.com (Sankarshan Mukhopadhyay) Date: Wed, 20 Mar 2019 07:23:04 +0530 Subject: [Gluster-users] Help analise statedumps In-Reply-To: References: Message-ID: On Tue, Mar 19, 2019 at 11:09 PM Pedro Costa wrote: > Sorry to revive old thread, but just to let you know that with the latest 5.4 version this has virtually stopped happening. > > I can?t ascertain for sure yet, but since the update the memory footprint of Gluster has been massively reduced. > > Thanks to everyone, great job. Good news is always fantastic to hear! Thank you for reviving the thread and providing feedback. From nbalacha at redhat.com Wed Mar 20 03:16:13 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Wed, 20 Mar 2019 08:46:13 +0530 Subject: [Gluster-users] / - is in split-brain In-Reply-To: References: Message-ID: Hi, What is the output of the gluster volume info ? Thanks, Nithya On Wed, 20 Mar 2019 at 01:58, Pablo Schandin wrote: > Hello all! > > I had a volume with only a local brick running vms and recently added a > second (remote) brick to the volume. After adding the brick, the heal > command reported the following: > > root at gluster-gu1:~# gluster volume heal gv1 info >> Brick gluster-gu1:/mnt/gv_gu1/brick >> / - Is in split-brain >> Status: Connected >> Number of entries: 1 >> Brick gluster-gu2:/mnt/gv_gu1/brick >> Status: Connected >> Number of entries: 0 > > > All other files healed correctly. I noticed that in the xfs of the brick I > see a directory named localadmin but when I ls the gluster volume > mountpoint I got an error and a lot of ??? > > root at gluster-gu1:/var/lib/vmImages_gu1# ll >> ls: cannot access 'localadmin': No data available >> d????????? ? ? ? ? ? localadmin/ > > > This goes for both servers that have that volume gv1 mounted. Both see > that directory like that. While in the xfs brick > /mnt/gv_gu1/brick/localadmin is an accessible directory. > > root at gluster-gu1:/mnt/gv_gu1/brick/localadmin# ll >> total 4 >> drwxr-xr-x 2 localadmin root 6 Mar 7 09:40 ./ >> drwxr-xr-x 6 root root 4096 Mar 7 09:40 ../ > > > When I added the second brick to the volume, this localadmin folder was > not replicated there I imagine because of this strange behavior. > > Can someone help me with this? > Thanks! > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Wed Mar 20 03:26:55 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Wed, 20 Mar 2019 08:56:55 +0530 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: <2ed171d2-df68-ada3-e0de-53f19cb79520@redhat.com> Message-ID: Hi Artem, I think you are running into a different crash. The ones reported which were prevented by turning off write-behind are now fixed. We will need to look into the one you are seeing to see why it is happening. Regards, Nithya On Tue, 19 Mar 2019 at 20:25, Artem Russakovskii wrote: > The flood is indeed fixed for us on 5.5. However, the crashes are not. > > Sincerely, > Artem > > -- > Founder, Android Police , APK Mirror > , Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > | @ArtemR > > > > On Mon, Mar 18, 2019 at 5:41 AM Hu Bert wrote: > >> Hi Amar, >> >> if you refer to this bug: >> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test >> setup i haven't seen those entries, while copying & deleting a few GBs >> of data. For a final statement we have to wait until i updated our >> live gluster servers - could take place on tuesday or wednesday. >> >> Maybe other users can do an update to 5.4 as well and report back here. >> >> >> Hubert >> >> >> >> Am Mo., 18. M?rz 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan >> : >> > >> > Hi Hu Bert, >> > >> > Appreciate the feedback. Also are the other boiling issues related to >> logs fixed now? >> > >> > -Amar >> > >> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert wrote: >> >> >> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2 >> >> volumes done. In 'gluster peer status' the peers stay connected during >> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in the >> >> logs. Looks good :-) >> >> >> >> Am Mo., 18. M?rz 2019 um 09:54 Uhr schrieb Hu Bert < >> revirii at googlemail.com>: >> >> > >> >> > Good morning :-) >> >> > >> >> > for debian the packages are there: >> >> > >> https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ >> >> > >> >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if there >> >> > are some errors etc. and report back. >> >> > >> >> > btw: no release notes for 5.4 and 5.5 so far? >> >> > https://docs.gluster.org/en/latest/release-notes/ ? >> >> > >> >> > Am Fr., 15. M?rz 2019 um 14:28 Uhr schrieb Shyam Ranganathan >> >> > : >> >> > > >> >> > > We created a 5.5 release tag, and it is under packaging now. It >> should >> >> > > be packaged and ready for testing early next week and should be >> released >> >> > > close to mid-week next week. >> >> > > >> >> > > Thanks, >> >> > > Shyam >> >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote: >> >> > > > Wednesday now with no update :-/ >> >> > > > >> >> > > > Sincerely, >> >> > > > Artem >> >> > > > >> >> > > > -- >> >> > > > Founder, Android Police , APK >> Mirror >> >> > > > , Illogical Robot LLC >> >> > > > beerpla.net | +ArtemRussakovskii >> >> > > > | @ArtemR >> >> > > > >> >> > > > >> >> > > > >> >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii < >> archon810 at gmail.com >> >> > > > > wrote: >> >> > > > >> >> > > > Hi Amar, >> >> > > > >> >> > > > Any updates on this? I'm still not seeing it in OpenSUSE >> build >> >> > > > repos. Maybe later today? >> >> > > > >> >> > > > Thanks. >> >> > > > >> >> > > > Sincerely, >> >> > > > Artem >> >> > > > >> >> > > > -- >> >> > > > Founder, Android Police , APK >> Mirror >> >> > > > , Illogical Robot LLC >> >> > > > beerpla.net | +ArtemRussakovskii >> >> > > > | @ArtemR >> >> > > > >> >> > > > >> >> > > > >> >> > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan >> >> > > > > wrote: >> >> > > > >> >> > > > We are talking days. Not weeks. Considering already it is >> >> > > > Thursday here. 1 more day for tagging, and packaging. >> May be ok >> >> > > > to expect it on Monday. >> >> > > > >> >> > > > -Amar >> >> > > > >> >> > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii >> >> > > > > >> wrote: >> >> > > > >> >> > > > Is the next release going to be an imminent hotfix, >> i.e. >> >> > > > something like today/tomorrow, or are we talking >> weeks? >> >> > > > >> >> > > > Sincerely, >> >> > > > Artem >> >> > > > >> >> > > > -- >> >> > > > Founder, Android Police < >> http://www.androidpolice.com>, APK >> >> > > > Mirror , Illogical Robot >> LLC >> >> > > > beerpla.net | >> +ArtemRussakovskii >> >> > > > | >> @ArtemR >> >> > > > >> >> > > > >> >> > > > >> >> > > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii >> >> > > > > >> wrote: >> >> > > > >> >> > > > Ended up downgrading to 5.3 just in case. Peer >> status >> >> > > > and volume status are OK now. >> >> > > > >> >> > > > zypper install --oldpackage >> glusterfs-5.3-lp150.100.1 >> >> > > > Loading repository data... >> >> > > > Reading installed packages... >> >> > > > Resolving package dependencies... >> >> > > > >> >> > > > Problem: glusterfs-5.3-lp150.100.1.x86_64 >> requires >> >> > > > libgfapi0 = 5.3, but this requirement cannot be >> provided >> >> > > > not installable providers: >> >> > > > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] >> >> > > > Solution 1: Following actions will be done: >> >> > > > downgrade of libgfapi0-5.4-lp150.100.1.x86_64 >> to >> >> > > > libgfapi0-5.3-lp150.100.1.x86_64 >> >> > > > downgrade of >> libgfchangelog0-5.4-lp150.100.1.x86_64 to >> >> > > > libgfchangelog0-5.3-lp150.100.1.x86_64 >> >> > > > downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 >> to >> >> > > > libgfrpc0-5.3-lp150.100.1.x86_64 >> >> > > > downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 >> to >> >> > > > libgfxdr0-5.3-lp150.100.1.x86_64 >> >> > > > downgrade of >> libglusterfs0-5.4-lp150.100.1.x86_64 to >> >> > > > libglusterfs0-5.3-lp150.100.1.x86_64 >> >> > > > Solution 2: do not install >> glusterfs-5.3-lp150.100.1.x86_64 >> >> > > > Solution 3: break >> glusterfs-5.3-lp150.100.1.x86_64 by >> >> > > > ignoring some of its dependencies >> >> > > > >> >> > > > Choose from above solutions by number or cancel >> >> > > > [1/2/3/c] (c): 1 >> >> > > > Resolving dependencies... >> >> > > > Resolving package dependencies... >> >> > > > >> >> > > > The following 6 packages are going to be >> downgraded: >> >> > > > glusterfs libgfapi0 libgfchangelog0 libgfrpc0 >> >> > > > libgfxdr0 libglusterfs0 >> >> > > > >> >> > > > 6 packages to downgrade. >> >> > > > >> >> > > > Sincerely, >> >> > > > Artem >> >> > > > >> >> > > > -- >> >> > > > Founder, Android Police >> >> > > > , APK Mirror >> >> > > > , Illogical Robot LLC >> >> > > > beerpla.net | >> +ArtemRussakovskii >> >> > > > | >> @ArtemR >> >> > > > >> >> > > > >> >> > > > >> >> > > > On Tue, Mar 5, 2019 at 10:57 AM Artem >> Russakovskii >> >> > > > > >> wrote: >> >> > > > >> >> > > > Noticed the same when upgrading from 5.3 to >> 5.4, as >> >> > > > mentioned. >> >> > > > >> >> > > > I'm confused though. Is actual replication >> affected, >> >> > > > because the 5.4 server and the 3x 5.3 >> servers still >> >> > > > show heal info as all 4 connected, and the >> files >> >> > > > seem to be replicating correctly as well. >> >> > > > >> >> > > > So what's actually affected - just the status >> >> > > > command, or leaving 5.4 on one of the nodes >> is doing >> >> > > > some damage to the underlying fs? Is it >> fixable by >> >> > > > tweaking transport.socket.ssl-enabled? Does >> >> > > > upgrading all servers to 5.4 resolve it, or >> should >> >> > > > we revert back to 5.3? >> >> > > > >> >> > > > Sincerely, >> >> > > > Artem >> >> > > > >> >> > > > -- >> >> > > > Founder, Android Police >> >> > > > , APK Mirror >> >> > > > , Illogical >> Robot LLC >> >> > > > beerpla.net | >> >> > > > +ArtemRussakovskii >> >> > > > >> >> > > > | @ArtemR >> >> > > > >> >> > > > >> >> > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert >> >> > > > > >> > > > > wrote: >> >> > > > >> >> > > > fyi: did a downgrade 5.4 -> 5.3 and it >> worked. >> >> > > > all replicas are up and >> >> > > > running. Awaiting updated v5.4. >> >> > > > >> >> > > > thx :-) >> >> > > > >> >> > > > Am Di., 5. M?rz 2019 um 09:26 Uhr >> schrieb Hari >> >> > > > Gowtham > >> > > > >: >> >> > > > > >> >> > > > > There are plans to revert the patch >> causing >> >> > > > this error and rebuilt 5.4. >> >> > > > > This should happen faster. the rebuilt >> 5.4 >> >> > > > should be void of this upgrade issue. >> >> > > > > >> >> > > > > In the meantime, you can use 5.3 for >> this cluster. >> >> > > > > Downgrading to 5.3 will work if it was >> just >> >> > > > one node that was upgrade to 5.4 >> >> > > > > and the other nodes are still in 5.3. >> >> > > > > >> >> > > > > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert >> >> > > > > >> > > > > wrote: >> >> > > > > > >> >> > > > > > Hi Hari, >> >> > > > > > >> >> > > > > > thx for the hint. Do you know when >> this will >> >> > > > be fixed? Is a downgrade >> >> > > > > > 5.4 -> 5.3 a possibility to fix this? >> >> > > > > > >> >> > > > > > Hubert >> >> > > > > > >> >> > > > > > Am Di., 5. M?rz 2019 um 08:32 Uhr >> schrieb >> >> > > > Hari Gowtham > >> > > > >: >> >> > > > > > > >> >> > > > > > > Hi, >> >> > > > > > > >> >> > > > > > > This is a known issue we are >> working on. >> >> > > > > > > As the checksum differs between the >> >> > > > updated and non updated node, the >> >> > > > > > > peers are getting rejected. >> >> > > > > > > The bricks aren't coming because >> of the >> >> > > > same issue. >> >> > > > > > > >> >> > > > > > > More about the issue: >> >> > > > >> https://bugzilla.redhat.com/show_bug.cgi?id=1685120 >> >> > > > > > > >> >> > > > > > > On Tue, Mar 5, 2019 at 12:56 PM Hu >> Bert >> >> > > > > >> > > > > wrote: >> >> > > > > > > > >> >> > > > > > > > Interestingly: gluster volume >> status >> >> > > > misses gluster1, while heal >> >> > > > > > > > statistics show gluster1: >> >> > > > > > > > >> >> > > > > > > > gluster volume status workdata >> >> > > > > > > > Status of volume: workdata >> >> > > > > > > > Gluster process >> >> > > > TCP Port RDMA Port Online Pid >> >> > > > > > > > >> >> > > > >> ------------------------------------------------------------------------------ >> >> > > > > > > > Brick >> gluster2:/gluster/md4/workdata >> >> > > > 49153 0 Y 1723 >> >> > > > > > > > Brick >> gluster3:/gluster/md4/workdata >> >> > > > 49153 0 Y 2068 >> >> > > > > > > > Self-heal Daemon on localhost >> >> > > > N/A N/A Y 1732 >> >> > > > > > > > Self-heal Daemon on gluster3 >> >> > > > N/A N/A Y 2077 >> >> > > > > > > > >> >> > > > > > > > vs. >> >> > > > > > > > >> >> > > > > > > > gluster volume heal workdata >> statistics >> >> > > > heal-count >> >> > > > > > > > Gathering count of entries to be >> healed >> >> > > > on volume workdata has been successful >> >> > > > > > > > >> >> > > > > > > > Brick >> gluster1:/gluster/md4/workdata >> >> > > > > > > > Number of entries: 0 >> >> > > > > > > > >> >> > > > > > > > Brick >> gluster2:/gluster/md4/workdata >> >> > > > > > > > Number of entries: 10745 >> >> > > > > > > > >> >> > > > > > > > Brick >> gluster3:/gluster/md4/workdata >> >> > > > > > > > Number of entries: 10744 >> >> > > > > > > > >> >> > > > > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr >> >> > > > schrieb Hu Bert > >> > > > >: >> >> > > > > > > > > >> >> > > > > > > > > Hi Miling, >> >> > > > > > > > > >> >> > > > > > > > > well, there are such entries, >> but >> >> > > > those haven't been a problem during >> >> > > > > > > > > install and the last kernel >> >> > > > update+reboot. The entries look like: >> >> > > > > > > > > >> >> > > > > > > > > PUBLIC_IP >> gluster2.alpserver.de >> >> > > > gluster2 >> >> > > > > > > > > >> >> > > > > > > > > 192.168.0.50 gluster1 >> >> > > > > > > > > 192.168.0.51 gluster2 >> >> > > > > > > > > 192.168.0.52 gluster3 >> >> > > > > > > > > >> >> > > > > > > > > 'ping gluster2' resolves to >> LAN IP; I >> >> > > > removed the last entry in the >> >> > > > > > > > > 1st line, did a reboot ... no, >> didn't >> >> > > > help. From >> >> > > > > > > > > /var/log/glusterfs/glusterd.log >> >> > > > > > > > > on gluster 2: >> >> > > > > > > > > >> >> > > > > > > > > [2019-03-05 07:04:36.188128] E >> [MSGID: >> >> > > > 106010] >> >> > > > > > > > > >> >> > > > >> [glusterd-utils.c:3483:glusterd_compare_friend_volume] >> >> > > > 0-management: >> >> > > > > > > > > Version of Cksums persistent >> differ. >> >> > > > local cksum = 3950307018, remote >> >> > > > > > > > > cksum = 455409345 on peer >> gluster1 >> >> > > > > > > > > [2019-03-05 07:04:36.188314] I >> [MSGID: >> >> > > > 106493] >> >> > > > > > > > > >> >> > > > >> [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] >> >> > > > 0-glusterd: >> >> > > > > > > > > Responded to gluster1 (0), >> ret: 0, >> >> > > > op_ret: -1 >> >> > > > > > > > > >> >> > > > > > > > > Interestingly there are no >> entries in >> >> > > > the brick logs of the rejected >> >> > > > > > > > > server. Well, not surprising >> as no >> >> > > > brick process is running. The >> >> > > > > > > > > server gluster1 is still in >> rejected >> >> > > > state. >> >> > > > > > > > > >> >> > > > > > > > > 'gluster volume start workdata >> force' >> >> > > > starts the brick process on >> >> > > > > > > > > gluster1, and some heals are >> happening >> >> > > > on gluster2+3, but via 'gluster >> >> > > > > > > > > volume status workdata' the >> volumes >> >> > > > still aren't complete. >> >> > > > > > > > > >> >> > > > > > > > > gluster1: >> >> > > > > > > > > >> >> > > > >> ------------------------------------------------------------------------------ >> >> > > > > > > > > Brick >> gluster1:/gluster/md4/workdata >> >> > > > 49152 0 Y 2523 >> >> > > > > > > > > Self-heal Daemon on localhost >> >> > > > N/A N/A Y 2549 >> >> > > > > > > > > >> >> > > > > > > > > gluster2: >> >> > > > > > > > > Gluster process >> >> > > > TCP Port RDMA Port Online Pid >> >> > > > > > > > > >> >> > > > >> ------------------------------------------------------------------------------ >> >> > > > > > > > > Brick >> gluster2:/gluster/md4/workdata >> >> > > > 49153 0 Y 1723 >> >> > > > > > > > > Brick >> gluster3:/gluster/md4/workdata >> >> > > > 49153 0 Y 2068 >> >> > > > > > > > > Self-heal Daemon on localhost >> >> > > > N/A N/A Y 1732 >> >> > > > > > > > > Self-heal Daemon on gluster3 >> >> > > > N/A N/A Y 2077 >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > Hubert >> >> > > > > > > > > >> >> > > > > > > > > Am Di., 5. M?rz 2019 um 07:58 >> Uhr >> >> > > > schrieb Milind Changire < >> mchangir at redhat.com >> >> > > > >: >> >> > > > > > > > > > >> >> > > > > > > > > > There are probably DNS >> entries or >> >> > > > /etc/hosts entries with the public IP >> Addresses >> >> > > > that the host names (gluster1, gluster2, >> >> > > > gluster3) are getting resolved to. >> >> > > > > > > > > > /etc/resolv.conf would tell >> which is >> >> > > > the default domain searched for the node >> names >> >> > > > and the DNS servers which respond to the >> queries. >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > On Tue, Mar 5, 2019 at 12:14 >> PM Hu >> >> > > > Bert > >> > > > > wrote: >> >> > > > > > > > > >> >> >> > > > > > > > > >> Good morning, >> >> > > > > > > > > >> >> >> > > > > > > > > >> i have a replicate 3 setup >> with 2 >> >> > > > volumes, running on version 5.3 on >> >> > > > > > > > > >> debian stretch. This >> morning i >> >> > > > upgraded one server to version 5.4 and >> >> > > > > > > > > >> rebooted the machine; after >> the >> >> > > > restart i noticed that: >> >> > > > > > > > > >> >> >> > > > > > > > > >> - no brick process is >> running >> >> > > > > > > > > >> - gluster volume status >> only shows >> >> > > > the server itself: >> >> > > > > > > > > >> gluster volume status >> workdata >> >> > > > > > > > > >> Status of volume: workdata >> >> > > > > > > > > >> Gluster process >> >> > > > TCP Port RDMA Port Online Pid >> >> > > > > > > > > >> >> >> > > > >> ------------------------------------------------------------------------------ >> >> > > > > > > > > >> Brick >> >> > > > gluster1:/gluster/md4/workdata N/A >> >> > > > N/A N N/A >> >> > > > > > > > > >> NFS Server on localhost >> >> > > > N/A N/A N N/A >> >> > > > > > > > > >> >> >> > > > > > > > > >> - gluster peer status on >> the server >> >> > > > > > > > > >> gluster peer status >> >> > > > > > > > > >> Number of Peers: 2 >> >> > > > > > > > > >> >> >> > > > > > > > > >> Hostname: gluster3 >> >> > > > > > > > > >> Uuid: >> >> > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a >> >> > > > > > > > > >> State: Peer Rejected >> (Connected) >> >> > > > > > > > > >> >> >> > > > > > > > > >> Hostname: gluster2 >> >> > > > > > > > > >> Uuid: >> >> > > > 162fea82-406a-4f51-81a3-e90235d8da27 >> >> > > > > > > > > >> State: Peer Rejected >> (Connected) >> >> > > > > > > > > >> >> >> > > > > > > > > >> - gluster peer status on >> the other >> >> > > > 2 servers: >> >> > > > > > > > > >> gluster peer status >> >> > > > > > > > > >> Number of Peers: 2 >> >> > > > > > > > > >> >> >> > > > > > > > > >> Hostname: gluster1 >> >> > > > > > > > > >> Uuid: >> >> > > > 9a360776-7b58-49ae-831e-a0ce4e4afbef >> >> > > > > > > > > >> State: Peer Rejected >> (Connected) >> >> > > > > > > > > >> >> >> > > > > > > > > >> Hostname: gluster3 >> >> > > > > > > > > >> Uuid: >> >> > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a >> >> > > > > > > > > >> State: Peer in Cluster >> (Connected) >> >> > > > > > > > > >> >> >> > > > > > > > > >> I noticed that, in the >> brick logs, >> >> > > > i see that the public IP is used >> >> > > > > > > > > >> instead of the LAN IP. >> brick logs >> >> > > > from one of the volumes: >> >> > > > > > > > > >> >> >> > > > > > > > > >> rejected node: >> >> > > > https://pastebin.com/qkpj10Sd >> >> > > > > > > > > >> connected nodes: >> >> > > > https://pastebin.com/8SxVVYFV >> >> > > > > > > > > >> >> >> > > > > > > > > >> Why is the public IP >> suddenly used >> >> > > > instead of the LAN IP? Killing all >> >> > > > > > > > > >> gluster processes and >> rebooting >> >> > > > (again) didn't help. >> >> > > > > > > > > >> >> >> > > > > > > > > >> >> >> > > > > > > > > >> Thx, >> >> > > > > > > > > >> Hubert >> >> > > > > > > > > >> >> >> > > > >> _______________________________________________ >> >> > > > > > > > > >> Gluster-users mailing list >> >> > > > > > > > > >> Gluster-users at gluster.org >> >> > > > >> >> > > > > > > > > >> >> >> > > > >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > -- >> >> > > > > > > > > > Milind >> >> > > > > > > > > > >> >> > > > > > > > >> >> > > > >> _______________________________________________ >> >> > > > > > > > Gluster-users mailing list >> >> > > > > > > > Gluster-users at gluster.org >> >> > > > >> >> > > > > > > > >> >> > > > >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > -- >> >> > > > > > > Regards, >> >> > > > > > > Hari Gowtham. >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > -- >> >> > > > > Regards, >> >> > > > > Hari Gowtham. >> >> > > > >> _______________________________________________ >> >> > > > Gluster-users mailing list >> >> > > > Gluster-users at gluster.org >> >> > > > >> >> > > > >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > > >> >> > > > _______________________________________________ >> >> > > > Gluster-users mailing list >> >> > > > Gluster-users at gluster.org > Gluster-users at gluster.org> >> >> > > > >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > > >> >> > > > >> >> > > > >> >> > > > -- >> >> > > > Amar Tumballi (amarts) >> >> > > > >> >> > > > >> >> > > > _______________________________________________ >> >> > > > Gluster-users mailing list >> >> > > > Gluster-users at gluster.org >> >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > > >> >> > > _______________________________________________ >> >> > > Gluster-users mailing list >> >> > > Gluster-users at gluster.org >> >> > > https://lists.gluster.org/mailman/listinfo/gluster-users >> > >> > >> > >> > -- >> > Amar Tumballi (amarts) >> > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From jthottan at redhat.com Wed Mar 20 04:04:05 2019 From: jthottan at redhat.com (Jiffin Thottan) Date: Wed, 20 Mar 2019 00:04:05 -0400 (EDT) Subject: [Gluster-users] NFS export of gluster - solution In-Reply-To: References: <4db533c1-3710-31b0-64ca-486c19fb4a63@nyu.edu> <864377152.13828951.1552967718384.JavaMail.zimbra@redhat.com> Message-ID: <1730596461.14095487.1553054645670.JavaMail.zimbra@redhat.com> ----- Original Message ----- From: "Sankarshan Mukhopadhyay" Cc: "gluster-users" Sent: Tuesday, March 19, 2019 10:07:36 AM Subject: Re: [Gluster-users] NFS export of gluster - solution On Tue, Mar 19, 2019 at 9:25 AM Jiffin Thottan wrote: > > Thanks Valerio for sharing the information > > ----- Original Message ----- > From: "Valerio Luccio" > To: "gluster-users" > Sent: Monday, March 18, 2019 8:37:46 PM > Subject: [Gluster-users] NFS export of gluster - solution > > So, I recently start NFS exporting of my gluster so that I could mount > it from a legacy Mac OS X server. Every 24/36 hours the export seemed to > freeze causing the server to seize up. The ganesha log was filled with > errors related to RQUOTA. Frank Filz of the nfs-ganesha suggested that > I'd try setting "Enable_RQUOTA = false;" in the NFS_CORE_PARAM config > block of the ganesha.conf file and that seems to have done the trick, 5 > days and counting without a problem. > Does this configuration change need to be updated in any existing documentation (for Gluster, nfs-ganesha)? Created a pull for the same https://github.com/gluster/glusterdocs/pull/461 -- Jiffin _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users From vbellur at redhat.com Wed Mar 20 04:07:46 2019 From: vbellur at redhat.com (Vijay Bellur) Date: Tue, 19 Mar 2019 21:07:46 -0700 Subject: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0 In-Reply-To: References: <2aa722b474de38085772f5513facefa878ff70f3.camel@gmail.com> Message-ID: I tried this configuration on my local setup and the test passed fine. Adding the fuse and write-behind maintainers in Gluster to check if they are aware of any oddities with using mmap & fuse. Thanks, Vijay On Tue, Mar 19, 2019 at 2:21 PM Jim Kinney wrote: > Volume Name: home > Type: Replicate > Volume ID: 5367adb1-99fc-44c3-98c4-71f7a41e628a > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp,rdma > Bricks: > Brick1: bmidata1:/data/glusterfs/home/brick/brick > Brick2: bmidata2:/data/glusterfs/home/brick/brick > Options Reconfigured: > performance.client-io-threads: off > storage.build-pgfid: on > cluster.self-heal-daemon: enable > performance.readdir-ahead: off > nfs.disable: off > > > There are 11 other volumes and all are similar. > > > On Tue, 2019-03-19 at 13:59 -0700, Vijay Bellur wrote: > > Thank you for the reproducer! Can you please let us know the output of > `gluster volume info`? > > Regards, > Vijay > > On Tue, Mar 19, 2019 at 12:53 PM Jim Kinney wrote: > > This python will fail when writing to a file in a glusterfs fuse mounted > directory. > > import mmap > > # write a simple example file > with open("hello.txt", "wb") as f: > f.write("Hello Python!\n") > > with open("hello.txt", "r+b") as f: > # memory-map the file, size 0 means whole file > mm = mmap.mmap(f.fileno(), 0) > # read content via standard file methods > print mm.readline() # prints "Hello Python!" > # read content via slice notation > print mm[:5] # prints "Hello" > # update content using slice notation; > # note that new content must have same size > mm[6:] = " world!\n" > # ... and read again using standard file methods > mm.seek(0) > print mm.readline() # prints "Hello world!" > # close the map > mm.close() > > > > > > > > On Tue, 2019-03-19 at 12:06 -0400, Jim Kinney wrote: > > Native mount issue with multiple clients (centos7 glusterfs 3.12). > > Seems to hit python 2.7 and 3+. User tries to open file(s) for write on > long process and system eventually times out. > > Switching to NFS stops the error. > > No bug notice yet. Too many pans on the fire :-( > > On Tue, 2019-03-19 at 18:42 +0530, Amar Tumballi Suryanarayan wrote: > > Hi Jim, > > On Tue, Mar 19, 2019 at 6:21 PM Jim Kinney wrote: > > > Issues with glusterfs fuse mounts cause issues with python file open for > write. We have to use nfs to avoid this. > > Really want to see better back-end tools to facilitate cleaning up of > glusterfs failures. If system is going to use hard linked ID, need a > mapping of id to file to fix things. That option is now on for all exports. > It should be the default If a host is down and users delete files by the > thousands, gluster _never_ catches up. Finding path names for ids across > even a 40TB mount, much less the 200+TB one, is a slow process. A network > outage of 2 minutes and one system didn't get the call to recursively > delete several dozen directories each with several thousand files. > > > > Are you talking about some issues in geo-replication module or some other > application using native mount? Happy to take the discussion forward about > these issues. > > Are there any bugs open on this? > > Thanks, > Amar > > > > > nfs > On March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe wrote: > > Hi, > > Looking into something else I fell over this proposal. Being a shop that > are going into "Leaving GlusterFS" mode, I thought I would give my two > cents. > > While being partially an HPC shop with a few Lustre filesystems, we chose > GlusterFS for an archiving solution (2-3 PB), because we could find files > in the underlying ZFS filesystems if GlusterFS went sour. > > We have used the access to the underlying files plenty, because of the > continuous instability of GlusterFS'. Meanwhile, Lustre have been almost > effortless to run and mainly for that reason we are planning to move away > from GlusterFS. > > Reading this proposal kind of underlined that "Leaving GluserFS" is the > right thing to do. While I never understood why GlusterFS has been in > feature crazy mode instead of stabilizing mode, taking away crucial > features I don't get. With RoCE, RDMA is getting mainstream. Quotas are > very useful, even though the current implementation are not perfect. > Tiering also makes so much sense, but, for large files, not on a per-file > level. > > To be honest we only use quotas. We got scared of trying out new > performance features that potentially would open up a new back of issues. > > Sorry for being such a buzzkill. I really wanted it to be different. > > Cheers, > Hans Henrik > On 19/07/2018 08.56, Amar Tumballi wrote: > > > * Hi all, Over last 12 years of Gluster, we have developed many features, > and continue to support most of it till now. But along the way, we have > figured out better methods of doing things. Also we are not actively > maintaining some of these features. We are now thinking of cleaning up some > of these ?unsupported? features, and mark them as ?SunSet? (i.e., would be > totally taken out of codebase in following releases) in next upcoming > release, v5.0. The release notes will provide options for smoothly > migrating to the supported configurations. If you are using any of these > features, do let us know, so that we can help you with ?migration?.. Also, > we are happy to guide new developers to work on those components which are > not actively being maintained by current set of developers. List of > features hitting sunset: ?cluster/stripe? translator: This translator was > developed very early in the evolution of GlusterFS, and addressed one of > the very common question of Distributed FS, which is ?What happens if one > of my file is bigger than the available brick. Say, I have 2 TB hard drive, > exported in glusterfs, my file is 3 TB?. While it solved the purpose, it > was very hard to handle failure scenarios, and give a real good experience > to our users with this feature. Over the time, Gluster solved the problem > with it?s ?Shard? feature, which solves the problem in much better way, and > provides much better solution with existing well supported stack. Hence the > proposal for Deprecation. If you are using this feature, then do write to > us, as it needs a proper migration from existing volume to a new full > supported volume type before you upgrade. ?storage/bd? translator: This > feature got into the code base 5 years back with this patch > [1]. Plan was to use a block device > directly as a brick, which would help to handle disk-image storage much > easily in glusterfs. As the feature is not getting more contribution, and > we are not seeing any user traction on this, would like to propose for > Deprecation. If you are using the feature, plan to move to a supported > gluster volume configuration, and have your setup ?supported? before > upgrading to your new gluster version. ?RDMA? transport support: Gluster > started supporting RDMA while ib-verbs was still new, and very high-end > infra around that time were using Infiniband. Engineers did work with > Mellanox, and got the technology into GlusterFS for better data migration, > data copy. While current day kernels support very good speed with IPoIB > module itself, and there are no more bandwidth for experts in these area to > maintain the feature, we recommend migrating over to TCP (IP based) network > for your volume. If you are successfully using RDMA transport, do get in > touch with us to prioritize the migration plan for your volume. Plan is to > work on this after the release, so by version 6.0, we will have a cleaner > transport code, which just needs to support one type. ?Tiering? feature > Gluster?s tiering feature which was planned to be providing an option to > keep your ?hot? data in different location than your cold data, so one can > get better performance. While we saw some users for the feature, it needs > much more attention to be completely bug free. At the time, we are not > having any active maintainers for the feature, and hence suggesting to take > it out of the ?supported? tag. If you are willing to take it up, and > maintain it, do let us know, and we are happy to assist you. If you are > already using tiering feature, before upgrading, make sure to do gluster > volume tier detach all the bricks before upgrading to next release. Also, > we recommend you to use features like dmcache on your LVM setup to get best > performance from bricks. ?Quota? This is a call out for ?Quota? feature, to > let you all know that it will be ?no new development? state. While this > feature is ?actively? in use by many people, the challenges we have in > accounting mechanisms involved, has made it hard to achieve good > performance with the feature. Also, the amount of extended attribute > get/set operations while using the feature is not very ideal. Hence we > recommend our users to move towards setting quota on backend bricks > directly (ie, XFS project quota), or to use different volumes for different > directories etc. As the feature wouldn?t be deprecated immediately, the > feature doesn?t need a migration plan when you upgrade to newer version, > but if you are a new user, we wouldn?t recommend setting quota feature. By > the release dates, we will be publishing our best alternatives guide for > gluster?s current quota feature. Note that if you want to contribute to the > feature, we have project quota based issue open > [2] Happy to get > contributions, and help in getting a newer approach to Quota. > ------------------------------ These are our set of initial features which > we propose to take out of ?fully? supported features. While we are in the > process of making the user/developer experience of the project much better > with providing well maintained codebase, we may come up with few more set > of features which we may possibly consider to move out of support, and > hence keep watching this space. [1] - http://review.gluster.org/4809 > [2] - > https://github.com/gluster/glusterfs/issues/184 > Regards, Vijay, Shyam, > Amar * > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > > https://lists.glus > > > -- > > > Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity.ter.org/mailman/listinfo/gluster-users > > > > > > -- > > > James P. Kinney III > > > Every time you stop a school, you will have to build a jail. What you > > gain at one end you lose at the other. It's like feeding a dog on his > > own tail. It won't fatten the dog. > > - Speech 11/23/1900 Mark Twain > > > http://heretothereideas.blogspot.com/ > > -- > > > James P. Kinney III > > > Every time you stop a school, you will have to build a jail. What you > > gain at one end you lose at the other. It's like feeding a dog on his > > own tail. It won't fatten the dog. > > - Speech 11/23/1900 Mark Twain > > > http://heretothereideas.blogspot.com/ > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > -- > > James P. Kinney III Every time you stop a school, you will have to build a > jail. What you gain at one end you lose at the other. It's like feeding a > dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark > Twain http://heretothereideas.blogspot.com/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From archon810 at gmail.com Wed Mar 20 04:18:43 2019 From: archon810 at gmail.com (Artem Russakovskii) Date: Tue, 19 Mar 2019 21:18:43 -0700 Subject: [Gluster-users] Constant fuse client crashes "fixed" by setting performance.write-behind: off. Any hope for a 4.1.8 release? In-Reply-To: <104b01d4de6c$f92542f0$eb6fc8d0$@thinkhuge.net> References: <104b01d4de6c$f92542f0$eb6fc8d0$@thinkhuge.net> Message-ID: Brandon, I've had performance.write-behind: off for weeks ever since it was suggested as a fix, but the crashes kept coming. Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | +ArtemRussakovskii | @ArtemR On Tue, Mar 19, 2019 at 9:01 AM wrote: > Hey Artem, > > > > Wondering have you tried this "performance.write-behind: off" setting? > I've added this to my multiple separate gluster clusters but, I won't know > until weekend ftp backups run again if it helps with our situation as a > workaround. > > > > We need this fixed highest priority I know that though. > > > > Can anyone please advise what steps can I take to get similar crash log > information from CentOS 7 yum repo built gluster? Would that help if I > shared that? > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From archon810 at gmail.com Wed Mar 20 04:21:22 2019 From: archon810 at gmail.com (Artem Russakovskii) Date: Tue, 19 Mar 2019 21:21:22 -0700 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: <2ed171d2-df68-ada3-e0de-53f19cb79520@redhat.com> Message-ID: Can I roll back performance.write-behind: off and lru-limit=0 then? I'm waiting for the debug packages to be available for OpenSUSE, then I can help Amar with another debug session. In the meantime, have you had time to set up 1x4 replicate testing? I was told you were only testing 1x3, and it's the 4th brick that may be causing the crash, which is consistent with this whole time only 1 of 4 bricks constantly crashing. The other 3 have been rock solid. I'm hoping you could find the issue without a debug session this way. Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | +ArtemRussakovskii | @ArtemR On Tue, Mar 19, 2019 at 8:27 PM Nithya Balachandran wrote: > Hi Artem, > > I think you are running into a different crash. The ones reported which > were prevented by turning off write-behind are now fixed. > We will need to look into the one you are seeing to see why it is > happening. > > Regards, > Nithya > > > On Tue, 19 Mar 2019 at 20:25, Artem Russakovskii > wrote: > >> The flood is indeed fixed for us on 5.5. However, the crashes are not. >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police , APK Mirror >> , Illogical Robot LLC >> beerpla.net | +ArtemRussakovskii >> | @ArtemR >> >> >> >> On Mon, Mar 18, 2019 at 5:41 AM Hu Bert wrote: >> >>> Hi Amar, >>> >>> if you refer to this bug: >>> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test >>> setup i haven't seen those entries, while copying & deleting a few GBs >>> of data. For a final statement we have to wait until i updated our >>> live gluster servers - could take place on tuesday or wednesday. >>> >>> Maybe other users can do an update to 5.4 as well and report back here. >>> >>> >>> Hubert >>> >>> >>> >>> Am Mo., 18. M?rz 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan >>> : >>> > >>> > Hi Hu Bert, >>> > >>> > Appreciate the feedback. Also are the other boiling issues related to >>> logs fixed now? >>> > >>> > -Amar >>> > >>> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert >>> wrote: >>> >> >>> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2 >>> >> volumes done. In 'gluster peer status' the peers stay connected during >>> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in the >>> >> logs. Looks good :-) >>> >> >>> >> Am Mo., 18. M?rz 2019 um 09:54 Uhr schrieb Hu Bert < >>> revirii at googlemail.com>: >>> >> > >>> >> > Good morning :-) >>> >> > >>> >> > for debian the packages are there: >>> >> > >>> https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ >>> >> > >>> >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if >>> there >>> >> > are some errors etc. and report back. >>> >> > >>> >> > btw: no release notes for 5.4 and 5.5 so far? >>> >> > https://docs.gluster.org/en/latest/release-notes/ ? >>> >> > >>> >> > Am Fr., 15. M?rz 2019 um 14:28 Uhr schrieb Shyam Ranganathan >>> >> > : >>> >> > > >>> >> > > We created a 5.5 release tag, and it is under packaging now. It >>> should >>> >> > > be packaged and ready for testing early next week and should be >>> released >>> >> > > close to mid-week next week. >>> >> > > >>> >> > > Thanks, >>> >> > > Shyam >>> >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote: >>> >> > > > Wednesday now with no update :-/ >>> >> > > > >>> >> > > > Sincerely, >>> >> > > > Artem >>> >> > > > >>> >> > > > -- >>> >> > > > Founder, Android Police , APK >>> Mirror >>> >> > > > , Illogical Robot LLC >>> >> > > > beerpla.net | +ArtemRussakovskii >>> >> > > > | @ArtemR >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii < >>> archon810 at gmail.com >>> >> > > > > wrote: >>> >> > > > >>> >> > > > Hi Amar, >>> >> > > > >>> >> > > > Any updates on this? I'm still not seeing it in OpenSUSE >>> build >>> >> > > > repos. Maybe later today? >>> >> > > > >>> >> > > > Thanks. >>> >> > > > >>> >> > > > Sincerely, >>> >> > > > Artem >>> >> > > > >>> >> > > > -- >>> >> > > > Founder, Android Police , >>> APK Mirror >>> >> > > > , Illogical Robot LLC >>> >> > > > beerpla.net | +ArtemRussakovskii >>> >> > > > | @ArtemR >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan >>> >> > > > > wrote: >>> >> > > > >>> >> > > > We are talking days. Not weeks. Considering already it >>> is >>> >> > > > Thursday here. 1 more day for tagging, and packaging. >>> May be ok >>> >> > > > to expect it on Monday. >>> >> > > > >>> >> > > > -Amar >>> >> > > > >>> >> > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii >>> >> > > > > >>> wrote: >>> >> > > > >>> >> > > > Is the next release going to be an imminent hotfix, >>> i.e. >>> >> > > > something like today/tomorrow, or are we talking >>> weeks? >>> >> > > > >>> >> > > > Sincerely, >>> >> > > > Artem >>> >> > > > >>> >> > > > -- >>> >> > > > Founder, Android Police < >>> http://www.androidpolice.com>, APK >>> >> > > > Mirror , Illogical >>> Robot LLC >>> >> > > > beerpla.net | >>> +ArtemRussakovskii >>> >> > > > | >>> @ArtemR >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii >>> >> > > > > >>> wrote: >>> >> > > > >>> >> > > > Ended up downgrading to 5.3 just in case. Peer >>> status >>> >> > > > and volume status are OK now. >>> >> > > > >>> >> > > > zypper install --oldpackage >>> glusterfs-5.3-lp150.100.1 >>> >> > > > Loading repository data... >>> >> > > > Reading installed packages... >>> >> > > > Resolving package dependencies... >>> >> > > > >>> >> > > > Problem: glusterfs-5.3-lp150.100.1.x86_64 >>> requires >>> >> > > > libgfapi0 = 5.3, but this requirement cannot be >>> provided >>> >> > > > not installable providers: >>> >> > > > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] >>> >> > > > Solution 1: Following actions will be done: >>> >> > > > downgrade of libgfapi0-5.4-lp150.100.1.x86_64 >>> to >>> >> > > > libgfapi0-5.3-lp150.100.1.x86_64 >>> >> > > > downgrade of >>> libgfchangelog0-5.4-lp150.100.1.x86_64 to >>> >> > > > libgfchangelog0-5.3-lp150.100.1.x86_64 >>> >> > > > downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 >>> to >>> >> > > > libgfrpc0-5.3-lp150.100.1.x86_64 >>> >> > > > downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 >>> to >>> >> > > > libgfxdr0-5.3-lp150.100.1.x86_64 >>> >> > > > downgrade of >>> libglusterfs0-5.4-lp150.100.1.x86_64 to >>> >> > > > libglusterfs0-5.3-lp150.100.1.x86_64 >>> >> > > > Solution 2: do not install >>> glusterfs-5.3-lp150.100.1.x86_64 >>> >> > > > Solution 3: break >>> glusterfs-5.3-lp150.100.1.x86_64 by >>> >> > > > ignoring some of its dependencies >>> >> > > > >>> >> > > > Choose from above solutions by number or cancel >>> >> > > > [1/2/3/c] (c): 1 >>> >> > > > Resolving dependencies... >>> >> > > > Resolving package dependencies... >>> >> > > > >>> >> > > > The following 6 packages are going to be >>> downgraded: >>> >> > > > glusterfs libgfapi0 libgfchangelog0 libgfrpc0 >>> >> > > > libgfxdr0 libglusterfs0 >>> >> > > > >>> >> > > > 6 packages to downgrade. >>> >> > > > >>> >> > > > Sincerely, >>> >> > > > Artem >>> >> > > > >>> >> > > > -- >>> >> > > > Founder, Android Police >>> >> > > > , APK Mirror >>> >> > > > , Illogical Robot >>> LLC >>> >> > > > beerpla.net | >>> +ArtemRussakovskii >>> >> > > > | >>> @ArtemR >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > On Tue, Mar 5, 2019 at 10:57 AM Artem >>> Russakovskii >>> >> > > > >> archon810 at gmail.com>> wrote: >>> >> > > > >>> >> > > > Noticed the same when upgrading from 5.3 to >>> 5.4, as >>> >> > > > mentioned. >>> >> > > > >>> >> > > > I'm confused though. Is actual replication >>> affected, >>> >> > > > because the 5.4 server and the 3x 5.3 >>> servers still >>> >> > > > show heal info as all 4 connected, and the >>> files >>> >> > > > seem to be replicating correctly as well. >>> >> > > > >>> >> > > > So what's actually affected - just the >>> status >>> >> > > > command, or leaving 5.4 on one of the nodes >>> is doing >>> >> > > > some damage to the underlying fs? Is it >>> fixable by >>> >> > > > tweaking transport.socket.ssl-enabled? Does >>> >> > > > upgrading all servers to 5.4 resolve it, or >>> should >>> >> > > > we revert back to 5.3? >>> >> > > > >>> >> > > > Sincerely, >>> >> > > > Artem >>> >> > > > >>> >> > > > -- >>> >> > > > Founder, Android Police >>> >> > > > , APK Mirror >>> >> > > > , Illogical >>> Robot LLC >>> >> > > > beerpla.net | >>> >> > > > +ArtemRussakovskii >>> >> > > > >> > >>> >> > > > | @ArtemR >>> >> > > > >>> >> > > > >>> >> > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert >>> >> > > > >> >> > > > > wrote: >>> >> > > > >>> >> > > > fyi: did a downgrade 5.4 -> 5.3 and it >>> worked. >>> >> > > > all replicas are up and >>> >> > > > running. Awaiting updated v5.4. >>> >> > > > >>> >> > > > thx :-) >>> >> > > > >>> >> > > > Am Di., 5. M?rz 2019 um 09:26 Uhr >>> schrieb Hari >>> >> > > > Gowtham >> >> > > > >: >>> >> > > > > >>> >> > > > > There are plans to revert the patch >>> causing >>> >> > > > this error and rebuilt 5.4. >>> >> > > > > This should happen faster. the >>> rebuilt 5.4 >>> >> > > > should be void of this upgrade issue. >>> >> > > > > >>> >> > > > > In the meantime, you can use 5.3 for >>> this cluster. >>> >> > > > > Downgrading to 5.3 will work if it >>> was just >>> >> > > > one node that was upgrade to 5.4 >>> >> > > > > and the other nodes are still in 5.3. >>> >> > > > > >>> >> > > > > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert >>> >> > > > >> >> > > > > wrote: >>> >> > > > > > >>> >> > > > > > Hi Hari, >>> >> > > > > > >>> >> > > > > > thx for the hint. Do you know when >>> this will >>> >> > > > be fixed? Is a downgrade >>> >> > > > > > 5.4 -> 5.3 a possibility to fix >>> this? >>> >> > > > > > >>> >> > > > > > Hubert >>> >> > > > > > >>> >> > > > > > Am Di., 5. M?rz 2019 um 08:32 Uhr >>> schrieb >>> >> > > > Hari Gowtham >> >> > > > >: >>> >> > > > > > > >>> >> > > > > > > Hi, >>> >> > > > > > > >>> >> > > > > > > This is a known issue we are >>> working on. >>> >> > > > > > > As the checksum differs between >>> the >>> >> > > > updated and non updated node, the >>> >> > > > > > > peers are getting rejected. >>> >> > > > > > > The bricks aren't coming because >>> of the >>> >> > > > same issue. >>> >> > > > > > > >>> >> > > > > > > More about the issue: >>> >> > > > >>> https://bugzilla.redhat.com/show_bug.cgi?id=1685120 >>> >> > > > > > > >>> >> > > > > > > On Tue, Mar 5, 2019 at 12:56 PM >>> Hu Bert >>> >> > > > >> >> > > > > wrote: >>> >> > > > > > > > >>> >> > > > > > > > Interestingly: gluster volume >>> status >>> >> > > > misses gluster1, while heal >>> >> > > > > > > > statistics show gluster1: >>> >> > > > > > > > >>> >> > > > > > > > gluster volume status workdata >>> >> > > > > > > > Status of volume: workdata >>> >> > > > > > > > Gluster process >>> >> > > > TCP Port RDMA Port Online Pid >>> >> > > > > > > > >>> >> > > > >>> ------------------------------------------------------------------------------ >>> >> > > > > > > > Brick >>> gluster2:/gluster/md4/workdata >>> >> > > > 49153 0 Y 1723 >>> >> > > > > > > > Brick >>> gluster3:/gluster/md4/workdata >>> >> > > > 49153 0 Y 2068 >>> >> > > > > > > > Self-heal Daemon on localhost >>> >> > > > N/A N/A Y 1732 >>> >> > > > > > > > Self-heal Daemon on gluster3 >>> >> > > > N/A N/A Y 2077 >>> >> > > > > > > > >>> >> > > > > > > > vs. >>> >> > > > > > > > >>> >> > > > > > > > gluster volume heal workdata >>> statistics >>> >> > > > heal-count >>> >> > > > > > > > Gathering count of entries to >>> be healed >>> >> > > > on volume workdata has been successful >>> >> > > > > > > > >>> >> > > > > > > > Brick >>> gluster1:/gluster/md4/workdata >>> >> > > > > > > > Number of entries: 0 >>> >> > > > > > > > >>> >> > > > > > > > Brick >>> gluster2:/gluster/md4/workdata >>> >> > > > > > > > Number of entries: 10745 >>> >> > > > > > > > >>> >> > > > > > > > Brick >>> gluster3:/gluster/md4/workdata >>> >> > > > > > > > Number of entries: 10744 >>> >> > > > > > > > >>> >> > > > > > > > Am Di., 5. M?rz 2019 um 08:18 >>> Uhr >>> >> > > > schrieb Hu Bert >> >> > > > >: >>> >> > > > > > > > > >>> >> > > > > > > > > Hi Miling, >>> >> > > > > > > > > >>> >> > > > > > > > > well, there are such entries, >>> but >>> >> > > > those haven't been a problem during >>> >> > > > > > > > > install and the last kernel >>> >> > > > update+reboot. The entries look like: >>> >> > > > > > > > > >>> >> > > > > > > > > PUBLIC_IP >>> gluster2.alpserver.de >>> >> > > > gluster2 >>> >> > > > > > > > > >>> >> > > > > > > > > 192.168.0.50 gluster1 >>> >> > > > > > > > > 192.168.0.51 gluster2 >>> >> > > > > > > > > 192.168.0.52 gluster3 >>> >> > > > > > > > > >>> >> > > > > > > > > 'ping gluster2' resolves to >>> LAN IP; I >>> >> > > > removed the last entry in the >>> >> > > > > > > > > 1st line, did a reboot ... >>> no, didn't >>> >> > > > help. From >>> >> > > > > > > > > >>> /var/log/glusterfs/glusterd.log >>> >> > > > > > > > > on gluster 2: >>> >> > > > > > > > > >>> >> > > > > > > > > [2019-03-05 07:04:36.188128] >>> E [MSGID: >>> >> > > > 106010] >>> >> > > > > > > > > >>> >> > > > >>> [glusterd-utils.c:3483:glusterd_compare_friend_volume] >>> >> > > > 0-management: >>> >> > > > > > > > > Version of Cksums persistent >>> differ. >>> >> > > > local cksum = 3950307018, remote >>> >> > > > > > > > > cksum = 455409345 on peer >>> gluster1 >>> >> > > > > > > > > [2019-03-05 07:04:36.188314] >>> I [MSGID: >>> >> > > > 106493] >>> >> > > > > > > > > >>> >> > > > >>> [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] >>> >> > > > 0-glusterd: >>> >> > > > > > > > > Responded to gluster1 (0), >>> ret: 0, >>> >> > > > op_ret: -1 >>> >> > > > > > > > > >>> >> > > > > > > > > Interestingly there are no >>> entries in >>> >> > > > the brick logs of the rejected >>> >> > > > > > > > > server. Well, not surprising >>> as no >>> >> > > > brick process is running. The >>> >> > > > > > > > > server gluster1 is still in >>> rejected >>> >> > > > state. >>> >> > > > > > > > > >>> >> > > > > > > > > 'gluster volume start >>> workdata force' >>> >> > > > starts the brick process on >>> >> > > > > > > > > gluster1, and some heals are >>> happening >>> >> > > > on gluster2+3, but via 'gluster >>> >> > > > > > > > > volume status workdata' the >>> volumes >>> >> > > > still aren't complete. >>> >> > > > > > > > > >>> >> > > > > > > > > gluster1: >>> >> > > > > > > > > >>> >> > > > >>> ------------------------------------------------------------------------------ >>> >> > > > > > > > > Brick >>> gluster1:/gluster/md4/workdata >>> >> > > > 49152 0 Y 2523 >>> >> > > > > > > > > Self-heal Daemon on localhost >>> >> > > > N/A N/A Y 2549 >>> >> > > > > > > > > >>> >> > > > > > > > > gluster2: >>> >> > > > > > > > > Gluster process >>> >> > > > TCP Port RDMA Port Online Pid >>> >> > > > > > > > > >>> >> > > > >>> ------------------------------------------------------------------------------ >>> >> > > > > > > > > Brick >>> gluster2:/gluster/md4/workdata >>> >> > > > 49153 0 Y 1723 >>> >> > > > > > > > > Brick >>> gluster3:/gluster/md4/workdata >>> >> > > > 49153 0 Y 2068 >>> >> > > > > > > > > Self-heal Daemon on localhost >>> >> > > > N/A N/A Y 1732 >>> >> > > > > > > > > Self-heal Daemon on gluster3 >>> >> > > > N/A N/A Y 2077 >>> >> > > > > > > > > >>> >> > > > > > > > > >>> >> > > > > > > > > Hubert >>> >> > > > > > > > > >>> >> > > > > > > > > Am Di., 5. M?rz 2019 um 07:58 >>> Uhr >>> >> > > > schrieb Milind Changire < >>> mchangir at redhat.com >>> >> > > > >: >>> >> > > > > > > > > > >>> >> > > > > > > > > > There are probably DNS >>> entries or >>> >> > > > /etc/hosts entries with the public IP >>> Addresses >>> >> > > > that the host names (gluster1, gluster2, >>> >> > > > gluster3) are getting resolved to. >>> >> > > > > > > > > > /etc/resolv.conf would tell >>> which is >>> >> > > > the default domain searched for the >>> node names >>> >> > > > and the DNS servers which respond to >>> the queries. >>> >> > > > > > > > > > >>> >> > > > > > > > > > >>> >> > > > > > > > > > On Tue, Mar 5, 2019 at >>> 12:14 PM Hu >>> >> > > > Bert >> >> > > > > wrote: >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> Good morning, >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> i have a replicate 3 setup >>> with 2 >>> >> > > > volumes, running on version 5.3 on >>> >> > > > > > > > > >> debian stretch. This >>> morning i >>> >> > > > upgraded one server to version 5.4 and >>> >> > > > > > > > > >> rebooted the machine; >>> after the >>> >> > > > restart i noticed that: >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> - no brick process is >>> running >>> >> > > > > > > > > >> - gluster volume status >>> only shows >>> >> > > > the server itself: >>> >> > > > > > > > > >> gluster volume status >>> workdata >>> >> > > > > > > > > >> Status of volume: workdata >>> >> > > > > > > > > >> Gluster process >>> >> > > > TCP Port RDMA Port Online >>> Pid >>> >> > > > > > > > > >> >>> >> > > > >>> ------------------------------------------------------------------------------ >>> >> > > > > > > > > >> Brick >>> >> > > > gluster1:/gluster/md4/workdata >>> N/A >>> >> > > > N/A N N/A >>> >> > > > > > > > > >> NFS Server on localhost >>> >> > > > N/A N/A N >>> N/A >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> - gluster peer status on >>> the server >>> >> > > > > > > > > >> gluster peer status >>> >> > > > > > > > > >> Number of Peers: 2 >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> Hostname: gluster3 >>> >> > > > > > > > > >> Uuid: >>> >> > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a >>> >> > > > > > > > > >> State: Peer Rejected >>> (Connected) >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> Hostname: gluster2 >>> >> > > > > > > > > >> Uuid: >>> >> > > > 162fea82-406a-4f51-81a3-e90235d8da27 >>> >> > > > > > > > > >> State: Peer Rejected >>> (Connected) >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> - gluster peer status on >>> the other >>> >> > > > 2 servers: >>> >> > > > > > > > > >> gluster peer status >>> >> > > > > > > > > >> Number of Peers: 2 >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> Hostname: gluster1 >>> >> > > > > > > > > >> Uuid: >>> >> > > > 9a360776-7b58-49ae-831e-a0ce4e4afbef >>> >> > > > > > > > > >> State: Peer Rejected >>> (Connected) >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> Hostname: gluster3 >>> >> > > > > > > > > >> Uuid: >>> >> > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a >>> >> > > > > > > > > >> State: Peer in Cluster >>> (Connected) >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> I noticed that, in the >>> brick logs, >>> >> > > > i see that the public IP is used >>> >> > > > > > > > > >> instead of the LAN IP. >>> brick logs >>> >> > > > from one of the volumes: >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> rejected node: >>> >> > > > https://pastebin.com/qkpj10Sd >>> >> > > > > > > > > >> connected nodes: >>> >> > > > https://pastebin.com/8SxVVYFV >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> Why is the public IP >>> suddenly used >>> >> > > > instead of the LAN IP? Killing all >>> >> > > > > > > > > >> gluster processes and >>> rebooting >>> >> > > > (again) didn't help. >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> Thx, >>> >> > > > > > > > > >> Hubert >>> >> > > > > > > > > >> >>> >> > > > >>> _______________________________________________ >>> >> > > > > > > > > >> Gluster-users mailing list >>> >> > > > > > > > > >> Gluster-users at gluster.org >>> >> > > > >>> >> > > > > > > > > >> >>> >> > > > >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> > > > > > > > > > >>> >> > > > > > > > > > >>> >> > > > > > > > > > >>> >> > > > > > > > > > -- >>> >> > > > > > > > > > Milind >>> >> > > > > > > > > > >>> >> > > > > > > > >>> >> > > > >>> _______________________________________________ >>> >> > > > > > > > Gluster-users mailing list >>> >> > > > > > > > Gluster-users at gluster.org >>> >> > > > >>> >> > > > > > > > >>> >> > > > >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > -- >>> >> > > > > > > Regards, >>> >> > > > > > > Hari Gowtham. >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > > -- >>> >> > > > > Regards, >>> >> > > > > Hari Gowtham. >>> >> > > > >>> _______________________________________________ >>> >> > > > Gluster-users mailing list >>> >> > > > Gluster-users at gluster.org >>> >> > > > >>> >> > > > >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> > > > >>> >> > > > _______________________________________________ >>> >> > > > Gluster-users mailing list >>> >> > > > Gluster-users at gluster.org >> Gluster-users at gluster.org> >>> >> > > > >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > -- >>> >> > > > Amar Tumballi (amarts) >>> >> > > > >>> >> > > > >>> >> > > > _______________________________________________ >>> >> > > > Gluster-users mailing list >>> >> > > > Gluster-users at gluster.org >>> >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> > > > >>> >> > > _______________________________________________ >>> >> > > Gluster-users mailing list >>> >> > > Gluster-users at gluster.org >>> >> > > https://lists.gluster.org/mailman/listinfo/gluster-users >>> > >>> > >>> > >>> > -- >>> > Amar Tumballi (amarts) >>> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Wed Mar 20 04:59:00 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Wed, 20 Mar 2019 06:59:00 +0200 Subject: [Gluster-users] Docu - how to debug issues Message-ID: Hello Community, Is there a docu page clearing what information is needed to be gathered in advance in order to help the devs resolve issues ? So far I couldn't find one - but I have missed that. If not, it will be nice to have that info posted somewhere. For example - FUSE issues - do 1,2,3... Same for other client-side issues and then for cluster-side also. I guess this will save a lot of 'what is your output of gluster volume info vol' questions. Best Regards, Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed Mar 20 05:27:16 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 20 Mar 2019 10:57:16 +0530 Subject: [Gluster-users] Transport endpoint is not connected failures in 5.3 under high I/O load In-Reply-To: <122f01d4ddaa$177772f0$466658d0$@thinkhuge.net> References: <122f01d4ddaa$177772f0$466658d0$@thinkhuge.net> Message-ID: Hi Brandon, There were few concerns raised about 5.3 issues recently, and we fixed some of them and made 5.5 (in 5.4 we faced an upgrade issue, so 5.5 is recommended upgrade version). Can you please upgrade to 5.5 version? -Amar On Mon, Mar 18, 2019 at 10:16 PM wrote: > Hello list, > > > > We are having critical failures under load of CentOS7 glusterfs 5.3 with > our servers losing their local mount point with the issue - "Transport > endpoint is not connected" > > > > Not sure if it is related but the logs are full of the following message. > > > > [2019-03-18 14:00:02.656876] E [MSGID: 101191] > [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler > > > > We operate multiple separate glusterfs distributed clusters of about 6-8 > nodes. Our 2 biggest, separate, and most I/O active glusterfs clusters are > both having the issues. > > > > We are trying to use glusterfs as a unified file system for pureftpd > backup services for a VPS service. We have a relatively small backup > window of the weekend where all our servers backup at the same time. When > backups start early on Saturday it causes a sustained massive amount of FTP > file upload I/O for around 48 hours with all the compressed backup files > being uploaded. For our london 8 node cluster for example there is about > 45 TB of uploads in ~48 hours currently. > > > > We do have some other smaller issues with directory listing under this > load too but, it has been working for a couple years since 3.x until we've > updated recently and randomly now all servers are losing their glusterfs > mount with the "Transport endpoint is not connected" issue. > > > > Our glusterfs servers are all mostly the same with small variations. > Mostly they are supermicro E3 cpu, 16 gb ram, LSI raid10 hdd (with and > without bbu). Drive arrays vary between 4-16 sata3 hdd drives each node > depending on if the servers are older or newer. Firmware is kept up-to-date > as well as running the latest LSI compiled driver. the newer 16 drive > backup servers have 2 x 1Gbit LACP teamed interfaces also. > > > > [root at lonbaknode3 ~]# uname -r > > 3.10.0-957.5.1.el7.x86_64 > > > > [root at lonbaknode3 ~]# rpm -qa |grep gluster > > centos-release-gluster5-1.0-1.el7.centos.noarch > > glusterfs-libs-5.3-2.el7.x86_64 > > glusterfs-api-5.3-2.el7.x86_64 > > glusterfs-5.3-2.el7.x86_64 > > glusterfs-cli-5.3-2.el7.x86_64 > > glusterfs-client-xlators-5.3-2.el7.x86_64 > > glusterfs-server-5.3-2.el7.x86_64 > > glusterfs-fuse-5.3-2.el7.x86_64 > > [root at lonbaknode3 ~]# > > > > [root at lonbaknode3 ~]# gluster volume info all > > > > Volume Name: volbackups > > Type: Distribute > > Volume ID: 32bf4fe9-5450-49f8-b6aa-05471d3bdffa > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 8 > > Transport-type: tcp > > Bricks: > > Brick1: lonbaknode3.domain.net:/lvbackups/brick > > Brick2: lonbaknode4.domain.net:/lvbackups/brick > > Brick3: lonbaknode5.domain.net:/lvbackups/brick > > Brick4: lonbaknode6.domain.net:/lvbackups/brick > > Brick5: lonbaknode7.domain.net:/lvbackups/brick > > Brick6: lonbaknode8.domain.net:/lvbackups/brick > > Brick7: lonbaknode9.domain.net:/lvbackups/brick > > Brick8: lonbaknode10.domain.net:/lvbackups/brick > > Options Reconfigured: > > transport.address-family: inet > > nfs.disable: on > > cluster.min-free-disk: 1% > > performance.cache-size: 8GB > > performance.cache-max-file-size: 128MB > > diagnostics.brick-log-level: WARNING > > diagnostics.brick-sys-log-level: WARNING > > client.event-threads: 3 > > performance.client-io-threads: on > > performance.io-thread-count: 24 > > network.inode-lru-limit: 1048576 > > performance.parallel-readdir: on > > performance.cache-invalidation: on > > performance.md-cache-timeout: 600 > > features.cache-invalidation: on > > features.cache-invalidation-timeout: 600 > > [root at lonbaknode3 ~]# > > > > Mount output shows the following: > > > > lonbaknode3.domain.net:/volbackups on /home/volbackups type > fuse.glusterfs > (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) > > > > If you notice anything in our volume or mount settings above missing or > otherwise bad feel free to let us know. Still learning this glusterfs. I > tried searching for any recommended performance settings but, it's not > always clear which setting is most applicable or beneficial to our workload. > > > > I have just found this post that looks like it is the same issues. > > > > https://lists.gluster.org/pipermail/gluster-users/2019-March/035958.html > > > > We have not yet tried the suggestion of "performance.write-behind: off" > but, we will do so if that is recommended. > > > > Could someone knowledgeable advise anything for these issues? > > > > If any more information is needed do let us know. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed Mar 20 05:30:19 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 20 Mar 2019 11:00:19 +0530 Subject: [Gluster-users] Help analise statedumps In-Reply-To: References: Message-ID: It is really good to hear the good news. The one thing we did in 5.4 (and which is present in 6.0 too), is implementing garbage collection logic in fuse module, which keeps a check on memory usage. Looks like the feature is working as expected. Regards, Amar On Wed, Mar 20, 2019 at 7:24 AM Sankarshan Mukhopadhyay < sankarshan.mukhopadhyay at gmail.com> wrote: > On Tue, Mar 19, 2019 at 11:09 PM Pedro Costa wrote: > > > Sorry to revive old thread, but just to let you know that with the > latest 5.4 version this has virtually stopped happening. > > > > I can?t ascertain for sure yet, but since the update the memory > footprint of Gluster has been massively reduced. > > > > Thanks to everyone, great job. > > Good news is always fantastic to hear! Thank you for reviving the > thread and providing feedback. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed Mar 20 05:39:25 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 20 Mar 2019 11:09:25 +0530 Subject: [Gluster-users] recovery from reboot time? In-Reply-To: References: Message-ID: There are 2 things happen after a reboot. 1. glusterd (management layer) does a sanity check of its volumes, and sees if there are anything different while it went down, and tries to correct its state. - This is fine as long as number of volumes are less, or numbers of nodes are less. (less is referred as < 100). 2. If it is a replicate or disperse volume, then self-heal daemon does check if there are any self-heal pending. - This does a 'index' crawl to check which files actually changed when one of the brick/node was down. - If this list is big, it can sometimes does take some time. But 'Days/weeks/month' is not a expected/observed behavior. Is there any logs in the log file? If not, can you do a 'strace -f' to the pid which is consuming major CPU?? (strace for 1 mins sample is good enough). -Amar On Wed, Mar 20, 2019 at 2:05 AM Alvin Starr wrote: > We have a simple replicated volume with 1 brick on each node of 17TB. > > There is something like 35M files and directories on the volume. > > One of the servers rebooted and is now "doing something". > > It kind of looks like its doing some kind of sality check with the node > that did not reboot but its hard to say and it looks like it may run for > hours/days/months.... > > Will Gluster take a long time with Lots of little files to resync? > > > -- > Alvin Starr || land: (905)513-7688 > Netvel Inc. || Cell: (416)806-0133 > alvin at netvel.net || > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed Mar 20 05:45:16 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 20 Mar 2019 11:15:16 +0530 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: <2ed171d2-df68-ada3-e0de-53f19cb79520@redhat.com> Message-ID: On Wed, Mar 20, 2019 at 9:52 AM Artem Russakovskii wrote: > Can I roll back performance.write-behind: off and lru-limit=0 then? I'm > waiting for the debug packages to be available for OpenSUSE, then I can > help Amar with another debug session. > > Yes, the write-behind issue is now fixed. You can enable write-behind. Also remove lru-limit=0, so you can also utilize the benefit of garbage collection introduced in 5.4 Lets get to fixing the problem once the debuginfo packages are available. > In the meantime, have you had time to set up 1x4 replicate testing? I was > told you were only testing 1x3, and it's the 4th brick that may be causing > the crash, which is consistent with this whole time only 1 of 4 bricks > constantly crashing. The other 3 have been rock solid. I'm hoping you could > find the issue without a debug session this way. > > That is my gut feeling still. Added a basic test case with 4 bricks, https://review.gluster.org/#/c/glusterfs/+/22328/. But I think this particular issue is happening only on certain pattern of access for 1x4 setup. Lets get to the root of it once we have debuginfo packages for Suse builds. -Amar Sincerely, > Artem > > -- > Founder, Android Police , APK Mirror > , Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > | @ArtemR > > > > On Tue, Mar 19, 2019 at 8:27 PM Nithya Balachandran > wrote: > > > Hi Artem, > > > > I think you are running into a different crash. The ones reported which > > were prevented by turning off write-behind are now fixed. > > We will need to look into the one you are seeing to see why it is > > happening. > > > > Regards, > > Nithya > > > > > > On Tue, 19 Mar 2019 at 20:25, Artem Russakovskii > > wrote: > > > >> The flood is indeed fixed for us on 5.5. However, the crashes are not. > >> > >> Sincerely, > >> Artem > >> > >> -- > >> Founder, Android Police , APK Mirror > >> , Illogical Robot LLC > >> beerpla.net | +ArtemRussakovskii > >> | @ArtemR > >> > >> > >> > >> On Mon, Mar 18, 2019 at 5:41 AM Hu Bert wrote: > >> > >>> Hi Amar, > >>> > >>> if you refer to this bug: > >>> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test > >>> setup i haven't seen those entries, while copying & deleting a few GBs > >>> of data. For a final statement we have to wait until i updated our > >>> live gluster servers - could take place on tuesday or wednesday. > >>> > >>> Maybe other users can do an update to 5.4 as well and report back here. > >>> > >>> > >>> Hubert > >>> > >>> > >>> > >>> Am Mo., 18. M?rz 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan > >>> : > >>> > > >>> > Hi Hu Bert, > >>> > > >>> > Appreciate the feedback. Also are the other boiling issues related to > >>> logs fixed now? > >>> > > >>> > -Amar > >>> > > >>> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert > >>> wrote: > >>> >> > >>> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2 > >>> >> volumes done. In 'gluster peer status' the peers stay connected > during > >>> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in the > >>> >> logs. Looks good :-) > >>> >> > >>> >> Am Mo., 18. M?rz 2019 um 09:54 Uhr schrieb Hu Bert < > >>> revirii at googlemail.com>: > >>> >> > > >>> >> > Good morning :-) > >>> >> > > >>> >> > for debian the packages are there: > >>> >> > > >>> > https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ > >>> >> > > >>> >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if > >>> there > >>> >> > are some errors etc. and report back. > >>> >> > > >>> >> > btw: no release notes for 5.4 and 5.5 so far? > >>> >> > https://docs.gluster.org/en/latest/release-notes/ ? > >>> >> > > >>> >> > Am Fr., 15. M?rz 2019 um 14:28 Uhr schrieb Shyam Ranganathan > >>> >> > : > >>> >> > > > >>> >> > > We created a 5.5 release tag, and it is under packaging now. It > >>> should > >>> >> > > be packaged and ready for testing early next week and should be > >>> released > >>> >> > > close to mid-week next week. > >>> >> > > > >>> >> > > Thanks, > >>> >> > > Shyam > >>> >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote: > >>> >> > > > Wednesday now with no update :-/ > >>> >> > > > > >>> >> > > > Sincerely, > >>> >> > > > Artem > >>> >> > > > > >>> >> > > > -- > >>> >> > > > Founder, Android Police , APK > >>> Mirror > >>> >> > > > , Illogical Robot LLC > >>> >> > > > beerpla.net | +ArtemRussakovskii > >>> >> > > > | @ArtemR > >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii < > >>> archon810 at gmail.com > >>> >> > > > > wrote: > >>> >> > > > > >>> >> > > > Hi Amar, > >>> >> > > > > >>> >> > > > Any updates on this? I'm still not seeing it in OpenSUSE > >>> build > >>> >> > > > repos. Maybe later today? > >>> >> > > > > >>> >> > > > Thanks. > >>> >> > > > > >>> >> > > > Sincerely, > >>> >> > > > Artem > >>> >> > > > > >>> >> > > > -- > >>> >> > > > Founder, Android Police , > >>> APK Mirror > >>> >> > > > , Illogical Robot LLC > >>> >> > > > beerpla.net | +ArtemRussakovskii > >>> >> > > > | @ArtemR > >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan > >>> >> > > > > wrote: > >>> >> > > > > >>> >> > > > We are talking days. Not weeks. Considering already it > >>> is > >>> >> > > > Thursday here. 1 more day for tagging, and packaging. > >>> May be ok > >>> >> > > > to expect it on Monday. > >>> >> > > > > >>> >> > > > -Amar > >>> >> > > > > >>> >> > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii > >>> >> > > > > > >>> wrote: > >>> >> > > > > >>> >> > > > Is the next release going to be an imminent > hotfix, > >>> i.e. > >>> >> > > > something like today/tomorrow, or are we talking > >>> weeks? > >>> >> > > > > >>> >> > > > Sincerely, > >>> >> > > > Artem > >>> >> > > > > >>> >> > > > -- > >>> >> > > > Founder, Android Police < > >>> http://www.androidpolice.com>, APK > >>> >> > > > Mirror , Illogical > >>> Robot LLC > >>> >> > > > beerpla.net | > >>> +ArtemRussakovskii > >>> >> > > > | > >>> @ArtemR > >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii > >>> >> > > > >> > >>> wrote: > >>> >> > > > > >>> >> > > > Ended up downgrading to 5.3 just in case. Peer > >>> status > >>> >> > > > and volume status are OK now. > >>> >> > > > > >>> >> > > > zypper install --oldpackage > >>> glusterfs-5.3-lp150.100.1 > >>> >> > > > Loading repository data... > >>> >> > > > Reading installed packages... > >>> >> > > > Resolving package dependencies... > >>> >> > > > > >>> >> > > > Problem: glusterfs-5.3-lp150.100.1.x86_64 > >>> requires > >>> >> > > > libgfapi0 = 5.3, but this requirement cannot > be > >>> provided > >>> >> > > > not installable providers: > >>> >> > > > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] > >>> >> > > > Solution 1: Following actions will be done: > >>> >> > > > downgrade of > libgfapi0-5.4-lp150.100.1.x86_64 > >>> to > >>> >> > > > libgfapi0-5.3-lp150.100.1.x86_64 > >>> >> > > > downgrade of > >>> libgfchangelog0-5.4-lp150.100.1.x86_64 to > >>> >> > > > libgfchangelog0-5.3-lp150.100.1.x86_64 > >>> >> > > > downgrade of > libgfrpc0-5.4-lp150.100.1.x86_64 > >>> to > >>> >> > > > libgfrpc0-5.3-lp150.100.1.x86_64 > >>> >> > > > downgrade of > libgfxdr0-5.4-lp150.100.1.x86_64 > >>> to > >>> >> > > > libgfxdr0-5.3-lp150.100.1.x86_64 > >>> >> > > > downgrade of > >>> libglusterfs0-5.4-lp150.100.1.x86_64 to > >>> >> > > > libglusterfs0-5.3-lp150.100.1.x86_64 > >>> >> > > > Solution 2: do not install > >>> glusterfs-5.3-lp150.100.1.x86_64 > >>> >> > > > Solution 3: break > >>> glusterfs-5.3-lp150.100.1.x86_64 by > >>> >> > > > ignoring some of its dependencies > >>> >> > > > > >>> >> > > > Choose from above solutions by number or > cancel > >>> >> > > > [1/2/3/c] (c): 1 > >>> >> > > > Resolving dependencies... > >>> >> > > > Resolving package dependencies... > >>> >> > > > > >>> >> > > > The following 6 packages are going to be > >>> downgraded: > >>> >> > > > glusterfs libgfapi0 libgfchangelog0 > libgfrpc0 > >>> >> > > > libgfxdr0 libglusterfs0 > >>> >> > > > > >>> >> > > > 6 packages to downgrade. > >>> >> > > > > >>> >> > > > Sincerely, > >>> >> > > > Artem > >>> >> > > > > >>> >> > > > -- > >>> >> > > > Founder, Android Police > >>> >> > > > , APK Mirror > >>> >> > > > , Illogical Robot > >>> LLC > >>> >> > > > beerpla.net | > >>> +ArtemRussakovskii > >>> >> > > > > | > >>> @ArtemR > >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > On Tue, Mar 5, 2019 at 10:57 AM Artem > >>> Russakovskii > >>> >> > > > >>> archon810 at gmail.com>> wrote: > >>> >> > > > > >>> >> > > > Noticed the same when upgrading from 5.3 > to > >>> 5.4, as > >>> >> > > > mentioned. > >>> >> > > > > >>> >> > > > I'm confused though. Is actual replication > >>> affected, > >>> >> > > > because the 5.4 server and the 3x 5.3 > >>> servers still > >>> >> > > > show heal info as all 4 connected, and the > >>> files > >>> >> > > > seem to be replicating correctly as well. > >>> >> > > > > >>> >> > > > So what's actually affected - just the > >>> status > >>> >> > > > command, or leaving 5.4 on one of the > nodes > >>> is doing > >>> >> > > > some damage to the underlying fs? Is it > >>> fixable by > >>> >> > > > tweaking transport.socket.ssl-enabled? > Does > >>> >> > > > upgrading all servers to 5.4 resolve it, > or > >>> should > >>> >> > > > we revert back to 5.3? > >>> >> > > > > >>> >> > > > Sincerely, > >>> >> > > > Artem > >>> >> > > > > >>> >> > > > -- > >>> >> > > > Founder, Android Police > >>> >> > > > , APK > Mirror > >>> >> > > > , Illogical > >>> Robot LLC > >>> >> > > > beerpla.net | > >>> >> > > > +ArtemRussakovskii > >>> >> > > > < > https://plus.google.com/+ArtemRussakovskii > >>> > > >>> >> > > > | @ArtemR > >>> >> > > > > >>> >> > > > > >>> >> > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert > >>> >> > > > >>> >> > > > > wrote: > >>> >> > > > > >>> >> > > > fyi: did a downgrade 5.4 -> 5.3 and it > >>> worked. > >>> >> > > > all replicas are up and > >>> >> > > > running. Awaiting updated v5.4. > >>> >> > > > > >>> >> > > > thx :-) > >>> >> > > > > >>> >> > > > Am Di., 5. M?rz 2019 um 09:26 Uhr > >>> schrieb Hari > >>> >> > > > Gowtham >>> >> > > > >: > >>> >> > > > > > >>> >> > > > > There are plans to revert the patch > >>> causing > >>> >> > > > this error and rebuilt 5.4. > >>> >> > > > > This should happen faster. the > >>> rebuilt 5.4 > >>> >> > > > should be void of this upgrade issue. > >>> >> > > > > > >>> >> > > > > In the meantime, you can use 5.3 for > >>> this cluster. > >>> >> > > > > Downgrading to 5.3 will work if it > >>> was just > >>> >> > > > one node that was upgrade to 5.4 > >>> >> > > > > and the other nodes are still in 5.3 -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From andy.coates at gmail.com Wed Mar 20 06:21:47 2019 From: andy.coates at gmail.com (Andy Coates) Date: Wed, 20 Mar 2019 17:21:47 +1100 Subject: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied In-Reply-To: <7D2EB4E6-D0EE-4FB5-BB1F-E2587D82A37F@novartis.com> References: <38A2AC2F-64AF-4323-B079-1EF20D94CFFE@novartis.com> <2A722EAB-68B0-4DFB-97D2-FE69C8065CA5@novartis.com> <9D92A563-55B9-48FB-990D-566737AF09E3@novartis.com> <7D2EB4E6-D0EE-4FB5-BB1F-E2587D82A37F@novartis.com> Message-ID: We're seeing the same permission denied errors when running as a non-root geosync user. Does anyone know what the underlying issue is? On Wed, 26 Sep 2018 at 00:40, Kotte, Christian (Ext) < christian.kotte at novartis.com> wrote: > I changed to replication to use the root user and re-created the > replication with ?create force?. Now the files and folders were replicated, > and the permission denied, and New folder error disappeared, but old files > are not deleted. > > > > Looks like the history crawl is in some kind of a loop: > > > > [root at master ~]# gluster volume geo-replication status > > > > MASTER NODE MASTER VOL MASTER BRICK > SLAVE USER SLAVE SLAVE > NODE STATUS CRAWL STATUS LAST_SYNCED > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > master glustervol1 /bricks/brick1/brick > root ssh://slave_1::glustervol1 > slave_1 Active Hybrid Crawl N/A > > master glustervol1 /bricks/brick1/brick > root ssh://slave_2::glustervol1 > slave_2 Active Hybrid Crawl N/A > > master glustervol1 /bricks/brick1/brick > root ssh://slave_3::glustervol1 > slave_3 Active Hybrid Crawl N/A > > > > tail -f > /var/log/glusterfs/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.log > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line > 104, in cl_history_changelog > > raise ChangelogHistoryNotAvailable() > > ChangelogHistoryNotAvailable > > [2018-09-25 14:10:44.196011] E [repce(worker > /bricks/brick1/brick):197:__call__] RepceClient: call failed > call=29945:139700517484352:1537884644.19 method=history > error=ChangelogHistoryNotAvailable > > [2018-09-25 14:10:44.196405] I [resource(worker > /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not > available, using xsync > > [2018-09-25 14:10:44.221385] I [master(worker > /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl > stime=(0, 0) > > [2018-09-25 14:10:44.223382] I [gsyncdstatus(worker > /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl > Status Change status=Hybrid Crawl > > [2018-09-25 14:10:46.225296] I [master(worker > /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog > path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537884644 > > [2018-09-25 14:13:36.157408] I [gsyncd(config-get):297:main] : Using > session config file > path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf > > [2018-09-25 14:13:36.377880] I [gsyncd(status):297:main] : Using > session config file > path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf > > [2018-09-25 14:31:10.145035] I [master(worker > /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken > duration=1212.5316 num_files=1474 job=2 return_code=11 > > [2018-09-25 14:31:10.152637] E [syncdutils(worker > /bricks/brick1/brick):801:errlog] Popen: command returned error cmd=rsync > -aR0 --inplace --files-from=- --super --stats --numeric-ids > --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no > -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem > -p 22 -oControlMaster=auto -S > /tmp/gsyncd-aux-ssh-gg758Z/caec4d1d03cc28bc1853f692e291164f.sock > slave_3:/proc/15919/cwd error=11 > > [2018-09-25 14:31:10.237371] I [repce(agent > /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching > EOF. > > [2018-09-25 14:31:10.430820] I > [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status > Change status=Faulty > > [2018-09-25 14:31:20.541475] I [monitor(monitor):158:monitor] Monitor: > starting gsyncd worker brick=/bricks/brick1/brick slave_node=slave_3 > > [2018-09-25 14:31:20.806518] I [gsyncd(agent > /bricks/brick1/brick):297:main] : Using session config file > path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf > > [2018-09-25 14:31:20.816536] I [changelogagent(agent > /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining... > > [2018-09-25 14:31:20.821574] I [gsyncd(worker > /bricks/brick1/brick):297:main] : Using session config file > path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf > > [2018-09-25 14:31:20.882128] I [resource(worker > /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection > between master and slave... > > [2018-09-25 14:31:24.169857] I [resource(worker > /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between > master and slave established. duration=3.2873 > > [2018-09-25 14:31:24.170401] I [resource(worker > /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume > locally... > > [2018-09-25 14:31:25.354633] I [resource(worker > /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume > duration=1.1839 > > [2018-09-25 14:31:25.355073] I [subcmds(worker > /bricks/brick1/brick):70:subcmd_worker] : Worker spawn successful. > Acknowledging back to monitor > > [2018-09-25 14:31:27.439034] I [master(worker > /bricks/brick1/brick):1593:register] _GMaster: Working dir > path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick > > [2018-09-25 14:31:27.441847] I [resource(worker > /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time > time=1537885887 > > [2018-09-25 14:31:27.465053] I [gsyncdstatus(worker > /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status > Change status=Active > > [2018-09-25 14:31:27.471021] I [gsyncdstatus(worker > /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl > Status Change status=History Crawl > > [2018-09-25 14:31:27.471484] I [master(worker > /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl > turns=1 stime=(0, 0) entry_stime=None etime=1537885887 > > [2018-09-25 14:31:27.472564] E [repce(agent > /bricks/brick1/brick):105:worker] : call failed: > > Traceback (most recent call last): > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in > worker > > res = getattr(self.obj, rmeth)(*in_data[2:]) > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line > 53, in history > > num_parallel) > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line > 104, in cl_history_changelog > > raise ChangelogHistoryNotAvailable() > > ChangelogHistoryNotAvailable > > [2018-09-25 14:31:27.480632] E [repce(worker > /bricks/brick1/brick):197:__call__] RepceClient: call failed > call=31250:140272364406592:1537885887.47 method=history > error=ChangelogHistoryNotAvailable > > [2018-09-25 14:31:27.480958] I [resource(worker > /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not > available, using xsync > > [2018-09-25 14:31:27.495117] I [master(worker > /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl > stime=(0, 0) > > [2018-09-25 14:31:27.502083] I [gsyncdstatus(worker > /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl > Status Change status=Hybrid Crawl > > [2018-09-25 14:31:29.505284] I [master(worker > /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog > path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537885887 > > > > tail -f > /var/log/glusterfs/geo-replication-slaves/glustervol1_slave_3_glustervol1/gsyncd.log > > [2018-09-25 13:49:24.141303] I [repce(slave > master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on > reaching EOF. > > [2018-09-25 13:49:36.602051] W [gsyncd(slave > master/bricks/brick1/brick):293:main] : Session config file not > exists, using the default config > path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf > > [2018-09-25 13:49:36.629415] I [resource(slave > master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume > locally... > > [2018-09-25 13:49:37.701642] I [resource(slave > master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster > volume duration=1.0718 > > [2018-09-25 13:49:37.704282] I [resource(slave > master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening > > [2018-09-25 14:10:27.70952] I [repce(slave > master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on > reaching EOF. > > [2018-09-25 14:10:39.632124] W [gsyncd(slave > master/bricks/brick1/brick):293:main] : Session config file not > exists, using the default config > path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf > > [2018-09-25 14:10:39.650958] I [resource(slave > master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume > locally... > > [2018-09-25 14:10:40.729355] I [resource(slave > master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster > volume duration=1.0781 > > [2018-09-25 14:10:40.730650] I [resource(slave > master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening > > [2018-09-25 14:31:10.291064] I [repce(slave > master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on > reaching EOF. > > [2018-09-25 14:31:22.802237] W [gsyncd(slave > master/bricks/brick1/brick):293:main] : Session config file not > exists, using the default config > path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf > > [2018-09-25 14:31:22.828418] I [resource(slave > master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume > locally... > > [2018-09-25 14:31:23.910206] I [resource(slave > master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster > volume duration=1.0813 > > [2018-09-25 14:31:23.913369] I [resource(slave > master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening > > > > Any ideas how to resolve this without re-creating everything again? Can I > reset the changelog history? > > > > Regards, > > Christian > > > > *From: * on behalf of "Kotte, > Christian (Ext)" > *Date: *Monday, 24. September 2018 at 17:20 > *To: *Kotresh Hiremath Ravishankar > *Cc: *Gluster Users > *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log > OSError: [Errno 13] Permission denied > > > > I don?t configure the permissions of /bricks/brick1/brick/.glusterfs. I > don?t even see it on the local GlusterFS mount. > > > > Not sure why the permissions are configured with S and the AD group? > > > > Regards, > > Christian > > > > *From: * on behalf of "Kotte, > Christian (Ext)" > *Date: *Monday, 24. September 2018 at 17:10 > *To: *Kotresh Hiremath Ravishankar > *Cc: *Gluster Users > *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log > OSError: [Errno 13] Permission denied > > > > Yeah right. I get permission denied. > > > > [geoaccount at slave ~]$ ll > /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e > > ls: cannot access > /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e: > Permission denied > > [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/ > > ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/: Permission denied > > [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/ > > ls: cannot access /bricks/brick1/brick/.glusterfs/29/: Permission denied > > [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/ > > ls: cannot open directory /bricks/brick1/brick/.glusterfs/: Permission > denied > > > > [root at slave ~]# ll /bricks/brick1/brick/.glusterfs/29 > > total 0 > > drwx--S---+ 2 root AD+group 50 Sep 10 07:29 16 > > drwx--S---+ 2 root AD+group 50 Sep 10 07:29 33 > > drwx--S---+ 2 root AD+group 50 Sep 10 07:29 5e > > drwx--S---+ 2 root AD+group 50 Sep 10 07:29 73 > > drwx--S---+ 2 root AD+group 50 Sep 10 07:29 b2 > > drwx--S---+ 2 root AD+group 50 Sep 21 09:39 d1 > > drwx--S---+ 2 root AD+group 50 Sep 10 07:29 d7 > > drwx--S---+ 2 root AD+group 50 Sep 10 07:29 e6 > > drwx--S---+ 2 root AD+group 50 Sep 10 07:29 eb > > [root at slave ~]# > > > > However, the strange thing is that I could replicate new files and folders > before. The replication is broken since the ?New folder? was created. > > > > These are the permissions on a dev/test system: > > [root at slave-dev ~]# ll /bricks/brick1/brick/.glusterfs/ > > total 3136 > > drwx------. 44 root root 4096 Aug 22 18:19 00 > > drwx------. 50 root root 4096 Sep 12 13:14 01 > > drwx------. 54 root root 4096 Sep 13 11:33 02 > > drwx------. 59 root root 4096 Aug 22 18:21 03 > > drwx------. 60 root root 4096 Sep 12 13:14 04 > > drwx------. 68 root root 4096 Aug 24 12:36 05 > > drwx------. 56 root root 4096 Aug 22 18:21 06 > > drwx------. 46 root root 4096 Aug 22 18:21 07 > > drwx------. 51 root root 4096 Aug 22 18:21 08 > > drwx------. 42 root root 4096 Aug 22 18:21 09 > > drwx------. 44 root root 4096 Sep 13 11:16 0a > > > > I?ve configured an AD group, SGID bit, and ACLs via Ansible on the local > mount point. Could this be an issue? Should I avoid configuring the > permissions on .glusterfs and below? > > > > # ll /mnt/glustervol1/ > > total 12 > > drwxrwsr-x+ 4 AD+user AD+group 4096 Jul 13 07:46 Scripts > > drwxrwxr-x+ 10 AD+user AD+group 4096 Jun 12 12:03 Software > > -rw-rw-r--+ 1 root AD+group 0 Aug 8 08:44 test > > drwxr-xr-x+ 6 AD+user AD+group 4096 Apr 18 10:58 tftp > > > > glusterfs_volumes: > > [?] > > permissions: > > mode: "02775" > > owner: root > > group: "AD+group" > > acl_permissions: rw > > [?] > > > > # root directory is owned by root. > > # set permissions to 'g+s' to automatically set the group to "AD+group" > > # permissions of individual files will be set by Samba during creation > > - name: Configure volume directory permission 1/2 > > tags: glusterfs > > file: > > path: /mnt/{{ item.volume }} > > state: directory > > mode: "{{ item.permissions.mode }}" > > owner: "{{ item.permissions.owner }}" > > group: "{{ item.permissions.group }}" > > with_items: "{{ glusterfs_volumes }}" > > loop_control: > > label: "{{ item.volume }}" > > when: item.permissions is defined > > > > # ACL needs to be set to override default umask and grant "AD+group" write > permissions > > - name: Configure volume directory permission 2/2 (ACL) > > tags: glusterfs > > acl: > > path: /mnt/{{ item.volume }} > > default: yes > > entity: "{{ item.permissions.group }}" > > etype: group > > permissions: "{{ item.permissions.acl_permissions }}" > > state: present > > with_items: "{{ glusterfs_volumes }}" > > loop_control: > > label: "{{ item.volume }}" > > when: item.permissions is defined > > > > Regards, > > Christian > > > > *From: *Kotresh Hiremath Ravishankar > *Date: *Monday, 24. September 2018 at 16:20 > *To: *"Kotte, Christian (Ext)" > *Cc: *Gluster Users > *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log > OSError: [Errno 13] Permission denied > > > > I think I am get what's happening. The geo-rep session is non-root. > > Could you do readlink on brick path mentioned above > /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e > > from a geaccount user and see if you are getting "Permission Denied" > errors? > > > > Thanks, > > Kotresh HR > > > > On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) < > christian.kotte at novartis.com> wrote: > > Ok. It happens on all slave nodes (and on the interimmaster as well). > > > > It?s like I assumed. These are the logs of one of the slaves: > > > > gsyncd.log > > [2018-09-24 13:52:25.418382] I [repce(slave > slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on > reaching EOF. > > [2018-09-24 13:52:37.95297] W [gsyncd(slave > slave/bricks/brick1/brick):293:main] : Session config file not exists, > using the default config > path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf > > [2018-09-24 13:52:37.109643] I [resource(slave > slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume > locally... > > [2018-09-24 13:52:38.303920] I [resource(slave > slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster > volume duration=1.1941 > > [2018-09-24 13:52:38.304771] I [resource(slave > slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening > > [2018-09-24 13:52:41.981554] I [resource(slave > slave/bricks/brick1/brick):598:entry_ops] : Special case: rename on > mkdir gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e > entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder' > > [2018-09-24 13:52:42.45641] E [repce(slave > slave/bricks/brick1/brick):105:worker] : call failed: > > Traceback (most recent call last): > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in > worker > > res = getattr(self.obj, rmeth)(*in_data[2:]) > > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599, > in entry_ops > > src_entry = get_slv_dir_path(slv_host, slv_volume, gfid) > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 682, > in get_slv_dir_path > > [ENOENT], [ESTALE]) > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 540, > in errno_wrap > > return call(*arg) > > OSError: [Errno 13] Permission denied: > '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e' > > [2018-09-24 13:52:42.81794] I [repce(slave > slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on > reaching EOF. > > [2018-09-24 13:52:53.459676] W [gsyncd(slave > slave/bricks/brick1/brick):293:main] : Session config file not exists, > using the default config > path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf > > [2018-09-24 13:52:53.473500] I [resource(slave > slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume > locally... > > [2018-09-24 13:52:54.659044] I [resource(slave > slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume > duration=1.1854 > > [2018-09-24 13:52:54.659837] I [resource(slave > slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening > > > > The folder ?New folder? will be created via Samba and it was renamed by my > colleague right away after creation. > > [root at slave glustervol1_slave_glustervol1]# ls > /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/ > > [root at slave glustervol1_slave_glustervol1]# ls > /bricks/brick1/brick/.glusterfs/29/d1/ -al > > total 0 > > drwx--S---+ 2 root AD+group 50 Sep 21 09:39 . > > drwx--S---+ 11 root AD+group 96 Sep 21 09:39 .. > > lrwxrwxrwx. 1 root AD+group 75 Sep 21 09:39 > 29d1d60d-1ad6-45fc-87e0-93d478f7331e -> > ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager > > > > Creating the folder in > /bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/, > but it didn?t change anything. > > > > mnt-slave-bricks-brick1-brick.log > > [2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glustervol1-client-0: error returned while attempting to connect to > host:(null), port:0 > > [2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glustervol1-client-0: error returned while attempting to connect to > host:(null), port:0 > > [2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] > 0-glustervol1-client-0: changing port to 49152 (from 0) > > [2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glustervol1-client-0: error returned while attempting to connect to > host:(null), port:0 > > [2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data] > (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a) > [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16) > [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91) > [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument] > > [2018-09-24 13:51:10.643507] I [MSGID: 114006] > [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed > to set process-name in handshake msg > > [2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glustervol1-client-0: error returned while attempting to connect to > host:(null), port:0 > > [2018-09-24 13:51:10.671460] I [MSGID: 114046] > [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0: > Connected to glustervol1-client-0, attached to remote volume > '/bricks/brick1/brick'. > > [2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel > 7.22 > > [2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync] > 0-fuse: switched to graph 0 > > [2018-09-24 13:51:10.673329] I [MSGID: 109005] > [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory > selfheal failed: Unable to form layout for directory / > > [2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc] > 0-fuse: initating unmount of > /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E > > [2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit] > (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25] > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65] > -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-: > received signum (15), shutting down > > [2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse: > Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'. > > [2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing > fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'. > > > > Regards, > > Christian > > > > *From: *Kotresh Hiremath Ravishankar > *Date: *Saturday, 22. September 2018 at 06:52 > *To: *"Kotte, Christian (Ext)" > *Cc: *Gluster Users > *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log > OSError: [Errno 13] Permission denied > > > > The problem occured on slave side whose error is propagated to master. > Mostly any traceback with repce involved is related to problem in slave. > Just check few lines above in the log to find the slave node, the crashed > worker is connected to and get geo replication logs to further debug. > > > > > > > > > > > > On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), < > christian.kotte at novartis.com> wrote: > > Hi, > > > > Any idea how to troubleshoot this? > > > > New folders and files were created on the master and the replication went > faulty. They were created via Samba. > > > > Version: GlusterFS 4.1.3 > > > > [root at master]# gluster volume geo-replication status > > > > MASTER NODE MASTER VOL MASTER BRICK > SLAVE USER > SLAVE SLAVE > NODE STATUS CRAWL STATUS LAST_SYNCED > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > master glustervol1 /bricks/brick1/brick geoaccount > ssh://geoaccount at slave_1::glustervol1 N/A Faulty > N/A N/A > > master glustervol1 /bricks/brick1/brick geoaccount > ssh://geoaccount at slave_2::glustervol1 N/A Faulty > N/A N/A > > master glustervol1 /bricks/brick1/brick geoaccount > ssh://geoaccount at interimmaster::glustervol1 N/A Faulty > N/A N/A > > > > The following error is repeatedly logged in the gsyncd.logs: > > [2018-09-21 14:26:38.611479] I [repce(agent > /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching > EOF. > > [2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: > worker died in startup phase brick=/bricks/brick1/brick > > [2018-09-21 14:26:39.214322] I > [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status > Change status=Faulty > > [2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: > starting gsyncd worker brick=/bricks/brick1/brick slave_node= > slave_3 > > [2018-09-21 14:26:49.471532] I [gsyncd(agent > /bricks/brick1/brick):297:main] : Using session config file > path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf > > [2018-09-21 14:26:49.473917] I [changelogagent(agent > /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining... > > [2018-09-21 14:26:49.491359] I [gsyncd(worker > /bricks/brick1/brick):297:main] : Using session config file > path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf > > [2018-09-21 14:26:49.538049] I [resource(worker > /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection > between master and slave... > > [2018-09-21 14:26:53.5017] I [resource(worker > /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between > master and slave established. duration=3.4665 > > [2018-09-21 14:26:53.5419] I [resource(worker > /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume > locally... > > [2018-09-21 14:26:54.120374] I [resource(worker > /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume > duration=1.1146 > > [2018-09-21 14:26:54.121012] I [subcmds(worker > /bricks/brick1/brick):70:subcmd_worker] : Worker spawn successful. > Acknowledging back to monitor > > [2018-09-21 14:26:56.144460] I [master(worker > /bricks/brick1/brick):1593:register] _GMaster: Working dir > path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick > > [2018-09-21 14:26:56.145145] I [resource(worker > /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time > time=1537540016 > > [2018-09-21 14:26:56.160064] I [gsyncdstatus(worker > /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change > status=Active > > [2018-09-21 14:26:56.161175] I [gsyncdstatus(worker > /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl > Status Change status=History Crawl > > [2018-09-21 14:26:56.161536] I [master(worker > /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl > turns=1 stime=(1537522637, 0) entry_stime=(1537537141, 0) > etime=1537540016 > > [2018-09-21 14:26:56.164277] I [master(worker > /bricks/brick1/brick):1536:crawl] _GMaster: slave's time > stime=(1537522637, 0) > > [2018-09-21 14:26:56.197065] I [master(worker > /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed > entry ops to_changelog=1537522638 num_changelogs=1 > from_changelog=1537522638 > > [2018-09-21 14:26:56.197402] I [master(worker > /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 > MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=1 > > [2018-09-21 14:26:56.197623] I [master(worker > /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken > SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284 DATA=0 > XATT=0 > > [2018-09-21 14:26:56.198230] I [master(worker > /bricks/brick1/brick):1394:process] _GMaster: Batch Completed > changelog_end=1537522638 entry_stime=(1537537141, 0) > changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333 > num_changelogs=1 mode=history_changelog > > [2018-09-21 14:26:57.200436] I [master(worker > /bricks/brick1/brick):1536:crawl] _GMaster: slave's time > stime=(1537522637, 0) > > [2018-09-21 14:26:57.528625] E [repce(worker > /bricks/brick1/brick):197:__call__] RepceClient: call failed > call=17209:140650361157440:1537540017.21 method=entry_ops > error=OSError > > [2018-09-21 14:26:57.529371] E [syncdutils(worker > /bricks/brick1/brick):332:log_raise_exception] : FAIL: > > Traceback (most recent call last): > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in > main > > func(args) > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in > subcmd_worker > > local.service_loop(remote) > > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, > in service_loop > > g3.crawlwrap(oneshot=True) > > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in > crawlwrap > > self.crawl() > > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in > crawl > > self.changelogs_batch_process(changes) > > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in > changelogs_batch_process > > self.process(batch) > > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in > process > > self.process_change(change, done, retry) > > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in > process_change > > failures = self.slave.server.entry_ops(entries) > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in > __call__ > > return self.ins(self.meth, *a) > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in > __call__ > > raise res > > OSError: [Errno 13] Permission denied: > '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e' > > > > The permissions look fine. The replication is done via geo user instead of > root. It should be able to read, but I?m not sure if the syncdaemon runs > under geoaccount!? > > > > [root at master vRealize Operation Manager]# ll > /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e > > lrwxrwxrwx. 1 root root 75 Sep 21 09:39 > /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e > -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation > Manager > > > > [root at master vRealize Operation Manager]# ll > /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/ > > total 4 > > drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7 > > drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0 > > drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5 > > [root at master vRealize Operation Manager]# > > > > It could be possible that the folder was renamed. I had 3 similar issues > since I migrated to GlusterFS 4.x but couldn?t investigate much. I needed > to completely wipe GlusterFS and geo-repliction to get rid of this error? > > > > Any help is appreciated. > > > > Regards, > > > > *Christian Kotte* > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > -- > > Thanks and Regards, > > Kotresh H R > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Wed Mar 20 06:33:25 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Wed, 20 Mar 2019 12:03:25 +0530 Subject: [Gluster-users] Docu - how to debug issues In-Reply-To: References: Message-ID: <3696c4f5-0edc-47f1-a949-3ca065a45d16@redhat.com> On 20/03/19 10:29 AM, Strahil wrote: > > Hello Community, > > Is there a docu page clearing what information is needed to be > gathered in advance in order to help the devs resolve issues ? > So far I couldn't find one - but I have missed that. > volume info, gluster version of the clients/servers and all gluster logs under /var/log/glusterfs/ are the first things that you would need to provide if you were to raise a? bugzilla or a github issue. After that, it is mostly issue specific. Some pointers are there in https://docs.gluster.org/en/latest/Troubleshooting/. A consistent reproducer is also something that is good to have as it helps speed up the RCA. HTH, Ravi > > If not, it will be nice to have that info posted somewhere. > For example? - FUSE issues -? do 1,2,3... > Same for other client-side issues and then for cluster-side also. > I guess this will save a lot of? 'what is your output of gluster > volume info vol' questions. > > Best Regards, > Strahil Nikolov > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From khiremat at redhat.com Wed Mar 20 06:42:39 2019 From: khiremat at redhat.com (Kotresh Hiremath Ravishankar) Date: Wed, 20 Mar 2019 12:12:39 +0530 Subject: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied In-Reply-To: References: <38A2AC2F-64AF-4323-B079-1EF20D94CFFE@novartis.com> <2A722EAB-68B0-4DFB-97D2-FE69C8065CA5@novartis.com> <9D92A563-55B9-48FB-990D-566737AF09E3@novartis.com> <7D2EB4E6-D0EE-4FB5-BB1F-E2587D82A37F@novartis.com> Message-ID: Hi Andy, This is a issue with non-root geo-rep session and is not fixed yet. Could you please raise a bug for this issue? Thanks, Kotresh HR On Wed, Mar 20, 2019 at 11:53 AM Andy Coates wrote: > We're seeing the same permission denied errors when running as a non-root > geosync user. > > Does anyone know what the underlying issue is? > > On Wed, 26 Sep 2018 at 00:40, Kotte, Christian (Ext) < > christian.kotte at novartis.com> wrote: > >> I changed to replication to use the root user and re-created the >> replication with ?create force?. Now the files and folders were replicated, >> and the permission denied, and New folder error disappeared, but old files >> are not deleted. >> >> >> >> Looks like the history crawl is in some kind of a loop: >> >> >> >> [root at master ~]# gluster volume geo-replication status >> >> >> >> MASTER NODE MASTER VOL MASTER >> BRICK SLAVE USER SLAVE >> SLAVE >> NODE STATUS CRAWL STATUS LAST_SYNCED >> >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >> >> master glustervol1 >> /bricks/brick1/brick root ssh://slave_1::glustervol1 >> slave_1 Active >> Hybrid Crawl N/A >> >> master glustervol1 >> /bricks/brick1/brick root ssh://slave_2::glustervol1 >> slave_2 Active >> Hybrid Crawl N/A >> >> master glustervol1 >> /bricks/brick1/brick root ssh://slave_3::glustervol1 >> slave_3 Active >> Hybrid Crawl N/A >> >> >> >> tail -f >> /var/log/glusterfs/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.log >> >> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line >> 104, in cl_history_changelog >> >> raise ChangelogHistoryNotAvailable() >> >> ChangelogHistoryNotAvailable >> >> [2018-09-25 14:10:44.196011] E [repce(worker >> /bricks/brick1/brick):197:__call__] RepceClient: call failed >> call=29945:139700517484352:1537884644.19 method=history >> error=ChangelogHistoryNotAvailable >> >> [2018-09-25 14:10:44.196405] I [resource(worker >> /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not >> available, using xsync >> >> [2018-09-25 14:10:44.221385] I [master(worker >> /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl >> stime=(0, 0) >> >> [2018-09-25 14:10:44.223382] I [gsyncdstatus(worker >> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl >> Status Change status=Hybrid Crawl >> >> [2018-09-25 14:10:46.225296] I [master(worker >> /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog >> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537884644 >> >> [2018-09-25 14:13:36.157408] I [gsyncd(config-get):297:main] : Using >> session config file >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-25 14:13:36.377880] I [gsyncd(status):297:main] : Using >> session config file >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-25 14:31:10.145035] I [master(worker >> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken >> duration=1212.5316 num_files=1474 job=2 return_code=11 >> >> [2018-09-25 14:31:10.152637] E [syncdutils(worker >> /bricks/brick1/brick):801:errlog] Popen: command returned error cmd=rsync >> -aR0 --inplace --files-from=- --super --stats --numeric-ids >> --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no >> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem >> -p 22 -oControlMaster=auto -S >> /tmp/gsyncd-aux-ssh-gg758Z/caec4d1d03cc28bc1853f692e291164f.sock >> slave_3:/proc/15919/cwd error=11 >> >> [2018-09-25 14:31:10.237371] I [repce(agent >> /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching >> EOF. >> >> [2018-09-25 14:31:10.430820] I >> [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status >> Change status=Faulty >> >> [2018-09-25 14:31:20.541475] I [monitor(monitor):158:monitor] Monitor: >> starting gsyncd worker brick=/bricks/brick1/brick slave_node=slave_3 >> >> [2018-09-25 14:31:20.806518] I [gsyncd(agent >> /bricks/brick1/brick):297:main] : Using session config file >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-25 14:31:20.816536] I [changelogagent(agent >> /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining... >> >> [2018-09-25 14:31:20.821574] I [gsyncd(worker >> /bricks/brick1/brick):297:main] : Using session config file >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-25 14:31:20.882128] I [resource(worker >> /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection >> between master and slave... >> >> [2018-09-25 14:31:24.169857] I [resource(worker >> /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between >> master and slave established. duration=3.2873 >> >> [2018-09-25 14:31:24.170401] I [resource(worker >> /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >> locally... >> >> [2018-09-25 14:31:25.354633] I [resource(worker >> /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume >> duration=1.1839 >> >> [2018-09-25 14:31:25.355073] I [subcmds(worker >> /bricks/brick1/brick):70:subcmd_worker] : Worker spawn successful. >> Acknowledging back to monitor >> >> [2018-09-25 14:31:27.439034] I [master(worker >> /bricks/brick1/brick):1593:register] _GMaster: Working dir >> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick >> >> [2018-09-25 14:31:27.441847] I [resource(worker >> /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time >> time=1537885887 >> >> [2018-09-25 14:31:27.465053] I [gsyncdstatus(worker >> /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status >> Change status=Active >> >> [2018-09-25 14:31:27.471021] I [gsyncdstatus(worker >> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl >> Status Change status=History Crawl >> >> [2018-09-25 14:31:27.471484] I [master(worker >> /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl >> turns=1 stime=(0, 0) entry_stime=None etime=1537885887 >> >> [2018-09-25 14:31:27.472564] E [repce(agent >> /bricks/brick1/brick):105:worker] : call failed: >> >> Traceback (most recent call last): >> >> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in >> worker >> >> res = getattr(self.obj, rmeth)(*in_data[2:]) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line >> 53, in history >> >> num_parallel) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line >> 104, in cl_history_changelog >> >> raise ChangelogHistoryNotAvailable() >> >> ChangelogHistoryNotAvailable >> >> [2018-09-25 14:31:27.480632] E [repce(worker >> /bricks/brick1/brick):197:__call__] RepceClient: call failed >> call=31250:140272364406592:1537885887.47 method=history >> error=ChangelogHistoryNotAvailable >> >> [2018-09-25 14:31:27.480958] I [resource(worker >> /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not >> available, using xsync >> >> [2018-09-25 14:31:27.495117] I [master(worker >> /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl >> stime=(0, 0) >> >> [2018-09-25 14:31:27.502083] I [gsyncdstatus(worker >> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl >> Status Change status=Hybrid Crawl >> >> [2018-09-25 14:31:29.505284] I [master(worker >> /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog >> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537885887 >> >> >> >> tail -f >> /var/log/glusterfs/geo-replication-slaves/glustervol1_slave_3_glustervol1/gsyncd.log >> >> [2018-09-25 13:49:24.141303] I [repce(slave >> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on >> reaching EOF. >> >> [2018-09-25 13:49:36.602051] W [gsyncd(slave >> master/bricks/brick1/brick):293:main] : Session config file not >> exists, using the default config >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-25 13:49:36.629415] I [resource(slave >> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >> locally... >> >> [2018-09-25 13:49:37.701642] I [resource(slave >> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster >> volume duration=1.0718 >> >> [2018-09-25 13:49:37.704282] I [resource(slave >> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening >> >> [2018-09-25 14:10:27.70952] I [repce(slave >> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on >> reaching EOF. >> >> [2018-09-25 14:10:39.632124] W [gsyncd(slave >> master/bricks/brick1/brick):293:main] : Session config file not >> exists, using the default config >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-25 14:10:39.650958] I [resource(slave >> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >> locally... >> >> [2018-09-25 14:10:40.729355] I [resource(slave >> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster >> volume duration=1.0781 >> >> [2018-09-25 14:10:40.730650] I [resource(slave >> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening >> >> [2018-09-25 14:31:10.291064] I [repce(slave >> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on >> reaching EOF. >> >> [2018-09-25 14:31:22.802237] W [gsyncd(slave >> master/bricks/brick1/brick):293:main] : Session config file not >> exists, using the default config >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-25 14:31:22.828418] I [resource(slave >> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >> locally... >> >> [2018-09-25 14:31:23.910206] I [resource(slave >> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster >> volume duration=1.0813 >> >> [2018-09-25 14:31:23.913369] I [resource(slave >> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening >> >> >> >> Any ideas how to resolve this without re-creating everything again? Can I >> reset the changelog history? >> >> >> >> Regards, >> >> Christian >> >> >> >> *From: * on behalf of "Kotte, >> Christian (Ext)" >> *Date: *Monday, 24. September 2018 at 17:20 >> *To: *Kotresh Hiremath Ravishankar >> *Cc: *Gluster Users >> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log >> OSError: [Errno 13] Permission denied >> >> >> >> I don?t configure the permissions of /bricks/brick1/brick/.glusterfs. I >> don?t even see it on the local GlusterFS mount. >> >> >> >> Not sure why the permissions are configured with S and the AD group? >> >> >> >> Regards, >> >> Christian >> >> >> >> *From: * on behalf of "Kotte, >> Christian (Ext)" >> *Date: *Monday, 24. September 2018 at 17:10 >> *To: *Kotresh Hiremath Ravishankar >> *Cc: *Gluster Users >> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log >> OSError: [Errno 13] Permission denied >> >> >> >> Yeah right. I get permission denied. >> >> >> >> [geoaccount at slave ~]$ ll >> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e >> >> ls: cannot access >> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e: >> Permission denied >> >> [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/ >> >> ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/: Permission >> denied >> >> [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/ >> >> ls: cannot access /bricks/brick1/brick/.glusterfs/29/: Permission denied >> >> [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/ >> >> ls: cannot open directory /bricks/brick1/brick/.glusterfs/: Permission >> denied >> >> >> >> [root at slave ~]# ll /bricks/brick1/brick/.glusterfs/29 >> >> total 0 >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 16 >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 33 >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 5e >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 73 >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 b2 >> >> drwx--S---+ 2 root AD+group 50 Sep 21 09:39 d1 >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 d7 >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 e6 >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 eb >> >> [root at slave ~]# >> >> >> >> However, the strange thing is that I could replicate new files and >> folders before. The replication is broken since the ?New folder? was >> created. >> >> >> >> These are the permissions on a dev/test system: >> >> [root at slave-dev ~]# ll /bricks/brick1/brick/.glusterfs/ >> >> total 3136 >> >> drwx------. 44 root root 4096 Aug 22 18:19 00 >> >> drwx------. 50 root root 4096 Sep 12 13:14 01 >> >> drwx------. 54 root root 4096 Sep 13 11:33 02 >> >> drwx------. 59 root root 4096 Aug 22 18:21 03 >> >> drwx------. 60 root root 4096 Sep 12 13:14 04 >> >> drwx------. 68 root root 4096 Aug 24 12:36 05 >> >> drwx------. 56 root root 4096 Aug 22 18:21 06 >> >> drwx------. 46 root root 4096 Aug 22 18:21 07 >> >> drwx------. 51 root root 4096 Aug 22 18:21 08 >> >> drwx------. 42 root root 4096 Aug 22 18:21 09 >> >> drwx------. 44 root root 4096 Sep 13 11:16 0a >> >> >> >> I?ve configured an AD group, SGID bit, and ACLs via Ansible on the local >> mount point. Could this be an issue? Should I avoid configuring the >> permissions on .glusterfs and below? >> >> >> >> # ll /mnt/glustervol1/ >> >> total 12 >> >> drwxrwsr-x+ 4 AD+user AD+group 4096 Jul 13 07:46 Scripts >> >> drwxrwxr-x+ 10 AD+user AD+group 4096 Jun 12 12:03 Software >> >> -rw-rw-r--+ 1 root AD+group 0 Aug 8 08:44 test >> >> drwxr-xr-x+ 6 AD+user AD+group 4096 Apr 18 10:58 tftp >> >> >> >> glusterfs_volumes: >> >> [?] >> >> permissions: >> >> mode: "02775" >> >> owner: root >> >> group: "AD+group" >> >> acl_permissions: rw >> >> [?] >> >> >> >> # root directory is owned by root. >> >> # set permissions to 'g+s' to automatically set the group to "AD+group" >> >> # permissions of individual files will be set by Samba during creation >> >> - name: Configure volume directory permission 1/2 >> >> tags: glusterfs >> >> file: >> >> path: /mnt/{{ item.volume }} >> >> state: directory >> >> mode: "{{ item.permissions.mode }}" >> >> owner: "{{ item.permissions.owner }}" >> >> group: "{{ item.permissions.group }}" >> >> with_items: "{{ glusterfs_volumes }}" >> >> loop_control: >> >> label: "{{ item.volume }}" >> >> when: item.permissions is defined >> >> >> >> # ACL needs to be set to override default umask and grant "AD+group" >> write permissions >> >> - name: Configure volume directory permission 2/2 (ACL) >> >> tags: glusterfs >> >> acl: >> >> path: /mnt/{{ item.volume }} >> >> default: yes >> >> entity: "{{ item.permissions.group }}" >> >> etype: group >> >> permissions: "{{ item.permissions.acl_permissions }}" >> >> state: present >> >> with_items: "{{ glusterfs_volumes }}" >> >> loop_control: >> >> label: "{{ item.volume }}" >> >> when: item.permissions is defined >> >> >> >> Regards, >> >> Christian >> >> >> >> *From: *Kotresh Hiremath Ravishankar >> *Date: *Monday, 24. September 2018 at 16:20 >> *To: *"Kotte, Christian (Ext)" >> *Cc: *Gluster Users >> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log >> OSError: [Errno 13] Permission denied >> >> >> >> I think I am get what's happening. The geo-rep session is non-root. >> >> Could you do readlink on brick path mentioned above >> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e >> >> from a geaccount user and see if you are getting "Permission Denied" >> errors? >> >> >> >> Thanks, >> >> Kotresh HR >> >> >> >> On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) < >> christian.kotte at novartis.com> wrote: >> >> Ok. It happens on all slave nodes (and on the interimmaster as well). >> >> >> >> It?s like I assumed. These are the logs of one of the slaves: >> >> >> >> gsyncd.log >> >> [2018-09-24 13:52:25.418382] I [repce(slave >> slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on >> reaching EOF. >> >> [2018-09-24 13:52:37.95297] W [gsyncd(slave >> slave/bricks/brick1/brick):293:main] : Session config file not exists, >> using the default config >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf >> >> [2018-09-24 13:52:37.109643] I [resource(slave >> slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >> locally... >> >> [2018-09-24 13:52:38.303920] I [resource(slave >> slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster >> volume duration=1.1941 >> >> [2018-09-24 13:52:38.304771] I [resource(slave >> slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening >> >> [2018-09-24 13:52:41.981554] I [resource(slave >> slave/bricks/brick1/brick):598:entry_ops] : Special case: rename on >> mkdir gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e >> entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder' >> >> [2018-09-24 13:52:42.45641] E [repce(slave >> slave/bricks/brick1/brick):105:worker] : call failed: >> >> Traceback (most recent call last): >> >> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in >> worker >> >> res = getattr(self.obj, rmeth)(*in_data[2:]) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599, >> in entry_ops >> >> src_entry = get_slv_dir_path(slv_host, slv_volume, gfid) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line >> 682, in get_slv_dir_path >> >> [ENOENT], [ESTALE]) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line >> 540, in errno_wrap >> >> return call(*arg) >> >> OSError: [Errno 13] Permission denied: >> '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e' >> >> [2018-09-24 13:52:42.81794] I [repce(slave >> slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on >> reaching EOF. >> >> [2018-09-24 13:52:53.459676] W [gsyncd(slave >> slave/bricks/brick1/brick):293:main] : Session config file not exists, >> using the default config >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf >> >> [2018-09-24 13:52:53.473500] I [resource(slave >> slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >> locally... >> >> [2018-09-24 13:52:54.659044] I [resource(slave >> slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume >> duration=1.1854 >> >> [2018-09-24 13:52:54.659837] I [resource(slave >> slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening >> >> >> >> The folder ?New folder? will be created via Samba and it was renamed by >> my colleague right away after creation. >> >> [root at slave glustervol1_slave_glustervol1]# ls >> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/ >> >> [root at slave glustervol1_slave_glustervol1]# ls >> /bricks/brick1/brick/.glusterfs/29/d1/ -al >> >> total 0 >> >> drwx--S---+ 2 root AD+group 50 Sep 21 09:39 . >> >> drwx--S---+ 11 root AD+group 96 Sep 21 09:39 .. >> >> lrwxrwxrwx. 1 root AD+group 75 Sep 21 09:39 >> 29d1d60d-1ad6-45fc-87e0-93d478f7331e -> >> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager >> >> >> >> Creating the folder in >> /bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/, >> but it didn?t change anything. >> >> >> >> mnt-slave-bricks-brick1-brick.log >> >> [2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit] >> 0-glustervol1-client-0: error returned while attempting to connect to >> host:(null), port:0 >> >> [2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit] >> 0-glustervol1-client-0: error returned while attempting to connect to >> host:(null), port:0 >> >> [2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] >> 0-glustervol1-client-0: changing port to 49152 (from 0) >> >> [2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit] >> 0-glustervol1-client-0: error returned while attempting to connect to >> host:(null), port:0 >> >> [2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data] >> (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a) >> [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16) >> [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91) >> [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument] >> >> [2018-09-24 13:51:10.643507] I [MSGID: 114006] >> [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed >> to set process-name in handshake msg >> >> [2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit] >> 0-glustervol1-client-0: error returned while attempting to connect to >> host:(null), port:0 >> >> [2018-09-24 13:51:10.671460] I [MSGID: 114046] >> [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0: >> Connected to glustervol1-client-0, attached to remote volume >> '/bricks/brick1/brick'. >> >> [2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init] >> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel >> 7.22 >> >> [2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync] >> 0-fuse: switched to graph 0 >> >> [2018-09-24 13:51:10.673329] I [MSGID: 109005] >> [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory >> selfheal failed: Unable to form layout for directory / >> >> [2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc] >> 0-fuse: initating unmount of >> /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E >> >> [2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit] >> (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25] >> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-: >> received signum (15), shutting down >> >> [2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse: >> Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'. >> >> [2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing >> fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'. >> >> >> >> Regards, >> >> Christian >> >> >> >> *From: *Kotresh Hiremath Ravishankar >> *Date: *Saturday, 22. September 2018 at 06:52 >> *To: *"Kotte, Christian (Ext)" >> *Cc: *Gluster Users >> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log >> OSError: [Errno 13] Permission denied >> >> >> >> The problem occured on slave side whose error is propagated to master. >> Mostly any traceback with repce involved is related to problem in slave. >> Just check few lines above in the log to find the slave node, the crashed >> worker is connected to and get geo replication logs to further debug. >> >> >> >> >> >> >> >> >> >> >> >> On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), < >> christian.kotte at novartis.com> wrote: >> >> Hi, >> >> >> >> Any idea how to troubleshoot this? >> >> >> >> New folders and files were created on the master and the replication went >> faulty. They were created via Samba. >> >> >> >> Version: GlusterFS 4.1.3 >> >> >> >> [root at master]# gluster volume geo-replication status >> >> >> >> MASTER NODE MASTER VOL MASTER >> BRICK SLAVE USER >> SLAVE SLAVE >> NODE STATUS CRAWL STATUS LAST_SYNCED >> >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> master glustervol1 /bricks/brick1/brick geoaccount >> ssh://geoaccount at slave_1::glustervol1 N/A Faulty >> N/A N/A >> >> master glustervol1 /bricks/brick1/brick geoaccount >> ssh://geoaccount at slave_2::glustervol1 N/A Faulty >> N/A N/A >> >> master glustervol1 /bricks/brick1/brick geoaccount >> ssh://geoaccount at interimmaster::glustervol1 N/A Faulty >> N/A N/A >> >> >> >> The following error is repeatedly logged in the gsyncd.logs: >> >> [2018-09-21 14:26:38.611479] I [repce(agent >> /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching >> EOF. >> >> [2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: >> worker died in startup phase brick=/bricks/brick1/brick >> >> [2018-09-21 14:26:39.214322] I >> [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status >> Change status=Faulty >> >> [2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: >> starting gsyncd worker brick=/bricks/brick1/brick slave_node= >> slave_3 >> >> [2018-09-21 14:26:49.471532] I [gsyncd(agent >> /bricks/brick1/brick):297:main] : Using session config file >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-21 14:26:49.473917] I [changelogagent(agent >> /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining... >> >> [2018-09-21 14:26:49.491359] I [gsyncd(worker >> /bricks/brick1/brick):297:main] : Using session config file >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-21 14:26:49.538049] I [resource(worker >> /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection >> between master and slave... >> >> [2018-09-21 14:26:53.5017] I [resource(worker >> /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between >> master and slave established. duration=3.4665 >> >> [2018-09-21 14:26:53.5419] I [resource(worker >> /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >> locally... >> >> [2018-09-21 14:26:54.120374] I [resource(worker >> /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume >> duration=1.1146 >> >> [2018-09-21 14:26:54.121012] I [subcmds(worker >> /bricks/brick1/brick):70:subcmd_worker] : Worker spawn successful. >> Acknowledging back to monitor >> >> [2018-09-21 14:26:56.144460] I [master(worker >> /bricks/brick1/brick):1593:register] _GMaster: Working dir >> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick >> >> [2018-09-21 14:26:56.145145] I [resource(worker >> /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time >> time=1537540016 >> >> [2018-09-21 14:26:56.160064] I [gsyncdstatus(worker >> /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change >> status=Active >> >> [2018-09-21 14:26:56.161175] I [gsyncdstatus(worker >> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl >> Status Change status=History Crawl >> >> [2018-09-21 14:26:56.161536] I [master(worker >> /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl >> turns=1 stime=(1537522637, 0) entry_stime=(1537537141, 0) >> etime=1537540016 >> >> [2018-09-21 14:26:56.164277] I [master(worker >> /bricks/brick1/brick):1536:crawl] _GMaster: slave's time >> stime=(1537522637, 0) >> >> [2018-09-21 14:26:56.197065] I [master(worker >> /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed >> entry ops to_changelog=1537522638 num_changelogs=1 >> from_changelog=1537522638 >> >> [2018-09-21 14:26:56.197402] I [master(worker >> /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 >> MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=1 >> >> [2018-09-21 14:26:56.197623] I [master(worker >> /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken >> SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284 DATA=0 >> XATT=0 >> >> [2018-09-21 14:26:56.198230] I [master(worker >> /bricks/brick1/brick):1394:process] _GMaster: Batch Completed >> changelog_end=1537522638 entry_stime=(1537537141, 0) >> changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333 >> num_changelogs=1 mode=history_changelog >> >> [2018-09-21 14:26:57.200436] I [master(worker >> /bricks/brick1/brick):1536:crawl] _GMaster: slave's time >> stime=(1537522637, 0) >> >> [2018-09-21 14:26:57.528625] E [repce(worker >> /bricks/brick1/brick):197:__call__] RepceClient: call failed >> call=17209:140650361157440:1537540017.21 method=entry_ops >> error=OSError >> >> [2018-09-21 14:26:57.529371] E [syncdutils(worker >> /bricks/brick1/brick):332:log_raise_exception] : FAIL: >> >> Traceback (most recent call last): >> >> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in >> main >> >> func(args) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in >> subcmd_worker >> >> local.service_loop(remote) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, >> in service_loop >> >> g3.crawlwrap(oneshot=True) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in >> crawlwrap >> >> self.crawl() >> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, >> in crawl >> >> self.changelogs_batch_process(changes) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, >> in changelogs_batch_process >> >> self.process(batch) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, >> in process >> >> self.process_change(change, done, retry) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, >> in process_change >> >> failures = self.slave.server.entry_ops(entries) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in >> __call__ >> >> return self.ins(self.meth, *a) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in >> __call__ >> >> raise res >> >> OSError: [Errno 13] Permission denied: >> '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e' >> >> >> >> The permissions look fine. The replication is done via geo user instead >> of root. It should be able to read, but I?m not sure if the syncdaemon runs >> under geoaccount!? >> >> >> >> [root at master vRealize Operation Manager]# ll >> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e >> >> lrwxrwxrwx. 1 root root 75 Sep 21 09:39 >> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e >> -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation >> Manager >> >> >> >> [root at master vRealize Operation Manager]# ll >> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/ >> >> total 4 >> >> drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7 >> >> drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0 >> >> drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5 >> >> [root at master vRealize Operation Manager]# >> >> >> >> It could be possible that the folder was renamed. I had 3 similar issues >> since I migrated to GlusterFS 4.x but couldn?t investigate much. I needed >> to completely wipe GlusterFS and geo-repliction to get rid of this error? >> >> >> >> Any help is appreciated. >> >> >> >> Regards, >> >> >> >> *Christian Kotte* >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> >> -- >> >> Thanks and Regards, >> >> Kotresh H R >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: From andy.coates at gmail.com Wed Mar 20 07:04:51 2019 From: andy.coates at gmail.com (Andy Coates) Date: Wed, 20 Mar 2019 18:04:51 +1100 Subject: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied In-Reply-To: References: <38A2AC2F-64AF-4323-B079-1EF20D94CFFE@novartis.com> <2A722EAB-68B0-4DFB-97D2-FE69C8065CA5@novartis.com> <9D92A563-55B9-48FB-990D-566737AF09E3@novartis.com> <7D2EB4E6-D0EE-4FB5-BB1F-E2587D82A37F@novartis.com> Message-ID: I'll raise a bug as soon as I get chance. I'm trying to think back to what has changed though. We started on 4.1.3 if I recall and geo-rep was working. We've ended up on 4.1.6 when we noticed the issue after blowing away a slave environment, deleting existing sessions, and then starting the geo-rep from scratch. It's very peculiar because the mountbroker seems to be happily mounting and geosync user can access that mount point and write to it etc, and the source directory structure of the mount point appears to be setup, but then something triggers the exception when it tries to read .glusterfs folder. We can't swap to root user geo-rep due to security policies, so we're stuck with a broken replication. On Wed, 20 Mar 2019 at 17:42, Kotresh Hiremath Ravishankar < khiremat at redhat.com> wrote: > Hi Andy, > > This is a issue with non-root geo-rep session and is not fixed yet. Could > you please raise a bug for this issue? > > Thanks, > Kotresh HR > > On Wed, Mar 20, 2019 at 11:53 AM Andy Coates > wrote: > >> We're seeing the same permission denied errors when running as a non-root >> geosync user. >> >> Does anyone know what the underlying issue is? >> >> On Wed, 26 Sep 2018 at 00:40, Kotte, Christian (Ext) < >> christian.kotte at novartis.com> wrote: >> >>> I changed to replication to use the root user and re-created the >>> replication with ?create force?. Now the files and folders were replicated, >>> and the permission denied, and New folder error disappeared, but old files >>> are not deleted. >>> >>> >>> >>> Looks like the history crawl is in some kind of a loop: >>> >>> >>> >>> [root at master ~]# gluster volume geo-replication status >>> >>> >>> >>> MASTER NODE MASTER VOL MASTER >>> BRICK SLAVE USER SLAVE >>> SLAVE >>> NODE STATUS CRAWL STATUS LAST_SYNCED >>> >>> >>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >>> >>> master glustervol1 >>> /bricks/brick1/brick root ssh://slave_1::glustervol1 >>> slave_1 Active >>> Hybrid Crawl N/A >>> >>> master glustervol1 >>> /bricks/brick1/brick root ssh://slave_2::glustervol1 >>> slave_2 Active >>> Hybrid Crawl N/A >>> >>> master glustervol1 >>> /bricks/brick1/brick root ssh://slave_3::glustervol1 >>> slave_3 Active >>> Hybrid Crawl N/A >>> >>> >>> >>> tail -f >>> /var/log/glusterfs/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.log >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", >>> line 104, in cl_history_changelog >>> >>> raise ChangelogHistoryNotAvailable() >>> >>> ChangelogHistoryNotAvailable >>> >>> [2018-09-25 14:10:44.196011] E [repce(worker >>> /bricks/brick1/brick):197:__call__] RepceClient: call failed >>> call=29945:139700517484352:1537884644.19 method=history >>> error=ChangelogHistoryNotAvailable >>> >>> [2018-09-25 14:10:44.196405] I [resource(worker >>> /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not >>> available, using xsync >>> >>> [2018-09-25 14:10:44.221385] I [master(worker >>> /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl >>> stime=(0, 0) >>> >>> [2018-09-25 14:10:44.223382] I [gsyncdstatus(worker >>> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl >>> Status Change status=Hybrid Crawl >>> >>> [2018-09-25 14:10:46.225296] I [master(worker >>> /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog >>> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537884644 >>> >>> [2018-09-25 14:13:36.157408] I [gsyncd(config-get):297:main] : >>> Using session config file >>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >>> >>> [2018-09-25 14:13:36.377880] I [gsyncd(status):297:main] : Using >>> session config file >>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >>> >>> [2018-09-25 14:31:10.145035] I [master(worker >>> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken >>> duration=1212.5316 num_files=1474 job=2 return_code=11 >>> >>> [2018-09-25 14:31:10.152637] E [syncdutils(worker >>> /bricks/brick1/brick):801:errlog] Popen: command returned error cmd=rsync >>> -aR0 --inplace --files-from=- --super --stats --numeric-ids >>> --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no >>> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem >>> -p 22 -oControlMaster=auto -S >>> /tmp/gsyncd-aux-ssh-gg758Z/caec4d1d03cc28bc1853f692e291164f.sock >>> slave_3:/proc/15919/cwd error=11 >>> >>> [2018-09-25 14:31:10.237371] I [repce(agent >>> /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching >>> EOF. >>> >>> [2018-09-25 14:31:10.430820] I >>> [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status >>> Change status=Faulty >>> >>> [2018-09-25 14:31:20.541475] I [monitor(monitor):158:monitor] Monitor: >>> starting gsyncd worker brick=/bricks/brick1/brick slave_node=slave_3 >>> >>> [2018-09-25 14:31:20.806518] I [gsyncd(agent >>> /bricks/brick1/brick):297:main] : Using session config file >>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >>> >>> [2018-09-25 14:31:20.816536] I [changelogagent(agent >>> /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining... >>> >>> [2018-09-25 14:31:20.821574] I [gsyncd(worker >>> /bricks/brick1/brick):297:main] : Using session config file >>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >>> >>> [2018-09-25 14:31:20.882128] I [resource(worker >>> /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection >>> between master and slave... >>> >>> [2018-09-25 14:31:24.169857] I [resource(worker >>> /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between >>> master and slave established. duration=3.2873 >>> >>> [2018-09-25 14:31:24.170401] I [resource(worker >>> /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >>> locally... >>> >>> [2018-09-25 14:31:25.354633] I [resource(worker >>> /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume >>> duration=1.1839 >>> >>> [2018-09-25 14:31:25.355073] I [subcmds(worker >>> /bricks/brick1/brick):70:subcmd_worker] : Worker spawn successful. >>> Acknowledging back to monitor >>> >>> [2018-09-25 14:31:27.439034] I [master(worker >>> /bricks/brick1/brick):1593:register] _GMaster: Working dir >>> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick >>> >>> [2018-09-25 14:31:27.441847] I [resource(worker >>> /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time >>> time=1537885887 >>> >>> [2018-09-25 14:31:27.465053] I [gsyncdstatus(worker >>> /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status >>> Change status=Active >>> >>> [2018-09-25 14:31:27.471021] I [gsyncdstatus(worker >>> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl >>> Status Change status=History Crawl >>> >>> [2018-09-25 14:31:27.471484] I [master(worker >>> /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl >>> turns=1 stime=(0, 0) entry_stime=None etime=1537885887 >>> >>> [2018-09-25 14:31:27.472564] E [repce(agent >>> /bricks/brick1/brick):105:worker] : call failed: >>> >>> Traceback (most recent call last): >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in >>> worker >>> >>> res = getattr(self.obj, rmeth)(*in_data[2:]) >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", >>> line 53, in history >>> >>> num_parallel) >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", >>> line 104, in cl_history_changelog >>> >>> raise ChangelogHistoryNotAvailable() >>> >>> ChangelogHistoryNotAvailable >>> >>> [2018-09-25 14:31:27.480632] E [repce(worker >>> /bricks/brick1/brick):197:__call__] RepceClient: call failed >>> call=31250:140272364406592:1537885887.47 method=history >>> error=ChangelogHistoryNotAvailable >>> >>> [2018-09-25 14:31:27.480958] I [resource(worker >>> /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not >>> available, using xsync >>> >>> [2018-09-25 14:31:27.495117] I [master(worker >>> /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl >>> stime=(0, 0) >>> >>> [2018-09-25 14:31:27.502083] I [gsyncdstatus(worker >>> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl >>> Status Change status=Hybrid Crawl >>> >>> [2018-09-25 14:31:29.505284] I [master(worker >>> /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog >>> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537885887 >>> >>> >>> >>> tail -f >>> /var/log/glusterfs/geo-replication-slaves/glustervol1_slave_3_glustervol1/gsyncd.log >>> >>> [2018-09-25 13:49:24.141303] I [repce(slave >>> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on >>> reaching EOF. >>> >>> [2018-09-25 13:49:36.602051] W [gsyncd(slave >>> master/bricks/brick1/brick):293:main] : Session config file not >>> exists, using the default config >>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >>> >>> [2018-09-25 13:49:36.629415] I [resource(slave >>> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >>> locally... >>> >>> [2018-09-25 13:49:37.701642] I [resource(slave >>> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster >>> volume duration=1.0718 >>> >>> [2018-09-25 13:49:37.704282] I [resource(slave >>> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening >>> >>> [2018-09-25 14:10:27.70952] I [repce(slave >>> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on >>> reaching EOF. >>> >>> [2018-09-25 14:10:39.632124] W [gsyncd(slave >>> master/bricks/brick1/brick):293:main] : Session config file not >>> exists, using the default config >>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >>> >>> [2018-09-25 14:10:39.650958] I [resource(slave >>> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >>> locally... >>> >>> [2018-09-25 14:10:40.729355] I [resource(slave >>> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster >>> volume duration=1.0781 >>> >>> [2018-09-25 14:10:40.730650] I [resource(slave >>> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening >>> >>> [2018-09-25 14:31:10.291064] I [repce(slave >>> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on >>> reaching EOF. >>> >>> [2018-09-25 14:31:22.802237] W [gsyncd(slave >>> master/bricks/brick1/brick):293:main] : Session config file not >>> exists, using the default config >>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >>> >>> [2018-09-25 14:31:22.828418] I [resource(slave >>> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >>> locally... >>> >>> [2018-09-25 14:31:23.910206] I [resource(slave >>> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster >>> volume duration=1.0813 >>> >>> [2018-09-25 14:31:23.913369] I [resource(slave >>> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening >>> >>> >>> >>> Any ideas how to resolve this without re-creating everything again? Can >>> I reset the changelog history? >>> >>> >>> >>> Regards, >>> >>> Christian >>> >>> >>> >>> *From: * on behalf of "Kotte, >>> Christian (Ext)" >>> *Date: *Monday, 24. September 2018 at 17:20 >>> *To: *Kotresh Hiremath Ravishankar >>> *Cc: *Gluster Users >>> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - >>> gsyncd.log OSError: [Errno 13] Permission denied >>> >>> >>> >>> I don?t configure the permissions of /bricks/brick1/brick/.glusterfs. I >>> don?t even see it on the local GlusterFS mount. >>> >>> >>> >>> Not sure why the permissions are configured with S and the AD group? >>> >>> >>> >>> Regards, >>> >>> Christian >>> >>> >>> >>> *From: * on behalf of "Kotte, >>> Christian (Ext)" >>> *Date: *Monday, 24. September 2018 at 17:10 >>> *To: *Kotresh Hiremath Ravishankar >>> *Cc: *Gluster Users >>> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - >>> gsyncd.log OSError: [Errno 13] Permission denied >>> >>> >>> >>> Yeah right. I get permission denied. >>> >>> >>> >>> [geoaccount at slave ~]$ ll >>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e >>> >>> ls: cannot access >>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e: >>> Permission denied >>> >>> [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/ >>> >>> ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/: Permission >>> denied >>> >>> [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/ >>> >>> ls: cannot access /bricks/brick1/brick/.glusterfs/29/: Permission denied >>> >>> [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/ >>> >>> ls: cannot open directory /bricks/brick1/brick/.glusterfs/: Permission >>> denied >>> >>> >>> >>> [root at slave ~]# ll /bricks/brick1/brick/.glusterfs/29 >>> >>> total 0 >>> >>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 16 >>> >>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 33 >>> >>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 5e >>> >>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 73 >>> >>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 b2 >>> >>> drwx--S---+ 2 root AD+group 50 Sep 21 09:39 d1 >>> >>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 d7 >>> >>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 e6 >>> >>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 eb >>> >>> [root at slave ~]# >>> >>> >>> >>> However, the strange thing is that I could replicate new files and >>> folders before. The replication is broken since the ?New folder? was >>> created. >>> >>> >>> >>> These are the permissions on a dev/test system: >>> >>> [root at slave-dev ~]# ll /bricks/brick1/brick/.glusterfs/ >>> >>> total 3136 >>> >>> drwx------. 44 root root 4096 Aug 22 18:19 00 >>> >>> drwx------. 50 root root 4096 Sep 12 13:14 01 >>> >>> drwx------. 54 root root 4096 Sep 13 11:33 02 >>> >>> drwx------. 59 root root 4096 Aug 22 18:21 03 >>> >>> drwx------. 60 root root 4096 Sep 12 13:14 04 >>> >>> drwx------. 68 root root 4096 Aug 24 12:36 05 >>> >>> drwx------. 56 root root 4096 Aug 22 18:21 06 >>> >>> drwx------. 46 root root 4096 Aug 22 18:21 07 >>> >>> drwx------. 51 root root 4096 Aug 22 18:21 08 >>> >>> drwx------. 42 root root 4096 Aug 22 18:21 09 >>> >>> drwx------. 44 root root 4096 Sep 13 11:16 0a >>> >>> >>> >>> I?ve configured an AD group, SGID bit, and ACLs via Ansible on the local >>> mount point. Could this be an issue? Should I avoid configuring the >>> permissions on .glusterfs and below? >>> >>> >>> >>> # ll /mnt/glustervol1/ >>> >>> total 12 >>> >>> drwxrwsr-x+ 4 AD+user AD+group 4096 Jul 13 07:46 Scripts >>> >>> drwxrwxr-x+ 10 AD+user AD+group 4096 Jun 12 12:03 Software >>> >>> -rw-rw-r--+ 1 root AD+group 0 Aug 8 08:44 test >>> >>> drwxr-xr-x+ 6 AD+user AD+group 4096 Apr 18 10:58 tftp >>> >>> >>> >>> glusterfs_volumes: >>> >>> [?] >>> >>> permissions: >>> >>> mode: "02775" >>> >>> owner: root >>> >>> group: "AD+group" >>> >>> acl_permissions: rw >>> >>> [?] >>> >>> >>> >>> # root directory is owned by root. >>> >>> # set permissions to 'g+s' to automatically set the group to "AD+group" >>> >>> # permissions of individual files will be set by Samba during creation >>> >>> - name: Configure volume directory permission 1/2 >>> >>> tags: glusterfs >>> >>> file: >>> >>> path: /mnt/{{ item.volume }} >>> >>> state: directory >>> >>> mode: "{{ item.permissions.mode }}" >>> >>> owner: "{{ item.permissions.owner }}" >>> >>> group: "{{ item.permissions.group }}" >>> >>> with_items: "{{ glusterfs_volumes }}" >>> >>> loop_control: >>> >>> label: "{{ item.volume }}" >>> >>> when: item.permissions is defined >>> >>> >>> >>> # ACL needs to be set to override default umask and grant "AD+group" >>> write permissions >>> >>> - name: Configure volume directory permission 2/2 (ACL) >>> >>> tags: glusterfs >>> >>> acl: >>> >>> path: /mnt/{{ item.volume }} >>> >>> default: yes >>> >>> entity: "{{ item.permissions.group }}" >>> >>> etype: group >>> >>> permissions: "{{ item.permissions.acl_permissions }}" >>> >>> state: present >>> >>> with_items: "{{ glusterfs_volumes }}" >>> >>> loop_control: >>> >>> label: "{{ item.volume }}" >>> >>> when: item.permissions is defined >>> >>> >>> >>> Regards, >>> >>> Christian >>> >>> >>> >>> *From: *Kotresh Hiremath Ravishankar >>> *Date: *Monday, 24. September 2018 at 16:20 >>> *To: *"Kotte, Christian (Ext)" >>> *Cc: *Gluster Users >>> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - >>> gsyncd.log OSError: [Errno 13] Permission denied >>> >>> >>> >>> I think I am get what's happening. The geo-rep session is non-root. >>> >>> Could you do readlink on brick path mentioned above >>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e >>> >>> from a geaccount user and see if you are getting "Permission Denied" >>> errors? >>> >>> >>> >>> Thanks, >>> >>> Kotresh HR >>> >>> >>> >>> On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) < >>> christian.kotte at novartis.com> wrote: >>> >>> Ok. It happens on all slave nodes (and on the interimmaster as well). >>> >>> >>> >>> It?s like I assumed. These are the logs of one of the slaves: >>> >>> >>> >>> gsyncd.log >>> >>> [2018-09-24 13:52:25.418382] I [repce(slave >>> slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on >>> reaching EOF. >>> >>> [2018-09-24 13:52:37.95297] W [gsyncd(slave >>> slave/bricks/brick1/brick):293:main] : Session config file not exists, >>> using the default config >>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf >>> >>> [2018-09-24 13:52:37.109643] I [resource(slave >>> slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >>> locally... >>> >>> [2018-09-24 13:52:38.303920] I [resource(slave >>> slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster >>> volume duration=1.1941 >>> >>> [2018-09-24 13:52:38.304771] I [resource(slave >>> slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening >>> >>> [2018-09-24 13:52:41.981554] I [resource(slave >>> slave/bricks/brick1/brick):598:entry_ops] : Special case: rename on >>> mkdir gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e >>> entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder' >>> >>> [2018-09-24 13:52:42.45641] E [repce(slave >>> slave/bricks/brick1/brick):105:worker] : call failed: >>> >>> Traceback (most recent call last): >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in >>> worker >>> >>> res = getattr(self.obj, rmeth)(*in_data[2:]) >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599, >>> in entry_ops >>> >>> src_entry = get_slv_dir_path(slv_host, slv_volume, gfid) >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line >>> 682, in get_slv_dir_path >>> >>> [ENOENT], [ESTALE]) >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line >>> 540, in errno_wrap >>> >>> return call(*arg) >>> >>> OSError: [Errno 13] Permission denied: >>> '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e' >>> >>> [2018-09-24 13:52:42.81794] I [repce(slave >>> slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on >>> reaching EOF. >>> >>> [2018-09-24 13:52:53.459676] W [gsyncd(slave >>> slave/bricks/brick1/brick):293:main] : Session config file not exists, >>> using the default config >>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf >>> >>> [2018-09-24 13:52:53.473500] I [resource(slave >>> slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >>> locally... >>> >>> [2018-09-24 13:52:54.659044] I [resource(slave >>> slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume >>> duration=1.1854 >>> >>> [2018-09-24 13:52:54.659837] I [resource(slave >>> slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening >>> >>> >>> >>> The folder ?New folder? will be created via Samba and it was renamed by >>> my colleague right away after creation. >>> >>> [root at slave glustervol1_slave_glustervol1]# ls >>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/ >>> >>> [root at slave glustervol1_slave_glustervol1]# ls >>> /bricks/brick1/brick/.glusterfs/29/d1/ -al >>> >>> total 0 >>> >>> drwx--S---+ 2 root AD+group 50 Sep 21 09:39 . >>> >>> drwx--S---+ 11 root AD+group 96 Sep 21 09:39 .. >>> >>> lrwxrwxrwx. 1 root AD+group 75 Sep 21 09:39 >>> 29d1d60d-1ad6-45fc-87e0-93d478f7331e -> >>> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager >>> >>> >>> >>> Creating the folder in >>> /bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/, >>> but it didn?t change anything. >>> >>> >>> >>> mnt-slave-bricks-brick1-brick.log >>> >>> [2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit] >>> 0-glustervol1-client-0: error returned while attempting to connect to >>> host:(null), port:0 >>> >>> [2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit] >>> 0-glustervol1-client-0: error returned while attempting to connect to >>> host:(null), port:0 >>> >>> [2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] >>> 0-glustervol1-client-0: changing port to 49152 (from 0) >>> >>> [2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit] >>> 0-glustervol1-client-0: error returned while attempting to connect to >>> host:(null), port:0 >>> >>> [2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data] >>> (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a) >>> [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16) >>> [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91) >>> [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument] >>> >>> [2018-09-24 13:51:10.643507] I [MSGID: 114006] >>> [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed >>> to set process-name in handshake msg >>> >>> [2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit] >>> 0-glustervol1-client-0: error returned while attempting to connect to >>> host:(null), port:0 >>> >>> [2018-09-24 13:51:10.671460] I [MSGID: 114046] >>> [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0: >>> Connected to glustervol1-client-0, attached to remote volume >>> '/bricks/brick1/brick'. >>> >>> [2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init] >>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel >>> 7.22 >>> >>> [2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync] >>> 0-fuse: switched to graph 0 >>> >>> [2018-09-24 13:51:10.673329] I [MSGID: 109005] >>> [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory >>> selfheal failed: Unable to form layout for directory / >>> >>> [2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc] >>> 0-fuse: initating unmount of >>> /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E >>> >>> [2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit] >>> (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25] >>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65] >>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-: >>> received signum (15), shutting down >>> >>> [2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse: >>> Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'. >>> >>> [2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing >>> fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'. >>> >>> >>> >>> Regards, >>> >>> Christian >>> >>> >>> >>> *From: *Kotresh Hiremath Ravishankar >>> *Date: *Saturday, 22. September 2018 at 06:52 >>> *To: *"Kotte, Christian (Ext)" >>> *Cc: *Gluster Users >>> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - >>> gsyncd.log OSError: [Errno 13] Permission denied >>> >>> >>> >>> The problem occured on slave side whose error is propagated to master. >>> Mostly any traceback with repce involved is related to problem in slave. >>> Just check few lines above in the log to find the slave node, the crashed >>> worker is connected to and get geo replication logs to further debug. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), < >>> christian.kotte at novartis.com> wrote: >>> >>> Hi, >>> >>> >>> >>> Any idea how to troubleshoot this? >>> >>> >>> >>> New folders and files were created on the master and the replication >>> went faulty. They were created via Samba. >>> >>> >>> >>> Version: GlusterFS 4.1.3 >>> >>> >>> >>> [root at master]# gluster volume geo-replication status >>> >>> >>> >>> MASTER NODE MASTER VOL MASTER >>> BRICK SLAVE USER >>> SLAVE SLAVE >>> NODE STATUS CRAWL STATUS LAST_SYNCED >>> >>> >>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> >>> master glustervol1 /bricks/brick1/brick geoaccount >>> ssh://geoaccount at slave_1::glustervol1 N/A Faulty >>> N/A N/A >>> >>> master glustervol1 /bricks/brick1/brick geoaccount >>> ssh://geoaccount at slave_2::glustervol1 N/A Faulty >>> N/A N/A >>> >>> master glustervol1 /bricks/brick1/brick geoaccount >>> ssh://geoaccount at interimmaster::glustervol1 N/A Faulty >>> N/A N/A >>> >>> >>> >>> The following error is repeatedly logged in the gsyncd.logs: >>> >>> [2018-09-21 14:26:38.611479] I [repce(agent >>> /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching >>> EOF. >>> >>> [2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: >>> worker died in startup phase brick=/bricks/brick1/brick >>> >>> [2018-09-21 14:26:39.214322] I >>> [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status >>> Change status=Faulty >>> >>> [2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: >>> starting gsyncd worker brick=/bricks/brick1/brick slave_node= >>> slave_3 >>> >>> [2018-09-21 14:26:49.471532] I [gsyncd(agent >>> /bricks/brick1/brick):297:main] : Using session config file >>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >>> >>> [2018-09-21 14:26:49.473917] I [changelogagent(agent >>> /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining... >>> >>> [2018-09-21 14:26:49.491359] I [gsyncd(worker >>> /bricks/brick1/brick):297:main] : Using session config file >>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >>> >>> [2018-09-21 14:26:49.538049] I [resource(worker >>> /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection >>> between master and slave... >>> >>> [2018-09-21 14:26:53.5017] I [resource(worker >>> /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between >>> master and slave established. duration=3.4665 >>> >>> [2018-09-21 14:26:53.5419] I [resource(worker >>> /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >>> locally... >>> >>> [2018-09-21 14:26:54.120374] I [resource(worker >>> /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume >>> duration=1.1146 >>> >>> [2018-09-21 14:26:54.121012] I [subcmds(worker >>> /bricks/brick1/brick):70:subcmd_worker] : Worker spawn successful. >>> Acknowledging back to monitor >>> >>> [2018-09-21 14:26:56.144460] I [master(worker >>> /bricks/brick1/brick):1593:register] _GMaster: Working dir >>> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick >>> >>> [2018-09-21 14:26:56.145145] I [resource(worker >>> /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time >>> time=1537540016 >>> >>> [2018-09-21 14:26:56.160064] I [gsyncdstatus(worker >>> /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change >>> status=Active >>> >>> [2018-09-21 14:26:56.161175] I [gsyncdstatus(worker >>> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl >>> Status Change status=History Crawl >>> >>> [2018-09-21 14:26:56.161536] I [master(worker >>> /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl >>> turns=1 stime=(1537522637, 0) entry_stime=(1537537141, 0) >>> etime=1537540016 >>> >>> [2018-09-21 14:26:56.164277] I [master(worker >>> /bricks/brick1/brick):1536:crawl] _GMaster: slave's time >>> stime=(1537522637, 0) >>> >>> [2018-09-21 14:26:56.197065] I [master(worker >>> /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed >>> entry ops to_changelog=1537522638 num_changelogs=1 >>> from_changelog=1537522638 >>> >>> [2018-09-21 14:26:56.197402] I [master(worker >>> /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 >>> MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=1 >>> >>> [2018-09-21 14:26:56.197623] I [master(worker >>> /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken >>> SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284 DATA=0 >>> XATT=0 >>> >>> [2018-09-21 14:26:56.198230] I [master(worker >>> /bricks/brick1/brick):1394:process] _GMaster: Batch Completed >>> changelog_end=1537522638 entry_stime=(1537537141, 0) >>> changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333 >>> num_changelogs=1 mode=history_changelog >>> >>> [2018-09-21 14:26:57.200436] I [master(worker >>> /bricks/brick1/brick):1536:crawl] _GMaster: slave's time >>> stime=(1537522637, 0) >>> >>> [2018-09-21 14:26:57.528625] E [repce(worker >>> /bricks/brick1/brick):197:__call__] RepceClient: call failed >>> call=17209:140650361157440:1537540017.21 method=entry_ops >>> error=OSError >>> >>> [2018-09-21 14:26:57.529371] E [syncdutils(worker >>> /bricks/brick1/brick):332:log_raise_exception] : FAIL: >>> >>> Traceback (most recent call last): >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, >>> in main >>> >>> func(args) >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, >>> in subcmd_worker >>> >>> local.service_loop(remote) >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line >>> 1288, in service_loop >>> >>> g3.crawlwrap(oneshot=True) >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, >>> in crawlwrap >>> >>> self.crawl() >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, >>> in crawl >>> >>> self.changelogs_batch_process(changes) >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, >>> in changelogs_batch_process >>> >>> self.process(batch) >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, >>> in process >>> >>> self.process_change(change, done, retry) >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, >>> in process_change >>> >>> failures = self.slave.server.entry_ops(entries) >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in >>> __call__ >>> >>> return self.ins(self.meth, *a) >>> >>> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in >>> __call__ >>> >>> raise res >>> >>> OSError: [Errno 13] Permission denied: >>> '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e' >>> >>> >>> >>> The permissions look fine. The replication is done via geo user instead >>> of root. It should be able to read, but I?m not sure if the syncdaemon runs >>> under geoaccount!? >>> >>> >>> >>> [root at master vRealize Operation Manager]# ll >>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e >>> >>> lrwxrwxrwx. 1 root root 75 Sep 21 09:39 >>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e >>> -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation >>> Manager >>> >>> >>> >>> [root at master vRealize Operation Manager]# ll >>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/ >>> >>> total 4 >>> >>> drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7 >>> >>> drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0 >>> >>> drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5 >>> >>> [root at master vRealize Operation Manager]# >>> >>> >>> >>> It could be possible that the folder was renamed. I had 3 similar issues >>> since I migrated to GlusterFS 4.x but couldn?t investigate much. I needed >>> to completely wipe GlusterFS and geo-repliction to get rid of this error? >>> >>> >>> >>> Any help is appreciated. >>> >>> >>> >>> Regards, >>> >>> >>> >>> *Christian Kotte* >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> >>> -- >>> >>> Thanks and Regards, >>> >>> Kotresh H R >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Thanks and Regards, > Kotresh H R > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jahernan at redhat.com Wed Mar 20 07:53:33 2019 From: jahernan at redhat.com (Xavi Hernandez) Date: Wed, 20 Mar 2019 08:53:33 +0100 Subject: [Gluster-users] Constant fuse client crashes "fixed" by setting performance.write-behind: off. Any hope for a 4.1.8 release? In-Reply-To: References: <153001d4ddc1$80a1eff0$81e5cfd0$@thinkhuge.net> Message-ID: Can you provide the core dump, as well as details of the system where it crashed (distribution, version, list of packages, ...), so that I can analyze it ? Thanks, Xavi On Tue, Mar 19, 2019 at 3:56 PM Artem Russakovskii wrote: > I upgraded the node that was crashing to 5.5 yesterday. Today, it got > another crash. This is a 1x4 replicate cluster, you can find the config > mentioned in my previous reports, and Amar should have it as well. Here's > the log: > > ==> mnt-_data1.log <== > The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] > 0-_data1-replicate-0: selecting local read_child > _data1-client-3" repeated 4 times between [2019-03-19 > 14:40:50.741147] and [2019-03-19 14:40:56.874832] > pending frames: > frame : type(1) op(LOOKUP) > frame : type(1) op(LOOKUP) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(1) op(READ) > frame : type(0) op(0) > patchset: git://git.gluster.org/glusterfs.git > signal received: 6 > time of crash: > 2019-03-19 14:40:57 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 5.5 > /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7ff841f8364c] > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7ff841f8dd26] > /lib64/libc.so.6(+0x36160)[0x7ff84114a160] > /lib64/libc.so.6(gsignal+0x110)[0x7ff84114a0e0] > /lib64/libc.so.6(abort+0x151)[0x7ff84114b6c1] > /lib64/libc.so.6(+0x2e6fa)[0x7ff8411426fa] > /lib64/libc.so.6(+0x2e772)[0x7ff841142772] > /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7ff8414d80b8] > > /usr/lib64/glusterfs/5.5/xlator/cluster/replicate.so(+0x5de3d)[0x7ff839fbae3d] > > /usr/lib64/glusterfs/5.5/xlator/cluster/replicate.so(+0x70d51)[0x7ff839fcdd51] > > /usr/lib64/glusterfs/5.5/xlator/protocol/client.so(+0x58e1f)[0x7ff83a252e1f] > /usr/lib64/libgfrpc.so.0(+0xe820)[0x7ff841d4e820] > /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7ff841d4eb6f] > /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7ff841d4b063] > /usr/lib64/glusterfs/5.5/rpc-transport/socket.so(+0xa0ce)[0x7ff83b9690ce] > /usr/lib64/libglusterfs.so.0(+0x85519)[0x7ff841fe1519] > /lib64/libpthread.so.0(+0x7559)[0x7ff8414d5559] > /lib64/libc.so.6(clone+0x3f)[0x7ff84120c81f] > --------- > > Sincerely, > Artem > > -- > Founder, Android Police , APK Mirror > , Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > | @ArtemR > > > > On Mon, Mar 18, 2019 at 9:46 PM Amar Tumballi Suryanarayan < > atumball at redhat.com> wrote: > >> Due to this issue, along with few other logging issues, we did make a >> glusterfs-5.5 release, which has the fix for particular crash. >> >> Regards, >> Amar >> >> On Tue, 19 Mar, 2019, 1:04 AM , wrote: >> >>> Hello Ville-Pekka and list, >>> >>> >>> >>> I believe we are experiencing similar gluster fuse client crashes on 5.3 >>> as mentioned here. This morning I made a post in regards. >>> >>> >>> >>> https://lists.gluster.org/pipermail/gluster-users/2019-March/036036.html >>> >>> >>> >>> Has this "performance.write-behind: off" setting continued to be all you >>> needed to workaround the issue? >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Brandon >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Wed Mar 20 08:39:42 2019 From: revirii at googlemail.com (Hu Bert) Date: Wed, 20 Mar 2019 09:39:42 +0100 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: <2ed171d2-df68-ada3-e0de-53f19cb79520@redhat.com> Message-ID: Hi, i updated our live systems (debian stretch) from 5.3 -> 5.5 this morning; update went fine so far :-) However, on 3 (of 9) clients, the log entries still appear. The upgrade steps for all clients were identical: - install 5.5 (via apt upgrade) - umount volumes - mount volumes Interestingly the log entries still refer to version 5.3: [2019-03-20 08:38:31.880132] W [dict.c:761:dict_ref] (-->/usr/lib/x86_64-linux-gnu/glusterfs/5.3/xlator/performance/quick-read.so(+0x6df4) [0x7f35f214ddf4] -->/usr/lib/x86_64-linux-gnu/glusterfs/5.3/xlator/performance/io-cache.so(+0xa39d) [0x7f35f235f39d] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) [0x7f35f9403a38] ) 11-dict: dict is NULL [Invalid argument] First i thought there could be old processes running/hanging on these 3 clients, but I see that there are 4 processes (for 2 volumes) running on all clients: root 11234 0.0 0.2 1858720 580964 ? Ssl Mar11 7:23 /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0 --lru-limit=0 --process-name fuse --volfile-server=gluster1 --volfile-id=/persistent /data/repository/shared/private root 11323 0.6 2.5 10061536 6788940 ? Ssl Mar11 77:42 /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0 --lru-limit=0 --process-name fuse --volfile-server=gluster1 --volfile-id=/workdata /data/repository/shared/public root 11789 0.0 0.0 874116 11076 ? Ssl 07:32 0:00 /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0 --process-name fuse --volfile-server=gluster1 --volfile-id=/persistent /data/repository/shared/private root 11881 0.0 0.0 874116 10992 ? Ssl 07:32 0:00 /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0 --process-name fuse --volfile-server=gluster1 --volfile-id=/workdata /data/repository/shared/public The first 2 processes are for the "old" mount (with lru-limit=0), the last 2 processes are for the "new" mount. But only 3 clients still have these entries. Systems are running fine, no problems so far. Maybe wrong order of the update? If i look at https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/ - then it would be better to: unmount - upgrade - mount? Best regards, Hubert Am Di., 19. M?rz 2019 um 15:53 Uhr schrieb Artem Russakovskii : > > The flood is indeed fixed for us on 5.5. However, the crashes are not. > > Sincerely, > Artem > > -- > Founder, Android Police, APK Mirror, Illogical Robot LLC > beerpla.net | +ArtemRussakovskii | @ArtemR > > > On Mon, Mar 18, 2019 at 5:41 AM Hu Bert wrote: >> >> Hi Amar, >> >> if you refer to this bug: >> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test >> setup i haven't seen those entries, while copying & deleting a few GBs >> of data. For a final statement we have to wait until i updated our >> live gluster servers - could take place on tuesday or wednesday. >> >> Maybe other users can do an update to 5.4 as well and report back here. >> >> >> Hubert >> >> >> >> Am Mo., 18. M?rz 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan >> : >> > >> > Hi Hu Bert, >> > >> > Appreciate the feedback. Also are the other boiling issues related to logs fixed now? >> > >> > -Amar >> > >> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert wrote: >> >> >> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2 >> >> volumes done. In 'gluster peer status' the peers stay connected during >> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in the >> >> logs. Looks good :-) >> >> >> >> Am Mo., 18. M?rz 2019 um 09:54 Uhr schrieb Hu Bert : >> >> > >> >> > Good morning :-) >> >> > >> >> > for debian the packages are there: >> >> > https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ >> >> > >> >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if there >> >> > are some errors etc. and report back. >> >> > >> >> > btw: no release notes for 5.4 and 5.5 so far? >> >> > https://docs.gluster.org/en/latest/release-notes/ ? >> >> > >> >> > Am Fr., 15. M?rz 2019 um 14:28 Uhr schrieb Shyam Ranganathan >> >> > : >> >> > > >> >> > > We created a 5.5 release tag, and it is under packaging now. It should >> >> > > be packaged and ready for testing early next week and should be released >> >> > > close to mid-week next week. >> >> > > >> >> > > Thanks, >> >> > > Shyam >> >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote: >> >> > > > Wednesday now with no update :-/ >> >> > > > >> >> > > > Sincerely, >> >> > > > Artem >> >> > > > >> >> > > > -- >> >> > > > Founder, Android Police , APK Mirror >> >> > > > , Illogical Robot LLC >> >> > > > beerpla.net | +ArtemRussakovskii >> >> > > > | @ArtemR >> >> > > > >> >> > > > >> >> > > > >> >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii > >> > > > > wrote: >> >> > > > >> >> > > > Hi Amar, >> >> > > > >> >> > > > Any updates on this? I'm still not seeing it in OpenSUSE build >> >> > > > repos. Maybe later today? >> >> > > > >> >> > > > Thanks. >> >> > > > >> >> > > > Sincerely, >> >> > > > Artem >> >> > > > >> >> > > > -- >> >> > > > Founder, Android Police , APK Mirror >> >> > > > , Illogical Robot LLC >> >> > > > beerpla.net | +ArtemRussakovskii >> >> > > > | @ArtemR >> >> > > > >> >> > > > >> >> > > > >> >> > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan >> >> > > > > wrote: >> >> > > > >> >> > > > We are talking days. Not weeks. Considering already it is >> >> > > > Thursday here. 1 more day for tagging, and packaging. May be ok >> >> > > > to expect it on Monday. >> >> > > > >> >> > > > -Amar >> >> > > > >> >> > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii >> >> > > > > wrote: >> >> > > > >> >> > > > Is the next release going to be an imminent hotfix, i.e. >> >> > > > something like today/tomorrow, or are we talking weeks? >> >> > > > >> >> > > > Sincerely, >> >> > > > Artem >> >> > > > >> >> > > > -- >> >> > > > Founder, Android Police , APK >> >> > > > Mirror , Illogical Robot LLC >> >> > > > beerpla.net | +ArtemRussakovskii >> >> > > > | @ArtemR >> >> > > > >> >> > > > >> >> > > > >> >> > > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii >> >> > > > > wrote: >> >> > > > >> >> > > > Ended up downgrading to 5.3 just in case. Peer status >> >> > > > and volume status are OK now. >> >> > > > >> >> > > > zypper install --oldpackage glusterfs-5.3-lp150.100.1 >> >> > > > Loading repository data... >> >> > > > Reading installed packages... >> >> > > > Resolving package dependencies... >> >> > > > >> >> > > > Problem: glusterfs-5.3-lp150.100.1.x86_64 requires >> >> > > > libgfapi0 = 5.3, but this requirement cannot be provided >> >> > > > not installable providers: >> >> > > > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] >> >> > > > Solution 1: Following actions will be done: >> >> > > > downgrade of libgfapi0-5.4-lp150.100.1.x86_64 to >> >> > > > libgfapi0-5.3-lp150.100.1.x86_64 >> >> > > > downgrade of libgfchangelog0-5.4-lp150.100.1.x86_64 to >> >> > > > libgfchangelog0-5.3-lp150.100.1.x86_64 >> >> > > > downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 to >> >> > > > libgfrpc0-5.3-lp150.100.1.x86_64 >> >> > > > downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 to >> >> > > > libgfxdr0-5.3-lp150.100.1.x86_64 >> >> > > > downgrade of libglusterfs0-5.4-lp150.100.1.x86_64 to >> >> > > > libglusterfs0-5.3-lp150.100.1.x86_64 >> >> > > > Solution 2: do not install glusterfs-5.3-lp150.100.1.x86_64 >> >> > > > Solution 3: break glusterfs-5.3-lp150.100.1.x86_64 by >> >> > > > ignoring some of its dependencies >> >> > > > >> >> > > > Choose from above solutions by number or cancel >> >> > > > [1/2/3/c] (c): 1 >> >> > > > Resolving dependencies... >> >> > > > Resolving package dependencies... >> >> > > > >> >> > > > The following 6 packages are going to be downgraded: >> >> > > > glusterfs libgfapi0 libgfchangelog0 libgfrpc0 >> >> > > > libgfxdr0 libglusterfs0 >> >> > > > >> >> > > > 6 packages to downgrade. >> >> > > > >> >> > > > Sincerely, >> >> > > > Artem >> >> > > > >> >> > > > -- >> >> > > > Founder, Android Police >> >> > > > , APK Mirror >> >> > > > , Illogical Robot LLC >> >> > > > beerpla.net | +ArtemRussakovskii >> >> > > > | @ArtemR >> >> > > > >> >> > > > >> >> > > > >> >> > > > On Tue, Mar 5, 2019 at 10:57 AM Artem Russakovskii >> >> > > > > wrote: >> >> > > > >> >> > > > Noticed the same when upgrading from 5.3 to 5.4, as >> >> > > > mentioned. >> >> > > > >> >> > > > I'm confused though. Is actual replication affected, >> >> > > > because the 5.4 server and the 3x 5.3 servers still >> >> > > > show heal info as all 4 connected, and the files >> >> > > > seem to be replicating correctly as well. >> >> > > > >> >> > > > So what's actually affected - just the status >> >> > > > command, or leaving 5.4 on one of the nodes is doing >> >> > > > some damage to the underlying fs? Is it fixable by >> >> > > > tweaking transport.socket.ssl-enabled? Does >> >> > > > upgrading all servers to 5.4 resolve it, or should >> >> > > > we revert back to 5.3? >> >> > > > >> >> > > > Sincerely, >> >> > > > Artem >> >> > > > >> >> > > > -- >> >> > > > Founder, Android Police >> >> > > > , APK Mirror >> >> > > > , Illogical Robot LLC >> >> > > > beerpla.net | >> >> > > > +ArtemRussakovskii >> >> > > > >> >> > > > | @ArtemR >> >> > > > >> >> > > > >> >> > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert >> >> > > > > >> > > > > wrote: >> >> > > > >> >> > > > fyi: did a downgrade 5.4 -> 5.3 and it worked. >> >> > > > all replicas are up and >> >> > > > running. Awaiting updated v5.4. >> >> > > > >> >> > > > thx :-) >> >> > > > >> >> > > > Am Di., 5. M?rz 2019 um 09:26 Uhr schrieb Hari >> >> > > > Gowtham > >> > > > >: >> >> > > > > >> >> > > > > There are plans to revert the patch causing >> >> > > > this error and rebuilt 5.4. >> >> > > > > This should happen faster. the rebuilt 5.4 >> >> > > > should be void of this upgrade issue. >> >> > > > > >> >> > > > > In the meantime, you can use 5.3 for this cluster. >> >> > > > > Downgrading to 5.3 will work if it was just >> >> > > > one node that was upgrade to 5.4 >> >> > > > > and the other nodes are still in 5.3. >> >> > > > > >> >> > > > > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert >> >> > > > > >> > > > > wrote: >> >> > > > > > >> >> > > > > > Hi Hari, >> >> > > > > > >> >> > > > > > thx for the hint. Do you know when this will >> >> > > > be fixed? Is a downgrade >> >> > > > > > 5.4 -> 5.3 a possibility to fix this? >> >> > > > > > >> >> > > > > > Hubert >> >> > > > > > >> >> > > > > > Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb >> >> > > > Hari Gowtham > >> > > > >: >> >> > > > > > > >> >> > > > > > > Hi, >> >> > > > > > > >> >> > > > > > > This is a known issue we are working on. >> >> > > > > > > As the checksum differs between the >> >> > > > updated and non updated node, the >> >> > > > > > > peers are getting rejected. >> >> > > > > > > The bricks aren't coming because of the >> >> > > > same issue. >> >> > > > > > > >> >> > > > > > > More about the issue: >> >> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1685120 >> >> > > > > > > >> >> > > > > > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert >> >> > > > > >> > > > > wrote: >> >> > > > > > > > >> >> > > > > > > > Interestingly: gluster volume status >> >> > > > misses gluster1, while heal >> >> > > > > > > > statistics show gluster1: >> >> > > > > > > > >> >> > > > > > > > gluster volume status workdata >> >> > > > > > > > Status of volume: workdata >> >> > > > > > > > Gluster process >> >> > > > TCP Port RDMA Port Online Pid >> >> > > > > > > > >> >> > > > ------------------------------------------------------------------------------ >> >> > > > > > > > Brick gluster2:/gluster/md4/workdata >> >> > > > 49153 0 Y 1723 >> >> > > > > > > > Brick gluster3:/gluster/md4/workdata >> >> > > > 49153 0 Y 2068 >> >> > > > > > > > Self-heal Daemon on localhost >> >> > > > N/A N/A Y 1732 >> >> > > > > > > > Self-heal Daemon on gluster3 >> >> > > > N/A N/A Y 2077 >> >> > > > > > > > >> >> > > > > > > > vs. >> >> > > > > > > > >> >> > > > > > > > gluster volume heal workdata statistics >> >> > > > heal-count >> >> > > > > > > > Gathering count of entries to be healed >> >> > > > on volume workdata has been successful >> >> > > > > > > > >> >> > > > > > > > Brick gluster1:/gluster/md4/workdata >> >> > > > > > > > Number of entries: 0 >> >> > > > > > > > >> >> > > > > > > > Brick gluster2:/gluster/md4/workdata >> >> > > > > > > > Number of entries: 10745 >> >> > > > > > > > >> >> > > > > > > > Brick gluster3:/gluster/md4/workdata >> >> > > > > > > > Number of entries: 10744 >> >> > > > > > > > >> >> > > > > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr >> >> > > > schrieb Hu Bert > >> > > > >: >> >> > > > > > > > > >> >> > > > > > > > > Hi Miling, >> >> > > > > > > > > >> >> > > > > > > > > well, there are such entries, but >> >> > > > those haven't been a problem during >> >> > > > > > > > > install and the last kernel >> >> > > > update+reboot. The entries look like: >> >> > > > > > > > > >> >> > > > > > > > > PUBLIC_IP gluster2.alpserver.de >> >> > > > gluster2 >> >> > > > > > > > > >> >> > > > > > > > > 192.168.0.50 gluster1 >> >> > > > > > > > > 192.168.0.51 gluster2 >> >> > > > > > > > > 192.168.0.52 gluster3 >> >> > > > > > > > > >> >> > > > > > > > > 'ping gluster2' resolves to LAN IP; I >> >> > > > removed the last entry in the >> >> > > > > > > > > 1st line, did a reboot ... no, didn't >> >> > > > help. From >> >> > > > > > > > > /var/log/glusterfs/glusterd.log >> >> > > > > > > > > on gluster 2: >> >> > > > > > > > > >> >> > > > > > > > > [2019-03-05 07:04:36.188128] E [MSGID: >> >> > > > 106010] >> >> > > > > > > > > >> >> > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] >> >> > > > 0-management: >> >> > > > > > > > > Version of Cksums persistent differ. >> >> > > > local cksum = 3950307018, remote >> >> > > > > > > > > cksum = 455409345 on peer gluster1 >> >> > > > > > > > > [2019-03-05 07:04:36.188314] I [MSGID: >> >> > > > 106493] >> >> > > > > > > > > >> >> > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] >> >> > > > 0-glusterd: >> >> > > > > > > > > Responded to gluster1 (0), ret: 0, >> >> > > > op_ret: -1 >> >> > > > > > > > > >> >> > > > > > > > > Interestingly there are no entries in >> >> > > > the brick logs of the rejected >> >> > > > > > > > > server. Well, not surprising as no >> >> > > > brick process is running. The >> >> > > > > > > > > server gluster1 is still in rejected >> >> > > > state. >> >> > > > > > > > > >> >> > > > > > > > > 'gluster volume start workdata force' >> >> > > > starts the brick process on >> >> > > > > > > > > gluster1, and some heals are happening >> >> > > > on gluster2+3, but via 'gluster >> >> > > > > > > > > volume status workdata' the volumes >> >> > > > still aren't complete. >> >> > > > > > > > > >> >> > > > > > > > > gluster1: >> >> > > > > > > > > >> >> > > > ------------------------------------------------------------------------------ >> >> > > > > > > > > Brick gluster1:/gluster/md4/workdata >> >> > > > 49152 0 Y 2523 >> >> > > > > > > > > Self-heal Daemon on localhost >> >> > > > N/A N/A Y 2549 >> >> > > > > > > > > >> >> > > > > > > > > gluster2: >> >> > > > > > > > > Gluster process >> >> > > > TCP Port RDMA Port Online Pid >> >> > > > > > > > > >> >> > > > ------------------------------------------------------------------------------ >> >> > > > > > > > > Brick gluster2:/gluster/md4/workdata >> >> > > > 49153 0 Y 1723 >> >> > > > > > > > > Brick gluster3:/gluster/md4/workdata >> >> > > > 49153 0 Y 2068 >> >> > > > > > > > > Self-heal Daemon on localhost >> >> > > > N/A N/A Y 1732 >> >> > > > > > > > > Self-heal Daemon on gluster3 >> >> > > > N/A N/A Y 2077 >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > Hubert >> >> > > > > > > > > >> >> > > > > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr >> >> > > > schrieb Milind Changire > >> > > > >: >> >> > > > > > > > > > >> >> > > > > > > > > > There are probably DNS entries or >> >> > > > /etc/hosts entries with the public IP Addresses >> >> > > > that the host names (gluster1, gluster2, >> >> > > > gluster3) are getting resolved to. >> >> > > > > > > > > > /etc/resolv.conf would tell which is >> >> > > > the default domain searched for the node names >> >> > > > and the DNS servers which respond to the queries. >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu >> >> > > > Bert > >> > > > > wrote: >> >> > > > > > > > > >> >> >> > > > > > > > > >> Good morning, >> >> > > > > > > > > >> >> >> > > > > > > > > >> i have a replicate 3 setup with 2 >> >> > > > volumes, running on version 5.3 on >> >> > > > > > > > > >> debian stretch. This morning i >> >> > > > upgraded one server to version 5.4 and >> >> > > > > > > > > >> rebooted the machine; after the >> >> > > > restart i noticed that: >> >> > > > > > > > > >> >> >> > > > > > > > > >> - no brick process is running >> >> > > > > > > > > >> - gluster volume status only shows >> >> > > > the server itself: >> >> > > > > > > > > >> gluster volume status workdata >> >> > > > > > > > > >> Status of volume: workdata >> >> > > > > > > > > >> Gluster process >> >> > > > TCP Port RDMA Port Online Pid >> >> > > > > > > > > >> >> >> > > > ------------------------------------------------------------------------------ >> >> > > > > > > > > >> Brick >> >> > > > gluster1:/gluster/md4/workdata N/A >> >> > > > N/A N N/A >> >> > > > > > > > > >> NFS Server on localhost >> >> > > > N/A N/A N N/A >> >> > > > > > > > > >> >> >> > > > > > > > > >> - gluster peer status on the server >> >> > > > > > > > > >> gluster peer status >> >> > > > > > > > > >> Number of Peers: 2 >> >> > > > > > > > > >> >> >> > > > > > > > > >> Hostname: gluster3 >> >> > > > > > > > > >> Uuid: >> >> > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a >> >> > > > > > > > > >> State: Peer Rejected (Connected) >> >> > > > > > > > > >> >> >> > > > > > > > > >> Hostname: gluster2 >> >> > > > > > > > > >> Uuid: >> >> > > > 162fea82-406a-4f51-81a3-e90235d8da27 >> >> > > > > > > > > >> State: Peer Rejected (Connected) >> >> > > > > > > > > >> >> >> > > > > > > > > >> - gluster peer status on the other >> >> > > > 2 servers: >> >> > > > > > > > > >> gluster peer status >> >> > > > > > > > > >> Number of Peers: 2 >> >> > > > > > > > > >> >> >> > > > > > > > > >> Hostname: gluster1 >> >> > > > > > > > > >> Uuid: >> >> > > > 9a360776-7b58-49ae-831e-a0ce4e4afbef >> >> > > > > > > > > >> State: Peer Rejected (Connected) >> >> > > > > > > > > >> >> >> > > > > > > > > >> Hostname: gluster3 >> >> > > > > > > > > >> Uuid: >> >> > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a >> >> > > > > > > > > >> State: Peer in Cluster (Connected) >> >> > > > > > > > > >> >> >> > > > > > > > > >> I noticed that, in the brick logs, >> >> > > > i see that the public IP is used >> >> > > > > > > > > >> instead of the LAN IP. brick logs >> >> > > > from one of the volumes: >> >> > > > > > > > > >> >> >> > > > > > > > > >> rejected node: >> >> > > > https://pastebin.com/qkpj10Sd >> >> > > > > > > > > >> connected nodes: >> >> > > > https://pastebin.com/8SxVVYFV >> >> > > > > > > > > >> >> >> > > > > > > > > >> Why is the public IP suddenly used >> >> > > > instead of the LAN IP? Killing all >> >> > > > > > > > > >> gluster processes and rebooting >> >> > > > (again) didn't help. >> >> > > > > > > > > >> >> >> > > > > > > > > >> >> >> > > > > > > > > >> Thx, >> >> > > > > > > > > >> Hubert >> >> > > > > > > > > >> >> >> > > > _______________________________________________ >> >> > > > > > > > > >> Gluster-users mailing list >> >> > > > > > > > > >> Gluster-users at gluster.org >> >> > > > >> >> > > > > > > > > >> >> >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > -- >> >> > > > > > > > > > Milind >> >> > > > > > > > > > >> >> > > > > > > > >> >> > > > _______________________________________________ >> >> > > > > > > > Gluster-users mailing list >> >> > > > > > > > Gluster-users at gluster.org >> >> > > > >> >> > > > > > > > >> >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > -- >> >> > > > > > > Regards, >> >> > > > > > > Hari Gowtham. >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > -- >> >> > > > > Regards, >> >> > > > > Hari Gowtham. >> >> > > > _______________________________________________ >> >> > > > Gluster-users mailing list >> >> > > > Gluster-users at gluster.org >> >> > > > >> >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > > >> >> > > > _______________________________________________ >> >> > > > Gluster-users mailing list >> >> > > > Gluster-users at gluster.org >> >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > > >> >> > > > >> >> > > > >> >> > > > -- >> >> > > > Amar Tumballi (amarts) >> >> > > > >> >> > > > >> >> > > > _______________________________________________ >> >> > > > Gluster-users mailing list >> >> > > > Gluster-users at gluster.org >> >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > > >> >> > > _______________________________________________ >> >> > > Gluster-users mailing list >> >> > > Gluster-users at gluster.org >> >> > > https://lists.gluster.org/mailman/listinfo/gluster-users >> > >> > >> > >> > -- >> > Amar Tumballi (amarts) From ville-pekka.vainio at csc.fi Wed Mar 20 11:07:27 2019 From: ville-pekka.vainio at csc.fi (Ville-Pekka Vainio) Date: Wed, 20 Mar 2019 13:07:27 +0200 Subject: [Gluster-users] Constant fuse client crashes "fixed" by setting performance.write-behind: off. Any hope for a 4.1.8 release? In-Reply-To: <153001d4ddc1$80a1eff0$81e5cfd0$@thinkhuge.net> References: <153001d4ddc1$80a1eff0$81e5cfd0$@thinkhuge.net> Message-ID: <62B2AF20-811A-445E-93D8-E3810820600E@csc.fi> > On 18 Mar 2019, at 21.33, brandon at thinkhuge.net wrote: > > Has this "performance.write-behind: off" setting continued to be all you needed to workaround the issue? For us, yes. Before using that setting we mainly saw the crashes when trying to do directory listings of huge directories (with 7 000 to 10 000 sub-directories). A question to the gluster developers: According to https://www.gluster.org/release-schedule/ the 4.1 branch should still be maintained. However, a release with these fixes has not been made. Will there be a release at some point or should we start preparing for an upgrade to gluster 5.x? Regards, Ville-Pekka Vainio From schandinp at gmail.com Wed Mar 20 14:38:15 2019 From: schandinp at gmail.com (Pablo Schandin) Date: Wed, 20 Mar 2019 11:38:15 -0300 Subject: [Gluster-users] / - is in split-brain In-Reply-To: References: Message-ID: Here is the output root at gluster-gu1:~# gluster volume info gv1 > > Volume Name: gv1 > Type: Replicate > Volume ID: 3bb5023c-93bb-433e-8b95-56cfca82b68a > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: gluster-gu1.xcade.net:/mnt/gv_gu1/brick > Brick2: gluster-gu2.xcade.net:/mnt/gv_gu1/brick > Options Reconfigured: > performance.client-io-threads: off > transport.address-family: inet > nfs.disable: on > performance.readdir-ahead: on > diagnostics.brick-log-level: WARNING > diagnostics.client-log-level: WARNING El mi?., 20 mar. 2019 a las 0:16, Nithya Balachandran () escribi?: > Hi, > > What is the output of the gluster volume info ? > > Thanks, > Nithya > > On Wed, 20 Mar 2019 at 01:58, Pablo Schandin wrote: > >> Hello all! >> >> I had a volume with only a local brick running vms and recently added a >> second (remote) brick to the volume. After adding the brick, the heal >> command reported the following: >> >> root at gluster-gu1:~# gluster volume heal gv1 info >>> Brick gluster-gu1:/mnt/gv_gu1/brick >>> / - Is in split-brain >>> Status: Connected >>> Number of entries: 1 >>> Brick gluster-gu2:/mnt/gv_gu1/brick >>> Status: Connected >>> Number of entries: 0 >> >> >> All other files healed correctly. I noticed that in the xfs of the brick >> I see a directory named localadmin but when I ls the gluster volume >> mountpoint I got an error and a lot of ??? >> >> root at gluster-gu1:/var/lib/vmImages_gu1# ll >>> ls: cannot access 'localadmin': No data available >>> d????????? ? ? ? ? ? localadmin/ >> >> >> This goes for both servers that have that volume gv1 mounted. Both see >> that directory like that. While in the xfs brick >> /mnt/gv_gu1/brick/localadmin is an accessible directory. >> >> root at gluster-gu1:/mnt/gv_gu1/brick/localadmin# ll >>> total 4 >>> drwxr-xr-x 2 localadmin root 6 Mar 7 09:40 ./ >>> drwxr-xr-x 6 root root 4096 Mar 7 09:40 ../ >> >> >> When I added the second brick to the volume, this localadmin folder was >> not replicated there I imagine because of this strange behavior. >> >> Can someone help me with this? >> Thanks! >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauryam at gmail.com Wed Mar 20 17:17:13 2019 From: mauryam at gmail.com (Maurya M) Date: Wed, 20 Mar 2019 22:47:13 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' Message-ID: Hi all, Have setup a 3 master nodes - 3 slave nodes (gluster 4.1) for geo-replication, but once have the geo-replication configure the status is always on "Created', even after have force start the session. On close inspect of the logs on the master node seeing this error: "E [syncdutils(monitor):801:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 azureuser at xxxxx.xxxx..xxx. gluster --xml --remote-host=localhost volume info vol_a5ae34341a873c043c99a938adcb5b5781 error=255" Any ideas what is issue? thanks, Maurya -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim.kinney at gmail.com Wed Mar 20 17:27:50 2019 From: jim.kinney at gmail.com (Jim Kinney) Date: Wed, 20 Mar 2019 13:27:50 -0400 Subject: [Gluster-users] .glusterfs GFID links Message-ID: I have half a zillion broken symlinks in the .glusterfs folder on 3 of 11 volumes. It doesn't make sense to me that a GFID should linklike some of the ones below: /data/glusterfs/home/brick/brick/.glusterfs/9e/75/9e75a16f-fe4f-411e- 937d-1a6c4758fd0e -> ../../c7/6f/c76ff719-dde6-41f5-a327- 7e13fdf6ec4b/bundle2/data/glusterfs/home/brick/brick/.glusterfs/9e/75/9 e7594de-c68a-44d0-a959-46ceb628c28b -> SSL_CTX_set0_CA_list.html/data/glusterfs/home/brick/brick/.glusterfs/9e /75/9e75a40b-990f-4dd5-8aaa-9996da0fbdf4 -> ../../46/93/4693face-affb- 4647-bcd0- 919bccc82c42/labeled_tensor/data/glusterfs/home/brick/brick/.glusterfs/ 9e/75/9e75eb33-2941-461f-aa50-394a8e9cbba1 -> libtiff.so.5.2.6 The links are pointing to file names in the .glusterfs directories. Shouldn't all of these be symlinks to other GFID entries and not contain text like "bundle2" and "labeled_tensor"? -- James P. Kinney III Every time you stop a school, you will have to build a jail. What you gain at one end you lose at the other. It's like feeding a dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark Twain http://heretothereideas.blogspot.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sunkumar at redhat.com Wed Mar 20 17:37:00 2019 From: sunkumar at redhat.com (Sunny Kumar) Date: Wed, 20 Mar 2019 23:07:00 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: Message-ID: Hi Maurya, I guess you missed last trick to distribute keys in slave node. I see this is non-root geo-rep setup so please try this: Run the following command as root in any one of Slave node. /usr/local/libexec/glusterfs/set_geo_rep_pem_keys.sh - Sunny On Wed, Mar 20, 2019 at 10:47 PM Maurya M wrote: > > Hi all, > Have setup a 3 master nodes - 3 slave nodes (gluster 4.1) for geo-replication, but once have the geo-replication configure the status is always on "Created', > even after have force start the session. > > On close inspect of the logs on the master node seeing this error: > > "E [syncdutils(monitor):801:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 azureuser at xxxxx.xxxx..xxx. gluster --xml --remote-host=localhost volume info vol_a5ae34341a873c043c99a938adcb5b5781 error=255" > > Any ideas what is issue? > > thanks, > Maurya > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From archon810 at gmail.com Wed Mar 20 19:57:59 2019 From: archon810 at gmail.com (Artem Russakovskii) Date: Wed, 20 Mar 2019 12:57:59 -0700 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: <2ed171d2-df68-ada3-e0de-53f19cb79520@redhat.com> Message-ID: Amar, I see debuginfo packages now and have installed them. I'm available via Skype as before, just ping me there. Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | +ArtemRussakovskii | @ArtemR On Tue, Mar 19, 2019 at 10:46 PM Amar Tumballi Suryanarayan < atumball at redhat.com> wrote: > > > On Wed, Mar 20, 2019 at 9:52 AM Artem Russakovskii > wrote: > >> Can I roll back performance.write-behind: off and lru-limit=0 then? I'm >> waiting for the debug packages to be available for OpenSUSE, then I can >> help Amar with another debug session. >> >> > Yes, the write-behind issue is now fixed. You can enable write-behind. > Also remove lru-limit=0, so you can also utilize the benefit of garbage > collection introduced in 5.4 > > Lets get to fixing the problem once the debuginfo packages are available. > > > >> In the meantime, have you had time to set up 1x4 replicate testing? I was >> told you were only testing 1x3, and it's the 4th brick that may be causing >> the crash, which is consistent with this whole time only 1 of 4 bricks >> constantly crashing. The other 3 have been rock solid. I'm hoping you >> could >> find the issue without a debug session this way. >> >> > That is my gut feeling still. Added a basic test case with 4 bricks, > https://review.gluster.org/#/c/glusterfs/+/22328/. But I think this > particular issue is happening only on certain pattern of access for 1x4 > setup. Lets get to the root of it once we have debuginfo packages for Suse > builds. > > -Amar > > Sincerely, >> Artem >> >> -- >> Founder, Android Police , APK Mirror >> , Illogical Robot LLC >> beerpla.net | +ArtemRussakovskii >> | @ArtemR >> >> >> >> On Tue, Mar 19, 2019 at 8:27 PM Nithya Balachandran >> wrote: >> >> > Hi Artem, >> > >> > I think you are running into a different crash. The ones reported which >> > were prevented by turning off write-behind are now fixed. >> > We will need to look into the one you are seeing to see why it is >> > happening. >> > >> > Regards, >> > Nithya >> > >> > >> > On Tue, 19 Mar 2019 at 20:25, Artem Russakovskii >> > wrote: >> > >> >> The flood is indeed fixed for us on 5.5. However, the crashes are not. >> >> >> >> Sincerely, >> >> Artem >> >> >> >> -- >> >> Founder, Android Police , APK Mirror >> >> , Illogical Robot LLC >> >> beerpla.net | +ArtemRussakovskii >> >> | @ArtemR >> >> >> >> >> >> >> >> On Mon, Mar 18, 2019 at 5:41 AM Hu Bert >> wrote: >> >> >> >>> Hi Amar, >> >>> >> >>> if you refer to this bug: >> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test >> >>> setup i haven't seen those entries, while copying & deleting a few GBs >> >>> of data. For a final statement we have to wait until i updated our >> >>> live gluster servers - could take place on tuesday or wednesday. >> >>> >> >>> Maybe other users can do an update to 5.4 as well and report back >> here. >> >>> >> >>> >> >>> Hubert >> >>> >> >>> >> >>> >> >>> Am Mo., 18. M?rz 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan >> >>> : >> >>> > >> >>> > Hi Hu Bert, >> >>> > >> >>> > Appreciate the feedback. Also are the other boiling issues related >> to >> >>> logs fixed now? >> >>> > >> >>> > -Amar >> >>> > >> >>> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert >> >>> wrote: >> >>> >> >> >>> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2 >> >>> >> volumes done. In 'gluster peer status' the peers stay connected >> during >> >>> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in >> the >> >>> >> logs. Looks good :-) >> >>> >> >> >>> >> Am Mo., 18. M?rz 2019 um 09:54 Uhr schrieb Hu Bert < >> >>> revirii at googlemail.com>: >> >>> >> > >> >>> >> > Good morning :-) >> >>> >> > >> >>> >> > for debian the packages are there: >> >>> >> > >> >>> >> https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ >> >>> >> > >> >>> >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if >> >>> there >> >>> >> > are some errors etc. and report back. >> >>> >> > >> >>> >> > btw: no release notes for 5.4 and 5.5 so far? >> >>> >> > https://docs.gluster.org/en/latest/release-notes/ ? >> >>> >> > >> >>> >> > Am Fr., 15. M?rz 2019 um 14:28 Uhr schrieb Shyam Ranganathan >> >>> >> > : >> >>> >> > > >> >>> >> > > We created a 5.5 release tag, and it is under packaging now. It >> >>> should >> >>> >> > > be packaged and ready for testing early next week and should be >> >>> released >> >>> >> > > close to mid-week next week. >> >>> >> > > >> >>> >> > > Thanks, >> >>> >> > > Shyam >> >>> >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote: >> >>> >> > > > Wednesday now with no update :-/ >> >>> >> > > > >> >>> >> > > > Sincerely, >> >>> >> > > > Artem >> >>> >> > > > >> >>> >> > > > -- >> >>> >> > > > Founder, Android Police , APK >> >>> Mirror >> >>> >> > > > , Illogical Robot LLC >> >>> >> > > > beerpla.net | +ArtemRussakovskii >> >>> >> > > > | @ArtemR >> >>> >> > > > >> >>> >> > > > >> >>> >> > > > >> >>> >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii < >> >>> archon810 at gmail.com >> >>> >> > > > > wrote: >> >>> >> > > > >> >>> >> > > > Hi Amar, >> >>> >> > > > >> >>> >> > > > Any updates on this? I'm still not seeing it in OpenSUSE >> >>> build >> >>> >> > > > repos. Maybe later today? >> >>> >> > > > >> >>> >> > > > Thanks. >> >>> >> > > > >> >>> >> > > > Sincerely, >> >>> >> > > > Artem >> >>> >> > > > >> >>> >> > > > -- >> >>> >> > > > Founder, Android Police , >> >>> APK Mirror >> >>> >> > > > , Illogical Robot LLC >> >>> >> > > > beerpla.net | +ArtemRussakovskii >> >>> >> > > > | @ArtemR >> >>> >> > > > >> >>> >> > > > >> >>> >> > > > >> >>> >> > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi >> Suryanarayan >> >>> >> > > > > >> wrote: >> >>> >> > > > >> >>> >> > > > We are talking days. Not weeks. Considering already >> it >> >>> is >> >>> >> > > > Thursday here. 1 more day for tagging, and packaging. >> >>> May be ok >> >>> >> > > > to expect it on Monday. >> >>> >> > > > >> >>> >> > > > -Amar >> >>> >> > > > >> >>> >> > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii >> >>> >> > > > > >> >>> wrote: >> >>> >> > > > >> >>> >> > > > Is the next release going to be an imminent >> hotfix, >> >>> i.e. >> >>> >> > > > something like today/tomorrow, or are we talking >> >>> weeks? >> >>> >> > > > >> >>> >> > > > Sincerely, >> >>> >> > > > Artem >> >>> >> > > > >> >>> >> > > > -- >> >>> >> > > > Founder, Android Police < >> >>> http://www.androidpolice.com>, APK >> >>> >> > > > Mirror , Illogical >> >>> Robot LLC >> >>> >> > > > beerpla.net | >> >>> +ArtemRussakovskii >> >>> >> > > > | >> >>> @ArtemR >> >>> >> > > > >> >>> >> > > > >> >>> >> > > > >> >>> >> > > > On Tue, Mar 5, 2019 at 11:09 AM Artem >> Russakovskii >> >>> >> > > > > >> >> >>> wrote: >> >>> >> > > > >> >>> >> > > > Ended up downgrading to 5.3 just in case. >> Peer >> >>> status >> >>> >> > > > and volume status are OK now. >> >>> >> > > > >> >>> >> > > > zypper install --oldpackage >> >>> glusterfs-5.3-lp150.100.1 >> >>> >> > > > Loading repository data... >> >>> >> > > > Reading installed packages... >> >>> >> > > > Resolving package dependencies... >> >>> >> > > > >> >>> >> > > > Problem: glusterfs-5.3-lp150.100.1.x86_64 >> >>> requires >> >>> >> > > > libgfapi0 = 5.3, but this requirement cannot >> be >> >>> provided >> >>> >> > > > not installable providers: >> >>> >> > > > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] >> >>> >> > > > Solution 1: Following actions will be done: >> >>> >> > > > downgrade of >> libgfapi0-5.4-lp150.100.1.x86_64 >> >>> to >> >>> >> > > > libgfapi0-5.3-lp150.100.1.x86_64 >> >>> >> > > > downgrade of >> >>> libgfchangelog0-5.4-lp150.100.1.x86_64 to >> >>> >> > > > libgfchangelog0-5.3-lp150.100.1.x86_64 >> >>> >> > > > downgrade of >> libgfrpc0-5.4-lp150.100.1.x86_64 >> >>> to >> >>> >> > > > libgfrpc0-5.3-lp150.100.1.x86_64 >> >>> >> > > > downgrade of >> libgfxdr0-5.4-lp150.100.1.x86_64 >> >>> to >> >>> >> > > > libgfxdr0-5.3-lp150.100.1.x86_64 >> >>> >> > > > downgrade of >> >>> libglusterfs0-5.4-lp150.100.1.x86_64 to >> >>> >> > > > libglusterfs0-5.3-lp150.100.1.x86_64 >> >>> >> > > > Solution 2: do not install >> >>> glusterfs-5.3-lp150.100.1.x86_64 >> >>> >> > > > Solution 3: break >> >>> glusterfs-5.3-lp150.100.1.x86_64 by >> >>> >> > > > ignoring some of its dependencies >> >>> >> > > > >> >>> >> > > > Choose from above solutions by number or >> cancel >> >>> >> > > > [1/2/3/c] (c): 1 >> >>> >> > > > Resolving dependencies... >> >>> >> > > > Resolving package dependencies... >> >>> >> > > > >> >>> >> > > > The following 6 packages are going to be >> >>> downgraded: >> >>> >> > > > glusterfs libgfapi0 libgfchangelog0 >> libgfrpc0 >> >>> >> > > > libgfxdr0 libglusterfs0 >> >>> >> > > > >> >>> >> > > > 6 packages to downgrade. >> >>> >> > > > >> >>> >> > > > Sincerely, >> >>> >> > > > Artem >> >>> >> > > > >> >>> >> > > > -- >> >>> >> > > > Founder, Android Police >> >>> >> > > > , APK Mirror >> >>> >> > > > , Illogical Robot >> >>> LLC >> >>> >> > > > beerpla.net | >> >>> +ArtemRussakovskii >> >>> >> > > > >> | >> >>> @ArtemR >> >>> >> > > > >> >>> >> > > > >> >>> >> > > > >> >>> >> > > > On Tue, Mar 5, 2019 at 10:57 AM Artem >> >>> Russakovskii >> >>> >> > > > > >>> archon810 at gmail.com>> wrote: >> >>> >> > > > >> >>> >> > > > Noticed the same when upgrading from 5.3 >> to >> >>> 5.4, as >> >>> >> > > > mentioned. >> >>> >> > > > >> >>> >> > > > I'm confused though. Is actual >> replication >> >>> affected, >> >>> >> > > > because the 5.4 server and the 3x 5.3 >> >>> servers still >> >>> >> > > > show heal info as all 4 connected, and >> the >> >>> files >> >>> >> > > > seem to be replicating correctly as well. >> >>> >> > > > >> >>> >> > > > So what's actually affected - just the >> >>> status >> >>> >> > > > command, or leaving 5.4 on one of the >> nodes >> >>> is doing >> >>> >> > > > some damage to the underlying fs? Is it >> >>> fixable by >> >>> >> > > > tweaking transport.socket.ssl-enabled? >> Does >> >>> >> > > > upgrading all servers to 5.4 resolve it, >> or >> >>> should >> >>> >> > > > we revert back to 5.3? >> >>> >> > > > >> >>> >> > > > Sincerely, >> >>> >> > > > Artem >> >>> >> > > > >> >>> >> > > > -- >> >>> >> > > > Founder, Android Police >> >>> >> > > > , APK >> Mirror >> >>> >> > > > , Illogical >> >>> Robot LLC >> >>> >> > > > beerpla.net | >> >>> >> > > > +ArtemRussakovskii >> >>> >> > > > < >> https://plus.google.com/+ArtemRussakovskii >> >>> > >> >>> >> > > > | @ArtemR >> >>> >> > > > >> >>> >> > > > >> >>> >> > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert >> >>> >> > > > > >>> >> > > > > wrote: >> >>> >> > > > >> >>> >> > > > fyi: did a downgrade 5.4 -> 5.3 and >> it >> >>> worked. >> >>> >> > > > all replicas are up and >> >>> >> > > > running. Awaiting updated v5.4. >> >>> >> > > > >> >>> >> > > > thx :-) >> >>> >> > > > >> >>> >> > > > Am Di., 5. M?rz 2019 um 09:26 Uhr >> >>> schrieb Hari >> >>> >> > > > Gowtham > >>> >> > > > >: >> >>> >> > > > > >> >>> >> > > > > There are plans to revert the patch >> >>> causing >> >>> >> > > > this error and rebuilt 5.4. >> >>> >> > > > > This should happen faster. the >> >>> rebuilt 5.4 >> >>> >> > > > should be void of this upgrade issue. >> >>> >> > > > > >> >>> >> > > > > In the meantime, you can use 5.3 >> for >> >>> this cluster. >> >>> >> > > > > Downgrading to 5.3 will work if it >> >>> was just >> >>> >> > > > one node that was upgrade to 5.4 >> >>> >> > > > > and the other nodes are still in >> 5.3 > > > > -- > Amar Tumballi (amarts) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xiubli at redhat.com Thu Mar 21 03:29:30 2019 From: xiubli at redhat.com (Xiubo Li) Date: Thu, 21 Mar 2019 11:29:30 +0800 Subject: [Gluster-users] Network Block device (NBD) on top of glusterfs Message-ID: All, I am one of the contributor forgluster-block [1] project, and also I contribute to linux kernel andopen-iscsi project.[2] NBD was around for some time, but in recent time, linux kernel?s Network Block Device (NBD) is enhanced and made to work with more devices and also the option to integrate with netlink is added. So, I tried to provide a glusterfs client based NBD driver recently. Please refergithub issue #633 [3], and good news is I have a working code, with most basic things @nbd-runner project [4]. While this email is about announcing the project, and asking for more collaboration, I would also like to discuss more about the placement of the project itself. Currently nbd-runner project is expected to be shared by our friends at Ceph project too, to provide NBD driver for Ceph. I have personally worked with some of them closely while contributing to open-iSCSI project, and we would like to take this project to great success. Now few questions: 1. Can I continue to usehttp://github.com/gluster/nbd-runneras home for this project, even if its shared by other filesystem projects? * I personally am fine with this. 2. Should there be a separate organization for this repo? * While it may make sense in future, for now, I am not planning to start any new thing? It would be great if we have some consensus on this soon as nbd-runner is a new repository. If there are no concerns, I will continue to contribute to the existing repository. Regards, Xiubo Li (@lxbsz) [1] -https://github.com/gluster/gluster-block [2] -https://github.com/open-iscsi [3] -https://github.com/gluster/glusterfs/issues/633 [4] -https://github.com/gluster/nbd-runner -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Thu Mar 21 03:43:09 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Thu, 21 Mar 2019 09:13:09 +0530 Subject: [Gluster-users] .glusterfs GFID links In-Reply-To: References: Message-ID: On Wed, 20 Mar 2019 at 22:59, Jim Kinney wrote: > I have half a zillion broken symlinks in the .glusterfs folder on 3 of 11 > volumes. It doesn't make sense to me that a GFID should linklike some of > the ones below: > > /data/glusterfs/home/brick/brick/.glusterfs/9e/75/9e75a16f-fe4f-411e-937d-1a6c4758fd0e > -> ../../c7/6f/c76ff719-dde6-41f5-a327-7e13fdf6ec4b/bundle2 > /data/glusterfs/home/brick/brick/.glusterfs/9e/75/9e7594de-c68a-44d0-a959-46ceb628c28b > -> SSL_CTX_set0_CA_list.html > /data/glusterfs/home/brick/brick/.glusterfs/9e/75/9e75a40b-990f-4dd5-8aaa-9996da0fbdf4 > -> ../../46/93/4693face-affb-4647-bcd0-919bccc82c42/labeled_tensor > /data/glusterfs/home/brick/brick/.glusterfs/9e/75/9e75eb33-2941-461f-aa50-394a8e9cbba1 > -> libtiff.so.5.2.6 > > The links are pointing to file names in the .glusterfs directories. > Shouldn't all of these be symlinks to other GFID entries and not contain > text like "bundle2" and "labeled_tensor"? > > > Hi, These bundle2 and labeled_tensor links are fine - they are gfids of directories, not files, so the symlinks created will point to the parent handle+dirname. The ones pointing to SSL_CTX_set0_CA_list.html and libtiff.so.5.2.6 look odd. They should not be symlinks . File handles should be hardlinks. Regards, Nithya -- > > James P. Kinney III Every time you stop a school, you will have to build a > jail. What you gain at one end you lose at the other. It's like feeding a > dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark > Twain http://heretothereideas.blogspot.com/ > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauryam at gmail.com Thu Mar 21 05:58:13 2019 From: mauryam at gmail.com (Maurya M) Date: Thu, 21 Mar 2019 11:28:13 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: Message-ID: Hi Sunil, I did run the on the slave node : /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781 getting this message "/home/azureuser/common_secret.pem.pub not present. Please run geo-replication command on master with push-pem option to generate the file" So went back and created the session again, no change, so manually copied the common_secret.pem.pub to /home/azureuser/ but still the set_geo_rep_pem_keys.sh is looking the pem file in different name : COMMON_SECRET_PEM_PUB=${master_vol}_${slave_vol}_common_secret.pem.pub , change the name of pem , ran the command again : /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781 Successfully copied file. Command executed successfully. - went back and created the session , start the geo-replication , still seeing the same error in logs. Any ideas ? thanks, Maurya On Wed, Mar 20, 2019 at 11:07 PM Sunny Kumar wrote: > Hi Maurya, > > I guess you missed last trick to distribute keys in slave node. I see > this is non-root geo-rep setup so please try this: > > > Run the following command as root in any one of Slave node. > > /usr/local/libexec/glusterfs/set_geo_rep_pem_keys.sh > > > - Sunny > > On Wed, Mar 20, 2019 at 10:47 PM Maurya M wrote: > > > > Hi all, > > Have setup a 3 master nodes - 3 slave nodes (gluster 4.1) for > geo-replication, but once have the geo-replication configure the status is > always on "Created', > > even after have force start the session. > > > > On close inspect of the logs on the master node seeing this error: > > > > "E [syncdutils(monitor):801:errlog] Popen: command returned error > cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/secret.pem -p 22 azureuser at xxxxx.xxxx..xxx. > gluster --xml --remote-host=localhost volume info > vol_a5ae34341a873c043c99a938adcb5b5781 error=255" > > > > Any ideas what is issue? > > > > thanks, > > Maurya > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Thu Mar 21 06:09:46 2019 From: revirii at googlemail.com (Hu Bert) Date: Thu, 21 Mar 2019 07:09:46 +0100 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: <2ed171d2-df68-ada3-e0de-53f19cb79520@redhat.com> Message-ID: Good morning, looks like on 2 clients there was an automatic cleanup: [2019-03-21 05:04:52.857127] I [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating unmount of /data/repository/shared/public [2019-03-21 05:04:52.857507] W [glusterfsd.c:1500:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4) [0x7fa062cf64a4] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+ 0xfd) [0x56223e5b291d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x56223e5b2774] ) 0-: received signum (15), shutting down [2019-03-21 05:04:52.857532] I [fuse-bridge.c:5914:fini] 0-fuse: Unmounting '/data/repository/shared/public'. [2019-03-21 05:04:52.857547] I [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse connection to '/data/repository/shared/public'. On the 3rd client i unmounted both volumes, killed the 4 processes and mounted the volumes again. Now no more "dict is NULL" messages. Fine :-) Best regards, Hubert Am Mi., 20. M?rz 2019 um 09:39 Uhr schrieb Hu Bert : > > Hi, > > i updated our live systems (debian stretch) from 5.3 -> 5.5 this > morning; update went fine so far :-) > > However, on 3 (of 9) clients, the log entries still appear. The > upgrade steps for all clients were identical: > > - install 5.5 (via apt upgrade) > - umount volumes > - mount volumes > > Interestingly the log entries still refer to version 5.3: > > [2019-03-20 08:38:31.880132] W [dict.c:761:dict_ref] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/5.3/xlator/performance/quick-read.so(+0x6df4) > [0x7f35f214ddf4] > -->/usr/lib/x86_64-linux-gnu/glusterfs/5.3/xlator/performance/io-cache.so(+0xa39d) > [0x7f35f235f39d] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) > [0x7f35f9403a38] ) 11-dict: dict is NULL [Invalid argument] > > First i thought there could be old processes running/hanging on these > 3 clients, but I see that there are 4 processes (for 2 volumes) > running on all clients: > > root 11234 0.0 0.2 1858720 580964 ? Ssl Mar11 7:23 > /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0 > --lru-limit=0 --process-name fuse --volfile-server=gluster1 > --volfile-id=/persistent /data/repository/shared/private > root 11323 0.6 2.5 10061536 6788940 ? Ssl Mar11 77:42 > /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0 > --lru-limit=0 --process-name fuse --volfile-server=gluster1 > --volfile-id=/workdata /data/repository/shared/public > root 11789 0.0 0.0 874116 11076 ? Ssl 07:32 0:00 > /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0 > --process-name fuse --volfile-server=gluster1 --volfile-id=/persistent > /data/repository/shared/private > root 11881 0.0 0.0 874116 10992 ? Ssl 07:32 0:00 > /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0 > --process-name fuse --volfile-server=gluster1 --volfile-id=/workdata > /data/repository/shared/public > > The first 2 processes are for the "old" mount (with lru-limit=0), the > last 2 processes are for the "new" mount. But only 3 clients still > have these entries. Systems are running fine, no problems so far. > Maybe wrong order of the update? If i look at > https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/ - > then it would be better to: unmount - upgrade - mount? > > > Best regards, > Hubert > > Am Di., 19. M?rz 2019 um 15:53 Uhr schrieb Artem Russakovskii > : > > > > The flood is indeed fixed for us on 5.5. However, the crashes are not. > > > > Sincerely, > > Artem > > > > -- > > Founder, Android Police, APK Mirror, Illogical Robot LLC > > beerpla.net | +ArtemRussakovskii | @ArtemR > > > > > > On Mon, Mar 18, 2019 at 5:41 AM Hu Bert wrote: > >> > >> Hi Amar, > >> > >> if you refer to this bug: > >> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test > >> setup i haven't seen those entries, while copying & deleting a few GBs > >> of data. For a final statement we have to wait until i updated our > >> live gluster servers - could take place on tuesday or wednesday. > >> > >> Maybe other users can do an update to 5.4 as well and report back here. > >> > >> > >> Hubert > >> > >> > >> > >> Am Mo., 18. M?rz 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan > >> : > >> > > >> > Hi Hu Bert, > >> > > >> > Appreciate the feedback. Also are the other boiling issues related to logs fixed now? > >> > > >> > -Amar > >> > > >> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert wrote: > >> >> > >> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2 > >> >> volumes done. In 'gluster peer status' the peers stay connected during > >> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in the > >> >> logs. Looks good :-) > >> >> > >> >> Am Mo., 18. M?rz 2019 um 09:54 Uhr schrieb Hu Bert : > >> >> > > >> >> > Good morning :-) > >> >> > > >> >> > for debian the packages are there: > >> >> > https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ > >> >> > > >> >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if there > >> >> > are some errors etc. and report back. > >> >> > > >> >> > btw: no release notes for 5.4 and 5.5 so far? > >> >> > https://docs.gluster.org/en/latest/release-notes/ ? > >> >> > > >> >> > Am Fr., 15. M?rz 2019 um 14:28 Uhr schrieb Shyam Ranganathan > >> >> > : > >> >> > > > >> >> > > We created a 5.5 release tag, and it is under packaging now. It should > >> >> > > be packaged and ready for testing early next week and should be released > >> >> > > close to mid-week next week. > >> >> > > > >> >> > > Thanks, > >> >> > > Shyam > >> >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote: > >> >> > > > Wednesday now with no update :-/ > >> >> > > > > >> >> > > > Sincerely, > >> >> > > > Artem > >> >> > > > > >> >> > > > -- > >> >> > > > Founder, Android Police , APK Mirror > >> >> > > > , Illogical Robot LLC > >> >> > > > beerpla.net | +ArtemRussakovskii > >> >> > > > | @ArtemR > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii >> >> > > > > wrote: > >> >> > > > > >> >> > > > Hi Amar, > >> >> > > > > >> >> > > > Any updates on this? I'm still not seeing it in OpenSUSE build > >> >> > > > repos. Maybe later today? > >> >> > > > > >> >> > > > Thanks. > >> >> > > > > >> >> > > > Sincerely, > >> >> > > > Artem > >> >> > > > > >> >> > > > -- > >> >> > > > Founder, Android Police , APK Mirror > >> >> > > > , Illogical Robot LLC > >> >> > > > beerpla.net | +ArtemRussakovskii > >> >> > > > | @ArtemR > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan > >> >> > > > > wrote: > >> >> > > > > >> >> > > > We are talking days. Not weeks. Considering already it is > >> >> > > > Thursday here. 1 more day for tagging, and packaging. May be ok > >> >> > > > to expect it on Monday. > >> >> > > > > >> >> > > > -Amar > >> >> > > > > >> >> > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii > >> >> > > > > wrote: > >> >> > > > > >> >> > > > Is the next release going to be an imminent hotfix, i.e. > >> >> > > > something like today/tomorrow, or are we talking weeks? > >> >> > > > > >> >> > > > Sincerely, > >> >> > > > Artem > >> >> > > > > >> >> > > > -- > >> >> > > > Founder, Android Police , APK > >> >> > > > Mirror , Illogical Robot LLC > >> >> > > > beerpla.net | +ArtemRussakovskii > >> >> > > > | @ArtemR > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii > >> >> > > > > wrote: > >> >> > > > > >> >> > > > Ended up downgrading to 5.3 just in case. Peer status > >> >> > > > and volume status are OK now. > >> >> > > > > >> >> > > > zypper install --oldpackage glusterfs-5.3-lp150.100.1 > >> >> > > > Loading repository data... > >> >> > > > Reading installed packages... > >> >> > > > Resolving package dependencies... > >> >> > > > > >> >> > > > Problem: glusterfs-5.3-lp150.100.1.x86_64 requires > >> >> > > > libgfapi0 = 5.3, but this requirement cannot be provided > >> >> > > > not installable providers: > >> >> > > > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] > >> >> > > > Solution 1: Following actions will be done: > >> >> > > > downgrade of libgfapi0-5.4-lp150.100.1.x86_64 to > >> >> > > > libgfapi0-5.3-lp150.100.1.x86_64 > >> >> > > > downgrade of libgfchangelog0-5.4-lp150.100.1.x86_64 to > >> >> > > > libgfchangelog0-5.3-lp150.100.1.x86_64 > >> >> > > > downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 to > >> >> > > > libgfrpc0-5.3-lp150.100.1.x86_64 > >> >> > > > downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 to > >> >> > > > libgfxdr0-5.3-lp150.100.1.x86_64 > >> >> > > > downgrade of libglusterfs0-5.4-lp150.100.1.x86_64 to > >> >> > > > libglusterfs0-5.3-lp150.100.1.x86_64 > >> >> > > > Solution 2: do not install glusterfs-5.3-lp150.100.1.x86_64 > >> >> > > > Solution 3: break glusterfs-5.3-lp150.100.1.x86_64 by > >> >> > > > ignoring some of its dependencies > >> >> > > > > >> >> > > > Choose from above solutions by number or cancel > >> >> > > > [1/2/3/c] (c): 1 > >> >> > > > Resolving dependencies... > >> >> > > > Resolving package dependencies... > >> >> > > > > >> >> > > > The following 6 packages are going to be downgraded: > >> >> > > > glusterfs libgfapi0 libgfchangelog0 libgfrpc0 > >> >> > > > libgfxdr0 libglusterfs0 > >> >> > > > > >> >> > > > 6 packages to downgrade. > >> >> > > > > >> >> > > > Sincerely, > >> >> > > > Artem > >> >> > > > > >> >> > > > -- > >> >> > > > Founder, Android Police > >> >> > > > , APK Mirror > >> >> > > > , Illogical Robot LLC > >> >> > > > beerpla.net | +ArtemRussakovskii > >> >> > > > | @ArtemR > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > On Tue, Mar 5, 2019 at 10:57 AM Artem Russakovskii > >> >> > > > > wrote: > >> >> > > > > >> >> > > > Noticed the same when upgrading from 5.3 to 5.4, as > >> >> > > > mentioned. > >> >> > > > > >> >> > > > I'm confused though. Is actual replication affected, > >> >> > > > because the 5.4 server and the 3x 5.3 servers still > >> >> > > > show heal info as all 4 connected, and the files > >> >> > > > seem to be replicating correctly as well. > >> >> > > > > >> >> > > > So what's actually affected - just the status > >> >> > > > command, or leaving 5.4 on one of the nodes is doing > >> >> > > > some damage to the underlying fs? Is it fixable by > >> >> > > > tweaking transport.socket.ssl-enabled? Does > >> >> > > > upgrading all servers to 5.4 resolve it, or should > >> >> > > > we revert back to 5.3? > >> >> > > > > >> >> > > > Sincerely, > >> >> > > > Artem > >> >> > > > > >> >> > > > -- > >> >> > > > Founder, Android Police > >> >> > > > , APK Mirror > >> >> > > > , Illogical Robot LLC > >> >> > > > beerpla.net | > >> >> > > > +ArtemRussakovskii > >> >> > > > > >> >> > > > | @ArtemR > >> >> > > > > >> >> > > > > >> >> > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert > >> >> > > > >> >> > > > > wrote: > >> >> > > > > >> >> > > > fyi: did a downgrade 5.4 -> 5.3 and it worked. > >> >> > > > all replicas are up and > >> >> > > > running. Awaiting updated v5.4. > >> >> > > > > >> >> > > > thx :-) > >> >> > > > > >> >> > > > Am Di., 5. M?rz 2019 um 09:26 Uhr schrieb Hari > >> >> > > > Gowtham >> >> > > > >: > >> >> > > > > > >> >> > > > > There are plans to revert the patch causing > >> >> > > > this error and rebuilt 5.4. > >> >> > > > > This should happen faster. the rebuilt 5.4 > >> >> > > > should be void of this upgrade issue. > >> >> > > > > > >> >> > > > > In the meantime, you can use 5.3 for this cluster. > >> >> > > > > Downgrading to 5.3 will work if it was just > >> >> > > > one node that was upgrade to 5.4 > >> >> > > > > and the other nodes are still in 5.3. > >> >> > > > > > >> >> > > > > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert > >> >> > > > >> >> > > > > wrote: > >> >> > > > > > > >> >> > > > > > Hi Hari, > >> >> > > > > > > >> >> > > > > > thx for the hint. Do you know when this will > >> >> > > > be fixed? Is a downgrade > >> >> > > > > > 5.4 -> 5.3 a possibility to fix this? > >> >> > > > > > > >> >> > > > > > Hubert > >> >> > > > > > > >> >> > > > > > Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb > >> >> > > > Hari Gowtham >> >> > > > >: > >> >> > > > > > > > >> >> > > > > > > Hi, > >> >> > > > > > > > >> >> > > > > > > This is a known issue we are working on. > >> >> > > > > > > As the checksum differs between the > >> >> > > > updated and non updated node, the > >> >> > > > > > > peers are getting rejected. > >> >> > > > > > > The bricks aren't coming because of the > >> >> > > > same issue. > >> >> > > > > > > > >> >> > > > > > > More about the issue: > >> >> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1685120 > >> >> > > > > > > > >> >> > > > > > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert > >> >> > > > >> >> > > > > wrote: > >> >> > > > > > > > > >> >> > > > > > > > Interestingly: gluster volume status > >> >> > > > misses gluster1, while heal > >> >> > > > > > > > statistics show gluster1: > >> >> > > > > > > > > >> >> > > > > > > > gluster volume status workdata > >> >> > > > > > > > Status of volume: workdata > >> >> > > > > > > > Gluster process > >> >> > > > TCP Port RDMA Port Online Pid > >> >> > > > > > > > > >> >> > > > ------------------------------------------------------------------------------ > >> >> > > > > > > > Brick gluster2:/gluster/md4/workdata > >> >> > > > 49153 0 Y 1723 > >> >> > > > > > > > Brick gluster3:/gluster/md4/workdata > >> >> > > > 49153 0 Y 2068 > >> >> > > > > > > > Self-heal Daemon on localhost > >> >> > > > N/A N/A Y 1732 > >> >> > > > > > > > Self-heal Daemon on gluster3 > >> >> > > > N/A N/A Y 2077 > >> >> > > > > > > > > >> >> > > > > > > > vs. > >> >> > > > > > > > > >> >> > > > > > > > gluster volume heal workdata statistics > >> >> > > > heal-count > >> >> > > > > > > > Gathering count of entries to be healed > >> >> > > > on volume workdata has been successful > >> >> > > > > > > > > >> >> > > > > > > > Brick gluster1:/gluster/md4/workdata > >> >> > > > > > > > Number of entries: 0 > >> >> > > > > > > > > >> >> > > > > > > > Brick gluster2:/gluster/md4/workdata > >> >> > > > > > > > Number of entries: 10745 > >> >> > > > > > > > > >> >> > > > > > > > Brick gluster3:/gluster/md4/workdata > >> >> > > > > > > > Number of entries: 10744 > >> >> > > > > > > > > >> >> > > > > > > > Am Di., 5. M?rz 2019 um 08:18 Uhr > >> >> > > > schrieb Hu Bert >> >> > > > >: > >> >> > > > > > > > > > >> >> > > > > > > > > Hi Miling, > >> >> > > > > > > > > > >> >> > > > > > > > > well, there are such entries, but > >> >> > > > those haven't been a problem during > >> >> > > > > > > > > install and the last kernel > >> >> > > > update+reboot. The entries look like: > >> >> > > > > > > > > > >> >> > > > > > > > > PUBLIC_IP gluster2.alpserver.de > >> >> > > > gluster2 > >> >> > > > > > > > > > >> >> > > > > > > > > 192.168.0.50 gluster1 > >> >> > > > > > > > > 192.168.0.51 gluster2 > >> >> > > > > > > > > 192.168.0.52 gluster3 > >> >> > > > > > > > > > >> >> > > > > > > > > 'ping gluster2' resolves to LAN IP; I > >> >> > > > removed the last entry in the > >> >> > > > > > > > > 1st line, did a reboot ... no, didn't > >> >> > > > help. From > >> >> > > > > > > > > /var/log/glusterfs/glusterd.log > >> >> > > > > > > > > on gluster 2: > >> >> > > > > > > > > > >> >> > > > > > > > > [2019-03-05 07:04:36.188128] E [MSGID: > >> >> > > > 106010] > >> >> > > > > > > > > > >> >> > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] > >> >> > > > 0-management: > >> >> > > > > > > > > Version of Cksums persistent differ. > >> >> > > > local cksum = 3950307018, remote > >> >> > > > > > > > > cksum = 455409345 on peer gluster1 > >> >> > > > > > > > > [2019-03-05 07:04:36.188314] I [MSGID: > >> >> > > > 106493] > >> >> > > > > > > > > > >> >> > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] > >> >> > > > 0-glusterd: > >> >> > > > > > > > > Responded to gluster1 (0), ret: 0, > >> >> > > > op_ret: -1 > >> >> > > > > > > > > > >> >> > > > > > > > > Interestingly there are no entries in > >> >> > > > the brick logs of the rejected > >> >> > > > > > > > > server. Well, not surprising as no > >> >> > > > brick process is running. The > >> >> > > > > > > > > server gluster1 is still in rejected > >> >> > > > state. > >> >> > > > > > > > > > >> >> > > > > > > > > 'gluster volume start workdata force' > >> >> > > > starts the brick process on > >> >> > > > > > > > > gluster1, and some heals are happening > >> >> > > > on gluster2+3, but via 'gluster > >> >> > > > > > > > > volume status workdata' the volumes > >> >> > > > still aren't complete. > >> >> > > > > > > > > > >> >> > > > > > > > > gluster1: > >> >> > > > > > > > > > >> >> > > > ------------------------------------------------------------------------------ > >> >> > > > > > > > > Brick gluster1:/gluster/md4/workdata > >> >> > > > 49152 0 Y 2523 > >> >> > > > > > > > > Self-heal Daemon on localhost > >> >> > > > N/A N/A Y 2549 > >> >> > > > > > > > > > >> >> > > > > > > > > gluster2: > >> >> > > > > > > > > Gluster process > >> >> > > > TCP Port RDMA Port Online Pid > >> >> > > > > > > > > > >> >> > > > ------------------------------------------------------------------------------ > >> >> > > > > > > > > Brick gluster2:/gluster/md4/workdata > >> >> > > > 49153 0 Y 1723 > >> >> > > > > > > > > Brick gluster3:/gluster/md4/workdata > >> >> > > > 49153 0 Y 2068 > >> >> > > > > > > > > Self-heal Daemon on localhost > >> >> > > > N/A N/A Y 1732 > >> >> > > > > > > > > Self-heal Daemon on gluster3 > >> >> > > > N/A N/A Y 2077 > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > Hubert > >> >> > > > > > > > > > >> >> > > > > > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr > >> >> > > > schrieb Milind Changire >> >> > > > >: > >> >> > > > > > > > > > > >> >> > > > > > > > > > There are probably DNS entries or > >> >> > > > /etc/hosts entries with the public IP Addresses > >> >> > > > that the host names (gluster1, gluster2, > >> >> > > > gluster3) are getting resolved to. > >> >> > > > > > > > > > /etc/resolv.conf would tell which is > >> >> > > > the default domain searched for the node names > >> >> > > > and the DNS servers which respond to the queries. > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu > >> >> > > > Bert >> >> > > > > wrote: > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> Good morning, > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> i have a replicate 3 setup with 2 > >> >> > > > volumes, running on version 5.3 on > >> >> > > > > > > > > >> debian stretch. This morning i > >> >> > > > upgraded one server to version 5.4 and > >> >> > > > > > > > > >> rebooted the machine; after the > >> >> > > > restart i noticed that: > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> - no brick process is running > >> >> > > > > > > > > >> - gluster volume status only shows > >> >> > > > the server itself: > >> >> > > > > > > > > >> gluster volume status workdata > >> >> > > > > > > > > >> Status of volume: workdata > >> >> > > > > > > > > >> Gluster process > >> >> > > > TCP Port RDMA Port Online Pid > >> >> > > > > > > > > >> > >> >> > > > ------------------------------------------------------------------------------ > >> >> > > > > > > > > >> Brick > >> >> > > > gluster1:/gluster/md4/workdata N/A > >> >> > > > N/A N N/A > >> >> > > > > > > > > >> NFS Server on localhost > >> >> > > > N/A N/A N N/A > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> - gluster peer status on the server > >> >> > > > > > > > > >> gluster peer status > >> >> > > > > > > > > >> Number of Peers: 2 > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> Hostname: gluster3 > >> >> > > > > > > > > >> Uuid: > >> >> > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a > >> >> > > > > > > > > >> State: Peer Rejected (Connected) > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> Hostname: gluster2 > >> >> > > > > > > > > >> Uuid: > >> >> > > > 162fea82-406a-4f51-81a3-e90235d8da27 > >> >> > > > > > > > > >> State: Peer Rejected (Connected) > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> - gluster peer status on the other > >> >> > > > 2 servers: > >> >> > > > > > > > > >> gluster peer status > >> >> > > > > > > > > >> Number of Peers: 2 > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> Hostname: gluster1 > >> >> > > > > > > > > >> Uuid: > >> >> > > > 9a360776-7b58-49ae-831e-a0ce4e4afbef > >> >> > > > > > > > > >> State: Peer Rejected (Connected) > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> Hostname: gluster3 > >> >> > > > > > > > > >> Uuid: > >> >> > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a > >> >> > > > > > > > > >> State: Peer in Cluster (Connected) > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> I noticed that, in the brick logs, > >> >> > > > i see that the public IP is used > >> >> > > > > > > > > >> instead of the LAN IP. brick logs > >> >> > > > from one of the volumes: > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> rejected node: > >> >> > > > https://pastebin.com/qkpj10Sd > >> >> > > > > > > > > >> connected nodes: > >> >> > > > https://pastebin.com/8SxVVYFV > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> Why is the public IP suddenly used > >> >> > > > instead of the LAN IP? Killing all > >> >> > > > > > > > > >> gluster processes and rebooting > >> >> > > > (again) didn't help. > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> > >> >> > > > > > > > > >> Thx, > >> >> > > > > > > > > >> Hubert > >> >> > > > > > > > > >> > >> >> > > > _______________________________________________ > >> >> > > > > > > > > >> Gluster-users mailing list > >> >> > > > > > > > > >> Gluster-users at gluster.org > >> >> > > > > >> >> > > > > > > > > >> > >> >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > -- > >> >> > > > > > > > > > Milind > >> >> > > > > > > > > > > >> >> > > > > > > > > >> >> > > > _______________________________________________ > >> >> > > > > > > > Gluster-users mailing list > >> >> > > > > > > > Gluster-users at gluster.org > >> >> > > > > >> >> > > > > > > > > >> >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > -- > >> >> > > > > > > Regards, > >> >> > > > > > > Hari Gowtham. > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > -- > >> >> > > > > Regards, > >> >> > > > > Hari Gowtham. > >> >> > > > _______________________________________________ > >> >> > > > Gluster-users mailing list > >> >> > > > Gluster-users at gluster.org > >> >> > > > > >> >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users > >> >> > > > > >> >> > > > _______________________________________________ > >> >> > > > Gluster-users mailing list > >> >> > > > Gluster-users at gluster.org > >> >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > -- > >> >> > > > Amar Tumballi (amarts) > >> >> > > > > >> >> > > > > >> >> > > > _______________________________________________ > >> >> > > > Gluster-users mailing list > >> >> > > > Gluster-users at gluster.org > >> >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users > >> >> > > > > >> >> > > _______________________________________________ > >> >> > > Gluster-users mailing list > >> >> > > Gluster-users at gluster.org > >> >> > > https://lists.gluster.org/mailman/listinfo/gluster-users > >> > > >> > > >> > > >> > -- > >> > Amar Tumballi (amarts) From cuculovic at mdpi.com Thu Mar 21 08:09:27 2019 From: cuculovic at mdpi.com (Milos Cuculovic) Date: Thu, 21 Mar 2019 09:09:27 +0100 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> Message-ID: I was now able to catch the split brain log: sudo gluster volume heal storage2 info Brick storage3:/data/data-cluster /dms/final_archive - Is in split-brain Status: Connected Number of entries: 3 Brick storage4:/data/data-cluster /dms/final_archive - Is in split-brain Status: Connected Number of entries: 2 Milos > On 21 Mar 2019, at 09:07, Milos Cuculovic wrote: > > Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal shows this: > > sudo gluster volume heal storage2 info > Brick storage3:/data/data-cluster > > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 3 > > Brick storage4:/data/data-cluster > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 2 > > The same files stay there. From time to time the status of the /dms/final_archive is in split brain at the following command shows: > > sudo gluster volume heal storage2 info split-brain > Brick storage3:/data/data-cluster > /dms/final_archive > Status: Connected > Number of entries in split-brain: 1 > > Brick storage4:/data/data-cluster > /dms/final_archive > Status: Connected > Number of entries in split-brain: 1 > > How to know the file who is in split brain? The files in /dms/final_archive are not very important, fine to remove (ideally resolve the split brain) for the ones that differ. > > I can only see the directory and GFID. Any idea on how to resolve this situation as I would like to continue with the upgrade on the 2nd server, and for this the heal needs to be done with 0 entries in sudo gluster volume heal storage2 info > > Thank you in advance, Milos. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cuculovic at mdpi.com Thu Mar 21 08:07:38 2019 From: cuculovic at mdpi.com (Milos Cuculovic) Date: Thu, 21 Mar 2019 09:07:38 +0100 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain Message-ID: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal shows this: sudo gluster volume heal storage2 info Brick storage3:/data/data-cluster /dms/final_archive - Possibly undergoing heal Status: Connected Number of entries: 3 Brick storage4:/data/data-cluster /dms/final_archive - Possibly undergoing heal Status: Connected Number of entries: 2 The same files stay there. From time to time the status of the /dms/final_archive is in split brain at the following command shows: sudo gluster volume heal storage2 info split-brain Brick storage3:/data/data-cluster /dms/final_archive Status: Connected Number of entries in split-brain: 1 Brick storage4:/data/data-cluster /dms/final_archive Status: Connected Number of entries in split-brain: 1 How to know the file who is in split brain? The files in /dms/final_archive are not very important, fine to remove (ideally resolve the split brain) for the ones that differ. I can only see the directory and GFID. Any idea on how to resolve this situation as I would like to continue with the upgrade on the 2nd server, and for this the heal needs to be done with 0 entries in sudo gluster volume heal storage2 info Thank you in advance, Milos. From sunkumar at redhat.com Thu Mar 21 08:42:01 2019 From: sunkumar at redhat.com (Sunny Kumar) Date: Thu, 21 Mar 2019 14:12:01 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: Message-ID: Hey you can start a fresh I think you are not following proper setup steps. Please follow these steps [1] to create geo-rep session, you can delete the old one and do a fresh start. Or alternative you can use this tool[2] to setup geo-rep. [1]. https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ [2]. http://aravindavk.in/blog/gluster-georep-tools/ /Sunny On Thu, Mar 21, 2019 at 11:28 AM Maurya M wrote: > > Hi Sunil, > I did run the on the slave node : > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781 > getting this message "/home/azureuser/common_secret.pem.pub not present. Please run geo-replication command on master with push-pem option to generate the file" > > So went back and created the session again, no change, so manually copied the common_secret.pem.pub to /home/azureuser/ but still the set_geo_rep_pem_keys.sh is looking the pem file in different name : COMMON_SECRET_PEM_PUB=${master_vol}_${slave_vol}_common_secret.pem.pub , change the name of pem , ran the command again : > > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781 > Successfully copied file. > Command executed successfully. > > > - went back and created the session , start the geo-replication , still seeing the same error in logs. Any ideas ? > > thanks, > Maurya > > > > On Wed, Mar 20, 2019 at 11:07 PM Sunny Kumar wrote: >> >> Hi Maurya, >> >> I guess you missed last trick to distribute keys in slave node. I see >> this is non-root geo-rep setup so please try this: >> >> >> Run the following command as root in any one of Slave node. >> >> /usr/local/libexec/glusterfs/set_geo_rep_pem_keys.sh >> >> >> - Sunny >> >> On Wed, Mar 20, 2019 at 10:47 PM Maurya M wrote: >> > >> > Hi all, >> > Have setup a 3 master nodes - 3 slave nodes (gluster 4.1) for geo-replication, but once have the geo-replication configure the status is always on "Created', >> > even after have force start the session. >> > >> > On close inspect of the logs on the master node seeing this error: >> > >> > "E [syncdutils(monitor):801:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 azureuser at xxxxx.xxxx..xxx. gluster --xml --remote-host=localhost volume info vol_a5ae34341a873c043c99a938adcb5b5781 error=255" >> > >> > Any ideas what is issue? >> > >> > thanks, >> > Maurya >> > >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-users From mauryam at gmail.com Thu Mar 21 09:13:07 2019 From: mauryam at gmail.com (Maurya M) Date: Thu, 21 Mar 2019 14:43:07 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: Message-ID: hi Sunny, i did use the [1] link for the setup, when i encountered this error during ssh-copy-id : (so setup the passwordless ssh, by manually copied the private/ public keys to all the nodes , both master & slave) [root at k8s-agentpool1-24779565-1 ~]# ssh-copy-id geouser at xxx.xx.xxx.x /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub" The authenticity of host ' xxx.xx.xxx.x ( xxx.xx.xxx.x )' can't be established. ECDSA key fingerprint is SHA256:B2rNaocIcPjRga13oTnopbJ5KjI/7l5fMANXc+KhA9s. ECDSA key fingerprint is MD5:1b:70:f9:7a:bf:35:33:47:0c:f2:c1:cd:21:e2:d3:75. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys Permission denied (publickey). To start afresh what all needs to teardown / delete, do we have any script for it ? where all the pem keys do i need to delete? thanks, Maurya On Thu, Mar 21, 2019 at 2:12 PM Sunny Kumar wrote: > Hey you can start a fresh I think you are not following proper setup steps. > > Please follow these steps [1] to create geo-rep session, you can > delete the old one and do a fresh start. Or alternative you can use > this tool[2] to setup geo-rep. > > > [1]. > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ > [2]. http://aravindavk.in/blog/gluster-georep-tools/ > > > /Sunny > > On Thu, Mar 21, 2019 at 11:28 AM Maurya M wrote: > > > > Hi Sunil, > > I did run the on the slave node : > > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser > vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781 > > getting this message "/home/azureuser/common_secret.pem.pub not present. > Please run geo-replication command on master with push-pem option to > generate the file" > > > > So went back and created the session again, no change, so manually > copied the common_secret.pem.pub to /home/azureuser/ but still the > set_geo_rep_pem_keys.sh is looking the pem file in different name : > COMMON_SECRET_PEM_PUB=${master_vol}_${slave_vol}_common_secret.pem.pub , > change the name of pem , ran the command again : > > > > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser > vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781 > > Successfully copied file. > > Command executed successfully. > > > > > > - went back and created the session , start the geo-replication , still > seeing the same error in logs. Any ideas ? > > > > thanks, > > Maurya > > > > > > > > On Wed, Mar 20, 2019 at 11:07 PM Sunny Kumar > wrote: > >> > >> Hi Maurya, > >> > >> I guess you missed last trick to distribute keys in slave node. I see > >> this is non-root geo-rep setup so please try this: > >> > >> > >> Run the following command as root in any one of Slave node. > >> > >> /usr/local/libexec/glusterfs/set_geo_rep_pem_keys.sh > >> > >> > >> - Sunny > >> > >> On Wed, Mar 20, 2019 at 10:47 PM Maurya M wrote: > >> > > >> > Hi all, > >> > Have setup a 3 master nodes - 3 slave nodes (gluster 4.1) for > geo-replication, but once have the geo-replication configure the status is > always on "Created', > >> > even after have force start the session. > >> > > >> > On close inspect of the logs on the master node seeing this error: > >> > > >> > "E [syncdutils(monitor):801:errlog] Popen: command returned error > cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/secret.pem -p 22 azureuser at xxxxx.xxxx..xxx. > gluster --xml --remote-host=localhost volume info > vol_a5ae34341a873c043c99a938adcb5b5781 error=255" > >> > > >> > Any ideas what is issue? > >> > > >> > thanks, > >> > Maurya > >> > > >> > _______________________________________________ > >> > Gluster-users mailing list > >> > Gluster-users at gluster.org > >> > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Thu Mar 21 09:27:09 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Thu, 21 Mar 2019 14:57:09 +0530 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> Message-ID: Hi, Note: I guess the volume you are talking about is of type replica-2 (1x2). Usually replica 2 volumes are prone to split-brain. If you can consider converting them to arbiter or replica-3, they will handle most of the cases which can lead to slit-brains. For more information see [1]. Resolving the split-brain: [2] talks about how to interpret the heal info output and different ways to resolve them using the CLI/manually/using the favorite-child-policy. If you are having entry split brain, and is a gfid split-brain (file/dir having different gfids on the replica bricks) then you can use the CLI option to resolve them. If a directory is in gfid split-brain in a distributed-replicate volume and you are using the source-brick option please make sure you use the brick of this subvolume, which has the same gfid as that of the other distribute subvolume(s) where you have the correct gfid, as the source. If you are having a type mismatch then follow the steps in [3] to resolve the split-brain. [1] https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ [2] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ [3] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain HTH, Karthik On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic wrote: > I was now able to catch the split brain log: > > sudo gluster volume heal storage2 info > Brick storage3:/data/data-cluster > > > /dms/final_archive - Is in split-brain > > Status: Connected > Number of entries: 3 > > Brick storage4:/data/data-cluster > > /dms/final_archive - Is in split-brain > > Status: Connected > Number of entries: 2 > > Milos > > On 21 Mar 2019, at 09:07, Milos Cuculovic wrote: > > Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal > shows this: > > sudo gluster volume heal storage2 info > Brick storage3:/data/data-cluster > > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 3 > > Brick storage4:/data/data-cluster > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 2 > > The same files stay there. From time to time the status of the > /dms/final_archive is in split brain at the following command shows: > > sudo gluster volume heal storage2 info split-brain > Brick storage3:/data/data-cluster > /dms/final_archive > Status: Connected > Number of entries in split-brain: 1 > > Brick storage4:/data/data-cluster > /dms/final_archive > Status: Connected > Number of entries in split-brain: 1 > > How to know the file who is in split brain? The files in > /dms/final_archive are not very important, fine to remove (ideally resolve > the split brain) for the ones that differ. > > I can only see the directory and GFID. Any idea on how to resolve this > situation as I would like to continue with the upgrade on the 2nd server, > and for this the heal needs to be done with 0 entries in sudo gluster > volume heal storage2 info > > Thank you in advance, Milos. > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkalever at redhat.com Thu Mar 21 10:09:06 2019 From: pkalever at redhat.com (Prasanna Kalever) Date: Thu, 21 Mar 2019 15:39:06 +0530 Subject: [Gluster-users] [Gluster-devel] Network Block device (NBD) on top of glusterfs In-Reply-To: References: Message-ID: On Thu, Mar 21, 2019 at 9:00 AM Xiubo Li wrote: > All, > > I am one of the contributor for gluster-block > [1] project, and also I > contribute to linux kernel and open-iscsi > project.[2] > > NBD was around for some time, but in recent time, linux kernel?s Network > Block Device (NBD) is enhanced and made to work with more devices and also > the option to integrate with netlink is added. So, I tried to provide a > glusterfs client based NBD driver recently. Please refer github issue #633 > [3], and good news is I > have a working code, with most basic things @ nbd-runner project > [4]. > > While this email is about announcing the project, and asking for more > collaboration, I would also like to discuss more about the placement of the > project itself. Currently nbd-runner project is expected to be shared by > our friends at Ceph project too, to provide NBD driver for Ceph. I have > personally worked with some of them closely while contributing to > open-iSCSI project, and we would like to take this project to great success. > > Now few questions: > > 1. Can I continue to use http://github.com/gluster/nbd-runner as home > for this project, even if its shared by other filesystem projects? > > > - I personally am fine with this. > > > 1. Should there be a separate organization for this repo? > > > - While it may make sense in future, for now, I am not planning to > start any new thing? > > It would be great if we have some consensus on this soon as nbd-runner is > a new repository. If there are no concerns, I will continue to contribute > to the existing repository. > Thanks Xiubo Li, for finally sending this email out. Since this email is out on gluster mailing list, I would like to take a stand from gluster community point of view *only* and share my views. My honest answer is "If we want to maintain this within gluster org, then 80% of the effort is common/duplicate of what we did all these days with gluster-block", like: * rpc/socket code * cli/daemon parser/helper logics * gfapi util functions * logger framework * inotify & dyn-config threads * configure/Makefile/specfiles * docsAboutGluster and etc .. The repository gluster-block is actually a home for all the block related stuff within gluster and its designed to accommodate alike functionalities, if I was you I would have simply copied nbd-runner.c into https://github.com/gluster/gluster-block/tree/master/daemon/ just like ceph plays it here https://github.com/ceph/ceph/blob/master/src/tools/rbd_nbd/rbd-nbd.cc and be done. Advantages of keeping nbd client within gluster-block: -> No worry about maintenance code burdon -> No worry about monitoring a new component -> shipping packages to fedora/centos/rhel is handled -> This helps improve and stabilize the current gluster-block framework -> We can build a common CI -> We can use reuse common test framework and etc .. If you have an impression that gluster-block is for management, then I would really want to correct you at this point. Some of my near future plans for gluster-block: * Allow exporting blocks with FUSE access via fileIO backstore to improve large-file workloads, draft: https://github.com/gluster/gluster-block/pull/58 * Accommodate kernel loopback handling for local only applications * The same way we can accommodate nbd app/client, and IMHO this effort shouldn't take 1 or 2 days to get it merged with in gluster-block and ready for a go release. Hope that clarifies it. Best Regards, -- Prasanna > Regards, > Xiubo Li (@lxbsz) > > [1] - https://github.com/gluster/gluster-block > [2] - https://github.com/open-iscsi > [3] - https://github.com/gluster/glusterfs/issues/633 > [4] - https://github.com/gluster/nbd-runner > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From cuculovic at mdpi.com Thu Mar 21 10:24:33 2019 From: cuculovic at mdpi.com (Milos Cuculovic) Date: Thu, 21 Mar 2019 11:24:33 +0100 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> Message-ID: <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> Thanks Karthik! I was trying to find some resolution methods from [2] but unfortunately none worked (I can explain what I tried if needed). > I guess the volume you are talking about is of type replica-2 (1x2). That?s correct, aware of the arbiter solution but still didn?t took time to implement. From the info results I posted, how to know in which situation I am. No files are mentioned in spit brain, only directories. One brick has 3 entries and one two entries. sudo gluster volume heal storage2 info [sudo] password for sshadmin: Brick storage3:/data/data-cluster /dms/final_archive - Possibly undergoing heal Status: Connected Number of entries: 3 Brick storage4:/data/data-cluster /dms/final_archive - Possibly undergoing heal Status: Connected Number of entries: 2 - Kindest regards, Milos Cuculovic IT Manager --- MDPI AG Postfach, CH-4020 Basel, Switzerland Office: St. Alban-Anlage 66, 4052 Basel, Switzerland Tel. +41 61 683 77 35 Fax +41 61 302 89 18 Email: cuculovic at mdpi.com Skype: milos.cuculovic.mdpi Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > On 21 Mar 2019, at 10:27, Karthik Subrahmanya wrote: > > Hi, > > Note: I guess the volume you are talking about is of type replica-2 (1x2). Usually replica 2 volumes are prone to split-brain. If you can consider converting them to arbiter or replica-3, they will handle most of the cases which can lead to slit-brains. For more information see [1]. > > Resolving the split-brain: [2] talks about how to interpret the heal info output and different ways to resolve them using the CLI/manually/using the favorite-child-policy. > If you are having entry split brain, and is a gfid split-brain (file/dir having different gfids on the replica bricks) then you can use the CLI option to resolve them. If a directory is in gfid split-brain in a distributed-replicate volume and you are using the source-brick option please make sure you use the brick of this subvolume, which has the same gfid as that of the other distribute subvolume(s) where you have the correct gfid, as the source. > If you are having a type mismatch then follow the steps in [3] to resolve the split-brain. > > [1] https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ > [2] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ > [3] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain > > HTH, > Karthik > > On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic > wrote: > I was now able to catch the split brain log: > > sudo gluster volume heal storage2 info > Brick storage3:/data/data-cluster > > > /dms/final_archive - Is in split-brain > > Status: Connected > Number of entries: 3 > > Brick storage4:/data/data-cluster > > /dms/final_archive - Is in split-brain > > Status: Connected > Number of entries: 2 > > Milos > >> On 21 Mar 2019, at 09:07, Milos Cuculovic > wrote: >> >> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal shows this: >> >> sudo gluster volume heal storage2 info >> Brick storage3:/data/data-cluster >> >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 3 >> >> Brick storage4:/data/data-cluster >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 2 >> >> The same files stay there. From time to time the status of the /dms/final_archive is in split brain at the following command shows: >> >> sudo gluster volume heal storage2 info split-brain >> Brick storage3:/data/data-cluster >> /dms/final_archive >> Status: Connected >> Number of entries in split-brain: 1 >> >> Brick storage4:/data/data-cluster >> /dms/final_archive >> Status: Connected >> Number of entries in split-brain: 1 >> >> How to know the file who is in split brain? The files in /dms/final_archive are not very important, fine to remove (ideally resolve the split brain) for the ones that differ. >> >> I can only see the directory and GFID. Any idea on how to resolve this situation as I would like to continue with the upgrade on the 2nd server, and for this the heal needs to be done with 0 entries in sudo gluster volume heal storage2 info >> >> Thank you in advance, Milos. > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauro.tridici at cmcc.it Thu Mar 21 10:40:02 2019 From: mauro.tridici at cmcc.it (Mauro Tridici) Date: Thu, 21 Mar 2019 11:40:02 +0100 Subject: [Gluster-users] "rpc_clnt_ping_timer_expired" errors In-Reply-To: References: <96B07283-D8AB-4F06-909D-E00424625528@cmcc.it> <42758A0E-8BE9-497D-BDE3-55D7DC633BC7@cmcc.it> <6A8CF4A4-98EA-48C3-A059-D60D1B2721C7@cmcc.it> <2CF49168-9C1B-4931-8C34-8157262A137A@cmcc.it> <7A151CC9-A0AE-4A45-B450-A4063D216D9E@cmcc.it> <32D53ECE-3F49-4415-A6EE-241B351AC2BA@cmcc.it> <4685A75B-5978-4338-9C9F-4A02FB40B9BC@cmcc.it> <4D2E6B04-C2E8-4FD5-B72D-E7C05931C6F9@cmcc.it> <4E332A56-B318-4BC1-9F44-84AB4392A5DE@cmcc.it> <832FD362-3B14-40D8-8530-604419300476@cmcc.it> <8D926643-1093-48ED-823F-D8F117F702CF@cmcc.it> <9D0BE438-DF11-4D0A-AB85-B44357032F29@cmcc.it> <5F0AC378-8170-4342-8473-9C17159CAC1D@cmcc.it> <7A50E86D-9E27-4EA7-883B-45E9F973991A@cmcc.it> <58B5DB7F-DCB4-4FBF-B1F7-681315B1613A@cmcc.it> <6327B44F-4E7E-46BB-A74C-70F4457DD1EB@cmcc.it> <0167DF4A-8329-4A1A-B439-857DFA6F78BB@cmcc.it> <763F334E-35B8-4729-B8E1-D30866754EEE@cmcc.it> <91DFC9AC-4805-41EB-AC6F-5722E018DD6E@cmcc.it> <8A9752B8-B231-4570-8FF4-8D3D781E7D42@cmcc.it> <47A24804-F975-4EE6-9FA5-67FCDA18D707@cmcc.it> <637FF5D2-D1F4-4686-9D48-646A96F67B96@cmcc.it> <4A87495F-3755-48F7-8507-085869069C64@cmcc.it> <3854BBF6-5B98-4AB3-A67E-E7DE59E69A63@cmcc.it> <313DA021-9173-4899-96B0-831B10B00B61@cmcc.it> <17996AFD-DFC8-40F3-9D09-DB6DDAD5B7D6@cmcc.it> <7074B5D8-955A-4802-A9F3-606C99734417@cmcc.it> <83B84BF9-8334-4230-B6F8-0BC4BFBEFF15@cmcc.it> <133B9AE4-9792-4F72-AD91-D36E7B9EC711@cmcc.it> <6611C4B0-57FD-4390-88B5-BD373555D4C5@cmcc.it> Message-ID: <93130243-E356-4425-8F15-69BE61562E2F@cmcc.it> Hi Raghavendra, the number of errors reduced, but during last days I received some error notifications from Nagios server similar to the following one: ***** Nagios ***** Notification Type: PROBLEM Service: Brick - /gluster/mnt5/brick Host: s04 Address: s04-stg State: CRITICAL Date/Time: Mon Mar 18 19:56:36 CET 2019 Additional Info: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. The error was related only to s04 gluster server. So, following your suggestions, I executed, on s04 node, the top command. In attachment, you can find the related output. Thank you very much for your help. Regards, Mauro > On 14 Mar 2019, at 13:31, Raghavendra Gowdappa wrote: > > Thanks Mauro. > > On Thu, Mar 14, 2019 at 3:38 PM Mauro Tridici > wrote: > Hi Raghavendra, > > I just changed the client option value to 8. > I will check the volume behaviour during the next hours. > > The GlusterFS version is 3.12.14. > > I will provide you the logs as soon as the activity load will be high. > Thank you, > Mauro > >> On 14 Mar 2019, at 04:57, Raghavendra Gowdappa > wrote: >> >> >> >> On Wed, Mar 13, 2019 at 3:55 PM Mauro Tridici > wrote: >> Hi Raghavendra, >> >> Yes, server.event-thread has been changed from 4 to 8. >> >> Was client.event-thread value too changed to 8? If not, I would like to know the results of including this tuning too. Also, if possible, can you get the output of following command from problematic clients and bricks (during the duration when load tends to be high and ping-timer-expiry is seen)? >> >> # top -bHd 3 >> >> This will help us to know CPU utilization of event-threads. >> >> And I forgot to ask, what version of Glusterfs are you using? >> >> During last days, I noticed that the error events are still here although they have been considerably reduced. >> >> So, I used grep command against the log files in order to provide you a global vision about the warning, error and critical events appeared today at 06:xx (may be useful I hope). >> I collected the info from s06 gluster server, but the behaviour is the the almost the same on the other gluster servers. >> >> ERRORS: >> CWD: /var/log/glusterfs >> COMMAND: grep " E " *.log |grep "2019-03-13 06:" >> >> (I can see a lot of this kind of message in the same period but I'm notifying you only one record for each type of error) >> >> glusterd.log:[2019-03-13 06:12:35.982863] E [MSGID: 101042] [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /var/run/gluster/tier2_quota_list/ >> >> glustershd.log:[2019-03-13 06:14:28.666562] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a71ddcebb] (--> /lib64/libgfr >> pc.so.0(saved_frames_unwind+0x1de)[0x7f4a71ba1d9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4a71ba1ebe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup >> +0x90)[0x7f4a71ba3640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f4a71ba4130] ))))) 0-tier2-client-55: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) >> called at 2019-03-13 06:14:14.858441 (xid=0x17fddb50) >> >> glustershd.log:[2019-03-13 06:17:48.883825] E [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to 192.168.0.55:49158 failed (Connection timed out); disco >> nnecting socket >> glustershd.log:[2019-03-13 06:19:58.931798] E [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to 192.168.0.55:49158 failed (Connection timed out); disco >> nnecting socket >> glustershd.log:[2019-03-13 06:22:08.979829] E [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to 192.168.0.55:49158 failed (Connection timed out); disco >> nnecting socket >> glustershd.log:[2019-03-13 06:22:36.226847] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote operation failed [Transport endpoint >> is not connected] >> glustershd.log:[2019-03-13 06:22:36.306669] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote operation failed [Transport endpoint >> is not connected] >> glustershd.log:[2019-03-13 06:22:36.385257] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote operation failed [Transport endpoint >> is not connected] >> >> WARNINGS: >> CWD: /var/log/glusterfs >> COMMAND: grep " W " *.log |grep "2019-03-13 06:" >> >> (I can see a lot of this kind of message in the same period but I'm notifying you only one record for each type of warnings) >> >> glustershd.log:[2019-03-13 06:14:28.666772] W [MSGID: 114031] [client-rpc-fops.c:1080:client3_3_getxattr_cbk] 0-tier2-client-55: remote operation failed. Path: > 0f-f34d-4c25-bbe8-74bde0248d7e> (b6b35d0f-f34d-4c25-bbe8-74bde0248d7e). Key: (null) [Transport endpoint is not connected] >> >> glustershd.log:[2019-03-13 06:14:31.421576] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (2) >> >> glustershd.log:[2019-03-13 06:15:31.547417] W [MSGID: 122032] [ec-heald.c:266:ec_shd_index_sweep] 0-tier2-disperse-9: unable to get index-dir on tier2-client-55 [Operation >> now in progress] >> >> quota-mount-tier2.log:[2019-03-13 06:12:36.116277] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'trans >> port.address-family', continuing with correction >> quota-mount-tier2.log:[2019-03-13 06:12:36.198430] W [MSGID: 101174] [graph.c:363:_log_if_unknown_option] 0-tier2-readdir-ahead: option 'parallel-readdir' is not recognized >> quota-mount-tier2.log:[2019-03-13 06:12:37.945007] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f340892be25] -->/usr/sbin/glusterfs(gluste >> rfs_sigwaiter+0xe5) [0x55ef010164b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55ef0101632b] ) 0-: received signum (15), shutting down >> >> CRITICALS: >> CWD: /var/log/glusterfs >> COMMAND: grep " C " *.log |grep "2019-03-13 06:" >> >> no critical errors at 06:xx >> only one critical error during the day >> >> [root at s06 glusterfs]# grep " C " *.log |grep "2019-03-13" >> glustershd.log:[2019-03-13 02:21:29.126279] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server 192.168.0.55:49158 has not responded in the last 42 seconds, disconnecting. >> >> >> Thank you very much for your help. >> Regards, >> Mauro >> >>> On 12 Mar 2019, at 05:17, Raghavendra Gowdappa > wrote: >>> >>> Was the suggestion to increase server.event-thread values tried? If yes, what were the results? >>> >>> On Mon, Mar 11, 2019 at 2:40 PM Mauro Tridici > wrote: >>> Dear All, >>> >>> do you have any suggestions about the right way to "debug" this issue? >>> In attachment, the updated logs of ?s06" gluster server. >>> >>> I noticed a lot of intermittent warning and error messages. >>> >>> Thank you in advance, >>> Mauro >>> >>> >>> >>>> On 4 Mar 2019, at 18:45, Raghavendra Gowdappa > wrote: >>>> >>>> >>>> +Gluster Devel , +Gluster-users >>>> >>>> I would like to point out another issue. Even if what I suggested prevents disconnects, part of the solution would be only symptomatic treatment and doesn't address the root cause of the problem. In most of the ping-timer-expiry issues, the root cause is the increased load on bricks and the inability of bricks to be responsive under high load. So, the actual solution would be doing any or both of the following: >>>> * identify the source of increased load and if possible throttle it. Internal heal processes like self-heal, rebalance, quota heal are known to pump traffic into bricks without much throttling (io-threads _might_ do some throttling, but my understanding is its not sufficient). >>>> * identify the reason for bricks to become unresponsive during load. This may be fixable issues like not enough event-threads to read from network or difficult to fix issues like fsync on backend fs freezing the process or semi fixable issues (in code) like lock contention. >>>> >>>> So any genuine effort to fix ping-timer-issues (to be honest most of the times they are not issues related to rpc/network) would involve performance characterization of various subsystems on bricks and clients. Various subsystems can include (but not necessarily limited to), underlying OS/filesystem, glusterfs processes, CPU consumption etc >>>> >>>> regards, >>>> Raghavendra >>>> >>>> On Mon, Mar 4, 2019 at 9:31 PM Mauro Tridici > wrote: >>>> Thank you, let?s try! >>>> I will inform you about the effects of the change. >>>> >>>> Regards, >>>> Mauro >>>> >>>>> On 4 Mar 2019, at 16:55, Raghavendra Gowdappa > wrote: >>>>> >>>>> >>>>> >>>>> On Mon, Mar 4, 2019 at 8:54 PM Mauro Tridici > wrote: >>>>> Hi Raghavendra, >>>>> >>>>> thank you for your reply. >>>>> Yes, you are right. It is a problem that seems to happen randomly. >>>>> At this moment, server.event-threads value is 4. I will try to increase this value to 8. Do you think that it could be a valid value ? >>>>> >>>>> Yes. We can try with that. You should see at least frequency of ping-timer related disconnects reduce with this value (even if it doesn't eliminate the problem completely). >>>>> >>>>> >>>>> Regards, >>>>> Mauro >>>>> >>>>> >>>>>> On 4 Mar 2019, at 15:36, Raghavendra Gowdappa > wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Mar 4, 2019 at 8:01 PM Nithya Balachandran > wrote: >>>>>> Hi Mauro, >>>>>> >>>>>> It looks like some problem on s06. Are all your other nodes ok? Can you send us the gluster logs from this node? >>>>>> >>>>>> @Raghavendra G , do you have any idea as to how this can be debugged? Maybe running top ? Or debug brick logs? >>>>>> >>>>>> If we can reproduce the problem, collecting tcpdump on both ends of connection will help. But, one common problem is these bugs are inconsistently reproducible and hence we may not be able to capture tcpdump at correct intervals. Other than that, we can try to collect some evidence that poller threads were busy (waiting on locks). But, not sure what debug data provides that information. >>>>>> >>>>>> From what I know, its difficult to collect evidence for this issue and we could only reason about it. >>>>>> >>>>>> We can try a workaround though - try increasing server.event-threads and see whether ping-timer expiry issues go away with an optimal value. If that's the case, it kind of provides proof for our hypothesis. >>>>>> >>>>>> >>>>>> >>>>>> Regards, >>>>>> Nithya >>>>>> >>>>>> On Mon, 4 Mar 2019 at 15:25, Mauro Tridici > wrote: >>>>>> Hi All, >>>>>> >>>>>> some minutes ago I received this message from NAGIOS server >>>>>> >>>>>> ***** Nagios ***** >>>>>> >>>>>> Notification Type: PROBLEM >>>>>> >>>>>> Service: Brick - /gluster/mnt2/brick >>>>>> Host: s06 >>>>>> Address: s06-stg >>>>>> State: CRITICAL >>>>>> >>>>>> Date/Time: Mon Mar 4 10:25:33 CET 2019 >>>>>> >>>>>> Additional Info: >>>>>> CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. >>>>>> >>>>>> I checked the network, RAM and CPUs usage on s06 node and everything seems to be ok. >>>>>> No bricks are in error state. In /var/log/messages, I detected again a crash of ?check_vol_utili? that I think it is a module used by NRPE executable (that is the NAGIOS client). >>>>>> >>>>>> Mar 4 10:15:29 s06 kernel: traps: check_vol_utili[161224] general protection ip:7facffa0a66d sp:7ffe9f4e6fc0 error:0 in libglusterfs.so.0.0.1[7facff9b7000+f7000] >>>>>> Mar 4 10:15:29 s06 abrt-hook-ccpp: Process 161224 (python2.7) of user 0 killed by SIGSEGV - dumping core >>>>>> Mar 4 10:15:29 s06 abrt-server: Generating core_backtrace >>>>>> Mar 4 10:15:29 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>>> Mar 4 10:16:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 4 10:16:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 4 10:16:01 s06 systemd: Started Session 201010 of user root. >>>>>> Mar 4 10:16:01 s06 systemd: Starting Session 201010 of user root. >>>>>> Mar 4 10:16:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 4 10:16:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 4 10:16:24 s06 abrt-server: Duplicate: UUID >>>>>> Mar 4 10:16:24 s06 abrt-server: DUP_OF_DIR: /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>>>> Mar 4 10:16:24 s06 abrt-server: Deleting problem directory ccpp-2019-03-04-10:15:29-161224 (dup of ccpp-2018-09-25-12:27:42-13041) >>>>>> Mar 4 10:16:24 s06 abrt-server: Generating core_backtrace >>>>>> Mar 4 10:16:24 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>>> Mar 4 10:16:24 s06 abrt-server: Cannot notify '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event 'report_uReport' exited with 1 >>>>>> Mar 4 10:16:24 s06 abrt-hook-ccpp: Process 161391 (python2.7) of user 0 killed by SIGABRT - dumping core >>>>>> Mar 4 10:16:25 s06 abrt-server: Generating core_backtrace >>>>>> Mar 4 10:16:25 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>>> Mar 4 10:17:01 s06 systemd: Created slice User Slice of root. >>>>>> >>>>>> Also, I noticed the following errors that I think are very critical: >>>>>> >>>>>> Mar 4 10:21:12 s06 glustershd[20355]: [2019-03-04 09:21:12.954798] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server 192.168.0.55:49158 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 4 10:22:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 4 10:22:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 4 10:22:01 s06 systemd: Started Session 201017 of user root. >>>>>> Mar 4 10:22:01 s06 systemd: Starting Session 201017 of user root. >>>>>> Mar 4 10:22:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 4 10:22:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 4 10:22:03 s06 glustershd[20355]: [2019-03-04 09:22:03.964120] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server 192.168.0.54:49165 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 4 10:23:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 4 10:23:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 4 10:23:01 s06 systemd: Started Session 201018 of user root. >>>>>> Mar 4 10:23:01 s06 systemd: Starting Session 201018 of user root. >>>>>> Mar 4 10:23:02 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 4 10:23:02 s06 systemd: Stopping User Slice of root. >>>>>> Mar 4 10:24:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 4 10:24:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 4 10:24:01 s06 systemd: Started Session 201019 of user root. >>>>>> Mar 4 10:24:01 s06 systemd: Starting Session 201019 of user root. >>>>>> Mar 4 10:24:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 4 10:24:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 4 10:24:03 s06 glustershd[20355]: [2019-03-04 09:24:03.982502] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-16: server 192.168.0.52:49158 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746109] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-3: server 192.168.0.51:49153 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746215] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server 192.168.0.52:49156 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746260] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-21: server 192.168.0.51:49159 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746296] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server 192.168.0.52:49161 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746413] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server 192.168.0.54:49165 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 4 10:24:07 s06 glustershd[20355]: [2019-03-04 09:24:07.982952] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-45: server 192.168.0.54:49155 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 4 10:24:18 s06 glustershd[20355]: [2019-03-04 09:24:18.990929] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server 192.168.0.52:49161 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 4 10:24:31 s06 glustershd[20355]: [2019-03-04 09:24:31.995781] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-20: server 192.168.0.53:49159 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 4 10:25:01 s06 systemd: Created slice User Slice of root. >>>>>> Mar 4 10:25:01 s06 systemd: Starting User Slice of root. >>>>>> Mar 4 10:25:01 s06 systemd: Started Session 201020 of user root. >>>>>> Mar 4 10:25:01 s06 systemd: Starting Session 201020 of user root. >>>>>> Mar 4 10:25:01 s06 systemd: Removed slice User Slice of root. >>>>>> Mar 4 10:25:01 s06 systemd: Stopping User Slice of root. >>>>>> Mar 4 10:25:57 s06 systemd: Created slice User Slice of root. >>>>>> Mar 4 10:25:57 s06 systemd: Starting User Slice of root. >>>>>> Mar 4 10:25:57 s06 systemd-logind: New session 201021 of user root. >>>>>> Mar 4 10:25:57 s06 systemd: Started Session 201021 of user root. >>>>>> Mar 4 10:25:57 s06 systemd: Starting Session 201021 of user root. >>>>>> Mar 4 10:26:01 s06 systemd: Started Session 201022 of user root. >>>>>> Mar 4 10:26:01 s06 systemd: Starting Session 201022 of user root. >>>>>> Mar 4 10:26:21 s06 nrpe[162388]: Error: Could not complete SSL handshake with 192.168.1.56 : 5 >>>>>> Mar 4 10:27:01 s06 systemd: Started Session 201023 of user root. >>>>>> Mar 4 10:27:01 s06 systemd: Starting Session 201023 of user root. >>>>>> Mar 4 10:28:01 s06 systemd: Started Session 201024 of user root. >>>>>> Mar 4 10:28:01 s06 systemd: Starting Session 201024 of user root. >>>>>> Mar 4 10:29:01 s06 systemd: Started Session 201025 of user root. >>>>>> Mar 4 10:29:01 s06 systemd: Starting Session 201025 of user root. >>>>>> >>>>>> But, unfortunately, I don?t understand why it is happening. >>>>>> Now, NAGIOS server shows that s06 status is ok: >>>>>> >>>>>> ***** Nagios ***** >>>>>> >>>>>> Notification Type: RECOVERY >>>>>> >>>>>> Service: Brick - /gluster/mnt2/brick >>>>>> Host: s06 >>>>>> Address: s06-stg >>>>>> State: OK >>>>>> >>>>>> Date/Time: Mon Mar 4 10:35:23 CET 2019 >>>>>> >>>>>> Additional Info: >>>>>> OK: Brick /gluster/mnt2/brick is up >>>>>> >>>>>> Nothing is changed from RAM, CPUs, and NETWORK point of view. >>>>>> /var/log/message file has been updated: >>>>>> >>>>>> Mar 4 10:32:01 s06 systemd: Starting Session 201029 of user root. >>>>>> Mar 4 10:32:30 s06 glustershd[20355]: [2019-03-04 09:32:30.069082] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server 192.168.0.52:49156 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 4 10:32:55 s06 glustershd[20355]: [2019-03-04 09:32:55.074689] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-66: server 192.168.0.54:49167 has not responded in the last 42 seconds, disconnecting. >>>>>> Mar 4 10:33:01 s06 systemd: Started Session 201030 of user root. >>>>>> Mar 4 10:33:01 s06 systemd: Starting Session 201030 of user root. >>>>>> Mar 4 10:34:01 s06 systemd: Started Session 201031 of user root. >>>>>> Mar 4 10:34:01 s06 systemd: Starting Session 201031 of user root. >>>>>> Mar 4 10:35:01 s06 nrpe[162562]: Could not read request from client 192.168.1.56, bailing out... >>>>>> Mar 4 10:35:01 s06 nrpe[162562]: INFO: SSL Socket Shutdown. >>>>>> Mar 4 10:35:01 s06 systemd: Started Session 201032 of user root. >>>>>> Mar 4 10:35:01 s06 systemd: Starting Session 201032 of user root. >>>>>> >>>>>> Could you please help me to understand what it?s happening ? >>>>>> Thank you in advance. >>>>>> >>>>>> Rergards, >>>>>> Mauro >>>>>> >>>>>> >>>>>>> On 1 Mar 2019, at 12:17, Mauro Tridici > wrote: >>>>>>> >>>>>>> >>>>>>> Thank you, Milind. >>>>>>> I executed the instructions you suggested: >>>>>>> >>>>>>> - grep ?blocked for? /var/log/messages on s06 returns no output (no ?blocked? word is detected in messages file); >>>>>>> - in /var/log/messages file I can see this kind of error repeated for a lot of times: >>>>>>> >>>>>>> Mar 1 08:43:01 s06 systemd: Starting Session 196071 of user root. >>>>>>> Mar 1 08:43:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 1 08:43:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 1 08:43:02 s06 kernel: traps: check_vol_utili[57091] general protection ip:7f88e76ee66d sp:7ffe5a5bcc30 error:0 in libglusterfs.so.0.0.1[7f88e769b000+f7000] >>>>>>> Mar 1 08:43:02 s06 abrt-hook-ccpp: Process 57091 (python2.7) of user 0 killed by SIGSEGV - dumping core >>>>>>> Mar 1 08:43:02 s06 abrt-server: Generating core_backtrace >>>>>>> Mar 1 08:43:02 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>>>> Mar 1 08:43:58 s06 abrt-server: Duplicate: UUID >>>>>>> Mar 1 08:43:58 s06 abrt-server: DUP_OF_DIR: /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>>>>> Mar 1 08:43:58 s06 abrt-server: Deleting problem directory ccpp-2019-03-01-08:43:02-57091 (dup of ccpp-2018-09-25-12:27:42-13041) >>>>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Activating service name='org.freedesktop.problems' (using servicehelper) >>>>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Successfully activated service 'org.freedesktop.problems' >>>>>>> Mar 1 08:43:58 s06 abrt-server: Generating core_backtrace >>>>>>> Mar 1 08:43:58 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>>>> Mar 1 08:43:58 s06 abrt-server: Cannot notify '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event 'report_uReport' exited with 1 >>>>>>> Mar 1 08:44:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 1 08:44:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 1 08:44:01 s06 systemd: Started Session 196072 of user root. >>>>>>> Mar 1 08:44:01 s06 systemd: Starting Session 196072 of user root. >>>>>>> Mar 1 08:44:01 s06 systemd: Removed slice User Slice of root. >>>>>>> >>>>>>> - in /var/log/messages file I can see also 4 errors related to other cluster servers: >>>>>>> >>>>>>> Mar 1 11:05:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 1 11:05:01 s06 systemd: Started Session 196230 of user root. >>>>>>> Mar 1 11:05:01 s06 systemd: Starting Session 196230 of user root. >>>>>>> Mar 1 11:05:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 1 11:05:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 1 11:05:59 s06 glustershd[70117]: [2019-03-01 10:05:59.347094] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-33: server 192.168.0.51:49163 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 1 11:06:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 1 11:06:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 1 11:06:01 s06 systemd: Started Session 196231 of user root. >>>>>>> Mar 1 11:06:01 s06 systemd: Starting Session 196231 of user root. >>>>>>> Mar 1 11:06:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 1 11:06:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 1 11:06:12 s06 glustershd[70117]: [2019-03-01 10:06:12.351319] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-1: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 1 11:06:38 s06 glustershd[70117]: [2019-03-01 10:06:38.356920] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-7: server 192.168.0.52:49155 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 1 11:07:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 1 11:07:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 1 11:07:01 s06 systemd: Started Session 196232 of user root. >>>>>>> Mar 1 11:07:01 s06 systemd: Starting Session 196232 of user root. >>>>>>> Mar 1 11:07:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 1 11:07:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 1 11:07:36 s06 glustershd[70117]: [2019-03-01 10:07:36.366259] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-0: server 192.168.0.51:49152 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 1 11:08:01 s06 systemd: Created slice User Slice of root. >>>>>>> >>>>>>> No ?blocked? word is in /var/log/messages files on other cluster servers. >>>>>>> In attachment, the /var/log/messages file from s06 server. >>>>>>> >>>>>>> Thank you in advance, >>>>>>> Mauro >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On 1 Mar 2019, at 11:47, Milind Changire > wrote: >>>>>>>> >>>>>>>> The traces of very high disk activity on the servers are often found in /var/log/messages >>>>>>>> You might want to grep for "blocked for" in /var/log/messages on s06 and correlate the timestamps to confirm the unresponsiveness as reported in gluster client logs. >>>>>>>> In cases of high disk activity, although the operating system continues to respond to ICMP pings, the processes writing to disks often get blocked to a large flush to the disk which could span beyond 42 seconds and hence result in ping-timer-expiry logs. >>>>>>>> >>>>>>>> As a side note: >>>>>>>> If you indeed find gluster processes being blocked in /var/log/messages, you might want to tweak sysctl tunables called vm.dirty_background_ratio or vm.dirty_background_bytes to a smaller value than the existing. Please read up more on those tunables before touching the settings. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Mar 1, 2019 at 4:06 PM Mauro Tridici > wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> in attachment the client log captured after changing network.ping-timeout option. >>>>>>>> I noticed this error involving server 192.168.0.56 (s06) >>>>>>>> >>>>>>>> [2019-03-01 09:23:36.077287] I [rpc-clnt.c:1962:rpc_clnt_reconfig] 0-tier2-client-71: changing ping timeout to 42 (from 0) >>>>>>>> [2019-03-01 09:23:36.078213] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>>>> [2019-03-01 09:23:36.078432] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>>>> [2019-03-01 09:23:36.092357] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>>>> [2019-03-01 09:23:36.094146] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>>>> [2019-03-01 10:06:24.708082] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-50: server 192.168.0.56:49156 has not responded in the last 42 seconds, disconnecting. >>>>>>>> >>>>>>>> I don?t know why it happens, s06 server seems to be reachable. >>>>>>>> >>>>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>>>> Trying 192.168.0.56... >>>>>>>> Connected to 192.168.0.56. >>>>>>>> Escape character is '^]'. >>>>>>>> ^CConnection closed by foreign host. >>>>>>>> [athena_login2][/users/home/sysm02/]> ping 192.168.0.56 >>>>>>>> PING 192.168.0.56 (192.168.0.56) 56(84) bytes of data. >>>>>>>> 64 bytes from 192.168.0.56 : icmp_seq=1 ttl=64 time=0.116 ms >>>>>>>> 64 bytes from 192.168.0.56 : icmp_seq=2 ttl=64 time=0.101 ms >>>>>>>> >>>>>>>> --- 192.168.0.56 ping statistics --- >>>>>>>> 2 packets transmitted, 2 received, 0% packet loss, time 1528ms >>>>>>>> rtt min/avg/max/mdev = 0.101/0.108/0.116/0.012 ms >>>>>>>> >>>>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>>>> Trying 192.168.0.56... >>>>>>>> Connected to 192.168.0.56. >>>>>>>> Escape character is '^]'. >>>>>>>> >>>>>>>> Thank you for your help, >>>>>>>> Mauro >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On 1 Mar 2019, at 10:29, Mauro Tridici > wrote: >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> thank you for the explanation. >>>>>>>>> I just changed network.ping-timeout option to default value (network.ping-timeout=42). >>>>>>>>> >>>>>>>>> I will check the logs to see if the errors will appear again. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Mauro >>>>>>>>> >>>>>>>>>> On 1 Mar 2019, at 04:43, Milind Changire > wrote: >>>>>>>>>> >>>>>>>>>> network.ping-timeout should not be set to zero for non-glusterd clients. >>>>>>>>>> glusterd is a special case for which ping-timeout is set to zero via /etc/glusterfs/glusterd.vol >>>>>>>>>> >>>>>>>>>> Setting network.ping-timeout to zero disables arming of the ping timer for connections. This disables testing the connection for responsiveness and hence avoids proactive fail-over. >>>>>>>>>> >>>>>>>>>> Please reset network.ping-timeout to a non-zero positive value, eg. 42 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Feb 28, 2019 at 5:07 PM Nithya Balachandran > wrote: >>>>>>>>>> Adding Raghavendra and Milind to comment on this. >>>>>>>>>> >>>>>>>>>> What is the effect of setting network.ping-timeout to 0 and should it be set back to 42? >>>>>>>>>> Regards, >>>>>>>>>> Nithya >>>>>>>>>> >>>>>>>>>> On Thu, 28 Feb 2019 at 16:01, Mauro Tridici > wrote: >>>>>>>>>> Hi Nithya, >>>>>>>>>> >>>>>>>>>> sorry for the late. >>>>>>>>>> network.ping-timeout has been set to 0 in order to try to solve some timeout problems, but it didn?t help. >>>>>>>>>> I can set it to the default value. >>>>>>>>>> >>>>>>>>>> Can I proceed with the change? >>>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>> Mauro >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 28 Feb 2019, at 04:41, Nithya Balachandran > wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Mauro, >>>>>>>>>>> >>>>>>>>>>> Is network.ping-timeout still set to 0. The default value is 42. Is there a particular reason why this was changed? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Nithya >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, 27 Feb 2019 at 21:32, Mauro Tridici > wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Xavi, >>>>>>>>>>> >>>>>>>>>>> thank you for the detailed explanation and suggestions. >>>>>>>>>>> Yes, transport.listen-backlog option is still set to 1024. >>>>>>>>>>> >>>>>>>>>>> I will check the network and connectivity status using ?ping? and ?telnet? as soon as the errors will come back again. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Mauro >>>>>>>>>>> >>>>>>>>>>>> Il giorno 27 feb 2019, alle ore 16:42, Xavi Hernandez > ha scritto: >>>>>>>>>>>> >>>>>>>>>>>> Hi Mauro, >>>>>>>>>>>> >>>>>>>>>>>> those errors say that the mount point is not connected to some of the bricks while executing operations. I see references to 3rd and 6th bricks of several disperse sets, which seem to map to server s06. For some reason, gluster is having troubles connecting from the client machine to that particular server. At the end of the log I see that after long time a reconnect is done to both of them. However little after, other bricks from the s05 get disconnected and a reconnect times out. >>>>>>>>>>>> >>>>>>>>>>>> That's really odd. It seems like if server/communication is cut to s06 for some time, then restored, and then the same happens to the next server. >>>>>>>>>>>> >>>>>>>>>>>> If the servers are really online and it's only a communication issue, it explains why server memory and network has increased: if the problem only exists between the client and servers, any write made by the client will automatically mark the file as damaged, since some of the servers have not been updated. Since self-heal runs from the server nodes, they will probably be correctly connected to all bricks, which allows them to heal the just damaged file, which increases memory and network usage. >>>>>>>>>>>> >>>>>>>>>>>> I guess you still have transport.listen-backlog set to 1024, right ? >>>>>>>>>>>> >>>>>>>>>>>> Just to try to identify if the problem really comes from network, can you check if you lose some pings from the client to all of the servers while you are seeing those errors in the log file ? >>>>>>>>>>>> >>>>>>>>>>>> You can also check if during those errors, you can telnet to the port of the brick from the client. >>>>>>>>>>>> >>>>>>>>>>>> Xavi >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Feb 26, 2019 at 10:17 AM Mauro Tridici > wrote: >>>>>>>>>>>> Hi Nithya, >>>>>>>>>>>> >>>>>>>>>>>> ?df -h? operation is not still slow, but no users are using the volume, RAM and NETWORK usage is ok on the client node. >>>>>>>>>>>> >>>>>>>>>>>> I was worried about this kind of warnings/errors: >>>>>>>>>>>> >>>>>>>>>>>> [2019-02-25 10:59:00.664323] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-6: Executing operation with some subvolumes unavailable (20) >>>>>>>>>>>> >>>>>>>>>>>> [2019-02-26 03:11:35.212603] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-26 03:10:56.549903 (xid=0x106f1c5) >>>>>>>>>>>> >>>>>>>>>>>> [2019-02-26 03:13:03.313831] E [socket.c:2376:socket_connect_finish] 0-tier2-client-50: connection to 192.168.0.56:49156 failed (Timeout della connessione); disconnecting socket >>>>>>>>>>>> >>>>>>>>>>>> It seems that some subvolumes are not available and 192.168.0.56 server (s06) is not reachable. >>>>>>>>>>>> But gluster servers are up&running and bricks are ok. >>>>>>>>>>>> >>>>>>>>>>>> In attachment the updated tier2.log file. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thank you. >>>>>>>>>>>> Regards, >>>>>>>>>>>> Mauro >>>>>>>>>>>> >>>>>>>>>>>>> Il giorno 26 feb 2019, alle ore 04:03, Nithya Balachandran > ha scritto: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I see a lot of EC messages in the log but they don't seem very serious. Xavi, can you take a look? >>>>>>>>>>>>> >>>>>>>>>>>>> The only errors I see are: >>>>>>>>>>>>> [2019-02-25 10:58:45.519871] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-25 10:57:47.429969 (xid=0xd26fe7) >>>>>>>>>>>>> [2019-02-25 10:58:51.461493] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-41: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-25 10:57:47.499174 (xid=0xf47d6a) >>>>>>>>>>>>> [2019-02-25 11:07:57.152874] E [socket.c:2376:socket_connect_finish] 0-tier2-client-70: connection to 192.168.0.55:49163 failed (Timeout della connessione); disconnecting socket >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Is the df -h operation still slow? If yes, can you take a tcpdump of the client while running df -h and send that across? >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Nithya >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, 25 Feb 2019 at 17:27, Mauro Tridici > wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Sorry, some minutes after my last mail message, I noticed that ?df -h? command hanged for a while before returns the prompt. >>>>>>>>>>>>> Yesterday, everything was ok in the gluster client log, but, today, I see a lot of errors (please, take a look to the attached file). >>>>>>>>>>>>> >>>>>>>>>>>>> On the client node, I detected an important RAM e NETWORK usage. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Do you think that the errors have been caused by the client resources usage? >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you in advance, >>>>>>>>>>>>> Mauro >>>>>>>>>>>>> >>>>>> >>>> >>>> >>>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: top_bHd3.txt.gz Type: application/x-gzip Size: 136112 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Thu Mar 21 10:43:26 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Thu, 21 Mar 2019 16:13:26 +0530 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> Message-ID: Can you attach the "glustershd.log" file which will be present under "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m . -e hex " output of all the entries listed in the heal info output from both the bricks? On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic wrote: > Thanks Karthik! > > I was trying to find some resolution methods from [2] but unfortunately > none worked (I can explain what I tried if needed). > > I guess the volume you are talking about is of type replica-2 (1x2). > > That?s correct, aware of the arbiter solution but still didn?t took time > to implement. > > From the info results I posted, how to know in which situation I am. No > files are mentioned in spit brain, only directories. One brick has 3 > entries and one two entries. > > sudo gluster volume heal storage2 info > [sudo] password for sshadmin: > Brick storage3:/data/data-cluster > > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 3 > > Brick storage4:/data/data-cluster > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 2 > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email: cuculovic at mdpi.com > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message > are confidential and intended solely for the use of the individual or > entity to whom they are addressed. If you have received this message in > error, please notify me and delete this message from your system. You may > not copy this message in its entirety or in part, or disclose its contents > to anyone. > > On 21 Mar 2019, at 10:27, Karthik Subrahmanya wrote: > > Hi, > > Note: I guess the volume you are talking about is of type replica-2 (1x2). > Usually replica 2 volumes are prone to split-brain. If you can consider > converting them to arbiter or replica-3, they will handle most of the cases > which can lead to slit-brains. For more information see [1]. > > Resolving the split-brain: [2] talks about how to interpret the heal info > output and different ways to resolve them using the CLI/manually/using the > favorite-child-policy. > If you are having entry split brain, and is a gfid split-brain (file/dir > having different gfids on the replica bricks) then you can use the CLI > option to resolve them. If a directory is in gfid split-brain in a > distributed-replicate volume and you are using the source-brick option > please make sure you use the brick of this subvolume, which has the same > gfid as that of the other distribute subvolume(s) where you have the > correct gfid, as the source. > If you are having a type mismatch then follow the steps in [3] to resolve > the split-brain. > > [1] > https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ > [2] > https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ > [3] > https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain > > HTH, > Karthik > > On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic > wrote: > >> I was now able to catch the split brain log: >> >> sudo gluster volume heal storage2 info >> Brick storage3:/data/data-cluster >> >> >> /dms/final_archive - Is in split-brain >> >> Status: Connected >> Number of entries: 3 >> >> Brick storage4:/data/data-cluster >> >> /dms/final_archive - Is in split-brain >> >> Status: Connected >> Number of entries: 2 >> >> Milos >> >> On 21 Mar 2019, at 09:07, Milos Cuculovic wrote: >> >> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal >> shows this: >> >> sudo gluster volume heal storage2 info >> Brick storage3:/data/data-cluster >> >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 3 >> >> Brick storage4:/data/data-cluster >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 2 >> >> The same files stay there. From time to time the status of the >> /dms/final_archive is in split brain at the following command shows: >> >> sudo gluster volume heal storage2 info split-brain >> Brick storage3:/data/data-cluster >> /dms/final_archive >> Status: Connected >> Number of entries in split-brain: 1 >> >> Brick storage4:/data/data-cluster >> /dms/final_archive >> Status: Connected >> Number of entries in split-brain: 1 >> >> How to know the file who is in split brain? The files in >> /dms/final_archive are not very important, fine to remove (ideally resolve >> the split brain) for the ones that differ. >> >> I can only see the directory and GFID. Any idea on how to resolve this >> situation as I would like to continue with the upgrade on the 2nd server, >> and for this the heal needs to be done with 0 entries in sudo gluster >> volume heal storage2 info >> >> Thank you in advance, Milos. >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Thu Mar 21 10:48:28 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Thu, 21 Mar 2019 16:18:28 +0530 Subject: [Gluster-users] "rpc_clnt_ping_timer_expired" errors In-Reply-To: <93130243-E356-4425-8F15-69BE61562E2F@cmcc.it> References: <96B07283-D8AB-4F06-909D-E00424625528@cmcc.it> <42758A0E-8BE9-497D-BDE3-55D7DC633BC7@cmcc.it> <6A8CF4A4-98EA-48C3-A059-D60D1B2721C7@cmcc.it> <2CF49168-9C1B-4931-8C34-8157262A137A@cmcc.it> <7A151CC9-A0AE-4A45-B450-A4063D216D9E@cmcc.it> <32D53ECE-3F49-4415-A6EE-241B351AC2BA@cmcc.it> <4685A75B-5978-4338-9C9F-4A02FB40B9BC@cmcc.it> <4D2E6B04-C2E8-4FD5-B72D-E7C05931C6F9@cmcc.it> <4E332A56-B318-4BC1-9F44-84AB4392A5DE@cmcc.it> <832FD362-3B14-40D8-8530-604419300476@cmcc.it> <8D926643-1093-48ED-823F-D8F117F702CF@cmcc.it> <9D0BE438-DF11-4D0A-AB85-B44357032F29@cmcc.it> <5F0AC378-8170-4342-8473-9C17159CAC1D@cmcc.it> <7A50E86D-9E27-4EA7-883B-45E9F973991A@cmcc.it> <58B5DB7F-DCB4-4FBF-B1F7-681315B1613A@cmcc.it> <6327B44F-4E7E-46BB-A74C-70F4457DD1EB@cmcc.it> <0167DF4A-8329-4A1A-B439-857DFA6F78BB@cmcc.it> <763F334E-35B8-4729-B8E1-D30866754EEE@cmcc.it> <91DFC9AC-4805-41EB-AC6F-5722E018DD6E@cmcc.it> <8A9752B8-B231-4570-8FF4-8D3D781E7D42@cmcc.it> <47A24804-F975-4EE6-9FA5-67FCDA18D707@cmcc.it> <637FF5D2-D1F4-4686-9D48-646A96F67B96@cmcc.it> <4A87495F-3755-48F7-8507-085869069C64@cmcc.it> <3854BBF6-5B98-4AB3-A67E-E7DE59E69A63@cmcc.it> <313DA021-9173-4899-96B0-831B10B00B61@cmcc.it> <17996AFD-DFC8-40F3-9D09-DB6DDAD5B7D6@cmcc.it> <7074B5D8-955A-4802-A9F3-606C99734417@cmcc.it> <83B84BF9-8334-4230-B6F8-0BC4BFBEFF15@cmcc.it> <133B9AE4-9792-4F72-AD91-D36E7B9EC711@cmcc.it> <6611C4B0-57FD-4390-88B5-BD373555D4C5@cmcc.it> <93130243-E356-4425-8F15-69BE61562E2F@cmcc.it> Message-ID: On Thu, Mar 21, 2019 at 4:10 PM Mauro Tridici wrote: > Hi Raghavendra, > > the number of errors reduced, but during last days I received some error > notifications from Nagios server similar to the following one: > > > > > > > > > > > > > > > ****** Nagios *****Notification Type: PROBLEMService: Brick - > /gluster/mnt5/brickHost: s04Address: s04-stgState: CRITICALDate/Time: Mon > Mar 18 19:56:36 CET 2019Additional Info:CHECK_NRPE STATE CRITICAL: Socket > timeout after 10 seconds.* > > The error was related only to s04 gluster server. > > So, following your suggestions, I executed, on s04 node, the top command. > In attachment, you can find the related output. > top output doesn't contain cmd/thread names. Was there anything wrong. > Thank you very much for your help. > Regards, > Mauro > > > > On 14 Mar 2019, at 13:31, Raghavendra Gowdappa > wrote: > > Thanks Mauro. > > On Thu, Mar 14, 2019 at 3:38 PM Mauro Tridici > wrote: > >> Hi Raghavendra, >> >> I just changed the client option value to 8. >> I will check the volume behaviour during the next hours. >> >> The GlusterFS version is 3.12.14. >> >> I will provide you the logs as soon as the activity load will be high. >> Thank you, >> Mauro >> >> On 14 Mar 2019, at 04:57, Raghavendra Gowdappa >> wrote: >> >> >> >> On Wed, Mar 13, 2019 at 3:55 PM Mauro Tridici >> wrote: >> >>> Hi Raghavendra, >>> >>> Yes, server.event-thread has been changed from 4 to 8. >>> >> >> Was client.event-thread value too changed to 8? If not, I would like to >> know the results of including this tuning too. Also, if possible, can you >> get the output of following command from problematic clients and bricks >> (during the duration when load tends to be high and ping-timer-expiry is >> seen)? >> >> # top -bHd 3 >> >> This will help us to know CPU utilization of event-threads. >> >> And I forgot to ask, what version of Glusterfs are you using? >> >> During last days, I noticed that the error events are still here although >>> they have been considerably reduced. >>> >>> So, I used grep command against the log files in order to provide you a >>> global vision about the warning, error and critical events appeared today >>> at 06:xx (may be useful I hope). >>> I collected the info from s06 gluster server, but the behaviour is the >>> the almost the same on the other gluster servers. >>> >>> *ERRORS: * >>> *CWD: /var/log/glusterfs * >>> *COMMAND: grep " E " *.log |grep "2019-03-13 06:"* >>> >>> (I can see a lot of this kind of message in the same period but I'm >>> notifying you only one record for each type of error) >>> >>> glusterd.log:[2019-03-13 06:12:35.982863] E [MSGID: 101042] >>> [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of >>> /var/run/gluster/tier2_quota_list/ >>> >>> glustershd.log:[2019-03-13 06:14:28.666562] E >>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a71ddcebb] (--> >>> /lib64/libgfr >>> pc.so.0(saved_frames_unwind+0x1de)[0x7f4a71ba1d9e] (--> >>> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4a71ba1ebe] (--> >>> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup >>> +0x90)[0x7f4a71ba3640] (--> >>> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f4a71ba4130] ))))) >>> 0-tier2-client-55: forced unwinding frame type(GlusterFS 3.3) >>> op(INODELK(29)) >>> called at 2019-03-13 06:14:14.858441 (xid=0x17fddb50) >>> >>> glustershd.log:[2019-03-13 06:17:48.883825] E >>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to >>> 192.168.0.55:49158 failed (Connection timed out); disco >>> nnecting socket >>> glustershd.log:[2019-03-13 06:19:58.931798] E >>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to >>> 192.168.0.55:49158 failed (Connection timed out); disco >>> nnecting socket >>> glustershd.log:[2019-03-13 06:22:08.979829] E >>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to >>> 192.168.0.55:49158 failed (Connection timed out); disco >>> nnecting socket >>> glustershd.log:[2019-03-13 06:22:36.226847] E [MSGID: 114031] >>> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote >>> operation failed [Transport endpoint >>> is not connected] >>> glustershd.log:[2019-03-13 06:22:36.306669] E [MSGID: 114031] >>> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote >>> operation failed [Transport endpoint >>> is not connected] >>> glustershd.log:[2019-03-13 06:22:36.385257] E [MSGID: 114031] >>> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote >>> operation failed [Transport endpoint >>> is not connected] >>> >>> *WARNINGS:* >>> *CWD: /var/log/glusterfs * >>> *COMMAND: grep " W " *.log |grep "2019-03-13 06:"* >>> >>> (I can see a lot of this kind of message in the same period but I'm >>> notifying you only one record for each type of warnings) >>> >>> glustershd.log:[2019-03-13 06:14:28.666772] W [MSGID: 114031] >>> [client-rpc-fops.c:1080:client3_3_getxattr_cbk] 0-tier2-client-55: remote >>> operation failed. Path: >> 0f-f34d-4c25-bbe8-74bde0248d7e> (b6b35d0f-f34d-4c25-bbe8-74bde0248d7e). >>> Key: (null) [Transport endpoint is not connected] >>> >>> glustershd.log:[2019-03-13 06:14:31.421576] W [MSGID: 122035] >>> [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation >>> with some subvolumes unavailable (2) >>> >>> glustershd.log:[2019-03-13 06:15:31.547417] W [MSGID: 122032] >>> [ec-heald.c:266:ec_shd_index_sweep] 0-tier2-disperse-9: unable to get >>> index-dir on tier2-client-55 [Operation >>> now in progress] >>> >>> quota-mount-tier2.log:[2019-03-13 06:12:36.116277] W [MSGID: 101002] >>> [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is >>> deprecated, preferred is 'trans >>> port.address-family', continuing with correction >>> quota-mount-tier2.log:[2019-03-13 06:12:36.198430] W [MSGID: 101174] >>> [graph.c:363:_log_if_unknown_option] 0-tier2-readdir-ahead: option >>> 'parallel-readdir' is not recognized >>> quota-mount-tier2.log:[2019-03-13 06:12:37.945007] W >>> [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) >>> [0x7f340892be25] -->/usr/sbin/glusterfs(gluste >>> rfs_sigwaiter+0xe5) [0x55ef010164b5] >>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55ef0101632b] ) 0-: >>> received signum (15), shutting down >>> >>> *CRITICALS:* >>> *CWD: /var/log/glusterfs * >>> *COMMAND: grep " C " *.log |grep "2019-03-13 06:"* >>> >>> no critical errors at 06:xx >>> only one critical error during the day >>> >>> *[root at s06 glusterfs]# grep " C " *.log |grep "2019-03-13"* >>> glustershd.log:[2019-03-13 02:21:29.126279] C >>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server >>> 192.168.0.55:49158 has not responded in the last 42 seconds, >>> disconnecting. >>> >>> >>> Thank you very much for your help. >>> Regards, >>> Mauro >>> >>> On 12 Mar 2019, at 05:17, Raghavendra Gowdappa >>> wrote: >>> >>> Was the suggestion to increase server.event-thread values tried? If yes, >>> what were the results? >>> >>> On Mon, Mar 11, 2019 at 2:40 PM Mauro Tridici >>> wrote: >>> >>>> Dear All, >>>> >>>> do you have any suggestions about the right way to "debug" this issue? >>>> In attachment, the updated logs of ?s06" gluster server. >>>> >>>> I noticed a lot of intermittent warning and error messages. >>>> >>>> Thank you in advance, >>>> Mauro >>>> >>>> >>>> >>>> On 4 Mar 2019, at 18:45, Raghavendra Gowdappa >>>> wrote: >>>> >>>> >>>> +Gluster Devel , +Gluster-users >>>> >>>> >>>> I would like to point out another issue. Even if what I suggested >>>> prevents disconnects, part of the solution would be only symptomatic >>>> treatment and doesn't address the root cause of the problem. In most of the >>>> ping-timer-expiry issues, the root cause is the increased load on bricks >>>> and the inability of bricks to be responsive under high load. So, the >>>> actual solution would be doing any or both of the following: >>>> * identify the source of increased load and if possible throttle it. >>>> Internal heal processes like self-heal, rebalance, quota heal are known to >>>> pump traffic into bricks without much throttling (io-threads _might_ do >>>> some throttling, but my understanding is its not sufficient). >>>> * identify the reason for bricks to become unresponsive during load. >>>> This may be fixable issues like not enough event-threads to read from >>>> network or difficult to fix issues like fsync on backend fs freezing the >>>> process or semi fixable issues (in code) like lock contention. >>>> >>>> So any genuine effort to fix ping-timer-issues (to be honest most of >>>> the times they are not issues related to rpc/network) would involve >>>> performance characterization of various subsystems on bricks and clients. >>>> Various subsystems can include (but not necessarily limited to), underlying >>>> OS/filesystem, glusterfs processes, CPU consumption etc >>>> >>>> regards, >>>> Raghavendra >>>> >>>> On Mon, Mar 4, 2019 at 9:31 PM Mauro Tridici >>>> wrote: >>>> >>>>> Thank you, let?s try! >>>>> I will inform you about the effects of the change. >>>>> >>>>> Regards, >>>>> Mauro >>>>> >>>>> On 4 Mar 2019, at 16:55, Raghavendra Gowdappa >>>>> wrote: >>>>> >>>>> >>>>> >>>>> On Mon, Mar 4, 2019 at 8:54 PM Mauro Tridici >>>>> wrote: >>>>> >>>>>> Hi Raghavendra, >>>>>> >>>>>> thank you for your reply. >>>>>> Yes, you are right. It is a problem that seems to happen randomly. >>>>>> At this moment, server.event-threads value is 4. I will try to >>>>>> increase this value to 8. Do you think that it could be a valid value ? >>>>>> >>>>> >>>>> Yes. We can try with that. You should see at least frequency of >>>>> ping-timer related disconnects reduce with this value (even if it doesn't >>>>> eliminate the problem completely). >>>>> >>>>> >>>>>> Regards, >>>>>> Mauro >>>>>> >>>>>> >>>>>> On 4 Mar 2019, at 15:36, Raghavendra Gowdappa >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Mar 4, 2019 at 8:01 PM Nithya Balachandran < >>>>>> nbalacha at redhat.com> wrote: >>>>>> >>>>>>> Hi Mauro, >>>>>>> >>>>>>> It looks like some problem on s06. Are all your other nodes ok? Can >>>>>>> you send us the gluster logs from this node? >>>>>>> >>>>>>> @Raghavendra G , do you have any idea as >>>>>>> to how this can be debugged? Maybe running top ? Or debug brick logs? >>>>>>> >>>>>> >>>>>> If we can reproduce the problem, collecting tcpdump on both ends of >>>>>> connection will help. But, one common problem is these bugs are >>>>>> inconsistently reproducible and hence we may not be able to capture tcpdump >>>>>> at correct intervals. Other than that, we can try to collect some evidence >>>>>> that poller threads were busy (waiting on locks). But, not sure what debug >>>>>> data provides that information. >>>>>> >>>>>> From what I know, its difficult to collect evidence for this issue >>>>>> and we could only reason about it. >>>>>> >>>>>> We can try a workaround though - try increasing server.event-threads >>>>>> and see whether ping-timer expiry issues go away with an optimal value. If >>>>>> that's the case, it kind of provides proof for our hypothesis. >>>>>> >>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Nithya >>>>>>> >>>>>>> On Mon, 4 Mar 2019 at 15:25, Mauro Tridici >>>>>>> wrote: >>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> some minutes ago I received this message from NAGIOS server >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ****** Nagios *****Notification Type: PROBLEMService: Brick - >>>>>>>> /gluster/mnt2/brickHost: s06Address: s06-stgState: CRITICALDate/Time: Mon >>>>>>>> Mar 4 10:25:33 CET 2019Additional Info:CHECK_NRPE STATE CRITICAL: Socket >>>>>>>> timeout after 10 seconds.* >>>>>>>> >>>>>>>> I checked the network, RAM and CPUs usage on s06 node and >>>>>>>> everything seems to be ok. >>>>>>>> No bricks are in error state. In /var/log/messages, I detected >>>>>>>> again a crash of ?check_vol_utili? that I think it is a module used by NRPE >>>>>>>> executable (that is the NAGIOS client). >>>>>>>> >>>>>>>> Mar 4 10:15:29 s06 kernel: traps: check_vol_utili[161224] general >>>>>>>> protection ip:7facffa0a66d sp:7ffe9f4e6fc0 error:0 in >>>>>>>> libglusterfs.so.0.0.1[7facff9b7000+f7000] >>>>>>>> Mar 4 10:15:29 s06 abrt-hook-ccpp: Process 161224 (python2.7) of >>>>>>>> user 0 killed by SIGSEGV - dumping core >>>>>>>> Mar 4 10:15:29 s06 abrt-server: Generating core_backtrace >>>>>>>> Mar 4 10:15:29 s06 abrt-server: Error: Unable to open >>>>>>>> './coredump': No such file or directory >>>>>>>> Mar 4 10:16:01 s06 systemd: Created slice User Slice of root. >>>>>>>> Mar 4 10:16:01 s06 systemd: Starting User Slice of root. >>>>>>>> Mar 4 10:16:01 s06 systemd: Started Session 201010 of user root. >>>>>>>> Mar 4 10:16:01 s06 systemd: Starting Session 201010 of user root. >>>>>>>> Mar 4 10:16:01 s06 systemd: Removed slice User Slice of root. >>>>>>>> Mar 4 10:16:01 s06 systemd: Stopping User Slice of root. >>>>>>>> Mar 4 10:16:24 s06 abrt-server: Duplicate: UUID >>>>>>>> Mar 4 10:16:24 s06 abrt-server: DUP_OF_DIR: >>>>>>>> /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>>>>>> Mar 4 10:16:24 s06 abrt-server: Deleting problem directory >>>>>>>> ccpp-2019-03-04-10:15:29-161224 (dup of ccpp-2018-09-25-12:27:42-13041) >>>>>>>> Mar 4 10:16:24 s06 abrt-server: Generating core_backtrace >>>>>>>> Mar 4 10:16:24 s06 abrt-server: Error: Unable to open >>>>>>>> './coredump': No such file or directory >>>>>>>> Mar 4 10:16:24 s06 abrt-server: Cannot notify >>>>>>>> '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event >>>>>>>> 'report_uReport' exited with 1 >>>>>>>> Mar 4 10:16:24 s06 abrt-hook-ccpp: Process 161391 (python2.7) of >>>>>>>> user 0 killed by SIGABRT - dumping core >>>>>>>> Mar 4 10:16:25 s06 abrt-server: Generating core_backtrace >>>>>>>> Mar 4 10:16:25 s06 abrt-server: Error: Unable to open >>>>>>>> './coredump': No such file or directory >>>>>>>> Mar 4 10:17:01 s06 systemd: Created slice User Slice of root. >>>>>>>> >>>>>>>> Also, I noticed the following errors that I think are very critical: >>>>>>>> >>>>>>>> Mar 4 10:21:12 s06 glustershd[20355]: [2019-03-04 09:21:12.954798] >>>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: >>>>>>>> server 192.168.0.55:49158 has not responded in the last 42 >>>>>>>> seconds, disconnecting. >>>>>>>> Mar 4 10:22:01 s06 systemd: Created slice User Slice of root. >>>>>>>> Mar 4 10:22:01 s06 systemd: Starting User Slice of root. >>>>>>>> Mar 4 10:22:01 s06 systemd: Started Session 201017 of user root. >>>>>>>> Mar 4 10:22:01 s06 systemd: Starting Session 201017 of user root. >>>>>>>> Mar 4 10:22:01 s06 systemd: Removed slice User Slice of root. >>>>>>>> Mar 4 10:22:01 s06 systemd: Stopping User Slice of root. >>>>>>>> Mar 4 10:22:03 s06 glustershd[20355]: [2019-03-04 09:22:03.964120] >>>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: >>>>>>>> server 192.168.0.54:49165 has not responded in the last 42 >>>>>>>> seconds, disconnecting. >>>>>>>> Mar 4 10:23:01 s06 systemd: Created slice User Slice of root. >>>>>>>> Mar 4 10:23:01 s06 systemd: Starting User Slice of root. >>>>>>>> Mar 4 10:23:01 s06 systemd: Started Session 201018 of user root. >>>>>>>> Mar 4 10:23:01 s06 systemd: Starting Session 201018 of user root. >>>>>>>> Mar 4 10:23:02 s06 systemd: Removed slice User Slice of root. >>>>>>>> Mar 4 10:23:02 s06 systemd: Stopping User Slice of root. >>>>>>>> Mar 4 10:24:01 s06 systemd: Created slice User Slice of root. >>>>>>>> Mar 4 10:24:01 s06 systemd: Starting User Slice of root. >>>>>>>> Mar 4 10:24:01 s06 systemd: Started Session 201019 of user root. >>>>>>>> Mar 4 10:24:01 s06 systemd: Starting Session 201019 of user root. >>>>>>>> Mar 4 10:24:01 s06 systemd: Removed slice User Slice of root. >>>>>>>> Mar 4 10:24:01 s06 systemd: Stopping User Slice of root. >>>>>>>> Mar 4 10:24:03 s06 glustershd[20355]: [2019-03-04 09:24:03.982502] >>>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-16: >>>>>>>> server 192.168.0.52:49158 has not responded in the last 42 >>>>>>>> seconds, disconnecting. >>>>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746109] C >>>>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-3: server >>>>>>>> 192.168.0.51:49153 has not responded in the last 42 seconds, >>>>>>>> disconnecting. >>>>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746215] C >>>>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server >>>>>>>> 192.168.0.52:49156 has not responded in the last 42 seconds, >>>>>>>> disconnecting. >>>>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746260] C >>>>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-21: server >>>>>>>> 192.168.0.51:49159 has not responded in the last 42 seconds, >>>>>>>> disconnecting. >>>>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746296] C >>>>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server >>>>>>>> 192.168.0.52:49161 has not responded in the last 42 seconds, >>>>>>>> disconnecting. >>>>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746413] C >>>>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server >>>>>>>> 192.168.0.54:49165 has not responded in the last 42 seconds, >>>>>>>> disconnecting. >>>>>>>> Mar 4 10:24:07 s06 glustershd[20355]: [2019-03-04 09:24:07.982952] >>>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-45: >>>>>>>> server 192.168.0.54:49155 has not responded in the last 42 >>>>>>>> seconds, disconnecting. >>>>>>>> Mar 4 10:24:18 s06 glustershd[20355]: [2019-03-04 09:24:18.990929] >>>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: >>>>>>>> server 192.168.0.52:49161 has not responded in the last 42 >>>>>>>> seconds, disconnecting. >>>>>>>> Mar 4 10:24:31 s06 glustershd[20355]: [2019-03-04 09:24:31.995781] >>>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-20: >>>>>>>> server 192.168.0.53:49159 has not responded in the last 42 >>>>>>>> seconds, disconnecting. >>>>>>>> Mar 4 10:25:01 s06 systemd: Created slice User Slice of root. >>>>>>>> Mar 4 10:25:01 s06 systemd: Starting User Slice of root. >>>>>>>> Mar 4 10:25:01 s06 systemd: Started Session 201020 of user root. >>>>>>>> Mar 4 10:25:01 s06 systemd: Starting Session 201020 of user root. >>>>>>>> Mar 4 10:25:01 s06 systemd: Removed slice User Slice of root. >>>>>>>> Mar 4 10:25:01 s06 systemd: Stopping User Slice of root. >>>>>>>> Mar 4 10:25:57 s06 systemd: Created slice User Slice of root. >>>>>>>> Mar 4 10:25:57 s06 systemd: Starting User Slice of root. >>>>>>>> Mar 4 10:25:57 s06 systemd-logind: New session 201021 of user root. >>>>>>>> Mar 4 10:25:57 s06 systemd: Started Session 201021 of user root. >>>>>>>> Mar 4 10:25:57 s06 systemd: Starting Session 201021 of user root. >>>>>>>> Mar 4 10:26:01 s06 systemd: Started Session 201022 of user root. >>>>>>>> Mar 4 10:26:01 s06 systemd: Starting Session 201022 of user root. >>>>>>>> Mar 4 10:26:21 s06 nrpe[162388]: Error: Could not complete SSL >>>>>>>> handshake with 192.168.1.56: 5 >>>>>>>> Mar 4 10:27:01 s06 systemd: Started Session 201023 of user root. >>>>>>>> Mar 4 10:27:01 s06 systemd: Starting Session 201023 of user root. >>>>>>>> Mar 4 10:28:01 s06 systemd: Started Session 201024 of user root. >>>>>>>> Mar 4 10:28:01 s06 systemd: Starting Session 201024 of user root. >>>>>>>> Mar 4 10:29:01 s06 systemd: Started Session 201025 of user root. >>>>>>>> Mar 4 10:29:01 s06 systemd: Starting Session 201025 of user root. >>>>>>>> >>>>>>>> But, unfortunately, I don?t understand why it is happening. >>>>>>>> Now, NAGIOS server shows that s06 status is ok: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ****** Nagios *****Notification Type: RECOVERYService: Brick - >>>>>>>> /gluster/mnt2/brickHost: s06Address: s06-stgState: OKDate/Time: Mon Mar 4 >>>>>>>> 10:35:23 CET 2019Additional Info:OK: Brick /gluster/mnt2/brick is up* >>>>>>>> >>>>>>>> Nothing is changed from RAM, CPUs, and NETWORK point of view. >>>>>>>> /var/log/message file has been updated: >>>>>>>> >>>>>>>> Mar 4 10:32:01 s06 systemd: Starting Session 201029 of user root. >>>>>>>> Mar 4 10:32:30 s06 glustershd[20355]: [2019-03-04 09:32:30.069082] >>>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: >>>>>>>> server 192.168.0.52:49156 has not responded in the last 42 >>>>>>>> seconds, disconnecting. >>>>>>>> Mar 4 10:32:55 s06 glustershd[20355]: [2019-03-04 09:32:55.074689] >>>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-66: >>>>>>>> server 192.168.0.54:49167 has not responded in the last 42 >>>>>>>> seconds, disconnecting. >>>>>>>> Mar 4 10:33:01 s06 systemd: Started Session 201030 of user root. >>>>>>>> Mar 4 10:33:01 s06 systemd: Starting Session 201030 of user root. >>>>>>>> Mar 4 10:34:01 s06 systemd: Started Session 201031 of user root. >>>>>>>> Mar 4 10:34:01 s06 systemd: Starting Session 201031 of user root. >>>>>>>> Mar 4 10:35:01 s06 nrpe[162562]: Could not read request from >>>>>>>> client 192.168.1.56, bailing out... >>>>>>>> Mar 4 10:35:01 s06 nrpe[162562]: INFO: SSL Socket Shutdown. >>>>>>>> Mar 4 10:35:01 s06 systemd: Started Session 201032 of user root. >>>>>>>> Mar 4 10:35:01 s06 systemd: Starting Session 201032 of user root. >>>>>>>> >>>>>>>> Could you please help me to understand what it?s happening ? >>>>>>>> Thank you in advance. >>>>>>>> >>>>>>>> Rergards, >>>>>>>> Mauro >>>>>>>> >>>>>>>> >>>>>>>> On 1 Mar 2019, at 12:17, Mauro Tridici >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Thank you, Milind. >>>>>>>> I executed the instructions you suggested: >>>>>>>> >>>>>>>> - grep ?blocked for? /var/log/messages on s06 returns no output (no >>>>>>>> ?blocked? word is detected in messages file); >>>>>>>> - in /var/log/messages file I can see this kind of error repeated >>>>>>>> for a lot of times: >>>>>>>> >>>>>>>> Mar 1 08:43:01 s06 systemd: Starting Session 196071 of user root. >>>>>>>> Mar 1 08:43:01 s06 systemd: Removed slice User Slice of root. >>>>>>>> Mar 1 08:43:01 s06 systemd: Stopping User Slice of root. >>>>>>>> Mar 1 08:43:02 s06 kernel: traps: check_vol_utili[57091] general >>>>>>>> protection ip:7f88e76ee66d sp:7ffe5a5bcc30 error:0 in >>>>>>>> libglusterfs.so.0.0.1[7f88e769b000+f7000] >>>>>>>> Mar 1 08:43:02 s06 abrt-hook-ccpp: Process 57091 (python2.7) of >>>>>>>> user 0 killed by SIGSEGV - dumping core >>>>>>>> Mar 1 08:43:02 s06 abrt-server: Generating core_backtrace >>>>>>>> Mar 1 08:43:02 s06 abrt-server: Error: Unable to open >>>>>>>> './coredump': No such file or directory >>>>>>>> Mar 1 08:43:58 s06 abrt-server: Duplicate: UUID >>>>>>>> Mar 1 08:43:58 s06 abrt-server: DUP_OF_DIR: >>>>>>>> /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>>>>>> Mar 1 08:43:58 s06 abrt-server: Deleting problem directory >>>>>>>> ccpp-2019-03-01-08:43:02-57091 (dup of ccpp-2018-09-25-12:27:42-13041) >>>>>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Activating service >>>>>>>> name='org.freedesktop.problems' (using servicehelper) >>>>>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Successfully activated >>>>>>>> service 'org.freedesktop.problems' >>>>>>>> Mar 1 08:43:58 s06 abrt-server: Generating core_backtrace >>>>>>>> Mar 1 08:43:58 s06 abrt-server: Error: Unable to open >>>>>>>> './coredump': No such file or directory >>>>>>>> Mar 1 08:43:58 s06 abrt-server: Cannot notify >>>>>>>> '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event >>>>>>>> 'report_uReport' exited with 1 >>>>>>>> Mar 1 08:44:01 s06 systemd: Created slice User Slice of root. >>>>>>>> Mar 1 08:44:01 s06 systemd: Starting User Slice of root. >>>>>>>> Mar 1 08:44:01 s06 systemd: Started Session 196072 of user root. >>>>>>>> Mar 1 08:44:01 s06 systemd: Starting Session 196072 of user root. >>>>>>>> Mar 1 08:44:01 s06 systemd: Removed slice User Slice of root. >>>>>>>> >>>>>>>> - in /var/log/messages file I can see also 4 errors related to >>>>>>>> other cluster servers: >>>>>>>> >>>>>>>> Mar 1 11:05:01 s06 systemd: Starting User Slice of root. >>>>>>>> Mar 1 11:05:01 s06 systemd: Started Session 196230 of user root. >>>>>>>> Mar 1 11:05:01 s06 systemd: Starting Session 196230 of user root. >>>>>>>> Mar 1 11:05:01 s06 systemd: Removed slice User Slice of root. >>>>>>>> Mar 1 11:05:01 s06 systemd: Stopping User Slice of root. >>>>>>>> Mar 1 11:05:59 s06 glustershd[70117]: [2019-03-01 10:05:59.347094] >>>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-33: >>>>>>>> server 192.168.0.51:49163 has not responded in the last 42 >>>>>>>> seconds, disconnecting. >>>>>>>> Mar 1 11:06:01 s06 systemd: Created slice User Slice of root. >>>>>>>> Mar 1 11:06:01 s06 systemd: Starting User Slice of root. >>>>>>>> Mar 1 11:06:01 s06 systemd: Started Session 196231 of user root. >>>>>>>> Mar 1 11:06:01 s06 systemd: Starting Session 196231 of user root. >>>>>>>> Mar 1 11:06:01 s06 systemd: Removed slice User Slice of root. >>>>>>>> Mar 1 11:06:01 s06 systemd: Stopping User Slice of root. >>>>>>>> Mar 1 11:06:12 s06 glustershd[70117]: [2019-03-01 10:06:12.351319] >>>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-1: >>>>>>>> server 192.168.0.52:49153 has not responded in the last 42 >>>>>>>> seconds, disconnecting. >>>>>>>> Mar 1 11:06:38 s06 glustershd[70117]: [2019-03-01 10:06:38.356920] >>>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-7: >>>>>>>> server 192.168.0.52:49155 has not responded in the last 42 >>>>>>>> seconds, disconnecting. >>>>>>>> Mar 1 11:07:01 s06 systemd: Created slice User Slice of root. >>>>>>>> Mar 1 11:07:01 s06 systemd: Starting User Slice of root. >>>>>>>> Mar 1 11:07:01 s06 systemd: Started Session 196232 of user root. >>>>>>>> Mar 1 11:07:01 s06 systemd: Starting Session 196232 of user root. >>>>>>>> Mar 1 11:07:01 s06 systemd: Removed slice User Slice of root. >>>>>>>> Mar 1 11:07:01 s06 systemd: Stopping User Slice of root. >>>>>>>> Mar 1 11:07:36 s06 glustershd[70117]: [2019-03-01 10:07:36.366259] >>>>>>>> C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-0: >>>>>>>> server 192.168.0.51:49152 has not responded in the last 42 >>>>>>>> seconds, disconnecting. >>>>>>>> Mar 1 11:08:01 s06 systemd: Created slice User Slice of root. >>>>>>>> >>>>>>>> No ?blocked? word is in /var/log/messages files on other cluster >>>>>>>> servers. >>>>>>>> In attachment, the /var/log/messages file from s06 server. >>>>>>>> >>>>>>>> Thank you in advance, >>>>>>>> Mauro >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 1 Mar 2019, at 11:47, Milind Changire >>>>>>>> wrote: >>>>>>>> >>>>>>>> The traces of very high disk activity on the servers are often >>>>>>>> found in /var/log/messages >>>>>>>> You might want to grep for "blocked for" in /var/log/messages on >>>>>>>> s06 and correlate the timestamps to confirm the unresponsiveness as >>>>>>>> reported in gluster client logs. >>>>>>>> In cases of high disk activity, although the operating system >>>>>>>> continues to respond to ICMP pings, the processes writing to disks often >>>>>>>> get blocked to a large flush to the disk which could span beyond 42 seconds >>>>>>>> and hence result in ping-timer-expiry logs. >>>>>>>> >>>>>>>> As a side note: >>>>>>>> If you indeed find gluster processes being blocked in >>>>>>>> /var/log/messages, you might want to tweak sysctl tunables called >>>>>>>> vm.dirty_background_ratio or vm.dirty_background_bytes to a smaller value >>>>>>>> than the existing. Please read up more on those tunables before touching >>>>>>>> the settings. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Mar 1, 2019 at 4:06 PM Mauro Tridici >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> in attachment the client log captured after changing >>>>>>>>> network.ping-timeout option. >>>>>>>>> I noticed this error involving server 192.168.0.56 (s06) >>>>>>>>> >>>>>>>>> [2019-03-01 09:23:36.077287] I [rpc-clnt.c:1962:rpc_clnt_reconfig] >>>>>>>>> 0-tier2-client-71: changing ping timeout to 42 (from 0) >>>>>>>>> [2019-03-01 09:23:36.078213] I >>>>>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>>>>> volfile,continuing >>>>>>>>> [2019-03-01 09:23:36.078432] I >>>>>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>>>>> volfile,continuing >>>>>>>>> [2019-03-01 09:23:36.092357] I >>>>>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>>>>> volfile,continuing >>>>>>>>> [2019-03-01 09:23:36.094146] I >>>>>>>>> [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in >>>>>>>>> volfile,continuing >>>>>>>>> [2019-03-01 10:06:24.708082] C >>>>>>>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-50: server >>>>>>>>> 192.168.0.56:49156 has not responded in the last 42 seconds, >>>>>>>>> disconnecting. >>>>>>>>> >>>>>>>>> I don?t know why it happens, s06 server seems to be reachable. >>>>>>>>> >>>>>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>>>>> Trying 192.168.0.56... >>>>>>>>> Connected to 192.168.0.56. >>>>>>>>> Escape character is '^]'. >>>>>>>>> ^CConnection closed by foreign host. >>>>>>>>> [athena_login2][/users/home/sysm02/]> ping 192.168.0.56 >>>>>>>>> PING 192.168.0.56 (192.168.0.56) 56(84) bytes of data. >>>>>>>>> 64 bytes from 192.168.0.56: icmp_seq=1 ttl=64 time=0.116 ms >>>>>>>>> 64 bytes from 192.168.0.56: icmp_seq=2 ttl=64 time=0.101 ms >>>>>>>>> >>>>>>>>> --- 192.168.0.56 ping statistics --- >>>>>>>>> 2 packets transmitted, 2 received, 0% packet loss, time 1528ms >>>>>>>>> rtt min/avg/max/mdev = 0.101/0.108/0.116/0.012 ms >>>>>>>>> >>>>>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>>>>> Trying 192.168.0.56... >>>>>>>>> Connected to 192.168.0.56. >>>>>>>>> Escape character is '^]'. >>>>>>>>> >>>>>>>>> Thank you for your help, >>>>>>>>> Mauro >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 1 Mar 2019, at 10:29, Mauro Tridici >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> thank you for the explanation. >>>>>>>>> I just changed network.ping-timeout option to default value >>>>>>>>> (network.ping-timeout=42). >>>>>>>>> >>>>>>>>> I will check the logs to see if the errors will appear again. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Mauro >>>>>>>>> >>>>>>>>> On 1 Mar 2019, at 04:43, Milind Changire >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> network.ping-timeout should not be set to zero for non-glusterd >>>>>>>>> clients. >>>>>>>>> glusterd is a special case for which ping-timeout is set to zero >>>>>>>>> via /etc/glusterfs/glusterd.vol >>>>>>>>> >>>>>>>>> Setting network.ping-timeout to zero disables arming of the ping >>>>>>>>> timer for connections. This disables testing the connection for >>>>>>>>> responsiveness and hence avoids proactive fail-over. >>>>>>>>> >>>>>>>>> Please reset network.ping-timeout to a non-zero positive value, >>>>>>>>> eg. 42 >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Feb 28, 2019 at 5:07 PM Nithya Balachandran < >>>>>>>>> nbalacha at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> Adding Raghavendra and Milind to comment on this. >>>>>>>>>> >>>>>>>>>> What is the effect of setting network.ping-timeout to 0 and >>>>>>>>>> should it be set back to 42? >>>>>>>>>> Regards, >>>>>>>>>> Nithya >>>>>>>>>> >>>>>>>>>> On Thu, 28 Feb 2019 at 16:01, Mauro Tridici < >>>>>>>>>> mauro.tridici at cmcc.it> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Nithya, >>>>>>>>>>> >>>>>>>>>>> sorry for the late. >>>>>>>>>>> network.ping-timeout has been set to 0 in order to try to solve >>>>>>>>>>> some timeout problems, but it didn?t help. >>>>>>>>>>> I can set it to the default value. >>>>>>>>>>> >>>>>>>>>>> Can I proceed with the change? >>>>>>>>>>> >>>>>>>>>>> Thank you, >>>>>>>>>>> Mauro >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 28 Feb 2019, at 04:41, Nithya Balachandran < >>>>>>>>>>> nbalacha at redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Mauro, >>>>>>>>>>> >>>>>>>>>>> Is network.ping-timeout still set to 0. The default value is 42. >>>>>>>>>>> Is there a particular reason why this was changed? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Nithya >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, 27 Feb 2019 at 21:32, Mauro Tridici < >>>>>>>>>>> mauro.tridici at cmcc.it> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi Xavi, >>>>>>>>>>>> >>>>>>>>>>>> thank you for the detailed explanation and suggestions. >>>>>>>>>>>> Yes, transport.listen-backlog option is still set to 1024. >>>>>>>>>>>> >>>>>>>>>>>> I will check the network and connectivity status using ?ping? >>>>>>>>>>>> and ?telnet? as soon as the errors will come back again. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Mauro >>>>>>>>>>>> >>>>>>>>>>>> Il giorno 27 feb 2019, alle ore 16:42, Xavi Hernandez < >>>>>>>>>>>> jahernan at redhat.com> ha scritto: >>>>>>>>>>>> >>>>>>>>>>>> Hi Mauro, >>>>>>>>>>>> >>>>>>>>>>>> those errors say that the mount point is not connected to some >>>>>>>>>>>> of the bricks while executing operations. I see references to 3rd and 6th >>>>>>>>>>>> bricks of several disperse sets, which seem to map to server s06. For some >>>>>>>>>>>> reason, gluster is having troubles connecting from the client machine to >>>>>>>>>>>> that particular server. At the end of the log I see that after long time a >>>>>>>>>>>> reconnect is done to both of them. However little after, other bricks from >>>>>>>>>>>> the s05 get disconnected and a reconnect times out. >>>>>>>>>>>> >>>>>>>>>>>> That's really odd. It seems like if server/communication is cut >>>>>>>>>>>> to s06 for some time, then restored, and then the same happens to the next >>>>>>>>>>>> server. >>>>>>>>>>>> >>>>>>>>>>>> If the servers are really online and it's only a communication >>>>>>>>>>>> issue, it explains why server memory and network has increased: if the >>>>>>>>>>>> problem only exists between the client and servers, any write made by the >>>>>>>>>>>> client will automatically mark the file as damaged, since some of the >>>>>>>>>>>> servers have not been updated. Since self-heal runs from the server nodes, >>>>>>>>>>>> they will probably be correctly connected to all bricks, which allows them >>>>>>>>>>>> to heal the just damaged file, which increases memory and network usage. >>>>>>>>>>>> >>>>>>>>>>>> I guess you still have transport.listen-backlog set to 1024, >>>>>>>>>>>> right ? >>>>>>>>>>>> >>>>>>>>>>>> Just to try to identify if the problem really comes from >>>>>>>>>>>> network, can you check if you lose some pings from the client to all of the >>>>>>>>>>>> servers while you are seeing those errors in the log file ? >>>>>>>>>>>> >>>>>>>>>>>> You can also check if during those errors, you can telnet to >>>>>>>>>>>> the port of the brick from the client. >>>>>>>>>>>> >>>>>>>>>>>> Xavi >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Feb 26, 2019 at 10:17 AM Mauro Tridici < >>>>>>>>>>>> mauro.tridici at cmcc.it> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Nithya, >>>>>>>>>>>>> >>>>>>>>>>>>> ?df -h? operation is not still slow, but no users are using >>>>>>>>>>>>> the volume, RAM and NETWORK usage is ok on the client node. >>>>>>>>>>>>> >>>>>>>>>>>>> I was worried about this kind of warnings/errors: >>>>>>>>>>>>> >>>>>>>>>>>>> [2019-02-25 10:59:00.664323] W [MSGID: 122035] >>>>>>>>>>>>> [ec-common.c:571:ec_child_select] 0-tier2-disperse-6: Executing operation >>>>>>>>>>>>> with some subvolumes unavailable (20) >>>>>>>>>>>>> >>>>>>>>>>>>> [2019-02-26 03:11:35.212603] E >>>>>>>>>>>>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> >>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> >>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> >>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] >>>>>>>>>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) >>>>>>>>>>>>> 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) >>>>>>>>>>>>> called at 2019-02-26 03:10:56.549903 (xid=0x106f1c5) >>>>>>>>>>>>> >>>>>>>>>>>>> [2019-02-26 03:13:03.313831] E >>>>>>>>>>>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-50: connection to >>>>>>>>>>>>> 192.168.0.56:49156 failed (Timeout della connessione); >>>>>>>>>>>>> disconnecting socket >>>>>>>>>>>>> >>>>>>>>>>>>> It seems that some subvolumes are not available and >>>>>>>>>>>>> 192.168.0.56 server (s06) is not reachable. >>>>>>>>>>>>> But gluster servers are up&running and bricks are ok. >>>>>>>>>>>>> >>>>>>>>>>>>> In attachment the updated tier2.log file. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you. >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Mauro >>>>>>>>>>>>> >>>>>>>>>>>>> Il giorno 26 feb 2019, alle ore 04:03, Nithya Balachandran < >>>>>>>>>>>>> nbalacha at redhat.com> ha scritto: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I see a lot of EC messages in the log but they don't seem very >>>>>>>>>>>>> serious. Xavi, can you take a look? >>>>>>>>>>>>> >>>>>>>>>>>>> The only errors I see are: >>>>>>>>>>>>> [2019-02-25 10:58:45.519871] E >>>>>>>>>>>>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> >>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> >>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> >>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] >>>>>>>>>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) >>>>>>>>>>>>> 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) >>>>>>>>>>>>> called at 2019-02-25 10:57:47.429969 (xid=0xd26fe7) >>>>>>>>>>>>> [2019-02-25 10:58:51.461493] E >>>>>>>>>>>>> [rpc-clnt.c:350:saved_frames_unwind] (--> >>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> >>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> >>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> >>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] >>>>>>>>>>>>> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) >>>>>>>>>>>>> 0-tier2-client-41: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) >>>>>>>>>>>>> called at 2019-02-25 10:57:47.499174 (xid=0xf47d6a) >>>>>>>>>>>>> [2019-02-25 11:07:57.152874] E >>>>>>>>>>>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-70: connection to >>>>>>>>>>>>> 192.168.0.55:49163 failed (Timeout della connessione); >>>>>>>>>>>>> disconnecting socket >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Is the df -h operation still slow? If yes, can you take a >>>>>>>>>>>>> tcpdump of the client while running df -h and send that across? >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Nithya >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, 25 Feb 2019 at 17:27, Mauro Tridici < >>>>>>>>>>>>> mauro.tridici at cmcc.it> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sorry, some minutes after my last mail message, I noticed >>>>>>>>>>>>>> that ?df -h? command hanged for a while before returns the prompt. >>>>>>>>>>>>>> Yesterday, everything was ok in the gluster client log, but, >>>>>>>>>>>>>> today, I see a lot of errors (please, take a look to the attached file). >>>>>>>>>>>>>> >>>>>>>>>>>>>> On the client node, I detected an important RAM e NETWORK >>>>>>>>>>>>>> usage. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do you think that the errors have been caused by the client >>>>>>>>>>>>>> resources usage? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you in advance, >>>>>>>>>>>>>> Mauro >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cuculovic at mdpi.com Thu Mar 21 10:56:48 2019 From: cuculovic at mdpi.com (Milos Cuculovic) Date: Thu, 21 Mar 2019 11:56:48 +0100 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> Message-ID: Sure, thank you for following up. About the commands, here is what I see: brick1: ????????????????????????????????????? sudo gluster volume heal storage2 info Brick storage3:/data/data-cluster /dms/final_archive - Possibly undergoing heal Status: Connected Number of entries: 3 Brick storage4:/data/data-cluster /dms/final_archive - Possibly undergoing heal Status: Connected Number of entries: 2 ????????????????????????????????????? sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive trusted.afr.dirty=0x000000000000000000000000 trusted.afr.storage2-client-1=0x000000000000000000000010 trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 ????????????????????????????????????? stat /data/data-cluster/dms/final_archive File: '/data/data-cluster/dms/final_archive' Size: 3497984 Blocks: 8768 IO Block: 4096 directory Device: 807h/2055d Inode: 26427748396 Links: 72123 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2018-10-09 04:22:40.514629044 +0200 Modify: 2019-03-21 11:55:37.382278863 +0100 Change: 2019-03-21 11:55:37.382278863 +0100 Birth: - ????????????????????????????????????? ????????????????????????????????????? brick2: ????????????????????????????????????? sudo gluster volume heal storage2 info Brick storage3:/data/data-cluster /dms/final_archive - Possibly undergoing heal Status: Connected Number of entries: 3 Brick storage4:/data/data-cluster /dms/final_archive - Possibly undergoing heal Status: Connected Number of entries: 2 ????????????????????????????????????? sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive trusted.afr.dirty=0x000000000000000000000000 trusted.afr.storage2-client-0=0x000000000000000000000001 trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 ????????????????????????????????????? stat /data/data-cluster/dms/final_archive File: '/data/data-cluster/dms/final_archive' Size: 3497984 Blocks: 8760 IO Block: 4096 directory Device: 807h/2055d Inode: 13563551265 Links: 72124 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2018-10-09 04:22:40.514629044 +0200 Modify: 2019-03-21 11:55:46.382565124 +0100 Change: 2019-03-21 11:55:46.382565124 +0100 Birth: - ????????????????????????????????????? Hope this helps. - Kindest regards, Milos Cuculovic IT Manager --- MDPI AG Postfach, CH-4020 Basel, Switzerland Office: St. Alban-Anlage 66, 4052 Basel, Switzerland Tel. +41 61 683 77 35 Fax +41 61 302 89 18 Email: cuculovic at mdpi.com Skype: milos.cuculovic.mdpi Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > On 21 Mar 2019, at 11:43, Karthik Subrahmanya wrote: > > Can you attach the "glustershd.log" file which will be present under "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m . -e hex " output of all the entries listed in the heal info output from both the bricks? > > On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic > wrote: > Thanks Karthik! > > I was trying to find some resolution methods from [2] but unfortunately none worked (I can explain what I tried if needed). > >> I guess the volume you are talking about is of type replica-2 (1x2). > That?s correct, aware of the arbiter solution but still didn?t took time to implement. > > From the info results I posted, how to know in which situation I am. No files are mentioned in spit brain, only directories. One brick has 3 entries and one two entries. > > sudo gluster volume heal storage2 info > [sudo] password for sshadmin: > Brick storage3:/data/data-cluster > > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 3 > > Brick storage4:/data/data-cluster > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 2 > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email:?cuculovic at mdpi.com > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > >> On 21 Mar 2019, at 10:27, Karthik Subrahmanya > wrote: >> >> Hi, >> >> Note: I guess the volume you are talking about is of type replica-2 (1x2). Usually replica 2 volumes are prone to split-brain. If you can consider converting them to arbiter or replica-3, they will handle most of the cases which can lead to slit-brains. For more information see [1]. >> >> Resolving the split-brain: [2] talks about how to interpret the heal info output and different ways to resolve them using the CLI/manually/using the favorite-child-policy. >> If you are having entry split brain, and is a gfid split-brain (file/dir having different gfids on the replica bricks) then you can use the CLI option to resolve them. If a directory is in gfid split-brain in a distributed-replicate volume and you are using the source-brick option please make sure you use the brick of this subvolume, which has the same gfid as that of the other distribute subvolume(s) where you have the correct gfid, as the source. >> If you are having a type mismatch then follow the steps in [3] to resolve the split-brain. >> >> [1] https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >> [2] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >> [3] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain >> >> HTH, >> Karthik >> >> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic > wrote: >> I was now able to catch the split brain log: >> >> sudo gluster volume heal storage2 info >> Brick storage3:/data/data-cluster >> >> >> /dms/final_archive - Is in split-brain >> >> Status: Connected >> Number of entries: 3 >> >> Brick storage4:/data/data-cluster >> >> /dms/final_archive - Is in split-brain >> >> Status: Connected >> Number of entries: 2 >> >> Milos >> >>> On 21 Mar 2019, at 09:07, Milos Cuculovic > wrote: >>> >>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal shows this: >>> >>> sudo gluster volume heal storage2 info >>> Brick storage3:/data/data-cluster >>> >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick storage4:/data/data-cluster >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 2 >>> >>> The same files stay there. From time to time the status of the /dms/final_archive is in split brain at the following command shows: >>> >>> sudo gluster volume heal storage2 info split-brain >>> Brick storage3:/data/data-cluster >>> /dms/final_archive >>> Status: Connected >>> Number of entries in split-brain: 1 >>> >>> Brick storage4:/data/data-cluster >>> /dms/final_archive >>> Status: Connected >>> Number of entries in split-brain: 1 >>> >>> How to know the file who is in split brain? The files in /dms/final_archive are not very important, fine to remove (ideally resolve the split brain) for the ones that differ. >>> >>> I can only see the directory and GFID. Any idea on how to resolve this situation as I would like to continue with the upgrade on the 2nd server, and for this the heal needs to be done with 0 entries in sudo gluster volume heal storage2 info >>> >>> Thank you in advance, Milos. >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: glustershd_brick2.log Type: application/octet-stream Size: 734197 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: glustershd_brick1.log Type: application/octet-stream Size: 1193622 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From srangana at redhat.com Thu Mar 21 11:06:33 2019 From: srangana at redhat.com (Shyam Ranganathan) Date: Thu, 21 Mar 2019 07:06:33 -0400 Subject: [Gluster-users] Announcing Gluster release 5.5 Message-ID: <71be9d39-2794-bfab-ba58-6b904d22e1a1@redhat.com> The Gluster community is pleased to announce the release of Gluster 5.5 (packages available at [1]). Release notes for the release can be found at [3]. Major changes, features and limitations addressed in this release: - Release 5.4 introduced an incompatible change that prevented rolling upgrades, and hence was never announced to the lists. As a result we are jumping a release version and going to 5.5 from 5.3, that does not have the problem. Thanks, Gluster community [1] Packages for 5.5: https://download.gluster.org/pub/gluster/glusterfs/5/5.5/ [2] Release notes for 5.5: https://docs.gluster.org/en/latest/release-notes/5.5/ From ksubrahm at redhat.com Thu Mar 21 11:36:00 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Thu, 21 Mar 2019 17:06:00 +0530 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> Message-ID: Hi Milos, Thanks for the logs and the getfattr output. >From the logs I can see that there are 6 entries under the directory "/data/data-cluster/dms/final_archive" named 41be9ff5ec05c4b1c989c6053e709e59 5543982fab4b56060aa09f667a8ae617 a8b7f31775eebc8d1867e7f9de7b6eaf c1d3f3c2d7ae90e891e671e2f20d5d4b e5934699809a3b6dcfc5945f408b978b e7cdc94f60d390812a5f9754885e119e which are having gfid mismatch, so the heal is failing on this directory. You can use the CLI option to resolve these files from gfid mismatch. You can use any of the 3 methods available: 1. bigger-file gluster volume heal split-brain bigger-file 2. latest-mtime gluster volume heal split-brain latest-mtime 3. source-brick gluster volume heal split-brain source-brick where must be absolute path w.r.t. the volume, starting with '/'. If all those entries are directories then go for either latest-mtime/source-brick option. After you resolve all these gfid-mismatches, run the "gluster volume heal " command. Then check the heal info and let me know the result. Regards, Karthik On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic wrote: > Sure, thank you for following up. > > About the commands, here is what I see: > > brick1: > ????????????????????????????????????? > sudo gluster volume heal storage2 info > Brick storage3:/data/data-cluster > > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 3 > > Brick storage4:/data/data-cluster > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 2 > ????????????????????????????????????? > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.storage2-client-1=0x000000000000000000000010 > trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > ????????????????????????????????????? > stat /data/data-cluster/dms/final_archive > File: '/data/data-cluster/dms/final_archive' > Size: 3497984 Blocks: 8768 IO Block: 4096 directory > Device: 807h/2055d Inode: 26427748396 Links: 72123 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2018-10-09 04:22:40.514629044 +0200 > Modify: 2019-03-21 11:55:37.382278863 +0100 > Change: 2019-03-21 11:55:37.382278863 +0100 > Birth: - > ????????????????????????????????????? > ????????????????????????????????????? > > brick2: > ????????????????????????????????????? > sudo gluster volume heal storage2 info > Brick storage3:/data/data-cluster > > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 3 > > Brick storage4:/data/data-cluster > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 2 > ????????????????????????????????????? > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.storage2-client-0=0x000000000000000000000001 > trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > ????????????????????????????????????? > stat /data/data-cluster/dms/final_archive > File: '/data/data-cluster/dms/final_archive' > Size: 3497984 Blocks: 8760 IO Block: 4096 directory > Device: 807h/2055d Inode: 13563551265 Links: 72124 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2018-10-09 04:22:40.514629044 +0200 > Modify: 2019-03-21 11:55:46.382565124 +0100 > Change: 2019-03-21 11:55:46.382565124 +0100 > Birth: - > ????????????????????????????????????? > > Hope this helps. > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email: cuculovic at mdpi.com > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message > are confidential and intended solely for the use of the individual or > entity to whom they are addressed. If you have received this message in > error, please notify me and delete this message from your system. You may > not copy this message in its entirety or in part, or disclose its contents > to anyone. > > On 21 Mar 2019, at 11:43, Karthik Subrahmanya wrote: > > Can you attach the "glustershd.log" file which will be present under > "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m > . -e hex " output of all the entries listed in the heal > info output from both the bricks? > > On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic > wrote: > >> Thanks Karthik! >> >> I was trying to find some resolution methods from [2] but unfortunately >> none worked (I can explain what I tried if needed). >> >> I guess the volume you are talking about is of type replica-2 (1x2). >> >> That?s correct, aware of the arbiter solution but still didn?t took time >> to implement. >> >> From the info results I posted, how to know in which situation I am. No >> files are mentioned in spit brain, only directories. One brick has 3 >> entries and one two entries. >> >> sudo gluster volume heal storage2 info >> [sudo] password for sshadmin: >> Brick storage3:/data/data-cluster >> >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 3 >> >> Brick storage4:/data/data-cluster >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 2 >> >> - Kindest regards, >> >> Milos Cuculovic >> IT Manager >> >> --- >> MDPI AG >> Postfach, CH-4020 Basel, Switzerland >> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >> Tel. +41 61 683 77 35 >> Fax +41 61 302 89 18 >> Email: cuculovic at mdpi.com >> Skype: milos.cuculovic.mdpi >> >> Disclaimer: The information and files contained in this message >> are confidential and intended solely for the use of the individual or >> entity to whom they are addressed. If you have received this message in >> error, please notify me and delete this message from your system. You may >> not copy this message in its entirety or in part, or disclose its contents >> to anyone. >> >> On 21 Mar 2019, at 10:27, Karthik Subrahmanya >> wrote: >> >> Hi, >> >> Note: I guess the volume you are talking about is of type replica-2 >> (1x2). Usually replica 2 volumes are prone to split-brain. If you can >> consider converting them to arbiter or replica-3, they will handle most of >> the cases which can lead to slit-brains. For more information see [1]. >> >> Resolving the split-brain: [2] talks about how to interpret the heal info >> output and different ways to resolve them using the CLI/manually/using the >> favorite-child-policy. >> If you are having entry split brain, and is a gfid split-brain (file/dir >> having different gfids on the replica bricks) then you can use the CLI >> option to resolve them. If a directory is in gfid split-brain in a >> distributed-replicate volume and you are using the source-brick option >> please make sure you use the brick of this subvolume, which has the same >> gfid as that of the other distribute subvolume(s) where you have the >> correct gfid, as the source. >> If you are having a type mismatch then follow the steps in [3] to resolve >> the split-brain. >> >> [1] >> https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >> [2] >> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >> [3] >> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain >> >> HTH, >> Karthik >> >> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic >> wrote: >> >>> I was now able to catch the split brain log: >>> >>> sudo gluster volume heal storage2 info >>> Brick storage3:/data/data-cluster >>> >>> >>> /dms/final_archive - Is in split-brain >>> >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick storage4:/data/data-cluster >>> >>> /dms/final_archive - Is in split-brain >>> >>> Status: Connected >>> Number of entries: 2 >>> >>> Milos >>> >>> On 21 Mar 2019, at 09:07, Milos Cuculovic wrote: >>> >>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the >>> heal shows this: >>> >>> sudo gluster volume heal storage2 info >>> Brick storage3:/data/data-cluster >>> >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick storage4:/data/data-cluster >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 2 >>> >>> The same files stay there. From time to time the status of the >>> /dms/final_archive is in split brain at the following command shows: >>> >>> sudo gluster volume heal storage2 info split-brain >>> Brick storage3:/data/data-cluster >>> /dms/final_archive >>> Status: Connected >>> Number of entries in split-brain: 1 >>> >>> Brick storage4:/data/data-cluster >>> /dms/final_archive >>> Status: Connected >>> Number of entries in split-brain: 1 >>> >>> How to know the file who is in split brain? The files in >>> /dms/final_archive are not very important, fine to remove (ideally resolve >>> the split brain) for the ones that differ. >>> >>> I can only see the directory and GFID. Any idea on how to resolve this >>> situation as I would like to continue with the upgrade on the 2nd server, >>> and for this the heal needs to be done with 0 entries in sudo gluster >>> volume heal storage2 info >>> >>> Thank you in advance, Milos. >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cuculovic at mdpi.com Thu Mar 21 11:52:36 2019 From: cuculovic at mdpi.com (Milos Cuculovic) Date: Thu, 21 Mar 2019 12:52:36 +0100 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> Message-ID: Thank you Karthik, I have run this for all files (see example below) and it says the file is not in split-brain: sudo gluster volume heal storage2 split-brain latest-mtime /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: File not in split-brain. Volume heal failed. - Kindest regards, Milos Cuculovic IT Manager --- MDPI AG Postfach, CH-4020 Basel, Switzerland Office: St. Alban-Anlage 66, 4052 Basel, Switzerland Tel. +41 61 683 77 35 Fax +41 61 302 89 18 Email: cuculovic at mdpi.com Skype: milos.cuculovic.mdpi Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > On 21 Mar 2019, at 12:36, Karthik Subrahmanya wrote: > > Hi Milos, > > Thanks for the logs and the getfattr output. > From the logs I can see that there are 6 entries under the directory "/data/data-cluster/dms/final_archive" named > 41be9ff5ec05c4b1c989c6053e709e59 > 5543982fab4b56060aa09f667a8ae617 > a8b7f31775eebc8d1867e7f9de7b6eaf > c1d3f3c2d7ae90e891e671e2f20d5d4b > e5934699809a3b6dcfc5945f408b978b > e7cdc94f60d390812a5f9754885e119e > which are having gfid mismatch, so the heal is failing on this directory. > > You can use the CLI option to resolve these files from gfid mismatch. You can use any of the 3 methods available: > 1. bigger-file > gluster volume heal split-brain bigger-file > > 2. latest-mtime > gluster volume heal split-brain latest-mtime > > 3. source-brick > gluster volume heal split-brain source-brick > > where must be absolute path w.r.t. the volume, starting with '/'. > If all those entries are directories then go for either latest-mtime/source-brick option. > After you resolve all these gfid-mismatches, run the "gluster volume heal " command. Then check the heal info and let me know the result. > > Regards, > Karthik > > On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic > wrote: > Sure, thank you for following up. > > About the commands, here is what I see: > > brick1: > ????????????????????????????????????? > sudo gluster volume heal storage2 info > Brick storage3:/data/data-cluster > > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 3 > > Brick storage4:/data/data-cluster > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 2 > ????????????????????????????????????? > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.storage2-client-1=0x000000000000000000000010 > trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > ????????????????????????????????????? > stat /data/data-cluster/dms/final_archive > File: '/data/data-cluster/dms/final_archive' > Size: 3497984 Blocks: 8768 IO Block: 4096 directory > Device: 807h/2055d Inode: 26427748396 Links: 72123 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2018-10-09 04:22:40.514629044 +0200 > Modify: 2019-03-21 11:55:37.382278863 +0100 > Change: 2019-03-21 11:55:37.382278863 +0100 > Birth: - > ????????????????????????????????????? > ????????????????????????????????????? > > brick2: > ????????????????????????????????????? > sudo gluster volume heal storage2 info > Brick storage3:/data/data-cluster > > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 3 > > Brick storage4:/data/data-cluster > > /dms/final_archive - Possibly undergoing heal > > Status: Connected > Number of entries: 2 > ????????????????????????????????????? > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.storage2-client-0=0x000000000000000000000001 > trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > ????????????????????????????????????? > stat /data/data-cluster/dms/final_archive > File: '/data/data-cluster/dms/final_archive' > Size: 3497984 Blocks: 8760 IO Block: 4096 directory > Device: 807h/2055d Inode: 13563551265 Links: 72124 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2018-10-09 04:22:40.514629044 +0200 > Modify: 2019-03-21 11:55:46.382565124 +0100 > Change: 2019-03-21 11:55:46.382565124 +0100 > Birth: - > ????????????????????????????????????? > > Hope this helps. > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email:?cuculovic at mdpi.com > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > >> On 21 Mar 2019, at 11:43, Karthik Subrahmanya > wrote: >> >> Can you attach the "glustershd.log" file which will be present under "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m . -e hex " output of all the entries listed in the heal info output from both the bricks? >> >> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic > wrote: >> Thanks Karthik! >> >> I was trying to find some resolution methods from [2] but unfortunately none worked (I can explain what I tried if needed). >> >>> I guess the volume you are talking about is of type replica-2 (1x2). >> That?s correct, aware of the arbiter solution but still didn?t took time to implement. >> >> From the info results I posted, how to know in which situation I am. No files are mentioned in spit brain, only directories. One brick has 3 entries and one two entries. >> >> sudo gluster volume heal storage2 info >> [sudo] password for sshadmin: >> Brick storage3:/data/data-cluster >> >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 3 >> >> Brick storage4:/data/data-cluster >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 2 >> >> - Kindest regards, >> >> Milos Cuculovic >> IT Manager >> >> --- >> MDPI AG >> Postfach, CH-4020 Basel, Switzerland >> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >> Tel. +41 61 683 77 35 >> Fax +41 61 302 89 18 >> Email:?cuculovic at mdpi.com >> Skype: milos.cuculovic.mdpi >> >> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >> >>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya > wrote: >>> >>> Hi, >>> >>> Note: I guess the volume you are talking about is of type replica-2 (1x2). Usually replica 2 volumes are prone to split-brain. If you can consider converting them to arbiter or replica-3, they will handle most of the cases which can lead to slit-brains. For more information see [1]. >>> >>> Resolving the split-brain: [2] talks about how to interpret the heal info output and different ways to resolve them using the CLI/manually/using the favorite-child-policy. >>> If you are having entry split brain, and is a gfid split-brain (file/dir having different gfids on the replica bricks) then you can use the CLI option to resolve them. If a directory is in gfid split-brain in a distributed-replicate volume and you are using the source-brick option please make sure you use the brick of this subvolume, which has the same gfid as that of the other distribute subvolume(s) where you have the correct gfid, as the source. >>> If you are having a type mismatch then follow the steps in [3] to resolve the split-brain. >>> >>> [1] https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >>> [2] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >>> [3] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain >>> >>> HTH, >>> Karthik >>> >>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic > wrote: >>> I was now able to catch the split brain log: >>> >>> sudo gluster volume heal storage2 info >>> Brick storage3:/data/data-cluster >>> >>> >>> /dms/final_archive - Is in split-brain >>> >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick storage4:/data/data-cluster >>> >>> /dms/final_archive - Is in split-brain >>> >>> Status: Connected >>> Number of entries: 2 >>> >>> Milos >>> >>>> On 21 Mar 2019, at 09:07, Milos Cuculovic > wrote: >>>> >>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal shows this: >>>> >>>> sudo gluster volume heal storage2 info >>>> Brick storage3:/data/data-cluster >>>> >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 3 >>>> >>>> Brick storage4:/data/data-cluster >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 2 >>>> >>>> The same files stay there. From time to time the status of the /dms/final_archive is in split brain at the following command shows: >>>> >>>> sudo gluster volume heal storage2 info split-brain >>>> Brick storage3:/data/data-cluster >>>> /dms/final_archive >>>> Status: Connected >>>> Number of entries in split-brain: 1 >>>> >>>> Brick storage4:/data/data-cluster >>>> /dms/final_archive >>>> Status: Connected >>>> Number of entries in split-brain: 1 >>>> >>>> How to know the file who is in split brain? The files in /dms/final_archive are not very important, fine to remove (ideally resolve the split brain) for the ones that differ. >>>> >>>> I can only see the directory and GFID. Any idea on how to resolve this situation as I would like to continue with the upgrade on the 2nd server, and for this the heal needs to be done with 0 entries in sudo gluster volume heal storage2 info >>>> >>>> Thank you in advance, Milos. >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Thu Mar 21 12:05:34 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Thu, 21 Mar 2019 17:35:34 +0530 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> Message-ID: Can you give me the stat & getfattr output of all those 6 entries from both the bricks and the glfsheal-.log file from the node where you run this command? Meanwhile can you also try running this with the source-brick option? On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic wrote: > Thank you Karthik, > > I have run this for all files (see example below) and it says the file is > not in split-brain: > > sudo gluster volume heal storage2 split-brain latest-mtime > /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 > Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: File > not in split-brain. > Volume heal failed. > > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email: cuculovic at mdpi.com > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message > are confidential and intended solely for the use of the individual or > entity to whom they are addressed. If you have received this message in > error, please notify me and delete this message from your system. You may > not copy this message in its entirety or in part, or disclose its contents > to anyone. > > On 21 Mar 2019, at 12:36, Karthik Subrahmanya wrote: > > Hi Milos, > > Thanks for the logs and the getfattr output. > From the logs I can see that there are 6 entries under the > directory "/data/data-cluster/dms/final_archive" named > 41be9ff5ec05c4b1c989c6053e709e59 > 5543982fab4b56060aa09f667a8ae617 > a8b7f31775eebc8d1867e7f9de7b6eaf > c1d3f3c2d7ae90e891e671e2f20d5d4b > e5934699809a3b6dcfc5945f408b978b > e7cdc94f60d390812a5f9754885e119e > which are having gfid mismatch, so the heal is failing on this directory. > > You can use the CLI option to resolve these files from gfid mismatch. You > can use any of the 3 methods available: > 1. bigger-file > gluster volume heal split-brain bigger-file > > 2. latest-mtime > gluster volume heal split-brain latest-mtime > > 3. source-brick > gluster volume heal split-brain source-brick > > > where must be absolute path w.r.t. the volume, starting with '/'. > If all those entries are directories then go for either > latest-mtime/source-brick option. > After you resolve all these gfid-mismatches, run the "gluster volume heal > " command. Then check the heal info and let me know the result. > > Regards, > Karthik > > On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic > wrote: > >> Sure, thank you for following up. >> >> About the commands, here is what I see: >> >> brick1: >> ????????????????????????????????????? >> sudo gluster volume heal storage2 info >> Brick storage3:/data/data-cluster >> >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 3 >> >> Brick storage4:/data/data-cluster >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 2 >> ????????????????????????????????????? >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.storage2-client-1=0x000000000000000000000010 >> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> ????????????????????????????????????? >> stat /data/data-cluster/dms/final_archive >> File: '/data/data-cluster/dms/final_archive' >> Size: 3497984 Blocks: 8768 IO Block: 4096 directory >> Device: 807h/2055d Inode: 26427748396 Links: 72123 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2018-10-09 04:22:40.514629044 +0200 >> Modify: 2019-03-21 11:55:37.382278863 +0100 >> Change: 2019-03-21 11:55:37.382278863 +0100 >> Birth: - >> ????????????????????????????????????? >> ????????????????????????????????????? >> >> brick2: >> ????????????????????????????????????? >> sudo gluster volume heal storage2 info >> Brick storage3:/data/data-cluster >> >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 3 >> >> Brick storage4:/data/data-cluster >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 2 >> ????????????????????????????????????? >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.storage2-client-0=0x000000000000000000000001 >> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> ????????????????????????????????????? >> stat /data/data-cluster/dms/final_archive >> File: '/data/data-cluster/dms/final_archive' >> Size: 3497984 Blocks: 8760 IO Block: 4096 directory >> Device: 807h/2055d Inode: 13563551265 Links: 72124 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2018-10-09 04:22:40.514629044 +0200 >> Modify: 2019-03-21 11:55:46.382565124 +0100 >> Change: 2019-03-21 11:55:46.382565124 +0100 >> Birth: - >> ????????????????????????????????????? >> >> Hope this helps. >> >> - Kindest regards, >> >> Milos Cuculovic >> IT Manager >> >> --- >> MDPI AG >> Postfach, CH-4020 Basel, Switzerland >> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >> Tel. +41 61 683 77 35 >> Fax +41 61 302 89 18 >> Email: cuculovic at mdpi.com >> Skype: milos.cuculovic.mdpi >> >> Disclaimer: The information and files contained in this message >> are confidential and intended solely for the use of the individual or >> entity to whom they are addressed. If you have received this message in >> error, please notify me and delete this message from your system. You may >> not copy this message in its entirety or in part, or disclose its contents >> to anyone. >> >> On 21 Mar 2019, at 11:43, Karthik Subrahmanya >> wrote: >> >> Can you attach the "glustershd.log" file which will be present under >> "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m >> . -e hex " output of all the entries listed in the heal >> info output from both the bricks? >> >> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic >> wrote: >> >>> Thanks Karthik! >>> >>> I was trying to find some resolution methods from [2] but unfortunately >>> none worked (I can explain what I tried if needed). >>> >>> I guess the volume you are talking about is of type replica-2 (1x2). >>> >>> That?s correct, aware of the arbiter solution but still didn?t took time >>> to implement. >>> >>> From the info results I posted, how to know in which situation I am. No >>> files are mentioned in spit brain, only directories. One brick has 3 >>> entries and one two entries. >>> >>> sudo gluster volume heal storage2 info >>> [sudo] password for sshadmin: >>> Brick storage3:/data/data-cluster >>> >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick storage4:/data/data-cluster >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 2 >>> >>> - Kindest regards, >>> >>> Milos Cuculovic >>> IT Manager >>> >>> --- >>> MDPI AG >>> Postfach, CH-4020 Basel, Switzerland >>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>> Tel. +41 61 683 77 35 >>> Fax +41 61 302 89 18 >>> Email: cuculovic at mdpi.com >>> Skype: milos.cuculovic.mdpi >>> >>> Disclaimer: The information and files contained in this message >>> are confidential and intended solely for the use of the individual or >>> entity to whom they are addressed. If you have received this message in >>> error, please notify me and delete this message from your system. You may >>> not copy this message in its entirety or in part, or disclose its contents >>> to anyone. >>> >>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya >>> wrote: >>> >>> Hi, >>> >>> Note: I guess the volume you are talking about is of type replica-2 >>> (1x2). Usually replica 2 volumes are prone to split-brain. If you can >>> consider converting them to arbiter or replica-3, they will handle most of >>> the cases which can lead to slit-brains. For more information see [1]. >>> >>> Resolving the split-brain: [2] talks about how to interpret the heal >>> info output and different ways to resolve them using the CLI/manually/using >>> the favorite-child-policy. >>> If you are having entry split brain, and is a gfid split-brain (file/dir >>> having different gfids on the replica bricks) then you can use the CLI >>> option to resolve them. If a directory is in gfid split-brain in a >>> distributed-replicate volume and you are using the source-brick option >>> please make sure you use the brick of this subvolume, which has the same >>> gfid as that of the other distribute subvolume(s) where you have the >>> correct gfid, as the source. >>> If you are having a type mismatch then follow the steps in [3] to >>> resolve the split-brain. >>> >>> [1] >>> https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >>> [2] >>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >>> [3] >>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain >>> >>> HTH, >>> Karthik >>> >>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic >>> wrote: >>> >>>> I was now able to catch the split brain log: >>>> >>>> sudo gluster volume heal storage2 info >>>> Brick storage3:/data/data-cluster >>>> >>>> >>>> /dms/final_archive - Is in split-brain >>>> >>>> Status: Connected >>>> Number of entries: 3 >>>> >>>> Brick storage4:/data/data-cluster >>>> >>>> /dms/final_archive - Is in split-brain >>>> >>>> Status: Connected >>>> Number of entries: 2 >>>> >>>> Milos >>>> >>>> On 21 Mar 2019, at 09:07, Milos Cuculovic wrote: >>>> >>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the >>>> heal shows this: >>>> >>>> sudo gluster volume heal storage2 info >>>> Brick storage3:/data/data-cluster >>>> >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 3 >>>> >>>> Brick storage4:/data/data-cluster >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 2 >>>> >>>> The same files stay there. From time to time the status of the >>>> /dms/final_archive is in split brain at the following command shows: >>>> >>>> sudo gluster volume heal storage2 info split-brain >>>> Brick storage3:/data/data-cluster >>>> /dms/final_archive >>>> Status: Connected >>>> Number of entries in split-brain: 1 >>>> >>>> Brick storage4:/data/data-cluster >>>> /dms/final_archive >>>> Status: Connected >>>> Number of entries in split-brain: 1 >>>> >>>> How to know the file who is in split brain? The files in >>>> /dms/final_archive are not very important, fine to remove (ideally resolve >>>> the split brain) for the ones that differ. >>>> >>>> I can only see the directory and GFID. Any idea on how to resolve this >>>> situation as I would like to continue with the upgrade on the 2nd server, >>>> and for this the heal needs to be done with 0 entries in sudo gluster >>>> volume heal storage2 info >>>> >>>> Thank you in advance, Milos. >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cuculovic at mdpi.com Thu Mar 21 12:33:11 2019 From: cuculovic at mdpi.com (Milos Cuculovic) Date: Thu, 21 Mar 2019 13:33:11 +0100 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> Message-ID: <14B1CACB-4049-42DD-AB69-3B75FBD6BE30@mdpi.com> Sure: brick1: ???????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????? sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 trusted.gfid=0xe358ff34504241d387efe1e76eb28bb0 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 ???????????????????????????????????????????????????????????? sudo stat /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 File: '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 40809094709 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:06:26.994047597 +0100 Modify: 2019-03-20 11:28:28.294689870 +0100 Change: 2019-03-21 13:01:03.077654239 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 File: '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 49399908865 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:07:20.342140927 +0100 Modify: 2019-03-20 11:28:28.318690015 +0100 Change: 2019-03-21 13:01:03.133654344 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf File: '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 53706303549 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:06:55.414097315 +0100 Modify: 2019-03-20 11:28:28.362690281 +0100 Change: 2019-03-21 13:01:03.141654359 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b File: '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 57990935591 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:07:08.558120309 +0100 Modify: 2019-03-20 11:28:14.226604869 +0100 Change: 2019-03-21 13:01:03.189654448 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b File: '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 62291339781 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:06:02.070003998 +0100 Modify: 2019-03-20 11:28:28.458690861 +0100 Change: 2019-03-21 13:01:03.281654621 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e File: '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 66574223479 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:28:10.826584325 +0100 Modify: 2019-03-20 11:28:10.834584374 +0100 Change: 2019-03-20 14:06:07.937449353 +0100 Birth: - root at storage3:/var/log/glusterfs# ???????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????? brick2: ???????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????? sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e trusted.afr.dirty=0x000000000000000000000000 trusted.afr.storage2-client-0=0x000000000000000000000000 trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 trusted.afr.storage2-client-0=0x000000000000000000000000 trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf trusted.afr.storage2-client-0=0x000000000000000000000000 trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b trusted.afr.storage2-client-0=0x000000000000000000000000 trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b trusted.afr.storage2-client-0=0x000000000000000000000000 trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e trusted.afr.dirty=0x000000000000000000000000 trusted.afr.storage2-client-0=0x000000000000000000000000 trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 ???????????????????????????????????????????????????????????? sudo stat /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 File: '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 42232631305 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:06:26.994047597 +0100 Modify: 2019-03-20 11:28:28.294689870 +0100 Change: 2019-03-21 13:01:03.078748131 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 File: '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 78589109305 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:07:20.342140927 +0100 Modify: 2019-03-20 11:28:28.318690015 +0100 Change: 2019-03-21 13:01:03.134748477 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf File: '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 54972096517 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:06:55.414097315 +0100 Modify: 2019-03-20 11:28:28.362690281 +0100 Change: 2019-03-21 13:01:03.162748650 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b File: '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 40821259275 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:07:08.558120309 +0100 Modify: 2019-03-20 11:28:14.226604869 +0100 Change: 2019-03-21 13:01:03.194748848 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b File: '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 15876654 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:06:02.070003998 +0100 Modify: 2019-03-20 11:28:28.458690861 +0100 Change: 2019-03-21 13:01:03.282749392 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e File: '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 49408944650 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:28:10.826584325 +0100 Modify: 2019-03-20 11:28:10.834584374 +0100 Change: 2019-03-20 14:06:07.940849268 +0100 Birth: - ???????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????? The file is from brick 2 that I upgraded and started the heal on. - Kindest regards, Milos Cuculovic IT Manager --- MDPI AG Postfach, CH-4020 Basel, Switzerland Office: St. Alban-Anlage 66, 4052 Basel, Switzerland Tel. +41 61 683 77 35 Fax +41 61 302 89 18 Email: cuculovic at mdpi.com Skype: milos.cuculovic.mdpi Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > On 21 Mar 2019, at 13:05, Karthik Subrahmanya wrote: > > Can you give me the stat & getfattr output of all those 6 entries from both the bricks and the glfsheal-.log file from the node where you run this command? > Meanwhile can you also try running this with the source-brick option? > > On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic > wrote: > Thank you Karthik, > > I have run this for all files (see example below) and it says the file is not in split-brain: > > sudo gluster volume heal storage2 split-brain latest-mtime /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 > Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: File not in split-brain. > Volume heal failed. > > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email:?cuculovic at mdpi.com > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > >> On 21 Mar 2019, at 12:36, Karthik Subrahmanya > wrote: >> >> Hi Milos, >> >> Thanks for the logs and the getfattr output. >> From the logs I can see that there are 6 entries under the directory "/data/data-cluster/dms/final_archive" named >> 41be9ff5ec05c4b1c989c6053e709e59 >> 5543982fab4b56060aa09f667a8ae617 >> a8b7f31775eebc8d1867e7f9de7b6eaf >> c1d3f3c2d7ae90e891e671e2f20d5d4b >> e5934699809a3b6dcfc5945f408b978b >> e7cdc94f60d390812a5f9754885e119e >> which are having gfid mismatch, so the heal is failing on this directory. >> >> You can use the CLI option to resolve these files from gfid mismatch. You can use any of the 3 methods available: >> 1. bigger-file >> gluster volume heal split-brain bigger-file >> >> 2. latest-mtime >> gluster volume heal split-brain latest-mtime >> >> 3. source-brick >> gluster volume heal split-brain source-brick >> >> where must be absolute path w.r.t. the volume, starting with '/'. >> If all those entries are directories then go for either latest-mtime/source-brick option. >> After you resolve all these gfid-mismatches, run the "gluster volume heal " command. Then check the heal info and let me know the result. >> >> Regards, >> Karthik >> >> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic > wrote: >> Sure, thank you for following up. >> >> About the commands, here is what I see: >> >> brick1: >> ????????????????????????????????????? >> sudo gluster volume heal storage2 info >> Brick storage3:/data/data-cluster >> >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 3 >> >> Brick storage4:/data/data-cluster >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 2 >> ????????????????????????????????????? >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.storage2-client-1=0x000000000000000000000010 >> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> ????????????????????????????????????? >> stat /data/data-cluster/dms/final_archive >> File: '/data/data-cluster/dms/final_archive' >> Size: 3497984 Blocks: 8768 IO Block: 4096 directory >> Device: 807h/2055d Inode: 26427748396 Links: 72123 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2018-10-09 04:22:40.514629044 +0200 >> Modify: 2019-03-21 11:55:37.382278863 +0100 >> Change: 2019-03-21 11:55:37.382278863 +0100 >> Birth: - >> ????????????????????????????????????? >> ????????????????????????????????????? >> >> brick2: >> ????????????????????????????????????? >> sudo gluster volume heal storage2 info >> Brick storage3:/data/data-cluster >> >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 3 >> >> Brick storage4:/data/data-cluster >> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 2 >> ????????????????????????????????????? >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.storage2-client-0=0x000000000000000000000001 >> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> ????????????????????????????????????? >> stat /data/data-cluster/dms/final_archive >> File: '/data/data-cluster/dms/final_archive' >> Size: 3497984 Blocks: 8760 IO Block: 4096 directory >> Device: 807h/2055d Inode: 13563551265 Links: 72124 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2018-10-09 04:22:40.514629044 +0200 >> Modify: 2019-03-21 11:55:46.382565124 +0100 >> Change: 2019-03-21 11:55:46.382565124 +0100 >> Birth: - >> ????????????????????????????????????? >> >> Hope this helps. >> >> - Kindest regards, >> >> Milos Cuculovic >> IT Manager >> >> --- >> MDPI AG >> Postfach, CH-4020 Basel, Switzerland >> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >> Tel. +41 61 683 77 35 >> Fax +41 61 302 89 18 >> Email:?cuculovic at mdpi.com >> Skype: milos.cuculovic.mdpi >> >> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >> >>> On 21 Mar 2019, at 11:43, Karthik Subrahmanya > wrote: >>> >>> Can you attach the "glustershd.log" file which will be present under "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m . -e hex " output of all the entries listed in the heal info output from both the bricks? >>> >>> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic > wrote: >>> Thanks Karthik! >>> >>> I was trying to find some resolution methods from [2] but unfortunately none worked (I can explain what I tried if needed). >>> >>>> I guess the volume you are talking about is of type replica-2 (1x2). >>> That?s correct, aware of the arbiter solution but still didn?t took time to implement. >>> >>> From the info results I posted, how to know in which situation I am. No files are mentioned in spit brain, only directories. One brick has 3 entries and one two entries. >>> >>> sudo gluster volume heal storage2 info >>> [sudo] password for sshadmin: >>> Brick storage3:/data/data-cluster >>> >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick storage4:/data/data-cluster >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 2 >>> >>> - Kindest regards, >>> >>> Milos Cuculovic >>> IT Manager >>> >>> --- >>> MDPI AG >>> Postfach, CH-4020 Basel, Switzerland >>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>> Tel. +41 61 683 77 35 >>> Fax +41 61 302 89 18 >>> Email:?cuculovic at mdpi.com >>> Skype: milos.cuculovic.mdpi >>> >>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>> >>>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya > wrote: >>>> >>>> Hi, >>>> >>>> Note: I guess the volume you are talking about is of type replica-2 (1x2). Usually replica 2 volumes are prone to split-brain. If you can consider converting them to arbiter or replica-3, they will handle most of the cases which can lead to slit-brains. For more information see [1]. >>>> >>>> Resolving the split-brain: [2] talks about how to interpret the heal info output and different ways to resolve them using the CLI/manually/using the favorite-child-policy. >>>> If you are having entry split brain, and is a gfid split-brain (file/dir having different gfids on the replica bricks) then you can use the CLI option to resolve them. If a directory is in gfid split-brain in a distributed-replicate volume and you are using the source-brick option please make sure you use the brick of this subvolume, which has the same gfid as that of the other distribute subvolume(s) where you have the correct gfid, as the source. >>>> If you are having a type mismatch then follow the steps in [3] to resolve the split-brain. >>>> >>>> [1] https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >>>> [2] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >>>> [3] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain >>>> >>>> HTH, >>>> Karthik >>>> >>>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic > wrote: >>>> I was now able to catch the split brain log: >>>> >>>> sudo gluster volume heal storage2 info >>>> Brick storage3:/data/data-cluster >>>> >>>> >>>> /dms/final_archive - Is in split-brain >>>> >>>> Status: Connected >>>> Number of entries: 3 >>>> >>>> Brick storage4:/data/data-cluster >>>> >>>> /dms/final_archive - Is in split-brain >>>> >>>> Status: Connected >>>> Number of entries: 2 >>>> >>>> Milos >>>> >>>>> On 21 Mar 2019, at 09:07, Milos Cuculovic > wrote: >>>>> >>>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal shows this: >>>>> >>>>> sudo gluster volume heal storage2 info >>>>> Brick storage3:/data/data-cluster >>>>> >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 3 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 2 >>>>> >>>>> The same files stay there. From time to time the status of the /dms/final_archive is in split brain at the following command shows: >>>>> >>>>> sudo gluster volume heal storage2 info split-brain >>>>> Brick storage3:/data/data-cluster >>>>> /dms/final_archive >>>>> Status: Connected >>>>> Number of entries in split-brain: 1 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> /dms/final_archive >>>>> Status: Connected >>>>> Number of entries in split-brain: 1 >>>>> >>>>> How to know the file who is in split brain? The files in /dms/final_archive are not very important, fine to remove (ideally resolve the split brain) for the ones that differ. >>>>> >>>>> I can only see the directory and GFID. Any idea on how to resolve this situation as I would like to continue with the upgrade on the 2nd server, and for this the heal needs to be done with 0 entries in sudo gluster volume heal storage2 info >>>>> >>>>> Thank you in advance, Milos. >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: glfsheal-storage2.log Type: application/octet-stream Size: 513885 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From xiubli at redhat.com Thu Mar 21 13:01:02 2019 From: xiubli at redhat.com (Xiubo Li) Date: Thu, 21 Mar 2019 21:01:02 +0800 Subject: [Gluster-users] [Gluster-devel] Network Block device (NBD) on top of glusterfs In-Reply-To: References: Message-ID: <072c6bb2-eee1-8374-9b53-b9561deebfc7@redhat.com> On 2019/3/21 18:09, Prasanna Kalever wrote: > > > On Thu, Mar 21, 2019 at 9:00 AM Xiubo Li > wrote: > > All, > > I am one of the contributor forgluster-block > [1] project, and also I > contribute to linux kernel andopen-iscsi > project.[2] > > NBD was around for some time, but in recent time, linux kernel?s > Network Block Device (NBD) is enhanced and made to work with more > devices and also the option to integrate with netlink is added. > So, I tried to provide a glusterfs client based NBD driver > recently. Please refergithub issue #633 > [3], and good > news is I have a working code, with most basic things @nbd-runner > project [4]. > > While this email is about announcing the project, and asking for > more collaboration, I would also like to discuss more about the > placement of the project itself. Currently nbd-runner project is > expected to be shared by our friends at Ceph project too, to > provide NBD driver for Ceph. I have personally worked with some of > them closely while contributing to open-iSCSI project, and we > would like to take this project to great success. > > Now few questions: > > 1. Can I continue to usehttp://github.com/gluster/nbd-runneras > home for this project, even if its shared by other filesystem > projects? > > * I personally am fine with this. > > 2. Should there be a separate organization for this repo? > > * While it may make sense in future, for now, I am not planning > to start any new thing? > > It would be great if we have some consensus on this soon as > nbd-runner is a new repository. If there are no concerns, I will > continue to contribute to the existing repository. > > > Thanks Xiubo Li, for finally sending this email out. Since this email > is out on gluster mailing list, I would like to take a stand from > gluster community point of view *only* and share my views. > > My honest answer is "If we want to maintain this within gluster org, > then 80% of the effort is common/duplicate of what we did all these > days with gluster-block", > The great idea came from Mike Christie days ago and the nbd-runner project's framework is initially emulated from tcmu-runner. This is why I name this project as nbd-runner, which will work for all the other Distributed Storages, such as Gluster/Ceph/Azure, as discussed with Mike before. nbd-runner(NBD proto) and tcmu-runner(iSCSI proto) are almost the same and both are working as lower IO(READ/WRITE/...) stuff, not the management layer like ceph-iscsi-gateway and gluster-block currently do. Currently since I only implemented the Gluster handler and also using the RPC like glusterfs and gluster-block, most of the other code (about 70%) in nbd-runner are for the NBD proto and these are very different from tcmu-runner/glusterfs/gluster-block projects, and there are many new features in NBD module that not yet supported and then there will be more different in future. The framework coding has been done and the nbd-runner project is already stable and could already work well for me now. > like: > * rpc/socket code > * cli/daemon parser/helper logics > * gfapi util functions > * logger framework > * inotify & dyn-config threads Yeah, these features were initially from tcmu-runner project, Mike and I coded two years ago. Currently nbd-runner also has copied them from tcmu-runner. Very appreciated for you great ideas here Prasanna and hope nbd-runner could be more generically and successfully used in future. BRs Xiubo Li > * configure/Makefile/specfiles > * docsAboutGluster and etc .. > > The repository gluster-block is actually a home for all the block > related stuff within gluster and its designed to accommodate alike > functionalities, if I was you I would have simply copied nbd-runner.c > into https://github.com/gluster/gluster-block/tree/master/daemon/ just > like ceph plays it here > https://github.com/ceph/ceph/blob/master/src/tools/rbd_nbd/rbd-nbd.cc > and be done. > > Advantages of keeping nbd client within gluster-block: > -> No worry about maintenance code burdon > -> No worry about monitoring a new component > -> shipping packages to fedora/centos/rhel is handled > -> This helps improve and stabilize the current gluster-block framework > -> We can build a common CI > -> We can use reuse common test framework and etc .. > > If you have an impression that gluster-block is for management, then I > would really want to correct you at this point. > > Some of my near future plans for gluster-block: > * Allow exporting blocks with FUSE access via fileIO backstore to > improve large-file workloads, draft: > https://github.com/gluster/gluster-block/pull/58 > * Accommodate kernel loopback handling for local only applications > * The same way we can accommodate nbd app/client, and IMHO this effort > shouldn't take 1 or 2 days to get it merged with in gluster-block and > ready for a go release. > > > Hope that clarifies it. > > > Best Regards, > -- > Prasanna > > Regards, > Xiubo Li (@lxbsz) > > [1] -https://github.com/gluster/gluster-block > [2] -https://github.com/open-iscsi > [3] -https://github.com/gluster/glusterfs/issues/633 > [4] -https://github.com/gluster/nbd-runner > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Thu Mar 21 13:07:42 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Thu, 21 Mar 2019 18:37:42 +0530 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: <14B1CACB-4049-42DD-AB69-3B75FBD6BE30@mdpi.com> References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> <14B1CACB-4049-42DD-AB69-3B75FBD6BE30@mdpi.com> Message-ID: Hey Milos, I see that gfid got healed for those directories from the getfattr output and the glfsheal log also has messages corresponding to deleting the entries on one brick as part of healing which then got recreated on the brick with the correct gfid. Can you run the "guster volume heal " & "gluster volume heal info" command and paste the output here? If you still see entries pending heal, give the latest glustershd.log files from both the nodes along with the getfattr output of the files which are listed in the heal info output. Regards, Karthik On Thu, Mar 21, 2019 at 6:03 PM Milos Cuculovic wrote: > Sure: > > brick1: > ???????????????????????????????????????????????????????????? > ???????????????????????????????????????????????????????????? > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 > trusted.gfid=0xe358ff34504241d387efe1e76eb28bb0 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > ???????????????????????????????????????????????????????????? > sudo stat > /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 > File: > '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 40809094709 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:26.994047597 +0100 > Modify: 2019-03-20 11:28:28.294689870 +0100 > Change: 2019-03-21 13:01:03.077654239 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > File: > '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 49399908865 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:07:20.342140927 +0100 > Modify: 2019-03-20 11:28:28.318690015 +0100 > Change: 2019-03-21 13:01:03.133654344 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > File: > '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 53706303549 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:55.414097315 +0100 > Modify: 2019-03-20 11:28:28.362690281 +0100 > Change: 2019-03-21 13:01:03.141654359 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > File: > '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 57990935591 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:07:08.558120309 +0100 > Modify: 2019-03-20 11:28:14.226604869 +0100 > Change: 2019-03-21 13:01:03.189654448 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > File: > '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 62291339781 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:02.070003998 +0100 > Modify: 2019-03-20 11:28:28.458690861 +0100 > Change: 2019-03-21 13:01:03.281654621 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > File: > '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 66574223479 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:28:10.826584325 +0100 > Modify: 2019-03-20 11:28:10.834584374 +0100 > Change: 2019-03-20 14:06:07.937449353 +0100 > Birth: - > root at storage3:/var/log/glusterfs# > ???????????????????????????????????????????????????????????? > ???????????????????????????????????????????????????????????? > > brick2: > ???????????????????????????????????????????????????????????? > ???????????????????????????????????????????????????????????? > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > ???????????????????????????????????????????????????????????? > > sudo stat > /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 > File: > '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 42232631305 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:26.994047597 +0100 > Modify: 2019-03-20 11:28:28.294689870 +0100 > Change: 2019-03-21 13:01:03.078748131 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > File: > '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 78589109305 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:07:20.342140927 +0100 > Modify: 2019-03-20 11:28:28.318690015 +0100 > Change: 2019-03-21 13:01:03.134748477 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > File: > '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 54972096517 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:55.414097315 +0100 > Modify: 2019-03-20 11:28:28.362690281 +0100 > Change: 2019-03-21 13:01:03.162748650 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > File: > '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 40821259275 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:07:08.558120309 +0100 > Modify: 2019-03-20 11:28:14.226604869 +0100 > Change: 2019-03-21 13:01:03.194748848 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > File: > '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 15876654 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:02.070003998 +0100 > Modify: 2019-03-20 11:28:28.458690861 +0100 > Change: 2019-03-21 13:01:03.282749392 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > File: > '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 49408944650 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:28:10.826584325 +0100 > Modify: 2019-03-20 11:28:10.834584374 +0100 > Change: 2019-03-20 14:06:07.940849268 +0100 > Birth: - > ???????????????????????????????????????????????????????????? > ???????????????????????????????????????????????????????????? > > The file is from brick 2 that I upgraded and started the heal on. > > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email: cuculovic at mdpi.com > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message > are confidential and intended solely for the use of the individual or > entity to whom they are addressed. If you have received this message in > error, please notify me and delete this message from your system. You may > not copy this message in its entirety or in part, or disclose its contents > to anyone. > > On 21 Mar 2019, at 13:05, Karthik Subrahmanya wrote: > > Can you give me the stat & getfattr output of all those 6 entries from > both the bricks and the glfsheal-.log file from the node where you > run this command? > Meanwhile can you also try running this with the source-brick option? > > On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic > wrote: > >> Thank you Karthik, >> >> I have run this for all files (see example below) and it says the file is >> not in split-brain: >> >> sudo gluster volume heal storage2 split-brain latest-mtime >> /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >> Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: File >> not in split-brain. >> Volume heal failed. >> >> >> - Kindest regards, >> >> Milos Cuculovic >> IT Manager >> >> --- >> MDPI AG >> Postfach, CH-4020 Basel, Switzerland >> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >> Tel. +41 61 683 77 35 >> Fax +41 61 302 89 18 >> Email: cuculovic at mdpi.com >> Skype: milos.cuculovic.mdpi >> >> Disclaimer: The information and files contained in this message >> are confidential and intended solely for the use of the individual or >> entity to whom they are addressed. If you have received this message in >> error, please notify me and delete this message from your system. You may >> not copy this message in its entirety or in part, or disclose its contents >> to anyone. >> >> On 21 Mar 2019, at 12:36, Karthik Subrahmanya >> wrote: >> >> Hi Milos, >> >> Thanks for the logs and the getfattr output. >> From the logs I can see that there are 6 entries under the >> directory "/data/data-cluster/dms/final_archive" named >> 41be9ff5ec05c4b1c989c6053e709e59 >> 5543982fab4b56060aa09f667a8ae617 >> a8b7f31775eebc8d1867e7f9de7b6eaf >> c1d3f3c2d7ae90e891e671e2f20d5d4b >> e5934699809a3b6dcfc5945f408b978b >> e7cdc94f60d390812a5f9754885e119e >> which are having gfid mismatch, so the heal is failing on this directory. >> >> You can use the CLI option to resolve these files from gfid mismatch. You >> can use any of the 3 methods available: >> 1. bigger-file >> gluster volume heal split-brain bigger-file >> >> 2. latest-mtime >> gluster volume heal split-brain latest-mtime >> >> 3. source-brick >> gluster volume heal split-brain source-brick >> >> >> where must be absolute path w.r.t. the volume, starting with '/'. >> If all those entries are directories then go for either >> latest-mtime/source-brick option. >> After you resolve all these gfid-mismatches, run the "gluster volume heal >> " command. Then check the heal info and let me know the result. >> >> Regards, >> Karthik >> >> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic >> wrote: >> >>> Sure, thank you for following up. >>> >>> About the commands, here is what I see: >>> >>> brick1: >>> ????????????????????????????????????? >>> sudo gluster volume heal storage2 info >>> Brick storage3:/data/data-cluster >>> >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick storage4:/data/data-cluster >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 2 >>> ????????????????????????????????????? >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.afr.storage2-client-1=0x000000000000000000000010 >>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> ????????????????????????????????????? >>> stat /data/data-cluster/dms/final_archive >>> File: '/data/data-cluster/dms/final_archive' >>> Size: 3497984 Blocks: 8768 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 26427748396 Links: 72123 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2018-10-09 04:22:40.514629044 +0200 >>> Modify: 2019-03-21 11:55:37.382278863 +0100 >>> Change: 2019-03-21 11:55:37.382278863 +0100 >>> Birth: - >>> ????????????????????????????????????? >>> ????????????????????????????????????? >>> >>> brick2: >>> ????????????????????????????????????? >>> sudo gluster volume heal storage2 info >>> Brick storage3:/data/data-cluster >>> >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick storage4:/data/data-cluster >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 2 >>> ????????????????????????????????????? >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.afr.storage2-client-0=0x000000000000000000000001 >>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> ????????????????????????????????????? >>> stat /data/data-cluster/dms/final_archive >>> File: '/data/data-cluster/dms/final_archive' >>> Size: 3497984 Blocks: 8760 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 13563551265 Links: 72124 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2018-10-09 04:22:40.514629044 +0200 >>> Modify: 2019-03-21 11:55:46.382565124 +0100 >>> Change: 2019-03-21 11:55:46.382565124 +0100 >>> Birth: - >>> ????????????????????????????????????? >>> >>> Hope this helps. >>> >>> - Kindest regards, >>> >>> Milos Cuculovic >>> IT Manager >>> >>> --- >>> MDPI AG >>> Postfach, CH-4020 Basel, Switzerland >>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>> Tel. +41 61 683 77 35 >>> Fax +41 61 302 89 18 >>> Email: cuculovic at mdpi.com >>> Skype: milos.cuculovic.mdpi >>> >>> Disclaimer: The information and files contained in this message >>> are confidential and intended solely for the use of the individual or >>> entity to whom they are addressed. If you have received this message in >>> error, please notify me and delete this message from your system. You may >>> not copy this message in its entirety or in part, or disclose its contents >>> to anyone. >>> >>> On 21 Mar 2019, at 11:43, Karthik Subrahmanya >>> wrote: >>> >>> Can you attach the "glustershd.log" file which will be present under >>> "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m >>> . -e hex " output of all the entries listed in the heal >>> info output from both the bricks? >>> >>> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic >>> wrote: >>> >>>> Thanks Karthik! >>>> >>>> I was trying to find some resolution methods from [2] but unfortunately >>>> none worked (I can explain what I tried if needed). >>>> >>>> I guess the volume you are talking about is of type replica-2 (1x2). >>>> >>>> That?s correct, aware of the arbiter solution but still didn?t took >>>> time to implement. >>>> >>>> From the info results I posted, how to know in which situation I am. No >>>> files are mentioned in spit brain, only directories. One brick has 3 >>>> entries and one two entries. >>>> >>>> sudo gluster volume heal storage2 info >>>> [sudo] password for sshadmin: >>>> Brick storage3:/data/data-cluster >>>> >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 3 >>>> >>>> Brick storage4:/data/data-cluster >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 2 >>>> >>>> - Kindest regards, >>>> >>>> Milos Cuculovic >>>> IT Manager >>>> >>>> --- >>>> MDPI AG >>>> Postfach, CH-4020 Basel, Switzerland >>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>> Tel. +41 61 683 77 35 >>>> Fax +41 61 302 89 18 >>>> Email: cuculovic at mdpi.com >>>> Skype: milos.cuculovic.mdpi >>>> >>>> Disclaimer: The information and files contained in this message >>>> are confidential and intended solely for the use of the individual or >>>> entity to whom they are addressed. If you have received this message in >>>> error, please notify me and delete this message from your system. You may >>>> not copy this message in its entirety or in part, or disclose its contents >>>> to anyone. >>>> >>>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya >>>> wrote: >>>> >>>> Hi, >>>> >>>> Note: I guess the volume you are talking about is of type replica-2 >>>> (1x2). Usually replica 2 volumes are prone to split-brain. If you can >>>> consider converting them to arbiter or replica-3, they will handle most of >>>> the cases which can lead to slit-brains. For more information see [1]. >>>> >>>> Resolving the split-brain: [2] talks about how to interpret the heal >>>> info output and different ways to resolve them using the CLI/manually/using >>>> the favorite-child-policy. >>>> If you are having entry split brain, and is a gfid split-brain >>>> (file/dir having different gfids on the replica bricks) then you can use >>>> the CLI option to resolve them. If a directory is in gfid split-brain in a >>>> distributed-replicate volume and you are using the source-brick option >>>> please make sure you use the brick of this subvolume, which has the same >>>> gfid as that of the other distribute subvolume(s) where you have the >>>> correct gfid, as the source. >>>> If you are having a type mismatch then follow the steps in [3] to >>>> resolve the split-brain. >>>> >>>> [1] >>>> https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >>>> [2] >>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >>>> [3] >>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain >>>> >>>> HTH, >>>> Karthik >>>> >>>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic >>>> wrote: >>>> >>>>> I was now able to catch the split brain log: >>>>> >>>>> sudo gluster volume heal storage2 info >>>>> Brick storage3:/data/data-cluster >>>>> >>>>> >>>>> /dms/final_archive - Is in split-brain >>>>> >>>>> Status: Connected >>>>> Number of entries: 3 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> >>>>> /dms/final_archive - Is in split-brain >>>>> >>>>> Status: Connected >>>>> Number of entries: 2 >>>>> >>>>> Milos >>>>> >>>>> On 21 Mar 2019, at 09:07, Milos Cuculovic wrote: >>>>> >>>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the >>>>> heal shows this: >>>>> >>>>> sudo gluster volume heal storage2 info >>>>> Brick storage3:/data/data-cluster >>>>> >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 3 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 2 >>>>> >>>>> The same files stay there. From time to time the status of the >>>>> /dms/final_archive is in split brain at the following command shows: >>>>> >>>>> sudo gluster volume heal storage2 info split-brain >>>>> Brick storage3:/data/data-cluster >>>>> /dms/final_archive >>>>> Status: Connected >>>>> Number of entries in split-brain: 1 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> /dms/final_archive >>>>> Status: Connected >>>>> Number of entries in split-brain: 1 >>>>> >>>>> How to know the file who is in split brain? The files in >>>>> /dms/final_archive are not very important, fine to remove (ideally resolve >>>>> the split brain) for the ones that differ. >>>>> >>>>> I can only see the directory and GFID. Any idea on how to resolve this >>>>> situation as I would like to continue with the upgrade on the 2nd server, >>>>> and for this the heal needs to be done with 0 entries in sudo gluster >>>>> volume heal storage2 info >>>>> >>>>> Thank you in advance, Milos. >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauro.tridici at cmcc.it Thu Mar 21 13:08:38 2019 From: mauro.tridici at cmcc.it (Mauro Tridici) Date: Thu, 21 Mar 2019 14:08:38 +0100 Subject: [Gluster-users] "rpc_clnt_ping_timer_expired" errors In-Reply-To: References: <96B07283-D8AB-4F06-909D-E00424625528@cmcc.it> <42758A0E-8BE9-497D-BDE3-55D7DC633BC7@cmcc.it> <6A8CF4A4-98EA-48C3-A059-D60D1B2721C7@cmcc.it> <2CF49168-9C1B-4931-8C34-8157262A137A@cmcc.it> <7A151CC9-A0AE-4A45-B450-A4063D216D9E@cmcc.it> <32D53ECE-3F49-4415-A6EE-241B351AC2BA@cmcc.it> <4685A75B-5978-4338-9C9F-4A02FB40B9BC@cmcc.it> <4D2E6B04-C2E8-4FD5-B72D-E7C05931C6F9@cmcc.it> <4E332A56-B318-4BC1-9F44-84AB4392A5DE@cmcc.it> <832FD362-3B14-40D8-8530-604419300476@cmcc.it> <8D926643-1093-48ED-823F-D8F117F702CF@cmcc.it> <9D0BE438-DF11-4D0A-AB85-B44357032F29@cmcc.it> <5F0AC378-8170-4342-8473-9C17159CAC1D@cmcc.it> <7A50E86D-9E27-4EA7-883B-45E9F973991A@cmcc.it> <58B5DB7F-DCB4-4FBF-B1F7-681315B1613A@cmcc.it> <6327B44F-4E7E-46BB-A74C-70F4457DD1EB@cmcc.it> <0167DF4A-8329-4A1A-B439-857DFA6F78BB@cmcc.it> <763F334E-35B8-4729-B8E1-D30866754EEE@cmcc.it> <91DFC9AC-4805-41EB-AC6F-5722E018DD6E@cmcc.it> <8A9752B8-B231-4570-8FF4-8D3D781E7D42@cmcc.it> <47A24804-F975-4EE6-9FA5-67FCDA18D707@cmcc.it> <637FF5D2-D1F4-4686-9D48-646A96F67B96@cmcc.it> <4A87495F-3755-48F7-8507-085869069C64@cmcc.it> <3854BBF6-5B98-4AB3-A67E-E7DE59E69A63@cmcc.it> <313DA021-9173-4899-96B0-831B10B00B61@cmcc.it> <17996AFD-DFC8-40F3-9D09-DB6DDAD5B7D6@cmcc.it> <7074B5D8-955A-4802-A9F3-606C99734417@cmcc.it> <83B84BF9-8334-4230-B6F8-0BC4BFBEFF15@cmcc.it> <133B9AE4-9792-4F72-AD91-D36E7B9EC711@cmcc.it> <6611C4B0-57FD-4390-88B5-BD373555D4C5@cmcc.it> <93130243-E356-4425-8F15-69BE61562E2F@cmcc.it> Message-ID: Do you think that I made some mistake during ?top -bHd 3 > top_bHd3.txt? command execution? (I executed top command and I interrupted it after some seconds, maybe 10 seconds) Or do you means that there is something wrong on gluster services? Thank you, Mauro > On 21 Mar 2019, at 11:48, Raghavendra Gowdappa wrote: > > > > On Thu, Mar 21, 2019 at 4:10 PM Mauro Tridici > wrote: > Hi Raghavendra, > > the number of errors reduced, but during last days I received some error notifications from Nagios server similar to the following one: > > ***** Nagios ***** > > Notification Type: PROBLEM > > Service: Brick - /gluster/mnt5/brick > Host: s04 > Address: s04-stg > State: CRITICAL > > Date/Time: Mon Mar 18 19:56:36 CET 2019 > > Additional Info: > > CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. > > The error was related only to s04 gluster server. > > So, following your suggestions, I executed, on s04 node, the top command. > In attachment, you can find the related output. > > top output doesn't contain cmd/thread names. Was there anything wrong. > > > Thank you very much for your help. > Regards, > Mauro > > > >> On 14 Mar 2019, at 13:31, Raghavendra Gowdappa > wrote: >> >> Thanks Mauro. >> >> On Thu, Mar 14, 2019 at 3:38 PM Mauro Tridici > wrote: >> Hi Raghavendra, >> >> I just changed the client option value to 8. >> I will check the volume behaviour during the next hours. >> >> The GlusterFS version is 3.12.14. >> >> I will provide you the logs as soon as the activity load will be high. >> Thank you, >> Mauro >> >>> On 14 Mar 2019, at 04:57, Raghavendra Gowdappa > wrote: >>> >>> >>> >>> On Wed, Mar 13, 2019 at 3:55 PM Mauro Tridici > wrote: >>> Hi Raghavendra, >>> >>> Yes, server.event-thread has been changed from 4 to 8. >>> >>> Was client.event-thread value too changed to 8? If not, I would like to know the results of including this tuning too. Also, if possible, can you get the output of following command from problematic clients and bricks (during the duration when load tends to be high and ping-timer-expiry is seen)? >>> >>> # top -bHd 3 >>> >>> This will help us to know CPU utilization of event-threads. >>> >>> And I forgot to ask, what version of Glusterfs are you using? >>> >>> During last days, I noticed that the error events are still here although they have been considerably reduced. >>> >>> So, I used grep command against the log files in order to provide you a global vision about the warning, error and critical events appeared today at 06:xx (may be useful I hope). >>> I collected the info from s06 gluster server, but the behaviour is the the almost the same on the other gluster servers. >>> >>> ERRORS: >>> CWD: /var/log/glusterfs >>> COMMAND: grep " E " *.log |grep "2019-03-13 06:" >>> >>> (I can see a lot of this kind of message in the same period but I'm notifying you only one record for each type of error) >>> >>> glusterd.log:[2019-03-13 06:12:35.982863] E [MSGID: 101042] [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /var/run/gluster/tier2_quota_list/ >>> >>> glustershd.log:[2019-03-13 06:14:28.666562] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a71ddcebb] (--> /lib64/libgfr >>> pc.so.0(saved_frames_unwind+0x1de)[0x7f4a71ba1d9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4a71ba1ebe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup >>> +0x90)[0x7f4a71ba3640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f4a71ba4130] ))))) 0-tier2-client-55: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) >>> called at 2019-03-13 06:14:14.858441 (xid=0x17fddb50) >>> >>> glustershd.log:[2019-03-13 06:17:48.883825] E [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to 192.168.0.55:49158 failed (Connection timed out); disco >>> nnecting socket >>> glustershd.log:[2019-03-13 06:19:58.931798] E [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to 192.168.0.55:49158 failed (Connection timed out); disco >>> nnecting socket >>> glustershd.log:[2019-03-13 06:22:08.979829] E [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to 192.168.0.55:49158 failed (Connection timed out); disco >>> nnecting socket >>> glustershd.log:[2019-03-13 06:22:36.226847] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote operation failed [Transport endpoint >>> is not connected] >>> glustershd.log:[2019-03-13 06:22:36.306669] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote operation failed [Transport endpoint >>> is not connected] >>> glustershd.log:[2019-03-13 06:22:36.385257] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote operation failed [Transport endpoint >>> is not connected] >>> >>> WARNINGS: >>> CWD: /var/log/glusterfs >>> COMMAND: grep " W " *.log |grep "2019-03-13 06:" >>> >>> (I can see a lot of this kind of message in the same period but I'm notifying you only one record for each type of warnings) >>> >>> glustershd.log:[2019-03-13 06:14:28.666772] W [MSGID: 114031] [client-rpc-fops.c:1080:client3_3_getxattr_cbk] 0-tier2-client-55: remote operation failed. Path: >> 0f-f34d-4c25-bbe8-74bde0248d7e> (b6b35d0f-f34d-4c25-bbe8-74bde0248d7e). Key: (null) [Transport endpoint is not connected] >>> >>> glustershd.log:[2019-03-13 06:14:31.421576] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation with some subvolumes unavailable (2) >>> >>> glustershd.log:[2019-03-13 06:15:31.547417] W [MSGID: 122032] [ec-heald.c:266:ec_shd_index_sweep] 0-tier2-disperse-9: unable to get index-dir on tier2-client-55 [Operation >>> now in progress] >>> >>> quota-mount-tier2.log:[2019-03-13 06:12:36.116277] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'trans >>> port.address-family', continuing with correction >>> quota-mount-tier2.log:[2019-03-13 06:12:36.198430] W [MSGID: 101174] [graph.c:363:_log_if_unknown_option] 0-tier2-readdir-ahead: option 'parallel-readdir' is not recognized >>> quota-mount-tier2.log:[2019-03-13 06:12:37.945007] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f340892be25] -->/usr/sbin/glusterfs(gluste >>> rfs_sigwaiter+0xe5) [0x55ef010164b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55ef0101632b] ) 0-: received signum (15), shutting down >>> >>> CRITICALS: >>> CWD: /var/log/glusterfs >>> COMMAND: grep " C " *.log |grep "2019-03-13 06:" >>> >>> no critical errors at 06:xx >>> only one critical error during the day >>> >>> [root at s06 glusterfs]# grep " C " *.log |grep "2019-03-13" >>> glustershd.log:[2019-03-13 02:21:29.126279] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server 192.168.0.55:49158 has not responded in the last 42 seconds, disconnecting. >>> >>> >>> Thank you very much for your help. >>> Regards, >>> Mauro >>> >>>> On 12 Mar 2019, at 05:17, Raghavendra Gowdappa > wrote: >>>> >>>> Was the suggestion to increase server.event-thread values tried? If yes, what were the results? >>>> >>>> On Mon, Mar 11, 2019 at 2:40 PM Mauro Tridici > wrote: >>>> Dear All, >>>> >>>> do you have any suggestions about the right way to "debug" this issue? >>>> In attachment, the updated logs of ?s06" gluster server. >>>> >>>> I noticed a lot of intermittent warning and error messages. >>>> >>>> Thank you in advance, >>>> Mauro >>>> >>>> >>>> >>>>> On 4 Mar 2019, at 18:45, Raghavendra Gowdappa > wrote: >>>>> >>>>> >>>>> +Gluster Devel , +Gluster-users >>>>> >>>>> I would like to point out another issue. Even if what I suggested prevents disconnects, part of the solution would be only symptomatic treatment and doesn't address the root cause of the problem. In most of the ping-timer-expiry issues, the root cause is the increased load on bricks and the inability of bricks to be responsive under high load. So, the actual solution would be doing any or both of the following: >>>>> * identify the source of increased load and if possible throttle it. Internal heal processes like self-heal, rebalance, quota heal are known to pump traffic into bricks without much throttling (io-threads _might_ do some throttling, but my understanding is its not sufficient). >>>>> * identify the reason for bricks to become unresponsive during load. This may be fixable issues like not enough event-threads to read from network or difficult to fix issues like fsync on backend fs freezing the process or semi fixable issues (in code) like lock contention. >>>>> >>>>> So any genuine effort to fix ping-timer-issues (to be honest most of the times they are not issues related to rpc/network) would involve performance characterization of various subsystems on bricks and clients. Various subsystems can include (but not necessarily limited to), underlying OS/filesystem, glusterfs processes, CPU consumption etc >>>>> >>>>> regards, >>>>> Raghavendra >>>>> >>>>> On Mon, Mar 4, 2019 at 9:31 PM Mauro Tridici > wrote: >>>>> Thank you, let?s try! >>>>> I will inform you about the effects of the change. >>>>> >>>>> Regards, >>>>> Mauro >>>>> >>>>>> On 4 Mar 2019, at 16:55, Raghavendra Gowdappa > wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Mar 4, 2019 at 8:54 PM Mauro Tridici > wrote: >>>>>> Hi Raghavendra, >>>>>> >>>>>> thank you for your reply. >>>>>> Yes, you are right. It is a problem that seems to happen randomly. >>>>>> At this moment, server.event-threads value is 4. I will try to increase this value to 8. Do you think that it could be a valid value ? >>>>>> >>>>>> Yes. We can try with that. You should see at least frequency of ping-timer related disconnects reduce with this value (even if it doesn't eliminate the problem completely). >>>>>> >>>>>> >>>>>> Regards, >>>>>> Mauro >>>>>> >>>>>> >>>>>>> On 4 Mar 2019, at 15:36, Raghavendra Gowdappa > wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 4, 2019 at 8:01 PM Nithya Balachandran > wrote: >>>>>>> Hi Mauro, >>>>>>> >>>>>>> It looks like some problem on s06. Are all your other nodes ok? Can you send us the gluster logs from this node? >>>>>>> >>>>>>> @Raghavendra G , do you have any idea as to how this can be debugged? Maybe running top ? Or debug brick logs? >>>>>>> >>>>>>> If we can reproduce the problem, collecting tcpdump on both ends of connection will help. But, one common problem is these bugs are inconsistently reproducible and hence we may not be able to capture tcpdump at correct intervals. Other than that, we can try to collect some evidence that poller threads were busy (waiting on locks). But, not sure what debug data provides that information. >>>>>>> >>>>>>> From what I know, its difficult to collect evidence for this issue and we could only reason about it. >>>>>>> >>>>>>> We can try a workaround though - try increasing server.event-threads and see whether ping-timer expiry issues go away with an optimal value. If that's the case, it kind of provides proof for our hypothesis. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Nithya >>>>>>> >>>>>>> On Mon, 4 Mar 2019 at 15:25, Mauro Tridici > wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> some minutes ago I received this message from NAGIOS server >>>>>>> >>>>>>> ***** Nagios ***** >>>>>>> >>>>>>> Notification Type: PROBLEM >>>>>>> >>>>>>> Service: Brick - /gluster/mnt2/brick >>>>>>> Host: s06 >>>>>>> Address: s06-stg >>>>>>> State: CRITICAL >>>>>>> >>>>>>> Date/Time: Mon Mar 4 10:25:33 CET 2019 >>>>>>> >>>>>>> Additional Info: >>>>>>> CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. >>>>>>> >>>>>>> I checked the network, RAM and CPUs usage on s06 node and everything seems to be ok. >>>>>>> No bricks are in error state. In /var/log/messages, I detected again a crash of ?check_vol_utili? that I think it is a module used by NRPE executable (that is the NAGIOS client). >>>>>>> >>>>>>> Mar 4 10:15:29 s06 kernel: traps: check_vol_utili[161224] general protection ip:7facffa0a66d sp:7ffe9f4e6fc0 error:0 in libglusterfs.so.0.0.1[7facff9b7000+f7000] >>>>>>> Mar 4 10:15:29 s06 abrt-hook-ccpp: Process 161224 (python2.7) of user 0 killed by SIGSEGV - dumping core >>>>>>> Mar 4 10:15:29 s06 abrt-server: Generating core_backtrace >>>>>>> Mar 4 10:15:29 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>>>> Mar 4 10:16:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 4 10:16:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 4 10:16:01 s06 systemd: Started Session 201010 of user root. >>>>>>> Mar 4 10:16:01 s06 systemd: Starting Session 201010 of user root. >>>>>>> Mar 4 10:16:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 4 10:16:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 4 10:16:24 s06 abrt-server: Duplicate: UUID >>>>>>> Mar 4 10:16:24 s06 abrt-server: DUP_OF_DIR: /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>>>>> Mar 4 10:16:24 s06 abrt-server: Deleting problem directory ccpp-2019-03-04-10:15:29-161224 (dup of ccpp-2018-09-25-12:27:42-13041) >>>>>>> Mar 4 10:16:24 s06 abrt-server: Generating core_backtrace >>>>>>> Mar 4 10:16:24 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>>>> Mar 4 10:16:24 s06 abrt-server: Cannot notify '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event 'report_uReport' exited with 1 >>>>>>> Mar 4 10:16:24 s06 abrt-hook-ccpp: Process 161391 (python2.7) of user 0 killed by SIGABRT - dumping core >>>>>>> Mar 4 10:16:25 s06 abrt-server: Generating core_backtrace >>>>>>> Mar 4 10:16:25 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>>>> Mar 4 10:17:01 s06 systemd: Created slice User Slice of root. >>>>>>> >>>>>>> Also, I noticed the following errors that I think are very critical: >>>>>>> >>>>>>> Mar 4 10:21:12 s06 glustershd[20355]: [2019-03-04 09:21:12.954798] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server 192.168.0.55:49158 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 4 10:22:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 4 10:22:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 4 10:22:01 s06 systemd: Started Session 201017 of user root. >>>>>>> Mar 4 10:22:01 s06 systemd: Starting Session 201017 of user root. >>>>>>> Mar 4 10:22:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 4 10:22:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 4 10:22:03 s06 glustershd[20355]: [2019-03-04 09:22:03.964120] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server 192.168.0.54:49165 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 4 10:23:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 4 10:23:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 4 10:23:01 s06 systemd: Started Session 201018 of user root. >>>>>>> Mar 4 10:23:01 s06 systemd: Starting Session 201018 of user root. >>>>>>> Mar 4 10:23:02 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 4 10:23:02 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 4 10:24:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 4 10:24:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 4 10:24:01 s06 systemd: Started Session 201019 of user root. >>>>>>> Mar 4 10:24:01 s06 systemd: Starting Session 201019 of user root. >>>>>>> Mar 4 10:24:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 4 10:24:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 4 10:24:03 s06 glustershd[20355]: [2019-03-04 09:24:03.982502] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-16: server 192.168.0.52:49158 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746109] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-3: server 192.168.0.51:49153 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746215] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server 192.168.0.52:49156 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746260] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-21: server 192.168.0.51:49159 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746296] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server 192.168.0.52:49161 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 4 10:24:05 s06 quotad[20374]: [2019-03-04 09:24:05.746413] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-60: server 192.168.0.54:49165 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 4 10:24:07 s06 glustershd[20355]: [2019-03-04 09:24:07.982952] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-45: server 192.168.0.54:49155 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 4 10:24:18 s06 glustershd[20355]: [2019-03-04 09:24:18.990929] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-25: server 192.168.0.52:49161 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 4 10:24:31 s06 glustershd[20355]: [2019-03-04 09:24:31.995781] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-20: server 192.168.0.53:49159 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 4 10:25:01 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 4 10:25:01 s06 systemd: Starting User Slice of root. >>>>>>> Mar 4 10:25:01 s06 systemd: Started Session 201020 of user root. >>>>>>> Mar 4 10:25:01 s06 systemd: Starting Session 201020 of user root. >>>>>>> Mar 4 10:25:01 s06 systemd: Removed slice User Slice of root. >>>>>>> Mar 4 10:25:01 s06 systemd: Stopping User Slice of root. >>>>>>> Mar 4 10:25:57 s06 systemd: Created slice User Slice of root. >>>>>>> Mar 4 10:25:57 s06 systemd: Starting User Slice of root. >>>>>>> Mar 4 10:25:57 s06 systemd-logind: New session 201021 of user root. >>>>>>> Mar 4 10:25:57 s06 systemd: Started Session 201021 of user root. >>>>>>> Mar 4 10:25:57 s06 systemd: Starting Session 201021 of user root. >>>>>>> Mar 4 10:26:01 s06 systemd: Started Session 201022 of user root. >>>>>>> Mar 4 10:26:01 s06 systemd: Starting Session 201022 of user root. >>>>>>> Mar 4 10:26:21 s06 nrpe[162388]: Error: Could not complete SSL handshake with 192.168.1.56 : 5 >>>>>>> Mar 4 10:27:01 s06 systemd: Started Session 201023 of user root. >>>>>>> Mar 4 10:27:01 s06 systemd: Starting Session 201023 of user root. >>>>>>> Mar 4 10:28:01 s06 systemd: Started Session 201024 of user root. >>>>>>> Mar 4 10:28:01 s06 systemd: Starting Session 201024 of user root. >>>>>>> Mar 4 10:29:01 s06 systemd: Started Session 201025 of user root. >>>>>>> Mar 4 10:29:01 s06 systemd: Starting Session 201025 of user root. >>>>>>> >>>>>>> But, unfortunately, I don?t understand why it is happening. >>>>>>> Now, NAGIOS server shows that s06 status is ok: >>>>>>> >>>>>>> ***** Nagios ***** >>>>>>> >>>>>>> Notification Type: RECOVERY >>>>>>> >>>>>>> Service: Brick - /gluster/mnt2/brick >>>>>>> Host: s06 >>>>>>> Address: s06-stg >>>>>>> State: OK >>>>>>> >>>>>>> Date/Time: Mon Mar 4 10:35:23 CET 2019 >>>>>>> >>>>>>> Additional Info: >>>>>>> OK: Brick /gluster/mnt2/brick is up >>>>>>> >>>>>>> Nothing is changed from RAM, CPUs, and NETWORK point of view. >>>>>>> /var/log/message file has been updated: >>>>>>> >>>>>>> Mar 4 10:32:01 s06 systemd: Starting Session 201029 of user root. >>>>>>> Mar 4 10:32:30 s06 glustershd[20355]: [2019-03-04 09:32:30.069082] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-10: server 192.168.0.52:49156 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 4 10:32:55 s06 glustershd[20355]: [2019-03-04 09:32:55.074689] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-66: server 192.168.0.54:49167 has not responded in the last 42 seconds, disconnecting. >>>>>>> Mar 4 10:33:01 s06 systemd: Started Session 201030 of user root. >>>>>>> Mar 4 10:33:01 s06 systemd: Starting Session 201030 of user root. >>>>>>> Mar 4 10:34:01 s06 systemd: Started Session 201031 of user root. >>>>>>> Mar 4 10:34:01 s06 systemd: Starting Session 201031 of user root. >>>>>>> Mar 4 10:35:01 s06 nrpe[162562]: Could not read request from client 192.168.1.56, bailing out... >>>>>>> Mar 4 10:35:01 s06 nrpe[162562]: INFO: SSL Socket Shutdown. >>>>>>> Mar 4 10:35:01 s06 systemd: Started Session 201032 of user root. >>>>>>> Mar 4 10:35:01 s06 systemd: Starting Session 201032 of user root. >>>>>>> >>>>>>> Could you please help me to understand what it?s happening ? >>>>>>> Thank you in advance. >>>>>>> >>>>>>> Rergards, >>>>>>> Mauro >>>>>>> >>>>>>> >>>>>>>> On 1 Mar 2019, at 12:17, Mauro Tridici > wrote: >>>>>>>> >>>>>>>> >>>>>>>> Thank you, Milind. >>>>>>>> I executed the instructions you suggested: >>>>>>>> >>>>>>>> - grep ?blocked for? /var/log/messages on s06 returns no output (no ?blocked? word is detected in messages file); >>>>>>>> - in /var/log/messages file I can see this kind of error repeated for a lot of times: >>>>>>>> >>>>>>>> Mar 1 08:43:01 s06 systemd: Starting Session 196071 of user root. >>>>>>>> Mar 1 08:43:01 s06 systemd: Removed slice User Slice of root. >>>>>>>> Mar 1 08:43:01 s06 systemd: Stopping User Slice of root. >>>>>>>> Mar 1 08:43:02 s06 kernel: traps: check_vol_utili[57091] general protection ip:7f88e76ee66d sp:7ffe5a5bcc30 error:0 in libglusterfs.so.0.0.1[7f88e769b000+f7000] >>>>>>>> Mar 1 08:43:02 s06 abrt-hook-ccpp: Process 57091 (python2.7) of user 0 killed by SIGSEGV - dumping core >>>>>>>> Mar 1 08:43:02 s06 abrt-server: Generating core_backtrace >>>>>>>> Mar 1 08:43:02 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>>>>> Mar 1 08:43:58 s06 abrt-server: Duplicate: UUID >>>>>>>> Mar 1 08:43:58 s06 abrt-server: DUP_OF_DIR: /var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041 >>>>>>>> Mar 1 08:43:58 s06 abrt-server: Deleting problem directory ccpp-2019-03-01-08:43:02-57091 (dup of ccpp-2018-09-25-12:27:42-13041) >>>>>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Activating service name='org.freedesktop.problems' (using servicehelper) >>>>>>>> Mar 1 08:43:58 s06 dbus[1872]: [system] Successfully activated service 'org.freedesktop.problems' >>>>>>>> Mar 1 08:43:58 s06 abrt-server: Generating core_backtrace >>>>>>>> Mar 1 08:43:58 s06 abrt-server: Error: Unable to open './coredump': No such file or directory >>>>>>>> Mar 1 08:43:58 s06 abrt-server: Cannot notify '/var/tmp/abrt/ccpp-2018-09-25-12:27:42-13041' via uReport: Event 'report_uReport' exited with 1 >>>>>>>> Mar 1 08:44:01 s06 systemd: Created slice User Slice of root. >>>>>>>> Mar 1 08:44:01 s06 systemd: Starting User Slice of root. >>>>>>>> Mar 1 08:44:01 s06 systemd: Started Session 196072 of user root. >>>>>>>> Mar 1 08:44:01 s06 systemd: Starting Session 196072 of user root. >>>>>>>> Mar 1 08:44:01 s06 systemd: Removed slice User Slice of root. >>>>>>>> >>>>>>>> - in /var/log/messages file I can see also 4 errors related to other cluster servers: >>>>>>>> >>>>>>>> Mar 1 11:05:01 s06 systemd: Starting User Slice of root. >>>>>>>> Mar 1 11:05:01 s06 systemd: Started Session 196230 of user root. >>>>>>>> Mar 1 11:05:01 s06 systemd: Starting Session 196230 of user root. >>>>>>>> Mar 1 11:05:01 s06 systemd: Removed slice User Slice of root. >>>>>>>> Mar 1 11:05:01 s06 systemd: Stopping User Slice of root. >>>>>>>> Mar 1 11:05:59 s06 glustershd[70117]: [2019-03-01 10:05:59.347094] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-33: server 192.168.0.51:49163 has not responded in the last 42 seconds, disconnecting. >>>>>>>> Mar 1 11:06:01 s06 systemd: Created slice User Slice of root. >>>>>>>> Mar 1 11:06:01 s06 systemd: Starting User Slice of root. >>>>>>>> Mar 1 11:06:01 s06 systemd: Started Session 196231 of user root. >>>>>>>> Mar 1 11:06:01 s06 systemd: Starting Session 196231 of user root. >>>>>>>> Mar 1 11:06:01 s06 systemd: Removed slice User Slice of root. >>>>>>>> Mar 1 11:06:01 s06 systemd: Stopping User Slice of root. >>>>>>>> Mar 1 11:06:12 s06 glustershd[70117]: [2019-03-01 10:06:12.351319] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-1: server 192.168.0.52:49153 has not responded in the last 42 seconds, disconnecting. >>>>>>>> Mar 1 11:06:38 s06 glustershd[70117]: [2019-03-01 10:06:38.356920] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-7: server 192.168.0.52:49155 has not responded in the last 42 seconds, disconnecting. >>>>>>>> Mar 1 11:07:01 s06 systemd: Created slice User Slice of root. >>>>>>>> Mar 1 11:07:01 s06 systemd: Starting User Slice of root. >>>>>>>> Mar 1 11:07:01 s06 systemd: Started Session 196232 of user root. >>>>>>>> Mar 1 11:07:01 s06 systemd: Starting Session 196232 of user root. >>>>>>>> Mar 1 11:07:01 s06 systemd: Removed slice User Slice of root. >>>>>>>> Mar 1 11:07:01 s06 systemd: Stopping User Slice of root. >>>>>>>> Mar 1 11:07:36 s06 glustershd[70117]: [2019-03-01 10:07:36.366259] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-0: server 192.168.0.51:49152 has not responded in the last 42 seconds, disconnecting. >>>>>>>> Mar 1 11:08:01 s06 systemd: Created slice User Slice of root. >>>>>>>> >>>>>>>> No ?blocked? word is in /var/log/messages files on other cluster servers. >>>>>>>> In attachment, the /var/log/messages file from s06 server. >>>>>>>> >>>>>>>> Thank you in advance, >>>>>>>> Mauro >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On 1 Mar 2019, at 11:47, Milind Changire > wrote: >>>>>>>>> >>>>>>>>> The traces of very high disk activity on the servers are often found in /var/log/messages >>>>>>>>> You might want to grep for "blocked for" in /var/log/messages on s06 and correlate the timestamps to confirm the unresponsiveness as reported in gluster client logs. >>>>>>>>> In cases of high disk activity, although the operating system continues to respond to ICMP pings, the processes writing to disks often get blocked to a large flush to the disk which could span beyond 42 seconds and hence result in ping-timer-expiry logs. >>>>>>>>> >>>>>>>>> As a side note: >>>>>>>>> If you indeed find gluster processes being blocked in /var/log/messages, you might want to tweak sysctl tunables called vm.dirty_background_ratio or vm.dirty_background_bytes to a smaller value than the existing. Please read up more on those tunables before touching the settings. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Mar 1, 2019 at 4:06 PM Mauro Tridici > wrote: >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> in attachment the client log captured after changing network.ping-timeout option. >>>>>>>>> I noticed this error involving server 192.168.0.56 (s06) >>>>>>>>> >>>>>>>>> [2019-03-01 09:23:36.077287] I [rpc-clnt.c:1962:rpc_clnt_reconfig] 0-tier2-client-71: changing ping timeout to 42 (from 0) >>>>>>>>> [2019-03-01 09:23:36.078213] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>>>>> [2019-03-01 09:23:36.078432] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>>>>> [2019-03-01 09:23:36.092357] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>>>>> [2019-03-01 09:23:36.094146] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing >>>>>>>>> [2019-03-01 10:06:24.708082] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-50: server 192.168.0.56:49156 has not responded in the last 42 seconds, disconnecting. >>>>>>>>> >>>>>>>>> I don?t know why it happens, s06 server seems to be reachable. >>>>>>>>> >>>>>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>>>>> Trying 192.168.0.56... >>>>>>>>> Connected to 192.168.0.56. >>>>>>>>> Escape character is '^]'. >>>>>>>>> ^CConnection closed by foreign host. >>>>>>>>> [athena_login2][/users/home/sysm02/]> ping 192.168.0.56 >>>>>>>>> PING 192.168.0.56 (192.168.0.56) 56(84) bytes of data. >>>>>>>>> 64 bytes from 192.168.0.56 : icmp_seq=1 ttl=64 time=0.116 ms >>>>>>>>> 64 bytes from 192.168.0.56 : icmp_seq=2 ttl=64 time=0.101 ms >>>>>>>>> >>>>>>>>> --- 192.168.0.56 ping statistics --- >>>>>>>>> 2 packets transmitted, 2 received, 0% packet loss, time 1528ms >>>>>>>>> rtt min/avg/max/mdev = 0.101/0.108/0.116/0.012 ms >>>>>>>>> >>>>>>>>> [athena_login2][/users/home/sysm02/]> telnet 192.168.0.56 49156 >>>>>>>>> Trying 192.168.0.56... >>>>>>>>> Connected to 192.168.0.56. >>>>>>>>> Escape character is '^]'. >>>>>>>>> >>>>>>>>> Thank you for your help, >>>>>>>>> Mauro >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 1 Mar 2019, at 10:29, Mauro Tridici > wrote: >>>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> thank you for the explanation. >>>>>>>>>> I just changed network.ping-timeout option to default value (network.ping-timeout=42). >>>>>>>>>> >>>>>>>>>> I will check the logs to see if the errors will appear again. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Mauro >>>>>>>>>> >>>>>>>>>>> On 1 Mar 2019, at 04:43, Milind Changire > wrote: >>>>>>>>>>> >>>>>>>>>>> network.ping-timeout should not be set to zero for non-glusterd clients. >>>>>>>>>>> glusterd is a special case for which ping-timeout is set to zero via /etc/glusterfs/glusterd.vol >>>>>>>>>>> >>>>>>>>>>> Setting network.ping-timeout to zero disables arming of the ping timer for connections. This disables testing the connection for responsiveness and hence avoids proactive fail-over. >>>>>>>>>>> >>>>>>>>>>> Please reset network.ping-timeout to a non-zero positive value, eg. 42 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Feb 28, 2019 at 5:07 PM Nithya Balachandran > wrote: >>>>>>>>>>> Adding Raghavendra and Milind to comment on this. >>>>>>>>>>> >>>>>>>>>>> What is the effect of setting network.ping-timeout to 0 and should it be set back to 42? >>>>>>>>>>> Regards, >>>>>>>>>>> Nithya >>>>>>>>>>> >>>>>>>>>>> On Thu, 28 Feb 2019 at 16:01, Mauro Tridici > wrote: >>>>>>>>>>> Hi Nithya, >>>>>>>>>>> >>>>>>>>>>> sorry for the late. >>>>>>>>>>> network.ping-timeout has been set to 0 in order to try to solve some timeout problems, but it didn?t help. >>>>>>>>>>> I can set it to the default value. >>>>>>>>>>> >>>>>>>>>>> Can I proceed with the change? >>>>>>>>>>> >>>>>>>>>>> Thank you, >>>>>>>>>>> Mauro >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 28 Feb 2019, at 04:41, Nithya Balachandran > wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Mauro, >>>>>>>>>>>> >>>>>>>>>>>> Is network.ping-timeout still set to 0. The default value is 42. Is there a particular reason why this was changed? >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Nithya >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, 27 Feb 2019 at 21:32, Mauro Tridici > wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Xavi, >>>>>>>>>>>> >>>>>>>>>>>> thank you for the detailed explanation and suggestions. >>>>>>>>>>>> Yes, transport.listen-backlog option is still set to 1024. >>>>>>>>>>>> >>>>>>>>>>>> I will check the network and connectivity status using ?ping? and ?telnet? as soon as the errors will come back again. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Mauro >>>>>>>>>>>> >>>>>>>>>>>>> Il giorno 27 feb 2019, alle ore 16:42, Xavi Hernandez > ha scritto: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Mauro, >>>>>>>>>>>>> >>>>>>>>>>>>> those errors say that the mount point is not connected to some of the bricks while executing operations. I see references to 3rd and 6th bricks of several disperse sets, which seem to map to server s06. For some reason, gluster is having troubles connecting from the client machine to that particular server. At the end of the log I see that after long time a reconnect is done to both of them. However little after, other bricks from the s05 get disconnected and a reconnect times out. >>>>>>>>>>>>> >>>>>>>>>>>>> That's really odd. It seems like if server/communication is cut to s06 for some time, then restored, and then the same happens to the next server. >>>>>>>>>>>>> >>>>>>>>>>>>> If the servers are really online and it's only a communication issue, it explains why server memory and network has increased: if the problem only exists between the client and servers, any write made by the client will automatically mark the file as damaged, since some of the servers have not been updated. Since self-heal runs from the server nodes, they will probably be correctly connected to all bricks, which allows them to heal the just damaged file, which increases memory and network usage. >>>>>>>>>>>>> >>>>>>>>>>>>> I guess you still have transport.listen-backlog set to 1024, right ? >>>>>>>>>>>>> >>>>>>>>>>>>> Just to try to identify if the problem really comes from network, can you check if you lose some pings from the client to all of the servers while you are seeing those errors in the log file ? >>>>>>>>>>>>> >>>>>>>>>>>>> You can also check if during those errors, you can telnet to the port of the brick from the client. >>>>>>>>>>>>> >>>>>>>>>>>>> Xavi >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Feb 26, 2019 at 10:17 AM Mauro Tridici > wrote: >>>>>>>>>>>>> Hi Nithya, >>>>>>>>>>>>> >>>>>>>>>>>>> ?df -h? operation is not still slow, but no users are using the volume, RAM and NETWORK usage is ok on the client node. >>>>>>>>>>>>> >>>>>>>>>>>>> I was worried about this kind of warnings/errors: >>>>>>>>>>>>> >>>>>>>>>>>>> [2019-02-25 10:59:00.664323] W [MSGID: 122035] [ec-common.c:571:ec_child_select] 0-tier2-disperse-6: Executing operation with some subvolumes unavailable (20) >>>>>>>>>>>>> >>>>>>>>>>>>> [2019-02-26 03:11:35.212603] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-26 03:10:56.549903 (xid=0x106f1c5) >>>>>>>>>>>>> >>>>>>>>>>>>> [2019-02-26 03:13:03.313831] E [socket.c:2376:socket_connect_finish] 0-tier2-client-50: connection to 192.168.0.56:49156 failed (Timeout della connessione); disconnecting socket >>>>>>>>>>>>> >>>>>>>>>>>>> It seems that some subvolumes are not available and 192.168.0.56 server (s06) is not reachable. >>>>>>>>>>>>> But gluster servers are up&running and bricks are ok. >>>>>>>>>>>>> >>>>>>>>>>>>> In attachment the updated tier2.log file. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you. >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Mauro >>>>>>>>>>>>> >>>>>>>>>>>>>> Il giorno 26 feb 2019, alle ore 04:03, Nithya Balachandran > ha scritto: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I see a lot of EC messages in the log but they don't seem very serious. Xavi, can you take a look? >>>>>>>>>>>>>> >>>>>>>>>>>>>> The only errors I see are: >>>>>>>>>>>>>> [2019-02-25 10:58:45.519871] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-50: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-25 10:57:47.429969 (xid=0xd26fe7) >>>>>>>>>>>>>> [2019-02-25 10:58:51.461493] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x153)[0x3d0cc2f2e3] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e5)[0x3d0d410935] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x3d0d410a7e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xa5)[0x3d0d410b45] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x278)[0x3d0d410e68] ))))) 0-tier2-client-41: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2019-02-25 10:57:47.499174 (xid=0xf47d6a) >>>>>>>>>>>>>> [2019-02-25 11:07:57.152874] E [socket.c:2376:socket_connect_finish] 0-tier2-client-70: connection to 192.168.0.55:49163 failed (Timeout della connessione); disconnecting socket >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is the df -h operation still slow? If yes, can you take a tcpdump of the client while running df -h and send that across? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Nithya >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, 25 Feb 2019 at 17:27, Mauro Tridici > wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sorry, some minutes after my last mail message, I noticed that ?df -h? command hanged for a while before returns the prompt. >>>>>>>>>>>>>> Yesterday, everything was ok in the gluster client log, but, today, I see a lot of errors (please, take a look to the attached file). >>>>>>>>>>>>>> >>>>>>>>>>>>>> On the client node, I detected an important RAM e NETWORK usage. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do you think that the errors have been caused by the client resources usage? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you in advance, >>>>>>>>>>>>>> Mauro >>>>>>>>>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cuculovic at mdpi.com Thu Mar 21 13:18:46 2019 From: cuculovic at mdpi.com (Milos Cuculovic) Date: Thu, 21 Mar 2019 14:18:46 +0100 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> <14B1CACB-4049-42DD-AB69-3B75FBD6BE30@mdpi.com> Message-ID: <990A42AA-1F60-441E-BD5A-97B8333E2083@mdpi.com> Hey Karthik, > Can you run the "guster volume heal ? sudo gluster volume heal storage2 Launching heal operation to perform index self heal on volume storage2 has been successful Use heal info commands to check status. > "gluster volume heal info? sudo gluster volume heal storage2 info Brick storage3:/data/data-cluster Status: Connected Number of entries: 2 Brick storage4:/data/data-cluster Status: Connected Number of entries: 6 - Kindest regards, Milos Cuculovic IT Manager --- MDPI AG Postfach, CH-4020 Basel, Switzerland Office: St. Alban-Anlage 66, 4052 Basel, Switzerland Tel. +41 61 683 77 35 Fax +41 61 302 89 18 Email: cuculovic at mdpi.com Skype: milos.cuculovic.mdpi Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > On 21 Mar 2019, at 14:07, Karthik Subrahmanya wrote: > > Hey Milos, > > I see that gfid got healed for those directories from the getfattr output and the glfsheal log also has messages corresponding to deleting the entries on one brick as part of healing which then got recreated on the brick with the correct gfid. Can you run the "guster volume heal " & "gluster volume heal info" command and paste the output here? > If you still see entries pending heal, give the latest glustershd.log files from both the nodes along with the getfattr output of the files which are listed in the heal info output. > > Regards, > Karthik > > On Thu, Mar 21, 2019 at 6:03 PM Milos Cuculovic > wrote: > Sure: > > brick1: > ???????????????????????????????????????????????????????????? > ???????????????????????????????????????????????????????????? > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 > trusted.gfid=0xe358ff34504241d387efe1e76eb28bb0 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > ???????????????????????????????????????????????????????????? > sudo stat /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 > File: '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 40809094709 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:26.994047597 +0100 > Modify: 2019-03-20 11:28:28.294689870 +0100 > Change: 2019-03-21 13:01:03.077654239 +0100 > Birth: - > > sudo stat /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > File: '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 49399908865 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:07:20.342140927 +0100 > Modify: 2019-03-20 11:28:28.318690015 +0100 > Change: 2019-03-21 13:01:03.133654344 +0100 > Birth: - > > sudo stat /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > File: '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 53706303549 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:55.414097315 +0100 > Modify: 2019-03-20 11:28:28.362690281 +0100 > Change: 2019-03-21 13:01:03.141654359 +0100 > Birth: - > > sudo stat /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > File: '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 57990935591 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:07:08.558120309 +0100 > Modify: 2019-03-20 11:28:14.226604869 +0100 > Change: 2019-03-21 13:01:03.189654448 +0100 > Birth: - > > sudo stat /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > File: '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 62291339781 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:02.070003998 +0100 > Modify: 2019-03-20 11:28:28.458690861 +0100 > Change: 2019-03-21 13:01:03.281654621 +0100 > Birth: - > > sudo stat /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > File: '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 66574223479 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:28:10.826584325 +0100 > Modify: 2019-03-20 11:28:10.834584374 +0100 > Change: 2019-03-20 14:06:07.937449353 +0100 > Birth: - > root at storage3:/var/log/glusterfs# > ???????????????????????????????????????????????????????????? > ???????????????????????????????????????????????????????????? > > brick2: > ???????????????????????????????????????????????????????????? > ???????????????????????????????????????????????????????????? > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > getfattr: Removing leading '/' from absolute path names > # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > ???????????????????????????????????????????????????????????? > > sudo stat /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 > File: '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 42232631305 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:26.994047597 +0100 > Modify: 2019-03-20 11:28:28.294689870 +0100 > Change: 2019-03-21 13:01:03.078748131 +0100 > Birth: - > > sudo stat /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > File: '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 78589109305 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:07:20.342140927 +0100 > Modify: 2019-03-20 11:28:28.318690015 +0100 > Change: 2019-03-21 13:01:03.134748477 +0100 > Birth: - > > sudo stat /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > File: '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 54972096517 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:55.414097315 +0100 > Modify: 2019-03-20 11:28:28.362690281 +0100 > Change: 2019-03-21 13:01:03.162748650 +0100 > Birth: - > > sudo stat /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > File: '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 40821259275 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:07:08.558120309 +0100 > Modify: 2019-03-20 11:28:14.226604869 +0100 > Change: 2019-03-21 13:01:03.194748848 +0100 > Birth: - > > sudo stat /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > File: '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 15876654 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:02.070003998 +0100 > Modify: 2019-03-20 11:28:28.458690861 +0100 > Change: 2019-03-21 13:01:03.282749392 +0100 > Birth: - > > sudo stat /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > File: '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 49408944650 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:28:10.826584325 +0100 > Modify: 2019-03-20 11:28:10.834584374 +0100 > Change: 2019-03-20 14:06:07.940849268 +0100 > Birth: - > ???????????????????????????????????????????????????????????? > ???????????????????????????????????????????????????????????? > > The file is from brick 2 that I upgraded and started the heal on. > > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email:?cuculovic at mdpi.com > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > >> On 21 Mar 2019, at 13:05, Karthik Subrahmanya > wrote: >> >> Can you give me the stat & getfattr output of all those 6 entries from both the bricks and the glfsheal-.log file from the node where you run this command? >> Meanwhile can you also try running this with the source-brick option? >> >> On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic > wrote: >> Thank you Karthik, >> >> I have run this for all files (see example below) and it says the file is not in split-brain: >> >> sudo gluster volume heal storage2 split-brain latest-mtime /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >> Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: File not in split-brain. >> Volume heal failed. >> >> >> - Kindest regards, >> >> Milos Cuculovic >> IT Manager >> >> --- >> MDPI AG >> Postfach, CH-4020 Basel, Switzerland >> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >> Tel. +41 61 683 77 35 >> Fax +41 61 302 89 18 >> Email:?cuculovic at mdpi.com >> Skype: milos.cuculovic.mdpi >> >> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >> >>> On 21 Mar 2019, at 12:36, Karthik Subrahmanya > wrote: >>> >>> Hi Milos, >>> >>> Thanks for the logs and the getfattr output. >>> From the logs I can see that there are 6 entries under the directory "/data/data-cluster/dms/final_archive" named >>> 41be9ff5ec05c4b1c989c6053e709e59 >>> 5543982fab4b56060aa09f667a8ae617 >>> a8b7f31775eebc8d1867e7f9de7b6eaf >>> c1d3f3c2d7ae90e891e671e2f20d5d4b >>> e5934699809a3b6dcfc5945f408b978b >>> e7cdc94f60d390812a5f9754885e119e >>> which are having gfid mismatch, so the heal is failing on this directory. >>> >>> You can use the CLI option to resolve these files from gfid mismatch. You can use any of the 3 methods available: >>> 1. bigger-file >>> gluster volume heal split-brain bigger-file >>> >>> 2. latest-mtime >>> gluster volume heal split-brain latest-mtime >>> >>> 3. source-brick >>> gluster volume heal split-brain source-brick >>> >>> where must be absolute path w.r.t. the volume, starting with '/'. >>> If all those entries are directories then go for either latest-mtime/source-brick option. >>> After you resolve all these gfid-mismatches, run the "gluster volume heal " command. Then check the heal info and let me know the result. >>> >>> Regards, >>> Karthik >>> >>> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic > wrote: >>> Sure, thank you for following up. >>> >>> About the commands, here is what I see: >>> >>> brick1: >>> ????????????????????????????????????? >>> sudo gluster volume heal storage2 info >>> Brick storage3:/data/data-cluster >>> >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick storage4:/data/data-cluster >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 2 >>> ????????????????????????????????????? >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.afr.storage2-client-1=0x000000000000000000000010 >>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> ????????????????????????????????????? >>> stat /data/data-cluster/dms/final_archive >>> File: '/data/data-cluster/dms/final_archive' >>> Size: 3497984 Blocks: 8768 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 26427748396 Links: 72123 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2018-10-09 04:22:40.514629044 +0200 >>> Modify: 2019-03-21 11:55:37.382278863 +0100 >>> Change: 2019-03-21 11:55:37.382278863 +0100 >>> Birth: - >>> ????????????????????????????????????? >>> ????????????????????????????????????? >>> >>> brick2: >>> ????????????????????????????????????? >>> sudo gluster volume heal storage2 info >>> Brick storage3:/data/data-cluster >>> >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick storage4:/data/data-cluster >>> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 2 >>> ????????????????????????????????????? >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.afr.storage2-client-0=0x000000000000000000000001 >>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> ????????????????????????????????????? >>> stat /data/data-cluster/dms/final_archive >>> File: '/data/data-cluster/dms/final_archive' >>> Size: 3497984 Blocks: 8760 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 13563551265 Links: 72124 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2018-10-09 04:22:40.514629044 +0200 >>> Modify: 2019-03-21 11:55:46.382565124 +0100 >>> Change: 2019-03-21 11:55:46.382565124 +0100 >>> Birth: - >>> ????????????????????????????????????? >>> >>> Hope this helps. >>> >>> - Kindest regards, >>> >>> Milos Cuculovic >>> IT Manager >>> >>> --- >>> MDPI AG >>> Postfach, CH-4020 Basel, Switzerland >>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>> Tel. +41 61 683 77 35 >>> Fax +41 61 302 89 18 >>> Email:?cuculovic at mdpi.com >>> Skype: milos.cuculovic.mdpi >>> >>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>> >>>> On 21 Mar 2019, at 11:43, Karthik Subrahmanya > wrote: >>>> >>>> Can you attach the "glustershd.log" file which will be present under "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m . -e hex " output of all the entries listed in the heal info output from both the bricks? >>>> >>>> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic > wrote: >>>> Thanks Karthik! >>>> >>>> I was trying to find some resolution methods from [2] but unfortunately none worked (I can explain what I tried if needed). >>>> >>>>> I guess the volume you are talking about is of type replica-2 (1x2). >>>> That?s correct, aware of the arbiter solution but still didn?t took time to implement. >>>> >>>> From the info results I posted, how to know in which situation I am. No files are mentioned in spit brain, only directories. One brick has 3 entries and one two entries. >>>> >>>> sudo gluster volume heal storage2 info >>>> [sudo] password for sshadmin: >>>> Brick storage3:/data/data-cluster >>>> >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 3 >>>> >>>> Brick storage4:/data/data-cluster >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 2 >>>> >>>> - Kindest regards, >>>> >>>> Milos Cuculovic >>>> IT Manager >>>> >>>> --- >>>> MDPI AG >>>> Postfach, CH-4020 Basel, Switzerland >>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>> Tel. +41 61 683 77 35 >>>> Fax +41 61 302 89 18 >>>> Email:?cuculovic at mdpi.com >>>> Skype: milos.cuculovic.mdpi >>>> >>>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>>> >>>>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya > wrote: >>>>> >>>>> Hi, >>>>> >>>>> Note: I guess the volume you are talking about is of type replica-2 (1x2). Usually replica 2 volumes are prone to split-brain. If you can consider converting them to arbiter or replica-3, they will handle most of the cases which can lead to slit-brains. For more information see [1]. >>>>> >>>>> Resolving the split-brain: [2] talks about how to interpret the heal info output and different ways to resolve them using the CLI/manually/using the favorite-child-policy. >>>>> If you are having entry split brain, and is a gfid split-brain (file/dir having different gfids on the replica bricks) then you can use the CLI option to resolve them. If a directory is in gfid split-brain in a distributed-replicate volume and you are using the source-brick option please make sure you use the brick of this subvolume, which has the same gfid as that of the other distribute subvolume(s) where you have the correct gfid, as the source. >>>>> If you are having a type mismatch then follow the steps in [3] to resolve the split-brain. >>>>> >>>>> [1] https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >>>>> [2] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >>>>> [3] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain >>>>> >>>>> HTH, >>>>> Karthik >>>>> >>>>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic > wrote: >>>>> I was now able to catch the split brain log: >>>>> >>>>> sudo gluster volume heal storage2 info >>>>> Brick storage3:/data/data-cluster >>>>> >>>>> >>>>> /dms/final_archive - Is in split-brain >>>>> >>>>> Status: Connected >>>>> Number of entries: 3 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> >>>>> /dms/final_archive - Is in split-brain >>>>> >>>>> Status: Connected >>>>> Number of entries: 2 >>>>> >>>>> Milos >>>>> >>>>>> On 21 Mar 2019, at 09:07, Milos Cuculovic > wrote: >>>>>> >>>>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal shows this: >>>>>> >>>>>> sudo gluster volume heal storage2 info >>>>>> Brick storage3:/data/data-cluster >>>>>> >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 3 >>>>>> >>>>>> Brick storage4:/data/data-cluster >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 2 >>>>>> >>>>>> The same files stay there. From time to time the status of the /dms/final_archive is in split brain at the following command shows: >>>>>> >>>>>> sudo gluster volume heal storage2 info split-brain >>>>>> Brick storage3:/data/data-cluster >>>>>> /dms/final_archive >>>>>> Status: Connected >>>>>> Number of entries in split-brain: 1 >>>>>> >>>>>> Brick storage4:/data/data-cluster >>>>>> /dms/final_archive >>>>>> Status: Connected >>>>>> Number of entries in split-brain: 1 >>>>>> >>>>>> How to know the file who is in split brain? The files in /dms/final_archive are not very important, fine to remove (ideally resolve the split brain) for the ones that differ. >>>>>> >>>>>> I can only see the directory and GFID. Any idea on how to resolve this situation as I would like to continue with the upgrade on the 2nd server, and for this the heal needs to be done with 0 entries in sudo gluster volume heal storage2 info >>>>>> >>>>>> Thank you in advance, Milos. >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: glustershd_brick1.log Type: application/octet-stream Size: 1198992 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: glustershd_brick2.log Type: application/octet-stream Size: 739194 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Thu Mar 21 13:34:39 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Thu, 21 Mar 2019 19:04:39 +0530 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: <990A42AA-1F60-441E-BD5A-97B8333E2083@mdpi.com> References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> <14B1CACB-4049-42DD-AB69-3B75FBD6BE30@mdpi.com> <990A42AA-1F60-441E-BD5A-97B8333E2083@mdpi.com> Message-ID: Now the slit-brain on the directory is resolved. Are these entries which are there in the latest heal info output not getting healed? Are they still present in the heal info output? If they are still there can you try doing a lookup on those entries from client and see whether they are getting healed? On Thu, Mar 21, 2019 at 6:49 PM Milos Cuculovic wrote: > Hey Karthik, > > Can you run the "guster volume heal ? > > sudo gluster volume heal storage2 > Launching heal operation to perform index self heal on volume storage2 has > been successful > Use heal info commands to check status. > > "gluster volume heal info? > > sudo gluster volume heal storage2 info > Brick storage3:/data/data-cluster > > > Status: Connected > Number of entries: 2 > > Brick storage4:/data/data-cluster > > > > > > > Status: Connected > Number of entries: 6 > > > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email: cuculovic at mdpi.com > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message > are confidential and intended solely for the use of the individual or > entity to whom they are addressed. If you have received this message in > error, please notify me and delete this message from your system. You may > not copy this message in its entirety or in part, or disclose its contents > to anyone. > > On 21 Mar 2019, at 14:07, Karthik Subrahmanya wrote: > > Hey Milos, > > I see that gfid got healed for those directories from the getfattr output > and the glfsheal log also has messages corresponding to deleting the > entries on one brick as part of healing which then got recreated on the > brick with the correct gfid. Can you run the "guster volume heal " > & "gluster volume heal info" command and paste the output here? > If you still see entries pending heal, give the latest glustershd.log > files from both the nodes along with the getfattr output of the files which > are listed in the heal info output. > > Regards, > Karthik > > On Thu, Mar 21, 2019 at 6:03 PM Milos Cuculovic > wrote: > >> Sure: >> >> brick1: >> ???????????????????????????????????????????????????????????? >> ???????????????????????????????????????????????????????????? >> sudo getfattr -d -m . -e hex >> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >> getfattr: Removing leading '/' from absolute path names >> # file: >> data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >> trusted.gfid=0xe358ff34504241d387efe1e76eb28bb0 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex >> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >> getfattr: Removing leading '/' from absolute path names >> # file: >> data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex >> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >> getfattr: Removing leading '/' from absolute path names >> # file: >> data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex >> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >> getfattr: Removing leading '/' from absolute path names >> # file: >> data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex >> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >> getfattr: Removing leading '/' from absolute path names >> # file: >> data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex >> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> getfattr: Removing leading '/' from absolute path names >> # file: >> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> ???????????????????????????????????????????????????????????? >> sudo stat >> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >> File: >> '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 40809094709 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:06:26.994047597 +0100 >> Modify: 2019-03-20 11:28:28.294689870 +0100 >> Change: 2019-03-21 13:01:03.077654239 +0100 >> Birth: - >> >> sudo stat >> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >> File: >> '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 49399908865 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:07:20.342140927 +0100 >> Modify: 2019-03-20 11:28:28.318690015 +0100 >> Change: 2019-03-21 13:01:03.133654344 +0100 >> Birth: - >> >> sudo stat >> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >> File: >> '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 53706303549 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:06:55.414097315 +0100 >> Modify: 2019-03-20 11:28:28.362690281 +0100 >> Change: 2019-03-21 13:01:03.141654359 +0100 >> Birth: - >> >> sudo stat >> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >> File: >> '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 57990935591 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:07:08.558120309 +0100 >> Modify: 2019-03-20 11:28:14.226604869 +0100 >> Change: 2019-03-21 13:01:03.189654448 +0100 >> Birth: - >> >> sudo stat >> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >> File: >> '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 62291339781 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:06:02.070003998 +0100 >> Modify: 2019-03-20 11:28:28.458690861 +0100 >> Change: 2019-03-21 13:01:03.281654621 +0100 >> Birth: - >> >> sudo stat >> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> File: >> '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 66574223479 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:28:10.826584325 +0100 >> Modify: 2019-03-20 11:28:10.834584374 +0100 >> Change: 2019-03-20 14:06:07.937449353 +0100 >> Birth: - >> root at storage3:/var/log/glusterfs# >> ???????????????????????????????????????????????????????????? >> ???????????????????????????????????????????????????????????? >> >> brick2: >> ???????????????????????????????????????????????????????????? >> ???????????????????????????????????????????????????????????? >> sudo getfattr -d -m . -e hex >> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> getfattr: Removing leading '/' from absolute path names >> # file: >> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.storage2-client-0=0x000000000000000000000000 >> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex >> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >> getfattr: Removing leading '/' from absolute path names >> # file: >> data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >> trusted.afr.storage2-client-0=0x000000000000000000000000 >> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex >> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >> getfattr: Removing leading '/' from absolute path names >> # file: >> data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >> trusted.afr.storage2-client-0=0x000000000000000000000000 >> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex >> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >> getfattr: Removing leading '/' from absolute path names >> # file: >> data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >> trusted.afr.storage2-client-0=0x000000000000000000000000 >> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex >> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >> getfattr: Removing leading '/' from absolute path names >> # file: >> data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >> trusted.afr.storage2-client-0=0x000000000000000000000000 >> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex >> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> getfattr: Removing leading '/' from absolute path names >> # file: >> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.storage2-client-0=0x000000000000000000000000 >> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> ???????????????????????????????????????????????????????????? >> >> sudo stat >> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >> File: >> '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 42232631305 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:06:26.994047597 +0100 >> Modify: 2019-03-20 11:28:28.294689870 +0100 >> Change: 2019-03-21 13:01:03.078748131 +0100 >> Birth: - >> >> sudo stat >> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >> File: >> '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 78589109305 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:07:20.342140927 +0100 >> Modify: 2019-03-20 11:28:28.318690015 +0100 >> Change: 2019-03-21 13:01:03.134748477 +0100 >> Birth: - >> >> sudo stat >> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >> File: >> '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 54972096517 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:06:55.414097315 +0100 >> Modify: 2019-03-20 11:28:28.362690281 +0100 >> Change: 2019-03-21 13:01:03.162748650 +0100 >> Birth: - >> >> sudo stat >> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >> File: >> '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 40821259275 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:07:08.558120309 +0100 >> Modify: 2019-03-20 11:28:14.226604869 +0100 >> Change: 2019-03-21 13:01:03.194748848 +0100 >> Birth: - >> >> sudo stat >> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >> File: >> '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 15876654 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:06:02.070003998 +0100 >> Modify: 2019-03-20 11:28:28.458690861 +0100 >> Change: 2019-03-21 13:01:03.282749392 +0100 >> Birth: - >> >> sudo stat >> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> File: >> '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 49408944650 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:28:10.826584325 +0100 >> Modify: 2019-03-20 11:28:10.834584374 +0100 >> Change: 2019-03-20 14:06:07.940849268 +0100 >> Birth: - >> ???????????????????????????????????????????????????????????? >> ???????????????????????????????????????????????????????????? >> >> The file is from brick 2 that I upgraded and started the heal on. >> >> >> - Kindest regards, >> >> Milos Cuculovic >> IT Manager >> >> --- >> MDPI AG >> Postfach, CH-4020 Basel, Switzerland >> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >> Tel. +41 61 683 77 35 >> Fax +41 61 302 89 18 >> Email: cuculovic at mdpi.com >> Skype: milos.cuculovic.mdpi >> >> Disclaimer: The information and files contained in this message >> are confidential and intended solely for the use of the individual or >> entity to whom they are addressed. If you have received this message in >> error, please notify me and delete this message from your system. You may >> not copy this message in its entirety or in part, or disclose its contents >> to anyone. >> >> On 21 Mar 2019, at 13:05, Karthik Subrahmanya >> wrote: >> >> Can you give me the stat & getfattr output of all those 6 entries from >> both the bricks and the glfsheal-.log file from the node where you >> run this command? >> Meanwhile can you also try running this with the source-brick option? >> >> On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic >> wrote: >> >>> Thank you Karthik, >>> >>> I have run this for all files (see example below) and it says the file >>> is not in split-brain: >>> >>> sudo gluster volume heal storage2 split-brain latest-mtime >>> /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>> Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: File >>> not in split-brain. >>> Volume heal failed. >>> >>> >>> - Kindest regards, >>> >>> Milos Cuculovic >>> IT Manager >>> >>> --- >>> MDPI AG >>> Postfach, CH-4020 Basel, Switzerland >>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>> Tel. +41 61 683 77 35 >>> Fax +41 61 302 89 18 >>> Email: cuculovic at mdpi.com >>> Skype: milos.cuculovic.mdpi >>> >>> Disclaimer: The information and files contained in this message >>> are confidential and intended solely for the use of the individual or >>> entity to whom they are addressed. If you have received this message in >>> error, please notify me and delete this message from your system. You may >>> not copy this message in its entirety or in part, or disclose its contents >>> to anyone. >>> >>> On 21 Mar 2019, at 12:36, Karthik Subrahmanya >>> wrote: >>> >>> Hi Milos, >>> >>> Thanks for the logs and the getfattr output. >>> From the logs I can see that there are 6 entries under the >>> directory "/data/data-cluster/dms/final_archive" named >>> 41be9ff5ec05c4b1c989c6053e709e59 >>> 5543982fab4b56060aa09f667a8ae617 >>> a8b7f31775eebc8d1867e7f9de7b6eaf >>> c1d3f3c2d7ae90e891e671e2f20d5d4b >>> e5934699809a3b6dcfc5945f408b978b >>> e7cdc94f60d390812a5f9754885e119e >>> which are having gfid mismatch, so the heal is failing on this directory. >>> >>> You can use the CLI option to resolve these files from gfid mismatch. >>> You can use any of the 3 methods available: >>> 1. bigger-file >>> gluster volume heal split-brain bigger-file >>> >>> 2. latest-mtime >>> gluster volume heal split-brain latest-mtime >>> >>> 3. source-brick >>> gluster volume heal split-brain source-brick >>> >>> >>> where must be absolute path w.r.t. the volume, starting with '/'. >>> If all those entries are directories then go for either >>> latest-mtime/source-brick option. >>> After you resolve all these gfid-mismatches, run the "gluster volume >>> heal " command. Then check the heal info and let me know the >>> result. >>> >>> Regards, >>> Karthik >>> >>> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic >>> wrote: >>> >>>> Sure, thank you for following up. >>>> >>>> About the commands, here is what I see: >>>> >>>> brick1: >>>> ????????????????????????????????????? >>>> sudo gluster volume heal storage2 info >>>> Brick storage3:/data/data-cluster >>>> >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 3 >>>> >>>> Brick storage4:/data/data-cluster >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 2 >>>> ????????????????????????????????????? >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.storage2-client-1=0x000000000000000000000010 >>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> ????????????????????????????????????? >>>> stat /data/data-cluster/dms/final_archive >>>> File: '/data/data-cluster/dms/final_archive' >>>> Size: 3497984 Blocks: 8768 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 26427748396 Links: 72123 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2018-10-09 04:22:40.514629044 +0200 >>>> Modify: 2019-03-21 11:55:37.382278863 +0100 >>>> Change: 2019-03-21 11:55:37.382278863 +0100 >>>> Birth: - >>>> ????????????????????????????????????? >>>> ????????????????????????????????????? >>>> >>>> brick2: >>>> ????????????????????????????????????? >>>> sudo gluster volume heal storage2 info >>>> Brick storage3:/data/data-cluster >>>> >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 3 >>>> >>>> Brick storage4:/data/data-cluster >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 2 >>>> ????????????????????????????????????? >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.storage2-client-0=0x000000000000000000000001 >>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> ????????????????????????????????????? >>>> stat /data/data-cluster/dms/final_archive >>>> File: '/data/data-cluster/dms/final_archive' >>>> Size: 3497984 Blocks: 8760 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 13563551265 Links: 72124 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2018-10-09 04:22:40.514629044 +0200 >>>> Modify: 2019-03-21 11:55:46.382565124 +0100 >>>> Change: 2019-03-21 11:55:46.382565124 +0100 >>>> Birth: - >>>> ????????????????????????????????????? >>>> >>>> Hope this helps. >>>> >>>> - Kindest regards, >>>> >>>> Milos Cuculovic >>>> IT Manager >>>> >>>> --- >>>> MDPI AG >>>> Postfach, CH-4020 Basel, Switzerland >>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>> Tel. +41 61 683 77 35 >>>> Fax +41 61 302 89 18 >>>> Email: cuculovic at mdpi.com >>>> Skype: milos.cuculovic.mdpi >>>> >>>> Disclaimer: The information and files contained in this message >>>> are confidential and intended solely for the use of the individual or >>>> entity to whom they are addressed. If you have received this message in >>>> error, please notify me and delete this message from your system. You may >>>> not copy this message in its entirety or in part, or disclose its contents >>>> to anyone. >>>> >>>> On 21 Mar 2019, at 11:43, Karthik Subrahmanya >>>> wrote: >>>> >>>> Can you attach the "glustershd.log" file which will be present under >>>> "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m >>>> . -e hex " output of all the entries listed in the heal >>>> info output from both the bricks? >>>> >>>> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic >>>> wrote: >>>> >>>>> Thanks Karthik! >>>>> >>>>> I was trying to find some resolution methods from [2] but >>>>> unfortunately none worked (I can explain what I tried if needed). >>>>> >>>>> I guess the volume you are talking about is of type replica-2 (1x2). >>>>> >>>>> That?s correct, aware of the arbiter solution but still didn?t took >>>>> time to implement. >>>>> >>>>> From the info results I posted, how to know in which situation I am. >>>>> No files are mentioned in spit brain, only directories. One brick has 3 >>>>> entries and one two entries. >>>>> >>>>> sudo gluster volume heal storage2 info >>>>> [sudo] password for sshadmin: >>>>> Brick storage3:/data/data-cluster >>>>> >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 3 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 2 >>>>> >>>>> - Kindest regards, >>>>> >>>>> Milos Cuculovic >>>>> IT Manager >>>>> >>>>> --- >>>>> MDPI AG >>>>> Postfach, CH-4020 Basel, Switzerland >>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>> Tel. +41 61 683 77 35 >>>>> Fax +41 61 302 89 18 >>>>> Email: cuculovic at mdpi.com >>>>> Skype: milos.cuculovic.mdpi >>>>> >>>>> Disclaimer: The information and files contained in this message >>>>> are confidential and intended solely for the use of the individual or >>>>> entity to whom they are addressed. If you have received this message in >>>>> error, please notify me and delete this message from your system. You may >>>>> not copy this message in its entirety or in part, or disclose its contents >>>>> to anyone. >>>>> >>>>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Note: I guess the volume you are talking about is of type replica-2 >>>>> (1x2). Usually replica 2 volumes are prone to split-brain. If you can >>>>> consider converting them to arbiter or replica-3, they will handle most of >>>>> the cases which can lead to slit-brains. For more information see [1]. >>>>> >>>>> Resolving the split-brain: [2] talks about how to interpret the heal >>>>> info output and different ways to resolve them using the CLI/manually/using >>>>> the favorite-child-policy. >>>>> If you are having entry split brain, and is a gfid split-brain >>>>> (file/dir having different gfids on the replica bricks) then you can use >>>>> the CLI option to resolve them. If a directory is in gfid split-brain in a >>>>> distributed-replicate volume and you are using the source-brick option >>>>> please make sure you use the brick of this subvolume, which has the same >>>>> gfid as that of the other distribute subvolume(s) where you have the >>>>> correct gfid, as the source. >>>>> If you are having a type mismatch then follow the steps in [3] to >>>>> resolve the split-brain. >>>>> >>>>> [1] >>>>> https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >>>>> [2] >>>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >>>>> [3] >>>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain >>>>> >>>>> HTH, >>>>> Karthik >>>>> >>>>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic >>>>> wrote: >>>>> >>>>>> I was now able to catch the split brain log: >>>>>> >>>>>> sudo gluster volume heal storage2 info >>>>>> Brick storage3:/data/data-cluster >>>>>> >>>>>> >>>>>> /dms/final_archive - Is in split-brain >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 3 >>>>>> >>>>>> Brick storage4:/data/data-cluster >>>>>> >>>>>> /dms/final_archive - Is in split-brain >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 2 >>>>>> >>>>>> Milos >>>>>> >>>>>> On 21 Mar 2019, at 09:07, Milos Cuculovic wrote: >>>>>> >>>>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the >>>>>> heal shows this: >>>>>> >>>>>> sudo gluster volume heal storage2 info >>>>>> Brick storage3:/data/data-cluster >>>>>> >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 3 >>>>>> >>>>>> Brick storage4:/data/data-cluster >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 2 >>>>>> >>>>>> The same files stay there. From time to time the status of the >>>>>> /dms/final_archive is in split brain at the following command shows: >>>>>> >>>>>> sudo gluster volume heal storage2 info split-brain >>>>>> Brick storage3:/data/data-cluster >>>>>> /dms/final_archive >>>>>> Status: Connected >>>>>> Number of entries in split-brain: 1 >>>>>> >>>>>> Brick storage4:/data/data-cluster >>>>>> /dms/final_archive >>>>>> Status: Connected >>>>>> Number of entries in split-brain: 1 >>>>>> >>>>>> How to know the file who is in split brain? The files in >>>>>> /dms/final_archive are not very important, fine to remove (ideally resolve >>>>>> the split brain) for the ones that differ. >>>>>> >>>>>> I can only see the directory and GFID. Any idea on how to resolve >>>>>> this situation as I would like to continue with the upgrade on the 2nd >>>>>> server, and for this the heal needs to be done with 0 entries in sudo >>>>>> gluster volume heal storage2 info >>>>>> >>>>>> Thank you in advance, Milos. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkalever at redhat.com Thu Mar 21 14:23:19 2019 From: pkalever at redhat.com (Prasanna Kalever) Date: Thu, 21 Mar 2019 19:53:19 +0530 Subject: [Gluster-users] [Gluster-devel] Network Block device (NBD) on top of glusterfs In-Reply-To: <072c6bb2-eee1-8374-9b53-b9561deebfc7@redhat.com> References: <072c6bb2-eee1-8374-9b53-b9561deebfc7@redhat.com> Message-ID: On Thu, Mar 21, 2019 at 6:31 PM Xiubo Li wrote: > On 2019/3/21 18:09, Prasanna Kalever wrote: > > > > On Thu, Mar 21, 2019 at 9:00 AM Xiubo Li wrote: > >> All, >> >> I am one of the contributor for gluster-block >> [1] project, and also I >> contribute to linux kernel and open-iscsi >> project.[2] >> >> NBD was around for some time, but in recent time, linux kernel?s Network >> Block Device (NBD) is enhanced and made to work with more devices and also >> the option to integrate with netlink is added. So, I tried to provide a >> glusterfs client based NBD driver recently. Please refer github issue >> #633 [3], and good news >> is I have a working code, with most basic things @ nbd-runner project >> [4]. >> >> While this email is about announcing the project, and asking for more >> collaboration, I would also like to discuss more about the placement of the >> project itself. Currently nbd-runner project is expected to be shared by >> our friends at Ceph project too, to provide NBD driver for Ceph. I have >> personally worked with some of them closely while contributing to >> open-iSCSI project, and we would like to take this project to great success. >> >> Now few questions: >> >> 1. Can I continue to use http://github.com/gluster/nbd-runner as home >> for this project, even if its shared by other filesystem projects? >> >> >> - I personally am fine with this. >> >> >> 1. Should there be a separate organization for this repo? >> >> >> - While it may make sense in future, for now, I am not planning to >> start any new thing? >> >> It would be great if we have some consensus on this soon as nbd-runner is >> a new repository. If there are no concerns, I will continue to contribute >> to the existing repository. >> > > Thanks Xiubo Li, for finally sending this email out. Since this email is > out on gluster mailing list, I would like to take a stand from gluster > community point of view *only* and share my views. > > My honest answer is "If we want to maintain this within gluster org, then > 80% of the effort is common/duplicate of what we did all these days with > gluster-block", > > The great idea came from Mike Christie days ago and the nbd-runner > project's framework is initially emulated from tcmu-runner. This is why I > name this project as nbd-runner, which will work for all the other > Distributed Storages, such as Gluster/Ceph/Azure, as discussed with Mike > before. > > nbd-runner(NBD proto) and tcmu-runner(iSCSI proto) are almost the same and > both are working as lower IO(READ/WRITE/...) stuff, not the management > layer like ceph-iscsi-gateway and gluster-block currently do. > > Currently since I only implemented the Gluster handler and also using the > RPC like glusterfs and gluster-block, most of the other code (about 70%) in > nbd-runner are for the NBD proto and these are very different from > tcmu-runner/glusterfs/gluster-block projects, and there are many new > features in NBD module that not yet supported and then there will be more > different in future. > > The framework coding has been done and the nbd-runner project is already > stable and could already work well for me now. > > like: > * rpc/socket code > * cli/daemon parser/helper logics > * gfapi util functions > * logger framework > * inotify & dyn-config threads > > Yeah, these features were initially from tcmu-runner project, Mike and I > coded two years ago. Currently nbd-runner also has copied them from > tcmu-runner. > I don't think tcmu-runner has any of, -> cli/daemon approach routines -> rpc low-level clnt/svc routines -> gfapi level file create/delete util functions -> Json parser support -> socket bound/listener related functionalities -> autoMake build frame-work, and -> many other maintenance files I actually can go in detail and furnish a long list of reference made here and you cannot deny the fact, but its **all okay** to take references from other alike projects. But my intention was not to point about the copy made here, but rather saying we are just wasting our efforts rewriting, copy-pasting, maintaining and fixing the same functionality framework. Again all I'm trying to make is, if at all you want to maintain nbd client as part of gluster.org, why not use gluster-block itself ? which is well tested and stable enough. Apart from all the examples I have mentioned in my previous thread, there are other great advantages from user perspective as-well, like: * The top layers such as heketi consuming gluster's block storage really don't have to care whether the backend provider is tcmu-runner or nbd-runner or qemu-tcmu or kernel loopback or fileIO or something else ... They simply call gluster-block and get a block device out there. * We can reuse the existing gluster-block's rest api interface too. ** Believe me, over the years I have learned it from my experience and its a very fact that, we can save lot of energy and time by reusing existing stable framework rather than building a new one from scratch ** I will try to spend few hours over my weekends and send a nbd client application PR for gluster-block (IMO this shouldn't exceed ~200 lines), will request your review there. Cheers! -- Prasanna > Very appreciated for you great ideas here Prasanna and hope nbd-runner > could be more generically and successfully used in future. > > BRs > > Xiubo Li > > > * configure/Makefile/specfiles > * docsAboutGluster and etc .. > > The repository gluster-block is actually a home for all the block related > stuff within gluster and its designed to accommodate alike functionalities, > if I was you I would have simply copied nbd-runner.c into > https://github.com/gluster/gluster-block/tree/master/daemon/ just like > ceph plays it here > https://github.com/ceph/ceph/blob/master/src/tools/rbd_nbd/rbd-nbd.cc and > be done. > > Advantages of keeping nbd client within gluster-block: > -> No worry about maintenance code burdon > -> No worry about monitoring a new component > -> shipping packages to fedora/centos/rhel is handled > -> This helps improve and stabilize the current gluster-block framework > -> We can build a common CI > -> We can use reuse common test framework and etc .. > > If you have an impression that gluster-block is for management, then I > would really want to correct you at this point. > > Some of my near future plans for gluster-block: > * Allow exporting blocks with FUSE access via fileIO backstore to improve > large-file workloads, draft: > https://github.com/gluster/gluster-block/pull/58 > * Accommodate kernel loopback handling for local only applications > * The same way we can accommodate nbd app/client, and IMHO this effort > shouldn't take 1 or 2 days to get it merged with in gluster-block and ready > for a go release. > > > Hope that clarifies it. > > > Best Regards, > -- > Prasanna > > >> Regards, >> Xiubo Li (@lxbsz) >> >> [1] - https://github.com/gluster/gluster-block >> [2] - https://github.com/open-iscsi >> [3] - https://github.com/gluster/glusterfs/issues/633 >> [4] - https://github.com/gluster/nbd-runner >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-devel > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cuculovic at mdpi.com Fri Mar 22 07:36:24 2019 From: cuculovic at mdpi.com (Milos Cuculovic) Date: Fri, 22 Mar 2019 08:36:24 +0100 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> <14B1CACB-4049-42DD-AB69-3B75FBD6BE30@mdpi.com> <990A42AA-1F60-441E-BD5A-97B8333E2083@mdpi.com> Message-ID: <76354C5F-C197-475A-B4D3-D6089CED12EB@mdpi.com> I have run a few minutes ago the info and here are the results: sudo gluster volume heal storage2 info Brick storage3:/data/data-cluster Status: Connected Number of entries: 2 Brick storage4:/data/data-cluster Status: Connected Number of entries: 6 sudo gluster volume heal storage2 info split-brain Brick storage3:/data/data-cluster Status: Connected Number of entries in split-brain: 0 Brick storage4:/data/data-cluster Status: Connected Number of entries in split-brain: 0 The heal info (2 + 6) are there since yesterday and do not change. > If they are still there can you try doing a lookup on those entries from client and see whether they are getting healed? How can I do this having the gfid only? - Kindest regards, Milos Cuculovic IT Manager --- MDPI AG Postfach, CH-4020 Basel, Switzerland Office: St. Alban-Anlage 66, 4052 Basel, Switzerland Tel. +41 61 683 77 35 Fax +41 61 302 89 18 Email: cuculovic at mdpi.com Skype: milos.cuculovic.mdpi Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > On 21 Mar 2019, at 14:34, Karthik Subrahmanya wrote: > > Now the slit-brain on the directory is resolved. > Are these entries which are there in the latest heal info output not getting healed? Are they still present in the heal info output? > If they are still there can you try doing a lookup on those entries from client and see whether they are getting healed? > > > On Thu, Mar 21, 2019 at 6:49 PM Milos Cuculovic > wrote: > Hey Karthik, > >> Can you run the "guster volume heal ? > sudo gluster volume heal storage2 > Launching heal operation to perform index self heal on volume storage2 has been successful > Use heal info commands to check status. > >> "gluster volume heal info? > sudo gluster volume heal storage2 info > Brick storage3:/data/data-cluster > > > Status: Connected > Number of entries: 2 > > Brick storage4:/data/data-cluster > > > > > > > Status: Connected > Number of entries: 6 > > > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email:?cuculovic at mdpi.com > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > >> On 21 Mar 2019, at 14:07, Karthik Subrahmanya > wrote: >> >> Hey Milos, >> >> I see that gfid got healed for those directories from the getfattr output and the glfsheal log also has messages corresponding to deleting the entries on one brick as part of healing which then got recreated on the brick with the correct gfid. Can you run the "guster volume heal " & "gluster volume heal info" command and paste the output here? >> If you still see entries pending heal, give the latest glustershd.log files from both the nodes along with the getfattr output of the files which are listed in the heal info output. >> >> Regards, >> Karthik >> >> On Thu, Mar 21, 2019 at 6:03 PM Milos Cuculovic > wrote: >> Sure: >> >> brick1: >> ???????????????????????????????????????????????????????????? >> ???????????????????????????????????????????????????????????? >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >> trusted.gfid=0xe358ff34504241d387efe1e76eb28bb0 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> ???????????????????????????????????????????????????????????? >> sudo stat /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >> File: '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 40809094709 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:06:26.994047597 +0100 >> Modify: 2019-03-20 11:28:28.294689870 +0100 >> Change: 2019-03-21 13:01:03.077654239 +0100 >> Birth: - >> >> sudo stat /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >> File: '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 49399908865 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:07:20.342140927 +0100 >> Modify: 2019-03-20 11:28:28.318690015 +0100 >> Change: 2019-03-21 13:01:03.133654344 +0100 >> Birth: - >> >> sudo stat /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >> File: '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 53706303549 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:06:55.414097315 +0100 >> Modify: 2019-03-20 11:28:28.362690281 +0100 >> Change: 2019-03-21 13:01:03.141654359 +0100 >> Birth: - >> >> sudo stat /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >> File: '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 57990935591 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:07:08.558120309 +0100 >> Modify: 2019-03-20 11:28:14.226604869 +0100 >> Change: 2019-03-21 13:01:03.189654448 +0100 >> Birth: - >> >> sudo stat /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >> File: '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 62291339781 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:06:02.070003998 +0100 >> Modify: 2019-03-20 11:28:28.458690861 +0100 >> Change: 2019-03-21 13:01:03.281654621 +0100 >> Birth: - >> >> sudo stat /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> File: '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 66574223479 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:28:10.826584325 +0100 >> Modify: 2019-03-20 11:28:10.834584374 +0100 >> Change: 2019-03-20 14:06:07.937449353 +0100 >> Birth: - >> root at storage3:/var/log/glusterfs# >> ???????????????????????????????????????????????????????????? >> ???????????????????????????????????????????????????????????? >> >> brick2: >> ???????????????????????????????????????????????????????????? >> ???????????????????????????????????????????????????????????? >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.storage2-client-0=0x000000000000000000000000 >> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >> trusted.afr.storage2-client-0=0x000000000000000000000000 >> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >> trusted.afr.storage2-client-0=0x000000000000000000000000 >> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >> trusted.afr.storage2-client-0=0x000000000000000000000000 >> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >> trusted.afr.storage2-client-0=0x000000000000000000000000 >> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.storage2-client-0=0x000000000000000000000000 >> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> >> ???????????????????????????????????????????????????????????? >> >> sudo stat /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >> File: '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 42232631305 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:06:26.994047597 +0100 >> Modify: 2019-03-20 11:28:28.294689870 +0100 >> Change: 2019-03-21 13:01:03.078748131 +0100 >> Birth: - >> >> sudo stat /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >> File: '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 78589109305 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:07:20.342140927 +0100 >> Modify: 2019-03-20 11:28:28.318690015 +0100 >> Change: 2019-03-21 13:01:03.134748477 +0100 >> Birth: - >> >> sudo stat /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >> File: '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 54972096517 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:06:55.414097315 +0100 >> Modify: 2019-03-20 11:28:28.362690281 +0100 >> Change: 2019-03-21 13:01:03.162748650 +0100 >> Birth: - >> >> sudo stat /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >> File: '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 40821259275 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:07:08.558120309 +0100 >> Modify: 2019-03-20 11:28:14.226604869 +0100 >> Change: 2019-03-21 13:01:03.194748848 +0100 >> Birth: - >> >> sudo stat /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >> File: '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 15876654 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:06:02.070003998 +0100 >> Modify: 2019-03-20 11:28:28.458690861 +0100 >> Change: 2019-03-21 13:01:03.282749392 +0100 >> Birth: - >> >> sudo stat /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >> File: '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' >> Size: 33 Blocks: 0 IO Block: 4096 directory >> Device: 807h/2055d Inode: 49408944650 Links: 3 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2019-03-20 11:28:10.826584325 +0100 >> Modify: 2019-03-20 11:28:10.834584374 +0100 >> Change: 2019-03-20 14:06:07.940849268 +0100 >> Birth: - >> ???????????????????????????????????????????????????????????? >> ???????????????????????????????????????????????????????????? >> >> The file is from brick 2 that I upgraded and started the heal on. >> >> >> - Kindest regards, >> >> Milos Cuculovic >> IT Manager >> >> --- >> MDPI AG >> Postfach, CH-4020 Basel, Switzerland >> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >> Tel. +41 61 683 77 35 >> Fax +41 61 302 89 18 >> Email:?cuculovic at mdpi.com >> Skype: milos.cuculovic.mdpi >> >> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >> >>> On 21 Mar 2019, at 13:05, Karthik Subrahmanya > wrote: >>> >>> Can you give me the stat & getfattr output of all those 6 entries from both the bricks and the glfsheal-.log file from the node where you run this command? >>> Meanwhile can you also try running this with the source-brick option? >>> >>> On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic > wrote: >>> Thank you Karthik, >>> >>> I have run this for all files (see example below) and it says the file is not in split-brain: >>> >>> sudo gluster volume heal storage2 split-brain latest-mtime /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>> Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: File not in split-brain. >>> Volume heal failed. >>> >>> >>> - Kindest regards, >>> >>> Milos Cuculovic >>> IT Manager >>> >>> --- >>> MDPI AG >>> Postfach, CH-4020 Basel, Switzerland >>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>> Tel. +41 61 683 77 35 >>> Fax +41 61 302 89 18 >>> Email:?cuculovic at mdpi.com >>> Skype: milos.cuculovic.mdpi >>> >>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>> >>>> On 21 Mar 2019, at 12:36, Karthik Subrahmanya > wrote: >>>> >>>> Hi Milos, >>>> >>>> Thanks for the logs and the getfattr output. >>>> From the logs I can see that there are 6 entries under the directory "/data/data-cluster/dms/final_archive" named >>>> 41be9ff5ec05c4b1c989c6053e709e59 >>>> 5543982fab4b56060aa09f667a8ae617 >>>> a8b7f31775eebc8d1867e7f9de7b6eaf >>>> c1d3f3c2d7ae90e891e671e2f20d5d4b >>>> e5934699809a3b6dcfc5945f408b978b >>>> e7cdc94f60d390812a5f9754885e119e >>>> which are having gfid mismatch, so the heal is failing on this directory. >>>> >>>> You can use the CLI option to resolve these files from gfid mismatch. You can use any of the 3 methods available: >>>> 1. bigger-file >>>> gluster volume heal split-brain bigger-file >>>> >>>> 2. latest-mtime >>>> gluster volume heal split-brain latest-mtime >>>> >>>> 3. source-brick >>>> gluster volume heal split-brain source-brick >>>> >>>> where must be absolute path w.r.t. the volume, starting with '/'. >>>> If all those entries are directories then go for either latest-mtime/source-brick option. >>>> After you resolve all these gfid-mismatches, run the "gluster volume heal " command. Then check the heal info and let me know the result. >>>> >>>> Regards, >>>> Karthik >>>> >>>> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic > wrote: >>>> Sure, thank you for following up. >>>> >>>> About the commands, here is what I see: >>>> >>>> brick1: >>>> ????????????????????????????????????? >>>> sudo gluster volume heal storage2 info >>>> Brick storage3:/data/data-cluster >>>> >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 3 >>>> >>>> Brick storage4:/data/data-cluster >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 2 >>>> ????????????????????????????????????? >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.storage2-client-1=0x000000000000000000000010 >>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> ????????????????????????????????????? >>>> stat /data/data-cluster/dms/final_archive >>>> File: '/data/data-cluster/dms/final_archive' >>>> Size: 3497984 Blocks: 8768 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 26427748396 Links: 72123 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2018-10-09 04:22:40.514629044 +0200 >>>> Modify: 2019-03-21 11:55:37.382278863 +0100 >>>> Change: 2019-03-21 11:55:37.382278863 +0100 >>>> Birth: - >>>> ????????????????????????????????????? >>>> ????????????????????????????????????? >>>> >>>> brick2: >>>> ????????????????????????????????????? >>>> sudo gluster volume heal storage2 info >>>> Brick storage3:/data/data-cluster >>>> >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 3 >>>> >>>> Brick storage4:/data/data-cluster >>>> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 2 >>>> ????????????????????????????????????? >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.storage2-client-0=0x000000000000000000000001 >>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> ????????????????????????????????????? >>>> stat /data/data-cluster/dms/final_archive >>>> File: '/data/data-cluster/dms/final_archive' >>>> Size: 3497984 Blocks: 8760 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 13563551265 Links: 72124 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2018-10-09 04:22:40.514629044 +0200 >>>> Modify: 2019-03-21 11:55:46.382565124 +0100 >>>> Change: 2019-03-21 11:55:46.382565124 +0100 >>>> Birth: - >>>> ????????????????????????????????????? >>>> >>>> Hope this helps. >>>> >>>> - Kindest regards, >>>> >>>> Milos Cuculovic >>>> IT Manager >>>> >>>> --- >>>> MDPI AG >>>> Postfach, CH-4020 Basel, Switzerland >>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>> Tel. +41 61 683 77 35 >>>> Fax +41 61 302 89 18 >>>> Email:?cuculovic at mdpi.com >>>> Skype: milos.cuculovic.mdpi >>>> >>>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>>> >>>>> On 21 Mar 2019, at 11:43, Karthik Subrahmanya > wrote: >>>>> >>>>> Can you attach the "glustershd.log" file which will be present under "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m . -e hex " output of all the entries listed in the heal info output from both the bricks? >>>>> >>>>> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic > wrote: >>>>> Thanks Karthik! >>>>> >>>>> I was trying to find some resolution methods from [2] but unfortunately none worked (I can explain what I tried if needed). >>>>> >>>>>> I guess the volume you are talking about is of type replica-2 (1x2). >>>>> That?s correct, aware of the arbiter solution but still didn?t took time to implement. >>>>> >>>>> From the info results I posted, how to know in which situation I am. No files are mentioned in spit brain, only directories. One brick has 3 entries and one two entries. >>>>> >>>>> sudo gluster volume heal storage2 info >>>>> [sudo] password for sshadmin: >>>>> Brick storage3:/data/data-cluster >>>>> >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 3 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 2 >>>>> >>>>> - Kindest regards, >>>>> >>>>> Milos Cuculovic >>>>> IT Manager >>>>> >>>>> --- >>>>> MDPI AG >>>>> Postfach, CH-4020 Basel, Switzerland >>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>> Tel. +41 61 683 77 35 >>>>> Fax +41 61 302 89 18 >>>>> Email:?cuculovic at mdpi.com >>>>> Skype: milos.cuculovic.mdpi >>>>> >>>>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>>>> >>>>>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya > wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Note: I guess the volume you are talking about is of type replica-2 (1x2). Usually replica 2 volumes are prone to split-brain. If you can consider converting them to arbiter or replica-3, they will handle most of the cases which can lead to slit-brains. For more information see [1]. >>>>>> >>>>>> Resolving the split-brain: [2] talks about how to interpret the heal info output and different ways to resolve them using the CLI/manually/using the favorite-child-policy. >>>>>> If you are having entry split brain, and is a gfid split-brain (file/dir having different gfids on the replica bricks) then you can use the CLI option to resolve them. If a directory is in gfid split-brain in a distributed-replicate volume and you are using the source-brick option please make sure you use the brick of this subvolume, which has the same gfid as that of the other distribute subvolume(s) where you have the correct gfid, as the source. >>>>>> If you are having a type mismatch then follow the steps in [3] to resolve the split-brain. >>>>>> >>>>>> [1] https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >>>>>> [2] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >>>>>> [3] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain >>>>>> >>>>>> HTH, >>>>>> Karthik >>>>>> >>>>>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic > wrote: >>>>>> I was now able to catch the split brain log: >>>>>> >>>>>> sudo gluster volume heal storage2 info >>>>>> Brick storage3:/data/data-cluster >>>>>> >>>>>> >>>>>> /dms/final_archive - Is in split-brain >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 3 >>>>>> >>>>>> Brick storage4:/data/data-cluster >>>>>> >>>>>> /dms/final_archive - Is in split-brain >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 2 >>>>>> >>>>>> Milos >>>>>> >>>>>>> On 21 Mar 2019, at 09:07, Milos Cuculovic > wrote: >>>>>>> >>>>>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal shows this: >>>>>>> >>>>>>> sudo gluster volume heal storage2 info >>>>>>> Brick storage3:/data/data-cluster >>>>>>> >>>>>>> >>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 3 >>>>>>> >>>>>>> Brick storage4:/data/data-cluster >>>>>>> >>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 2 >>>>>>> >>>>>>> The same files stay there. From time to time the status of the /dms/final_archive is in split brain at the following command shows: >>>>>>> >>>>>>> sudo gluster volume heal storage2 info split-brain >>>>>>> Brick storage3:/data/data-cluster >>>>>>> /dms/final_archive >>>>>>> Status: Connected >>>>>>> Number of entries in split-brain: 1 >>>>>>> >>>>>>> Brick storage4:/data/data-cluster >>>>>>> /dms/final_archive >>>>>>> Status: Connected >>>>>>> Number of entries in split-brain: 1 >>>>>>> >>>>>>> How to know the file who is in split brain? The files in /dms/final_archive are not very important, fine to remove (ideally resolve the split brain) for the ones that differ. >>>>>>> >>>>>>> I can only see the directory and GFID. Any idea on how to resolve this situation as I would like to continue with the upgrade on the 2nd server, and for this the heal needs to be done with 0 entries in sudo gluster volume heal storage2 info >>>>>>> >>>>>>> Thank you in advance, Milos. >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Fri Mar 22 07:51:04 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Fri, 22 Mar 2019 13:21:04 +0530 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: <76354C5F-C197-475A-B4D3-D6089CED12EB@mdpi.com> References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> <14B1CACB-4049-42DD-AB69-3B75FBD6BE30@mdpi.com> <990A42AA-1F60-441E-BD5A-97B8333E2083@mdpi.com> <76354C5F-C197-475A-B4D3-D6089CED12EB@mdpi.com> Message-ID: Hi, If it is a file then you can find the filename from the gfid by running the following on the nodes hosting the bricks find -samefile // If it is a directory you can run the following on the nodes hosting the bricks ls -l // Run these on both the nodes and paste the output of these commands before running the lookup from client on these entries. Regards, Karthik On Fri, Mar 22, 2019 at 1:06 PM Milos Cuculovic wrote: > I have run a few minutes ago the info and here are the results: > > sudo gluster volume heal storage2 info > Brick storage3:/data/data-cluster > > > Status: Connected > Number of entries: 2 > > Brick storage4:/data/data-cluster > > > > > > > Status: Connected > Number of entries: 6 > > > sudo gluster volume heal storage2 info split-brain > Brick storage3:/data/data-cluster > Status: Connected > Number of entries in split-brain: 0 > > Brick storage4:/data/data-cluster > Status: Connected > Number of entries in split-brain: 0 > > The heal info (2 + 6) are there since yesterday and do not change. > > > If they are still there can you try doing a lookup on those entries from > client and see whether they are getting healed? > > How can I do this having the gfid only? > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email: cuculovic at mdpi.com > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message > are confidential and intended solely for the use of the individual or > entity to whom they are addressed. If you have received this message in > error, please notify me and delete this message from your system. You may > not copy this message in its entirety or in part, or disclose its contents > to anyone. > > On 21 Mar 2019, at 14:34, Karthik Subrahmanya wrote: > > Now the slit-brain on the directory is resolved. > Are these entries which are there in the latest heal info output not > getting healed? Are they still present in the heal info output? > If they are still there can you try doing a lookup on those entries from > client and see whether they are getting healed? > > > On Thu, Mar 21, 2019 at 6:49 PM Milos Cuculovic > wrote: > >> Hey Karthik, >> >> Can you run the "guster volume heal ? >> >> sudo gluster volume heal storage2 >> Launching heal operation to perform index self heal on volume storage2 >> has been successful >> Use heal info commands to check status. >> >> "gluster volume heal info? >> >> sudo gluster volume heal storage2 info >> Brick storage3:/data/data-cluster >> >> >> Status: Connected >> Number of entries: 2 >> >> Brick storage4:/data/data-cluster >> >> >> >> >> >> >> Status: Connected >> Number of entries: 6 >> >> >> >> - Kindest regards, >> >> Milos Cuculovic >> IT Manager >> >> --- >> MDPI AG >> Postfach, CH-4020 Basel, Switzerland >> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >> Tel. +41 61 683 77 35 >> Fax +41 61 302 89 18 >> Email: cuculovic at mdpi.com >> Skype: milos.cuculovic.mdpi >> >> Disclaimer: The information and files contained in this message >> are confidential and intended solely for the use of the individual or >> entity to whom they are addressed. If you have received this message in >> error, please notify me and delete this message from your system. You may >> not copy this message in its entirety or in part, or disclose its contents >> to anyone. >> >> On 21 Mar 2019, at 14:07, Karthik Subrahmanya >> wrote: >> >> Hey Milos, >> >> I see that gfid got healed for those directories from the getfattr output >> and the glfsheal log also has messages corresponding to deleting the >> entries on one brick as part of healing which then got recreated on the >> brick with the correct gfid. Can you run the "guster volume heal " >> & "gluster volume heal info" command and paste the output here? >> If you still see entries pending heal, give the latest glustershd.log >> files from both the nodes along with the getfattr output of the files which >> are listed in the heal info output. >> >> Regards, >> Karthik >> >> On Thu, Mar 21, 2019 at 6:03 PM Milos Cuculovic >> wrote: >> >>> Sure: >>> >>> brick1: >>> ???????????????????????????????????????????????????????????? >>> ???????????????????????????????????????????????????????????? >>> sudo getfattr -d -m . -e hex >>> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>> getfattr: Removing leading '/' from absolute path names >>> # file: >>> data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>> trusted.gfid=0xe358ff34504241d387efe1e76eb28bb0 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex >>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>> getfattr: Removing leading '/' from absolute path names >>> # file: >>> data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex >>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>> getfattr: Removing leading '/' from absolute path names >>> # file: >>> data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex >>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>> getfattr: Removing leading '/' from absolute path names >>> # file: >>> data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex >>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>> getfattr: Removing leading '/' from absolute path names >>> # file: >>> data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex >>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> getfattr: Removing leading '/' from absolute path names >>> # file: >>> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> ???????????????????????????????????????????????????????????? >>> sudo stat >>> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>> File: >>> '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 40809094709 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:06:26.994047597 +0100 >>> Modify: 2019-03-20 11:28:28.294689870 +0100 >>> Change: 2019-03-21 13:01:03.077654239 +0100 >>> Birth: - >>> >>> sudo stat >>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>> File: >>> '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 49399908865 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:07:20.342140927 +0100 >>> Modify: 2019-03-20 11:28:28.318690015 +0100 >>> Change: 2019-03-21 13:01:03.133654344 +0100 >>> Birth: - >>> >>> sudo stat >>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>> File: >>> '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 53706303549 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:06:55.414097315 +0100 >>> Modify: 2019-03-20 11:28:28.362690281 +0100 >>> Change: 2019-03-21 13:01:03.141654359 +0100 >>> Birth: - >>> >>> sudo stat >>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>> File: >>> '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 57990935591 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:07:08.558120309 +0100 >>> Modify: 2019-03-20 11:28:14.226604869 +0100 >>> Change: 2019-03-21 13:01:03.189654448 +0100 >>> Birth: - >>> >>> sudo stat >>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>> File: >>> '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 62291339781 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:06:02.070003998 +0100 >>> Modify: 2019-03-20 11:28:28.458690861 +0100 >>> Change: 2019-03-21 13:01:03.281654621 +0100 >>> Birth: - >>> >>> sudo stat >>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> File: >>> '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 66574223479 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:28:10.826584325 +0100 >>> Modify: 2019-03-20 11:28:10.834584374 +0100 >>> Change: 2019-03-20 14:06:07.937449353 +0100 >>> Birth: - >>> root at storage3:/var/log/glusterfs# >>> ???????????????????????????????????????????????????????????? >>> ???????????????????????????????????????????????????????????? >>> >>> brick2: >>> ???????????????????????????????????????????????????????????? >>> ???????????????????????????????????????????????????????????? >>> sudo getfattr -d -m . -e hex >>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> getfattr: Removing leading '/' from absolute path names >>> # file: >>> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex >>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>> getfattr: Removing leading '/' from absolute path names >>> # file: >>> data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex >>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>> getfattr: Removing leading '/' from absolute path names >>> # file: >>> data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex >>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>> getfattr: Removing leading '/' from absolute path names >>> # file: >>> data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex >>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>> getfattr: Removing leading '/' from absolute path names >>> # file: >>> data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex >>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> getfattr: Removing leading '/' from absolute path names >>> # file: >>> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> ???????????????????????????????????????????????????????????? >>> >>> sudo stat >>> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>> File: >>> '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 42232631305 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:06:26.994047597 +0100 >>> Modify: 2019-03-20 11:28:28.294689870 +0100 >>> Change: 2019-03-21 13:01:03.078748131 +0100 >>> Birth: - >>> >>> sudo stat >>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>> File: >>> '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 78589109305 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:07:20.342140927 +0100 >>> Modify: 2019-03-20 11:28:28.318690015 +0100 >>> Change: 2019-03-21 13:01:03.134748477 +0100 >>> Birth: - >>> >>> sudo stat >>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>> File: >>> '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 54972096517 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:06:55.414097315 +0100 >>> Modify: 2019-03-20 11:28:28.362690281 +0100 >>> Change: 2019-03-21 13:01:03.162748650 +0100 >>> Birth: - >>> >>> sudo stat >>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>> File: >>> '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 40821259275 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:07:08.558120309 +0100 >>> Modify: 2019-03-20 11:28:14.226604869 +0100 >>> Change: 2019-03-21 13:01:03.194748848 +0100 >>> Birth: - >>> >>> sudo stat >>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>> File: >>> '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 15876654 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:06:02.070003998 +0100 >>> Modify: 2019-03-20 11:28:28.458690861 +0100 >>> Change: 2019-03-21 13:01:03.282749392 +0100 >>> Birth: - >>> >>> sudo stat >>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> File: >>> '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 49408944650 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:28:10.826584325 +0100 >>> Modify: 2019-03-20 11:28:10.834584374 +0100 >>> Change: 2019-03-20 14:06:07.940849268 +0100 >>> Birth: - >>> ???????????????????????????????????????????????????????????? >>> ???????????????????????????????????????????????????????????? >>> >>> The file is from brick 2 that I upgraded and started the heal on. >>> >>> >>> - Kindest regards, >>> >>> Milos Cuculovic >>> IT Manager >>> >>> --- >>> MDPI AG >>> Postfach, CH-4020 Basel, Switzerland >>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>> Tel. +41 61 683 77 35 >>> Fax +41 61 302 89 18 >>> Email: cuculovic at mdpi.com >>> Skype: milos.cuculovic.mdpi >>> >>> Disclaimer: The information and files contained in this message >>> are confidential and intended solely for the use of the individual or >>> entity to whom they are addressed. If you have received this message in >>> error, please notify me and delete this message from your system. You may >>> not copy this message in its entirety or in part, or disclose its contents >>> to anyone. >>> >>> On 21 Mar 2019, at 13:05, Karthik Subrahmanya >>> wrote: >>> >>> Can you give me the stat & getfattr output of all those 6 entries from >>> both the bricks and the glfsheal-.log file from the node where you >>> run this command? >>> Meanwhile can you also try running this with the source-brick option? >>> >>> On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic >>> wrote: >>> >>>> Thank you Karthik, >>>> >>>> I have run this for all files (see example below) and it says the file >>>> is not in split-brain: >>>> >>>> sudo gluster volume heal storage2 split-brain latest-mtime >>>> /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>> Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: >>>> File not in split-brain. >>>> Volume heal failed. >>>> >>>> >>>> - Kindest regards, >>>> >>>> Milos Cuculovic >>>> IT Manager >>>> >>>> --- >>>> MDPI AG >>>> Postfach, CH-4020 Basel, Switzerland >>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>> Tel. +41 61 683 77 35 >>>> Fax +41 61 302 89 18 >>>> Email: cuculovic at mdpi.com >>>> Skype: milos.cuculovic.mdpi >>>> >>>> Disclaimer: The information and files contained in this message >>>> are confidential and intended solely for the use of the individual or >>>> entity to whom they are addressed. If you have received this message in >>>> error, please notify me and delete this message from your system. You may >>>> not copy this message in its entirety or in part, or disclose its contents >>>> to anyone. >>>> >>>> On 21 Mar 2019, at 12:36, Karthik Subrahmanya >>>> wrote: >>>> >>>> Hi Milos, >>>> >>>> Thanks for the logs and the getfattr output. >>>> From the logs I can see that there are 6 entries under the >>>> directory "/data/data-cluster/dms/final_archive" named >>>> 41be9ff5ec05c4b1c989c6053e709e59 >>>> 5543982fab4b56060aa09f667a8ae617 >>>> a8b7f31775eebc8d1867e7f9de7b6eaf >>>> c1d3f3c2d7ae90e891e671e2f20d5d4b >>>> e5934699809a3b6dcfc5945f408b978b >>>> e7cdc94f60d390812a5f9754885e119e >>>> which are having gfid mismatch, so the heal is failing on this >>>> directory. >>>> >>>> You can use the CLI option to resolve these files from gfid mismatch. >>>> You can use any of the 3 methods available: >>>> 1. bigger-file >>>> gluster volume heal split-brain bigger-file >>>> >>>> 2. latest-mtime >>>> gluster volume heal split-brain latest-mtime >>>> >>>> 3. source-brick >>>> gluster volume heal split-brain source-brick >>>> >>>> >>>> where must be absolute path w.r.t. the volume, starting with '/'. >>>> If all those entries are directories then go for either >>>> latest-mtime/source-brick option. >>>> After you resolve all these gfid-mismatches, run the "gluster volume >>>> heal " command. Then check the heal info and let me know the >>>> result. >>>> >>>> Regards, >>>> Karthik >>>> >>>> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic >>>> wrote: >>>> >>>>> Sure, thank you for following up. >>>>> >>>>> About the commands, here is what I see: >>>>> >>>>> brick1: >>>>> ????????????????????????????????????? >>>>> sudo gluster volume heal storage2 info >>>>> Brick storage3:/data/data-cluster >>>>> >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 3 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 2 >>>>> ????????????????????????????????????? >>>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: data/data-cluster/dms/final_archive >>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>> trusted.afr.storage2-client-1=0x000000000000000000000010 >>>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> ????????????????????????????????????? >>>>> stat /data/data-cluster/dms/final_archive >>>>> File: '/data/data-cluster/dms/final_archive' >>>>> Size: 3497984 Blocks: 8768 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 26427748396 Links: 72123 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>> 33/www-data) >>>>> Access: 2018-10-09 04:22:40.514629044 +0200 >>>>> Modify: 2019-03-21 11:55:37.382278863 +0100 >>>>> Change: 2019-03-21 11:55:37.382278863 +0100 >>>>> Birth: - >>>>> ????????????????????????????????????? >>>>> ????????????????????????????????????? >>>>> >>>>> brick2: >>>>> ????????????????????????????????????? >>>>> sudo gluster volume heal storage2 info >>>>> Brick storage3:/data/data-cluster >>>>> >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 3 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 2 >>>>> ????????????????????????????????????? >>>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: data/data-cluster/dms/final_archive >>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>> trusted.afr.storage2-client-0=0x000000000000000000000001 >>>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> ????????????????????????????????????? >>>>> stat /data/data-cluster/dms/final_archive >>>>> File: '/data/data-cluster/dms/final_archive' >>>>> Size: 3497984 Blocks: 8760 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 13563551265 Links: 72124 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>> 33/www-data) >>>>> Access: 2018-10-09 04:22:40.514629044 +0200 >>>>> Modify: 2019-03-21 11:55:46.382565124 +0100 >>>>> Change: 2019-03-21 11:55:46.382565124 +0100 >>>>> Birth: - >>>>> ????????????????????????????????????? >>>>> >>>>> Hope this helps. >>>>> >>>>> - Kindest regards, >>>>> >>>>> Milos Cuculovic >>>>> IT Manager >>>>> >>>>> --- >>>>> MDPI AG >>>>> Postfach, CH-4020 Basel, Switzerland >>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>> Tel. +41 61 683 77 35 >>>>> Fax +41 61 302 89 18 >>>>> Email: cuculovic at mdpi.com >>>>> Skype: milos.cuculovic.mdpi >>>>> >>>>> Disclaimer: The information and files contained in this message >>>>> are confidential and intended solely for the use of the individual or >>>>> entity to whom they are addressed. If you have received this message in >>>>> error, please notify me and delete this message from your system. You may >>>>> not copy this message in its entirety or in part, or disclose its contents >>>>> to anyone. >>>>> >>>>> On 21 Mar 2019, at 11:43, Karthik Subrahmanya >>>>> wrote: >>>>> >>>>> Can you attach the "glustershd.log" file which will be present under >>>>> "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m >>>>> . -e hex " output of all the entries listed in the heal >>>>> info output from both the bricks? >>>>> >>>>> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic >>>>> wrote: >>>>> >>>>>> Thanks Karthik! >>>>>> >>>>>> I was trying to find some resolution methods from [2] but >>>>>> unfortunately none worked (I can explain what I tried if needed). >>>>>> >>>>>> I guess the volume you are talking about is of type replica-2 (1x2). >>>>>> >>>>>> That?s correct, aware of the arbiter solution but still didn?t took >>>>>> time to implement. >>>>>> >>>>>> From the info results I posted, how to know in which situation I am. >>>>>> No files are mentioned in spit brain, only directories. One brick has 3 >>>>>> entries and one two entries. >>>>>> >>>>>> sudo gluster volume heal storage2 info >>>>>> [sudo] password for sshadmin: >>>>>> Brick storage3:/data/data-cluster >>>>>> >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 3 >>>>>> >>>>>> Brick storage4:/data/data-cluster >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 2 >>>>>> >>>>>> - Kindest regards, >>>>>> >>>>>> Milos Cuculovic >>>>>> IT Manager >>>>>> >>>>>> --- >>>>>> MDPI AG >>>>>> Postfach, CH-4020 Basel, Switzerland >>>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>>> Tel. +41 61 683 77 35 >>>>>> Fax +41 61 302 89 18 >>>>>> Email: cuculovic at mdpi.com >>>>>> Skype: milos.cuculovic.mdpi >>>>>> >>>>>> Disclaimer: The information and files contained in this message >>>>>> are confidential and intended solely for the use of the individual or >>>>>> entity to whom they are addressed. If you have received this message in >>>>>> error, please notify me and delete this message from your system. You may >>>>>> not copy this message in its entirety or in part, or disclose its contents >>>>>> to anyone. >>>>>> >>>>>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya >>>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Note: I guess the volume you are talking about is of type replica-2 >>>>>> (1x2). Usually replica 2 volumes are prone to split-brain. If you can >>>>>> consider converting them to arbiter or replica-3, they will handle most of >>>>>> the cases which can lead to slit-brains. For more information see [1]. >>>>>> >>>>>> Resolving the split-brain: [2] talks about how to interpret the heal >>>>>> info output and different ways to resolve them using the CLI/manually/using >>>>>> the favorite-child-policy. >>>>>> If you are having entry split brain, and is a gfid split-brain >>>>>> (file/dir having different gfids on the replica bricks) then you can use >>>>>> the CLI option to resolve them. If a directory is in gfid split-brain in a >>>>>> distributed-replicate volume and you are using the source-brick option >>>>>> please make sure you use the brick of this subvolume, which has the same >>>>>> gfid as that of the other distribute subvolume(s) where you have the >>>>>> correct gfid, as the source. >>>>>> If you are having a type mismatch then follow the steps in [3] to >>>>>> resolve the split-brain. >>>>>> >>>>>> [1] >>>>>> https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >>>>>> [2] >>>>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >>>>>> [3] >>>>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain >>>>>> >>>>>> HTH, >>>>>> Karthik >>>>>> >>>>>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic >>>>>> wrote: >>>>>> >>>>>>> I was now able to catch the split brain log: >>>>>>> >>>>>>> sudo gluster volume heal storage2 info >>>>>>> Brick storage3:/data/data-cluster >>>>>>> >>>>>>> >>>>>>> /dms/final_archive - Is in split-brain >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 3 >>>>>>> >>>>>>> Brick storage4:/data/data-cluster >>>>>>> >>>>>>> /dms/final_archive - Is in split-brain >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 2 >>>>>>> >>>>>>> Milos >>>>>>> >>>>>>> On 21 Mar 2019, at 09:07, Milos Cuculovic >>>>>>> wrote: >>>>>>> >>>>>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the >>>>>>> heal shows this: >>>>>>> >>>>>>> sudo gluster volume heal storage2 info >>>>>>> Brick storage3:/data/data-cluster >>>>>>> >>>>>>> >>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 3 >>>>>>> >>>>>>> Brick storage4:/data/data-cluster >>>>>>> >>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 2 >>>>>>> >>>>>>> The same files stay there. From time to time the status of the >>>>>>> /dms/final_archive is in split brain at the following command shows: >>>>>>> >>>>>>> sudo gluster volume heal storage2 info split-brain >>>>>>> Brick storage3:/data/data-cluster >>>>>>> /dms/final_archive >>>>>>> Status: Connected >>>>>>> Number of entries in split-brain: 1 >>>>>>> >>>>>>> Brick storage4:/data/data-cluster >>>>>>> /dms/final_archive >>>>>>> Status: Connected >>>>>>> Number of entries in split-brain: 1 >>>>>>> >>>>>>> How to know the file who is in split brain? The files in >>>>>>> /dms/final_archive are not very important, fine to remove (ideally resolve >>>>>>> the split brain) for the ones that differ. >>>>>>> >>>>>>> I can only see the directory and GFID. Any idea on how to resolve >>>>>>> this situation as I would like to continue with the upgrade on the 2nd >>>>>>> server, and for this the heal needs to be done with 0 entries in sudo >>>>>>> gluster volume heal storage2 info >>>>>>> >>>>>>> Thank you in advance, Milos. >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauryam at gmail.com Fri Mar 22 08:36:57 2019 From: mauryam at gmail.com (Maurya M) Date: Fri, 22 Mar 2019 14:06:57 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: Message-ID: hi Sunny, Passwordless ssh to : ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 azureuser at 172.16.201.35 is login, but when the whole command is run getting permission issues again:: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 azureuser at 172.16.201.35 gluster --xml --remote-host=localhost volume info vol_a5aee81a873c043c99a938adcb5b5781 -v ERROR: failed to create logfile "/var/log/glusterfs/cli.log" (Permission denied) ERROR: failed to open logfile /var/log/glusterfs/cli.log any idea here ? thanks, Maurya On Thu, Mar 21, 2019 at 2:43 PM Maurya M wrote: > hi Sunny, > i did use the [1] link for the setup, when i encountered this error > during ssh-copy-id : (so setup the passwordless ssh, by manually copied the > private/ public keys to all the nodes , both master & slave) > > [root at k8s-agentpool1-24779565-1 ~]# ssh-copy-id geouser at xxx.xx.xxx.x > /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: > "/root/.ssh/id_rsa.pub" > The authenticity of host ' xxx.xx.xxx.x ( xxx.xx.xxx.x )' can't be > established. > ECDSA key fingerprint is > SHA256:B2rNaocIcPjRga13oTnopbJ5KjI/7l5fMANXc+KhA9s. > ECDSA key fingerprint is > MD5:1b:70:f9:7a:bf:35:33:47:0c:f2:c1:cd:21:e2:d3:75. > Are you sure you want to continue connecting (yes/no)? yes > /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to > filter out any that are already installed > /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are > prompted now it is to install the new keys > Permission denied (publickey). > > To start afresh what all needs to teardown / delete, do we have any script > for it ? where all the pem keys do i need to delete? > > thanks, > Maurya > > On Thu, Mar 21, 2019 at 2:12 PM Sunny Kumar wrote: > >> Hey you can start a fresh I think you are not following proper setup >> steps. >> >> Please follow these steps [1] to create geo-rep session, you can >> delete the old one and do a fresh start. Or alternative you can use >> this tool[2] to setup geo-rep. >> >> >> [1]. >> https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ >> [2]. http://aravindavk.in/blog/gluster-georep-tools/ >> >> >> /Sunny >> >> On Thu, Mar 21, 2019 at 11:28 AM Maurya M wrote: >> > >> > Hi Sunil, >> > I did run the on the slave node : >> > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser >> vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781 >> > getting this message "/home/azureuser/common_secret.pem.pub not >> present. Please run geo-replication command on master with push-pem option >> to generate the file" >> > >> > So went back and created the session again, no change, so manually >> copied the common_secret.pem.pub to /home/azureuser/ but still the >> set_geo_rep_pem_keys.sh is looking the pem file in different name : >> COMMON_SECRET_PEM_PUB=${master_vol}_${slave_vol}_common_secret.pem.pub , >> change the name of pem , ran the command again : >> > >> > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser >> vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781 >> > Successfully copied file. >> > Command executed successfully. >> > >> > >> > - went back and created the session , start the geo-replication , still >> seeing the same error in logs. Any ideas ? >> > >> > thanks, >> > Maurya >> > >> > >> > >> > On Wed, Mar 20, 2019 at 11:07 PM Sunny Kumar >> wrote: >> >> >> >> Hi Maurya, >> >> >> >> I guess you missed last trick to distribute keys in slave node. I see >> >> this is non-root geo-rep setup so please try this: >> >> >> >> >> >> Run the following command as root in any one of Slave node. >> >> >> >> /usr/local/libexec/glusterfs/set_geo_rep_pem_keys.sh >> >> >> >> >> >> - Sunny >> >> >> >> On Wed, Mar 20, 2019 at 10:47 PM Maurya M wrote: >> >> > >> >> > Hi all, >> >> > Have setup a 3 master nodes - 3 slave nodes (gluster 4.1) for >> geo-replication, but once have the geo-replication configure the status is >> always on "Created', >> >> > even after have force start the session. >> >> > >> >> > On close inspect of the logs on the master node seeing this error: >> >> > >> >> > "E [syncdutils(monitor):801:errlog] Popen: command returned error >> cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >> /var/lib/glusterd/geo-replication/secret.pem -p 22 azureuser at xxxxx.xxxx..xxx. >> gluster --xml --remote-host=localhost volume info >> vol_a5ae34341a873c043c99a938adcb5b5781 error=255" >> >> > >> >> > Any ideas what is issue? >> >> > >> >> > thanks, >> >> > Maurya >> >> > >> >> > _______________________________________________ >> >> > Gluster-users mailing list >> >> > Gluster-users at gluster.org >> >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cuculovic at mdpi.com Fri Mar 22 11:32:11 2019 From: cuculovic at mdpi.com (Milos Cuculovic) Date: Fri, 22 Mar 2019 12:32:11 +0100 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> <14B1CACB-4049-42DD-AB69-3B75FBD6BE30@mdpi.com> <990A42AA-1F60-441E-BD5A-97B8333E2083@mdpi.com> <76354C5F-C197-475A-B4D3-D6089CED12EB@mdpi.com> Message-ID: <4B662F79-4947-4DFE-BDC7-B6B61A1054FF@mdpi.com> Thank you Karthik, The 2nd command works for all of them, those are directories: sudo ls -l /data/data-cluster/.glusterfs/27/6f/276fec9a-1c9b-4efe-9715-dcf4207e99b0 lrwxrwxrwx 1 root root 60 Jun 14 2018 /data/data-cluster/.glusterfs/27/6f/276fec9a-1c9b-4efe-9715-dcf4207e99b0 -> ../../a9/6e/a96e940d-3130-45d1-9efe-7aff463fec3d/final_files But now, what to do with this info? Since yesterday, the heal info shows the samge gfids. - Kindest regards, Milos Cuculovic IT Manager --- MDPI AG Postfach, CH-4020 Basel, Switzerland Office: St. Alban-Anlage 66, 4052 Basel, Switzerland Tel. +41 61 683 77 35 Fax +41 61 302 89 18 Email: cuculovic at mdpi.com Skype: milos.cuculovic.mdpi Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > On 22 Mar 2019, at 08:51, Karthik Subrahmanya wrote: > > Hi, > > If it is a file then you can find the filename from the gfid by running the following on the nodes hosting the bricks > find -samefile // > > If it is a directory you can run the following on the nodes hosting the bricks > ls -l // > > Run these on both the nodes and paste the output of these commands before running the lookup from client on these entries. > > Regards, > Karthik > > On Fri, Mar 22, 2019 at 1:06 PM Milos Cuculovic > wrote: > I have run a few minutes ago the info and here are the results: > > sudo gluster volume heal storage2 info > Brick storage3:/data/data-cluster > > > Status: Connected > Number of entries: 2 > > Brick storage4:/data/data-cluster > > > > > > > Status: Connected > Number of entries: 6 > > > sudo gluster volume heal storage2 info split-brain > Brick storage3:/data/data-cluster > Status: Connected > Number of entries in split-brain: 0 > > Brick storage4:/data/data-cluster > Status: Connected > Number of entries in split-brain: 0 > > The heal info (2 + 6) are there since yesterday and do not change. > > >> If they are still there can you try doing a lookup on those entries from client and see whether they are getting healed? > How can I do this having the gfid only? > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email:?cuculovic at mdpi.com > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > >> On 21 Mar 2019, at 14:34, Karthik Subrahmanya > wrote: >> >> Now the slit-brain on the directory is resolved. >> Are these entries which are there in the latest heal info output not getting healed? Are they still present in the heal info output? >> If they are still there can you try doing a lookup on those entries from client and see whether they are getting healed? >> >> >> On Thu, Mar 21, 2019 at 6:49 PM Milos Cuculovic > wrote: >> Hey Karthik, >> >>> Can you run the "guster volume heal ? >> sudo gluster volume heal storage2 >> Launching heal operation to perform index self heal on volume storage2 has been successful >> Use heal info commands to check status. >> >>> "gluster volume heal info? >> sudo gluster volume heal storage2 info >> Brick storage3:/data/data-cluster >> >> >> Status: Connected >> Number of entries: 2 >> >> Brick storage4:/data/data-cluster >> >> >> >> >> >> >> Status: Connected >> Number of entries: 6 >> >> >> >> - Kindest regards, >> >> Milos Cuculovic >> IT Manager >> >> --- >> MDPI AG >> Postfach, CH-4020 Basel, Switzerland >> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >> Tel. +41 61 683 77 35 >> Fax +41 61 302 89 18 >> Email:?cuculovic at mdpi.com >> Skype: milos.cuculovic.mdpi >> >> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >> >>> On 21 Mar 2019, at 14:07, Karthik Subrahmanya > wrote: >>> >>> Hey Milos, >>> >>> I see that gfid got healed for those directories from the getfattr output and the glfsheal log also has messages corresponding to deleting the entries on one brick as part of healing which then got recreated on the brick with the correct gfid. Can you run the "guster volume heal " & "gluster volume heal info" command and paste the output here? >>> If you still see entries pending heal, give the latest glustershd.log files from both the nodes along with the getfattr output of the files which are listed in the heal info output. >>> >>> Regards, >>> Karthik >>> >>> On Thu, Mar 21, 2019 at 6:03 PM Milos Cuculovic > wrote: >>> Sure: >>> >>> brick1: >>> ???????????????????????????????????????????????????????????? >>> ???????????????????????????????????????????????????????????? >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>> trusted.gfid=0xe358ff34504241d387efe1e76eb28bb0 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> ???????????????????????????????????????????????????????????? >>> sudo stat /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>> File: '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 40809094709 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:06:26.994047597 +0100 >>> Modify: 2019-03-20 11:28:28.294689870 +0100 >>> Change: 2019-03-21 13:01:03.077654239 +0100 >>> Birth: - >>> >>> sudo stat /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>> File: '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 49399908865 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:07:20.342140927 +0100 >>> Modify: 2019-03-20 11:28:28.318690015 +0100 >>> Change: 2019-03-21 13:01:03.133654344 +0100 >>> Birth: - >>> >>> sudo stat /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>> File: '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 53706303549 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:06:55.414097315 +0100 >>> Modify: 2019-03-20 11:28:28.362690281 +0100 >>> Change: 2019-03-21 13:01:03.141654359 +0100 >>> Birth: - >>> >>> sudo stat /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>> File: '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 57990935591 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:07:08.558120309 +0100 >>> Modify: 2019-03-20 11:28:14.226604869 +0100 >>> Change: 2019-03-21 13:01:03.189654448 +0100 >>> Birth: - >>> >>> sudo stat /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>> File: '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 62291339781 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:06:02.070003998 +0100 >>> Modify: 2019-03-20 11:28:28.458690861 +0100 >>> Change: 2019-03-21 13:01:03.281654621 +0100 >>> Birth: - >>> >>> sudo stat /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> File: '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 66574223479 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:28:10.826584325 +0100 >>> Modify: 2019-03-20 11:28:10.834584374 +0100 >>> Change: 2019-03-20 14:06:07.937449353 +0100 >>> Birth: - >>> root at storage3:/var/log/glusterfs# >>> ???????????????????????????????????????????????????????????? >>> ???????????????????????????????????????????????????????????? >>> >>> brick2: >>> ???????????????????????????????????????????????????????????? >>> ???????????????????????????????????????????????????????????? >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> >>> ???????????????????????????????????????????????????????????? >>> >>> sudo stat /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>> File: '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 42232631305 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:06:26.994047597 +0100 >>> Modify: 2019-03-20 11:28:28.294689870 +0100 >>> Change: 2019-03-21 13:01:03.078748131 +0100 >>> Birth: - >>> >>> sudo stat /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>> File: '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 78589109305 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:07:20.342140927 +0100 >>> Modify: 2019-03-20 11:28:28.318690015 +0100 >>> Change: 2019-03-21 13:01:03.134748477 +0100 >>> Birth: - >>> >>> sudo stat /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>> File: '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 54972096517 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:06:55.414097315 +0100 >>> Modify: 2019-03-20 11:28:28.362690281 +0100 >>> Change: 2019-03-21 13:01:03.162748650 +0100 >>> Birth: - >>> >>> sudo stat /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>> File: '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 40821259275 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:07:08.558120309 +0100 >>> Modify: 2019-03-20 11:28:14.226604869 +0100 >>> Change: 2019-03-21 13:01:03.194748848 +0100 >>> Birth: - >>> >>> sudo stat /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>> File: '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 15876654 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:06:02.070003998 +0100 >>> Modify: 2019-03-20 11:28:28.458690861 +0100 >>> Change: 2019-03-21 13:01:03.282749392 +0100 >>> Birth: - >>> >>> sudo stat /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>> File: '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' >>> Size: 33 Blocks: 0 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 49408944650 Links: 3 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2019-03-20 11:28:10.826584325 +0100 >>> Modify: 2019-03-20 11:28:10.834584374 +0100 >>> Change: 2019-03-20 14:06:07.940849268 +0100 >>> Birth: - >>> ???????????????????????????????????????????????????????????? >>> ???????????????????????????????????????????????????????????? >>> >>> The file is from brick 2 that I upgraded and started the heal on. >>> >>> >>> - Kindest regards, >>> >>> Milos Cuculovic >>> IT Manager >>> >>> --- >>> MDPI AG >>> Postfach, CH-4020 Basel, Switzerland >>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>> Tel. +41 61 683 77 35 >>> Fax +41 61 302 89 18 >>> Email:?cuculovic at mdpi.com >>> Skype: milos.cuculovic.mdpi >>> >>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>> >>>> On 21 Mar 2019, at 13:05, Karthik Subrahmanya > wrote: >>>> >>>> Can you give me the stat & getfattr output of all those 6 entries from both the bricks and the glfsheal-.log file from the node where you run this command? >>>> Meanwhile can you also try running this with the source-brick option? >>>> >>>> On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic > wrote: >>>> Thank you Karthik, >>>> >>>> I have run this for all files (see example below) and it says the file is not in split-brain: >>>> >>>> sudo gluster volume heal storage2 split-brain latest-mtime /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>> Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: File not in split-brain. >>>> Volume heal failed. >>>> >>>> >>>> - Kindest regards, >>>> >>>> Milos Cuculovic >>>> IT Manager >>>> >>>> --- >>>> MDPI AG >>>> Postfach, CH-4020 Basel, Switzerland >>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>> Tel. +41 61 683 77 35 >>>> Fax +41 61 302 89 18 >>>> Email:?cuculovic at mdpi.com >>>> Skype: milos.cuculovic.mdpi >>>> >>>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>>> >>>>> On 21 Mar 2019, at 12:36, Karthik Subrahmanya > wrote: >>>>> >>>>> Hi Milos, >>>>> >>>>> Thanks for the logs and the getfattr output. >>>>> From the logs I can see that there are 6 entries under the directory "/data/data-cluster/dms/final_archive" named >>>>> 41be9ff5ec05c4b1c989c6053e709e59 >>>>> 5543982fab4b56060aa09f667a8ae617 >>>>> a8b7f31775eebc8d1867e7f9de7b6eaf >>>>> c1d3f3c2d7ae90e891e671e2f20d5d4b >>>>> e5934699809a3b6dcfc5945f408b978b >>>>> e7cdc94f60d390812a5f9754885e119e >>>>> which are having gfid mismatch, so the heal is failing on this directory. >>>>> >>>>> You can use the CLI option to resolve these files from gfid mismatch. You can use any of the 3 methods available: >>>>> 1. bigger-file >>>>> gluster volume heal split-brain bigger-file >>>>> >>>>> 2. latest-mtime >>>>> gluster volume heal split-brain latest-mtime >>>>> >>>>> 3. source-brick >>>>> gluster volume heal split-brain source-brick >>>>> >>>>> where must be absolute path w.r.t. the volume, starting with '/'. >>>>> If all those entries are directories then go for either latest-mtime/source-brick option. >>>>> After you resolve all these gfid-mismatches, run the "gluster volume heal " command. Then check the heal info and let me know the result. >>>>> >>>>> Regards, >>>>> Karthik >>>>> >>>>> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic > wrote: >>>>> Sure, thank you for following up. >>>>> >>>>> About the commands, here is what I see: >>>>> >>>>> brick1: >>>>> ????????????????????????????????????? >>>>> sudo gluster volume heal storage2 info >>>>> Brick storage3:/data/data-cluster >>>>> >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 3 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 2 >>>>> ????????????????????????????????????? >>>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: data/data-cluster/dms/final_archive >>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>> trusted.afr.storage2-client-1=0x000000000000000000000010 >>>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> ????????????????????????????????????? >>>>> stat /data/data-cluster/dms/final_archive >>>>> File: '/data/data-cluster/dms/final_archive' >>>>> Size: 3497984 Blocks: 8768 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 26427748396 Links: 72123 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>>> Access: 2018-10-09 04:22:40.514629044 +0200 >>>>> Modify: 2019-03-21 11:55:37.382278863 +0100 >>>>> Change: 2019-03-21 11:55:37.382278863 +0100 >>>>> Birth: - >>>>> ????????????????????????????????????? >>>>> ????????????????????????????????????? >>>>> >>>>> brick2: >>>>> ????????????????????????????????????? >>>>> sudo gluster volume heal storage2 info >>>>> Brick storage3:/data/data-cluster >>>>> >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 3 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 2 >>>>> ????????????????????????????????????? >>>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: data/data-cluster/dms/final_archive >>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>> trusted.afr.storage2-client-0=0x000000000000000000000001 >>>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> ????????????????????????????????????? >>>>> stat /data/data-cluster/dms/final_archive >>>>> File: '/data/data-cluster/dms/final_archive' >>>>> Size: 3497984 Blocks: 8760 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 13563551265 Links: 72124 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>>> Access: 2018-10-09 04:22:40.514629044 +0200 >>>>> Modify: 2019-03-21 11:55:46.382565124 +0100 >>>>> Change: 2019-03-21 11:55:46.382565124 +0100 >>>>> Birth: - >>>>> ????????????????????????????????????? >>>>> >>>>> Hope this helps. >>>>> >>>>> - Kindest regards, >>>>> >>>>> Milos Cuculovic >>>>> IT Manager >>>>> >>>>> --- >>>>> MDPI AG >>>>> Postfach, CH-4020 Basel, Switzerland >>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>> Tel. +41 61 683 77 35 >>>>> Fax +41 61 302 89 18 >>>>> Email:?cuculovic at mdpi.com >>>>> Skype: milos.cuculovic.mdpi >>>>> >>>>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>>>> >>>>>> On 21 Mar 2019, at 11:43, Karthik Subrahmanya > wrote: >>>>>> >>>>>> Can you attach the "glustershd.log" file which will be present under "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m . -e hex " output of all the entries listed in the heal info output from both the bricks? >>>>>> >>>>>> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic > wrote: >>>>>> Thanks Karthik! >>>>>> >>>>>> I was trying to find some resolution methods from [2] but unfortunately none worked (I can explain what I tried if needed). >>>>>> >>>>>>> I guess the volume you are talking about is of type replica-2 (1x2). >>>>>> That?s correct, aware of the arbiter solution but still didn?t took time to implement. >>>>>> >>>>>> From the info results I posted, how to know in which situation I am. No files are mentioned in spit brain, only directories. One brick has 3 entries and one two entries. >>>>>> >>>>>> sudo gluster volume heal storage2 info >>>>>> [sudo] password for sshadmin: >>>>>> Brick storage3:/data/data-cluster >>>>>> >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 3 >>>>>> >>>>>> Brick storage4:/data/data-cluster >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 2 >>>>>> >>>>>> - Kindest regards, >>>>>> >>>>>> Milos Cuculovic >>>>>> IT Manager >>>>>> >>>>>> --- >>>>>> MDPI AG >>>>>> Postfach, CH-4020 Basel, Switzerland >>>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>>> Tel. +41 61 683 77 35 >>>>>> Fax +41 61 302 89 18 >>>>>> Email:?cuculovic at mdpi.com >>>>>> Skype: milos.cuculovic.mdpi >>>>>> >>>>>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>>>>> >>>>>>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya > wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Note: I guess the volume you are talking about is of type replica-2 (1x2). Usually replica 2 volumes are prone to split-brain. If you can consider converting them to arbiter or replica-3, they will handle most of the cases which can lead to slit-brains. For more information see [1]. >>>>>>> >>>>>>> Resolving the split-brain: [2] talks about how to interpret the heal info output and different ways to resolve them using the CLI/manually/using the favorite-child-policy. >>>>>>> If you are having entry split brain, and is a gfid split-brain (file/dir having different gfids on the replica bricks) then you can use the CLI option to resolve them. If a directory is in gfid split-brain in a distributed-replicate volume and you are using the source-brick option please make sure you use the brick of this subvolume, which has the same gfid as that of the other distribute subvolume(s) where you have the correct gfid, as the source. >>>>>>> If you are having a type mismatch then follow the steps in [3] to resolve the split-brain. >>>>>>> >>>>>>> [1] https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >>>>>>> [2] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >>>>>>> [3] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain >>>>>>> >>>>>>> HTH, >>>>>>> Karthik >>>>>>> >>>>>>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic > wrote: >>>>>>> I was now able to catch the split brain log: >>>>>>> >>>>>>> sudo gluster volume heal storage2 info >>>>>>> Brick storage3:/data/data-cluster >>>>>>> >>>>>>> >>>>>>> /dms/final_archive - Is in split-brain >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 3 >>>>>>> >>>>>>> Brick storage4:/data/data-cluster >>>>>>> >>>>>>> /dms/final_archive - Is in split-brain >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 2 >>>>>>> >>>>>>> Milos >>>>>>> >>>>>>>> On 21 Mar 2019, at 09:07, Milos Cuculovic > wrote: >>>>>>>> >>>>>>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal shows this: >>>>>>>> >>>>>>>> sudo gluster volume heal storage2 info >>>>>>>> Brick storage3:/data/data-cluster >>>>>>>> >>>>>>>> >>>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>>> >>>>>>>> Status: Connected >>>>>>>> Number of entries: 3 >>>>>>>> >>>>>>>> Brick storage4:/data/data-cluster >>>>>>>> >>>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>>> >>>>>>>> Status: Connected >>>>>>>> Number of entries: 2 >>>>>>>> >>>>>>>> The same files stay there. From time to time the status of the /dms/final_archive is in split brain at the following command shows: >>>>>>>> >>>>>>>> sudo gluster volume heal storage2 info split-brain >>>>>>>> Brick storage3:/data/data-cluster >>>>>>>> /dms/final_archive >>>>>>>> Status: Connected >>>>>>>> Number of entries in split-brain: 1 >>>>>>>> >>>>>>>> Brick storage4:/data/data-cluster >>>>>>>> /dms/final_archive >>>>>>>> Status: Connected >>>>>>>> Number of entries in split-brain: 1 >>>>>>>> >>>>>>>> How to know the file who is in split brain? The files in /dms/final_archive are not very important, fine to remove (ideally resolve the split brain) for the ones that differ. >>>>>>>> >>>>>>>> I can only see the directory and GFID. Any idea on how to resolve this situation as I would like to continue with the upgrade on the 2nd server, and for this the heal needs to be done with 0 entries in sudo gluster volume heal storage2 info >>>>>>>> >>>>>>>> Thank you in advance, Milos. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Fri Mar 22 11:58:43 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Fri, 22 Mar 2019 17:28:43 +0530 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: <4B662F79-4947-4DFE-BDC7-B6B61A1054FF@mdpi.com> References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> <14B1CACB-4049-42DD-AB69-3B75FBD6BE30@mdpi.com> <990A42AA-1F60-441E-BD5A-97B8333E2083@mdpi.com> <76354C5F-C197-475A-B4D3-D6089CED12EB@mdpi.com> <4B662F79-4947-4DFE-BDC7-B6B61A1054FF@mdpi.com> Message-ID: On Fri, Mar 22, 2019 at 5:02 PM Milos Cuculovic wrote: > Thank you Karthik, > > The 2nd command works for all of them, those are directories: > sudo ls -l > /data/data-cluster/.glusterfs/27/6f/276fec9a-1c9b-4efe-9715-dcf4207e99b0 > lrwxrwxrwx 1 root root 60 Jun 14 2018 > /data/data-cluster/.glusterfs/27/6f/276fec9a-1c9b-4efe-9715-dcf4207e99b0 -> > ../../a9/6e/a96e940d-3130-45d1-9efe-7aff463fec3d/final_files > > But now, what to do with this info? Since yesterday, the heal info shows > the samge gfids. > You can make use of the same ls - l command to get the actual path of the parent directory (a96e940d-3130-45d1-9efe-7aff463fec3d). Once you get the complete path you can run the lookup on those directories from the client and see whether they are getting healed? If not send the getfattr output of all the directories which are pending heal, glustershd.log from both the nodes, and client mount log from where you run the lookup. > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email: cuculovic at mdpi.com > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message > are confidential and intended solely for the use of the individual or > entity to whom they are addressed. If you have received this message in > error, please notify me and delete this message from your system. You may > not copy this message in its entirety or in part, or disclose its contents > to anyone. > > On 22 Mar 2019, at 08:51, Karthik Subrahmanya wrote: > > Hi, > > If it is a file then you can find the filename from the gfid by running > the following on the nodes hosting the bricks > find -samefile gfid>// > > If it is a directory you can run the following on the nodes hosting the > bricks > ls -l / gfid>/ > > Run these on both the nodes and paste the output of these commands before > running the lookup from client on these entries. > > Regards, > Karthik > > On Fri, Mar 22, 2019 at 1:06 PM Milos Cuculovic > wrote: > >> I have run a few minutes ago the info and here are the results: >> >> sudo gluster volume heal storage2 info >> Brick storage3:/data/data-cluster >> >> >> Status: Connected >> Number of entries: 2 >> >> Brick storage4:/data/data-cluster >> >> >> >> >> >> >> Status: Connected >> Number of entries: 6 >> >> >> sudo gluster volume heal storage2 info split-brain >> Brick storage3:/data/data-cluster >> Status: Connected >> Number of entries in split-brain: 0 >> >> Brick storage4:/data/data-cluster >> Status: Connected >> Number of entries in split-brain: 0 >> >> The heal info (2 + 6) are there since yesterday and do not change. >> >> >> If they are still there can you try doing a lookup on those entries from >> client and see whether they are getting healed? >> >> How can I do this having the gfid only? >> >> - Kindest regards, >> >> Milos Cuculovic >> IT Manager >> >> --- >> MDPI AG >> Postfach, CH-4020 Basel, Switzerland >> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >> Tel. +41 61 683 77 35 >> Fax +41 61 302 89 18 >> Email: cuculovic at mdpi.com >> Skype: milos.cuculovic.mdpi >> >> Disclaimer: The information and files contained in this message >> are confidential and intended solely for the use of the individual or >> entity to whom they are addressed. If you have received this message in >> error, please notify me and delete this message from your system. You may >> not copy this message in its entirety or in part, or disclose its contents >> to anyone. >> >> On 21 Mar 2019, at 14:34, Karthik Subrahmanya >> wrote: >> >> Now the slit-brain on the directory is resolved. >> Are these entries which are there in the latest heal info output not >> getting healed? Are they still present in the heal info output? >> If they are still there can you try doing a lookup on those entries from >> client and see whether they are getting healed? >> >> >> On Thu, Mar 21, 2019 at 6:49 PM Milos Cuculovic >> wrote: >> >>> Hey Karthik, >>> >>> Can you run the "guster volume heal ? >>> >>> sudo gluster volume heal storage2 >>> Launching heal operation to perform index self heal on volume storage2 >>> has been successful >>> Use heal info commands to check status. >>> >>> "gluster volume heal info? >>> >>> sudo gluster volume heal storage2 info >>> Brick storage3:/data/data-cluster >>> >>> >>> Status: Connected >>> Number of entries: 2 >>> >>> Brick storage4:/data/data-cluster >>> >>> >>> >>> >>> >>> >>> Status: Connected >>> Number of entries: 6 >>> >>> >>> >>> - Kindest regards, >>> >>> Milos Cuculovic >>> IT Manager >>> >>> --- >>> MDPI AG >>> Postfach, CH-4020 Basel, Switzerland >>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>> Tel. +41 61 683 77 35 >>> Fax +41 61 302 89 18 >>> Email: cuculovic at mdpi.com >>> Skype: milos.cuculovic.mdpi >>> >>> Disclaimer: The information and files contained in this message >>> are confidential and intended solely for the use of the individual or >>> entity to whom they are addressed. If you have received this message in >>> error, please notify me and delete this message from your system. You may >>> not copy this message in its entirety or in part, or disclose its contents >>> to anyone. >>> >>> On 21 Mar 2019, at 14:07, Karthik Subrahmanya >>> wrote: >>> >>> Hey Milos, >>> >>> I see that gfid got healed for those directories from the getfattr >>> output and the glfsheal log also has messages corresponding to deleting the >>> entries on one brick as part of healing which then got recreated on the >>> brick with the correct gfid. Can you run the "guster volume heal " >>> & "gluster volume heal info" command and paste the output here? >>> If you still see entries pending heal, give the latest glustershd.log >>> files from both the nodes along with the getfattr output of the files which >>> are listed in the heal info output. >>> >>> Regards, >>> Karthik >>> >>> On Thu, Mar 21, 2019 at 6:03 PM Milos Cuculovic >>> wrote: >>> >>>> Sure: >>>> >>>> brick1: >>>> ???????????????????????????????????????????????????????????? >>>> ???????????????????????????????????????????????????????????? >>>> sudo getfattr -d -m . -e hex >>>> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: >>>> data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>> trusted.gfid=0xe358ff34504241d387efe1e76eb28bb0 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex >>>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: >>>> data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex >>>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: >>>> data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex >>>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: >>>> data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex >>>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: >>>> data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex >>>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: >>>> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> ???????????????????????????????????????????????????????????? >>>> sudo stat >>>> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>> File: >>>> '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 40809094709 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:06:26.994047597 +0100 >>>> Modify: 2019-03-20 11:28:28.294689870 +0100 >>>> Change: 2019-03-21 13:01:03.077654239 +0100 >>>> Birth: - >>>> >>>> sudo stat >>>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>> File: >>>> '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 49399908865 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:07:20.342140927 +0100 >>>> Modify: 2019-03-20 11:28:28.318690015 +0100 >>>> Change: 2019-03-21 13:01:03.133654344 +0100 >>>> Birth: - >>>> >>>> sudo stat >>>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>> File: >>>> '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 53706303549 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:06:55.414097315 +0100 >>>> Modify: 2019-03-20 11:28:28.362690281 +0100 >>>> Change: 2019-03-21 13:01:03.141654359 +0100 >>>> Birth: - >>>> >>>> sudo stat >>>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>> File: >>>> '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 57990935591 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:07:08.558120309 +0100 >>>> Modify: 2019-03-20 11:28:14.226604869 +0100 >>>> Change: 2019-03-21 13:01:03.189654448 +0100 >>>> Birth: - >>>> >>>> sudo stat >>>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>> File: >>>> '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 62291339781 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:06:02.070003998 +0100 >>>> Modify: 2019-03-20 11:28:28.458690861 +0100 >>>> Change: 2019-03-21 13:01:03.281654621 +0100 >>>> Birth: - >>>> >>>> sudo stat >>>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> File: >>>> '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 66574223479 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:28:10.826584325 +0100 >>>> Modify: 2019-03-20 11:28:10.834584374 +0100 >>>> Change: 2019-03-20 14:06:07.937449353 +0100 >>>> Birth: - >>>> root at storage3:/var/log/glusterfs# >>>> ???????????????????????????????????????????????????????????? >>>> ???????????????????????????????????????????????????????????? >>>> >>>> brick2: >>>> ???????????????????????????????????????????????????????????? >>>> ???????????????????????????????????????????????????????????? >>>> sudo getfattr -d -m . -e hex >>>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: >>>> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex >>>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: >>>> data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex >>>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: >>>> data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex >>>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: >>>> data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex >>>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: >>>> data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex >>>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: >>>> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> ???????????????????????????????????????????????????????????? >>>> >>>> sudo stat >>>> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>> File: >>>> '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 42232631305 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:06:26.994047597 +0100 >>>> Modify: 2019-03-20 11:28:28.294689870 +0100 >>>> Change: 2019-03-21 13:01:03.078748131 +0100 >>>> Birth: - >>>> >>>> sudo stat >>>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>> File: >>>> '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 78589109305 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:07:20.342140927 +0100 >>>> Modify: 2019-03-20 11:28:28.318690015 +0100 >>>> Change: 2019-03-21 13:01:03.134748477 +0100 >>>> Birth: - >>>> >>>> sudo stat >>>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>> File: >>>> '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 54972096517 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:06:55.414097315 +0100 >>>> Modify: 2019-03-20 11:28:28.362690281 +0100 >>>> Change: 2019-03-21 13:01:03.162748650 +0100 >>>> Birth: - >>>> >>>> sudo stat >>>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>> File: >>>> '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 40821259275 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:07:08.558120309 +0100 >>>> Modify: 2019-03-20 11:28:14.226604869 +0100 >>>> Change: 2019-03-21 13:01:03.194748848 +0100 >>>> Birth: - >>>> >>>> sudo stat >>>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>> File: >>>> '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 15876654 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:06:02.070003998 +0100 >>>> Modify: 2019-03-20 11:28:28.458690861 +0100 >>>> Change: 2019-03-21 13:01:03.282749392 +0100 >>>> Birth: - >>>> >>>> sudo stat >>>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> File: >>>> '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 49408944650 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:28:10.826584325 +0100 >>>> Modify: 2019-03-20 11:28:10.834584374 +0100 >>>> Change: 2019-03-20 14:06:07.940849268 +0100 >>>> Birth: - >>>> ???????????????????????????????????????????????????????????? >>>> ???????????????????????????????????????????????????????????? >>>> >>>> The file is from brick 2 that I upgraded and started the heal on. >>>> >>>> >>>> - Kindest regards, >>>> >>>> Milos Cuculovic >>>> IT Manager >>>> >>>> --- >>>> MDPI AG >>>> Postfach, CH-4020 Basel, Switzerland >>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>> Tel. +41 61 683 77 35 >>>> Fax +41 61 302 89 18 >>>> Email: cuculovic at mdpi.com >>>> Skype: milos.cuculovic.mdpi >>>> >>>> Disclaimer: The information and files contained in this message >>>> are confidential and intended solely for the use of the individual or >>>> entity to whom they are addressed. If you have received this message in >>>> error, please notify me and delete this message from your system. You may >>>> not copy this message in its entirety or in part, or disclose its contents >>>> to anyone. >>>> >>>> On 21 Mar 2019, at 13:05, Karthik Subrahmanya >>>> wrote: >>>> >>>> Can you give me the stat & getfattr output of all those 6 entries from >>>> both the bricks and the glfsheal-.log file from the node where you >>>> run this command? >>>> Meanwhile can you also try running this with the source-brick option? >>>> >>>> On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic >>>> wrote: >>>> >>>>> Thank you Karthik, >>>>> >>>>> I have run this for all files (see example below) and it says the file >>>>> is not in split-brain: >>>>> >>>>> sudo gluster volume heal storage2 split-brain latest-mtime >>>>> /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>>> Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: >>>>> File not in split-brain. >>>>> Volume heal failed. >>>>> >>>>> >>>>> - Kindest regards, >>>>> >>>>> Milos Cuculovic >>>>> IT Manager >>>>> >>>>> --- >>>>> MDPI AG >>>>> Postfach, CH-4020 Basel, Switzerland >>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>> Tel. +41 61 683 77 35 >>>>> Fax +41 61 302 89 18 >>>>> Email: cuculovic at mdpi.com >>>>> Skype: milos.cuculovic.mdpi >>>>> >>>>> Disclaimer: The information and files contained in this message >>>>> are confidential and intended solely for the use of the individual or >>>>> entity to whom they are addressed. If you have received this message in >>>>> error, please notify me and delete this message from your system. You may >>>>> not copy this message in its entirety or in part, or disclose its contents >>>>> to anyone. >>>>> >>>>> On 21 Mar 2019, at 12:36, Karthik Subrahmanya >>>>> wrote: >>>>> >>>>> Hi Milos, >>>>> >>>>> Thanks for the logs and the getfattr output. >>>>> From the logs I can see that there are 6 entries under the >>>>> directory "/data/data-cluster/dms/final_archive" named >>>>> 41be9ff5ec05c4b1c989c6053e709e59 >>>>> 5543982fab4b56060aa09f667a8ae617 >>>>> a8b7f31775eebc8d1867e7f9de7b6eaf >>>>> c1d3f3c2d7ae90e891e671e2f20d5d4b >>>>> e5934699809a3b6dcfc5945f408b978b >>>>> e7cdc94f60d390812a5f9754885e119e >>>>> which are having gfid mismatch, so the heal is failing on this >>>>> directory. >>>>> >>>>> You can use the CLI option to resolve these files from gfid mismatch. >>>>> You can use any of the 3 methods available: >>>>> 1. bigger-file >>>>> gluster volume heal split-brain bigger-file >>>>> >>>>> 2. latest-mtime >>>>> gluster volume heal split-brain latest-mtime >>>>> >>>>> 3. source-brick >>>>> gluster volume heal split-brain source-brick >>>>> >>>>> >>>>> where must be absolute path w.r.t. the volume, starting with >>>>> '/'. >>>>> If all those entries are directories then go for either >>>>> latest-mtime/source-brick option. >>>>> After you resolve all these gfid-mismatches, run the "gluster volume >>>>> heal " command. Then check the heal info and let me know the >>>>> result. >>>>> >>>>> Regards, >>>>> Karthik >>>>> >>>>> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic >>>>> wrote: >>>>> >>>>>> Sure, thank you for following up. >>>>>> >>>>>> About the commands, here is what I see: >>>>>> >>>>>> brick1: >>>>>> ????????????????????????????????????? >>>>>> sudo gluster volume heal storage2 info >>>>>> Brick storage3:/data/data-cluster >>>>>> >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 3 >>>>>> >>>>>> Brick storage4:/data/data-cluster >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 2 >>>>>> ????????????????????????????????????? >>>>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>>>>> getfattr: Removing leading '/' from absolute path names >>>>>> # file: data/data-cluster/dms/final_archive >>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>> trusted.afr.storage2-client-1=0x000000000000000000000010 >>>>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>>> ????????????????????????????????????? >>>>>> stat /data/data-cluster/dms/final_archive >>>>>> File: '/data/data-cluster/dms/final_archive' >>>>>> Size: 3497984 Blocks: 8768 IO Block: 4096 directory >>>>>> Device: 807h/2055d Inode: 26427748396 Links: 72123 >>>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>>> 33/www-data) >>>>>> Access: 2018-10-09 04:22:40.514629044 +0200 >>>>>> Modify: 2019-03-21 11:55:37.382278863 +0100 >>>>>> Change: 2019-03-21 11:55:37.382278863 +0100 >>>>>> Birth: - >>>>>> ????????????????????????????????????? >>>>>> ????????????????????????????????????? >>>>>> >>>>>> brick2: >>>>>> ????????????????????????????????????? >>>>>> sudo gluster volume heal storage2 info >>>>>> Brick storage3:/data/data-cluster >>>>>> >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 3 >>>>>> >>>>>> Brick storage4:/data/data-cluster >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 2 >>>>>> ????????????????????????????????????? >>>>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>>>>> getfattr: Removing leading '/' from absolute path names >>>>>> # file: data/data-cluster/dms/final_archive >>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>> trusted.afr.storage2-client-0=0x000000000000000000000001 >>>>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>>> ????????????????????????????????????? >>>>>> stat /data/data-cluster/dms/final_archive >>>>>> File: '/data/data-cluster/dms/final_archive' >>>>>> Size: 3497984 Blocks: 8760 IO Block: 4096 directory >>>>>> Device: 807h/2055d Inode: 13563551265 Links: 72124 >>>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>>> 33/www-data) >>>>>> Access: 2018-10-09 04:22:40.514629044 +0200 >>>>>> Modify: 2019-03-21 11:55:46.382565124 +0100 >>>>>> Change: 2019-03-21 11:55:46.382565124 +0100 >>>>>> Birth: - >>>>>> ????????????????????????????????????? >>>>>> >>>>>> Hope this helps. >>>>>> >>>>>> - Kindest regards, >>>>>> >>>>>> Milos Cuculovic >>>>>> IT Manager >>>>>> >>>>>> --- >>>>>> MDPI AG >>>>>> Postfach, CH-4020 Basel, Switzerland >>>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>>> Tel. +41 61 683 77 35 >>>>>> Fax +41 61 302 89 18 >>>>>> Email: cuculovic at mdpi.com >>>>>> Skype: milos.cuculovic.mdpi >>>>>> >>>>>> Disclaimer: The information and files contained in this message >>>>>> are confidential and intended solely for the use of the individual or >>>>>> entity to whom they are addressed. If you have received this message in >>>>>> error, please notify me and delete this message from your system. You may >>>>>> not copy this message in its entirety or in part, or disclose its contents >>>>>> to anyone. >>>>>> >>>>>> On 21 Mar 2019, at 11:43, Karthik Subrahmanya >>>>>> wrote: >>>>>> >>>>>> Can you attach the "glustershd.log" file which will be present under >>>>>> "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m >>>>>> . -e hex " output of all the entries listed in the heal >>>>>> info output from both the bricks? >>>>>> >>>>>> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic >>>>>> wrote: >>>>>> >>>>>>> Thanks Karthik! >>>>>>> >>>>>>> I was trying to find some resolution methods from [2] but >>>>>>> unfortunately none worked (I can explain what I tried if needed). >>>>>>> >>>>>>> I guess the volume you are talking about is of type replica-2 (1x2). >>>>>>> >>>>>>> That?s correct, aware of the arbiter solution but still didn?t took >>>>>>> time to implement. >>>>>>> >>>>>>> From the info results I posted, how to know in which situation I am. >>>>>>> No files are mentioned in spit brain, only directories. One brick has 3 >>>>>>> entries and one two entries. >>>>>>> >>>>>>> sudo gluster volume heal storage2 info >>>>>>> [sudo] password for sshadmin: >>>>>>> Brick storage3:/data/data-cluster >>>>>>> >>>>>>> >>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 3 >>>>>>> >>>>>>> Brick storage4:/data/data-cluster >>>>>>> >>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 2 >>>>>>> >>>>>>> - Kindest regards, >>>>>>> >>>>>>> Milos Cuculovic >>>>>>> IT Manager >>>>>>> >>>>>>> --- >>>>>>> MDPI AG >>>>>>> Postfach, CH-4020 Basel, Switzerland >>>>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>>>> Tel. +41 61 683 77 35 >>>>>>> Fax +41 61 302 89 18 >>>>>>> Email: cuculovic at mdpi.com >>>>>>> Skype: milos.cuculovic.mdpi >>>>>>> >>>>>>> Disclaimer: The information and files contained in this message >>>>>>> are confidential and intended solely for the use of the individual or >>>>>>> entity to whom they are addressed. If you have received this message in >>>>>>> error, please notify me and delete this message from your system. You may >>>>>>> not copy this message in its entirety or in part, or disclose its contents >>>>>>> to anyone. >>>>>>> >>>>>>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya >>>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Note: I guess the volume you are talking about is of type replica-2 >>>>>>> (1x2). Usually replica 2 volumes are prone to split-brain. If you can >>>>>>> consider converting them to arbiter or replica-3, they will handle most of >>>>>>> the cases which can lead to slit-brains. For more information see [1]. >>>>>>> >>>>>>> Resolving the split-brain: [2] talks about how to interpret the heal >>>>>>> info output and different ways to resolve them using the CLI/manually/using >>>>>>> the favorite-child-policy. >>>>>>> If you are having entry split brain, and is a gfid split-brain >>>>>>> (file/dir having different gfids on the replica bricks) then you can use >>>>>>> the CLI option to resolve them. If a directory is in gfid split-brain in a >>>>>>> distributed-replicate volume and you are using the source-brick option >>>>>>> please make sure you use the brick of this subvolume, which has the same >>>>>>> gfid as that of the other distribute subvolume(s) where you have the >>>>>>> correct gfid, as the source. >>>>>>> If you are having a type mismatch then follow the steps in [3] to >>>>>>> resolve the split-brain. >>>>>>> >>>>>>> [1] >>>>>>> https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >>>>>>> [2] >>>>>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >>>>>>> [3] >>>>>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain >>>>>>> >>>>>>> HTH, >>>>>>> Karthik >>>>>>> >>>>>>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic >>>>>>> wrote: >>>>>>> >>>>>>>> I was now able to catch the split brain log: >>>>>>>> >>>>>>>> sudo gluster volume heal storage2 info >>>>>>>> Brick storage3:/data/data-cluster >>>>>>>> >>>>>>>> >>>>>>>> /dms/final_archive - Is in split-brain >>>>>>>> >>>>>>>> Status: Connected >>>>>>>> Number of entries: 3 >>>>>>>> >>>>>>>> Brick storage4:/data/data-cluster >>>>>>>> >>>>>>>> /dms/final_archive - Is in split-brain >>>>>>>> >>>>>>>> Status: Connected >>>>>>>> Number of entries: 2 >>>>>>>> >>>>>>>> Milos >>>>>>>> >>>>>>>> On 21 Mar 2019, at 09:07, Milos Cuculovic >>>>>>>> wrote: >>>>>>>> >>>>>>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, >>>>>>>> the heal shows this: >>>>>>>> >>>>>>>> sudo gluster volume heal storage2 info >>>>>>>> Brick storage3:/data/data-cluster >>>>>>>> >>>>>>>> >>>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>>> >>>>>>>> Status: Connected >>>>>>>> Number of entries: 3 >>>>>>>> >>>>>>>> Brick storage4:/data/data-cluster >>>>>>>> >>>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>>> >>>>>>>> Status: Connected >>>>>>>> Number of entries: 2 >>>>>>>> >>>>>>>> The same files stay there. From time to time the status of the >>>>>>>> /dms/final_archive is in split brain at the following command shows: >>>>>>>> >>>>>>>> sudo gluster volume heal storage2 info split-brain >>>>>>>> Brick storage3:/data/data-cluster >>>>>>>> /dms/final_archive >>>>>>>> Status: Connected >>>>>>>> Number of entries in split-brain: 1 >>>>>>>> >>>>>>>> Brick storage4:/data/data-cluster >>>>>>>> /dms/final_archive >>>>>>>> Status: Connected >>>>>>>> Number of entries in split-brain: 1 >>>>>>>> >>>>>>>> How to know the file who is in split brain? The files in >>>>>>>> /dms/final_archive are not very important, fine to remove (ideally resolve >>>>>>>> the split brain) for the ones that differ. >>>>>>>> >>>>>>>> I can only see the directory and GFID. Any idea on how to resolve >>>>>>>> this situation as I would like to continue with the upgrade on the 2nd >>>>>>>> server, and for this the heal needs to be done with 0 entries in sudo >>>>>>>> gluster volume heal storage2 info >>>>>>>> >>>>>>>> Thank you in advance, Milos. >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cuculovic at mdpi.com Fri Mar 22 12:52:50 2019 From: cuculovic at mdpi.com (Milos Cuculovic) Date: Fri, 22 Mar 2019 13:52:50 +0100 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> <14B1CACB-4049-42DD-AB69-3B75FBD6BE30@mdpi.com> <990A42AA-1F60-441E-BD5A-97B8333E2083@mdpi.com> <76354C5F-C197-475A-B4D3-D6089CED12EB@mdpi.com> <4B662F79-4947-4DFE-BDC7-B6B61A1054FF@mdpi.com> Message-ID: <2D0FE76B-F6B0-4CB4-92F7-88F030E150A1@mdpi.com> > You can make use of the same ls - l command to get the actual path of the parent directory (a96e940d-3130-45d1-9efe-7aff463fec3d). It seems the problem is that those directories exist on one brick but not on the other, as we can see here: sudo gluster volume heal storage2 info Brick storage3:/data/data-cluster Status: Connected Number of entries: 2 Brick storage4:/data/data-cluster Status: Connected Number of entries: 6 the first part shows the two gfids are aailable on brick1 only. the second part shows the gfids available on 2nd brick only. > Once you get the complete path you can run the lookup on those directories from the client and see whether they are getting healed? How to do this ? :) > If not send the getfattr output of all the directories which are pending heal, glustershd.log from both the nodes, and client mount log from where you run the lookup. Just in case, here attached are the log files. I really do not think those get healed as they stay there since yesterday, and those are directories with not so large files, should be done within seconds. - Kindest regards, Milos Cuculovic IT Manager --- MDPI AG Postfach, CH-4020 Basel, Switzerland Office: St. Alban-Anlage 66, 4052 Basel, Switzerland Tel. +41 61 683 77 35 Fax +41 61 302 89 18 Email: cuculovic at mdpi.com Skype: milos.cuculovic.mdpi Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > On 22 Mar 2019, at 12:58, Karthik Subrahmanya wrote: > > > > On Fri, Mar 22, 2019 at 5:02 PM Milos Cuculovic > wrote: > Thank you Karthik, > > The 2nd command works for all of them, those are directories: > sudo ls -l /data/data-cluster/.glusterfs/27/6f/276fec9a-1c9b-4efe-9715-dcf4207e99b0 > lrwxrwxrwx 1 root root 60 Jun 14 2018 /data/data-cluster/.glusterfs/27/6f/276fec9a-1c9b-4efe-9715-dcf4207e99b0 -> ../../a9/6e/a96e940d-3130-45d1-9efe-7aff463fec3d/final_files > > But now, what to do with this info? Since yesterday, the heal info shows the samge gfids. > You can make use of the same ls - l command to get the actual path of the parent directory (a96e940d-3130-45d1-9efe-7aff463fec3d). Once you get the complete path you can run the lookup on those directories from the client and see whether they are getting healed? > If not send the getfattr output of all the directories which are pending heal, glustershd.log from both the nodes, and client mount log from where you run the lookup. > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email:?cuculovic at mdpi.com > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > >> On 22 Mar 2019, at 08:51, Karthik Subrahmanya > wrote: >> >> Hi, >> >> If it is a file then you can find the filename from the gfid by running the following on the nodes hosting the bricks >> find -samefile // >> >> If it is a directory you can run the following on the nodes hosting the bricks >> ls -l // >> >> Run these on both the nodes and paste the output of these commands before running the lookup from client on these entries. >> >> Regards, >> Karthik >> >> On Fri, Mar 22, 2019 at 1:06 PM Milos Cuculovic > wrote: >> I have run a few minutes ago the info and here are the results: >> >> sudo gluster volume heal storage2 info >> Brick storage3:/data/data-cluster >> >> >> Status: Connected >> Number of entries: 2 >> >> Brick storage4:/data/data-cluster >> >> >> >> >> >> >> Status: Connected >> Number of entries: 6 >> >> >> sudo gluster volume heal storage2 info split-brain >> Brick storage3:/data/data-cluster >> Status: Connected >> Number of entries in split-brain: 0 >> >> Brick storage4:/data/data-cluster >> Status: Connected >> Number of entries in split-brain: 0 >> >> The heal info (2 + 6) are there since yesterday and do not change. >> >> >>> If they are still there can you try doing a lookup on those entries from client and see whether they are getting healed? >> How can I do this having the gfid only? >> >> - Kindest regards, >> >> Milos Cuculovic >> IT Manager >> >> --- >> MDPI AG >> Postfach, CH-4020 Basel, Switzerland >> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >> Tel. +41 61 683 77 35 >> Fax +41 61 302 89 18 >> Email:?cuculovic at mdpi.com >> Skype: milos.cuculovic.mdpi >> >> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >> >>> On 21 Mar 2019, at 14:34, Karthik Subrahmanya > wrote: >>> >>> Now the slit-brain on the directory is resolved. >>> Are these entries which are there in the latest heal info output not getting healed? Are they still present in the heal info output? >>> If they are still there can you try doing a lookup on those entries from client and see whether they are getting healed? >>> >>> >>> On Thu, Mar 21, 2019 at 6:49 PM Milos Cuculovic > wrote: >>> Hey Karthik, >>> >>>> Can you run the "guster volume heal ? >>> sudo gluster volume heal storage2 >>> Launching heal operation to perform index self heal on volume storage2 has been successful >>> Use heal info commands to check status. >>> >>>> "gluster volume heal info? >>> sudo gluster volume heal storage2 info >>> Brick storage3:/data/data-cluster >>> >>> >>> Status: Connected >>> Number of entries: 2 >>> >>> Brick storage4:/data/data-cluster >>> >>> >>> >>> >>> >>> >>> Status: Connected >>> Number of entries: 6 >>> >>> >>> >>> - Kindest regards, >>> >>> Milos Cuculovic >>> IT Manager >>> >>> --- >>> MDPI AG >>> Postfach, CH-4020 Basel, Switzerland >>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>> Tel. +41 61 683 77 35 >>> Fax +41 61 302 89 18 >>> Email:?cuculovic at mdpi.com >>> Skype: milos.cuculovic.mdpi >>> >>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>> >>>> On 21 Mar 2019, at 14:07, Karthik Subrahmanya > wrote: >>>> >>>> Hey Milos, >>>> >>>> I see that gfid got healed for those directories from the getfattr output and the glfsheal log also has messages corresponding to deleting the entries on one brick as part of healing which then got recreated on the brick with the correct gfid. Can you run the "guster volume heal " & "gluster volume heal info" command and paste the output here? >>>> If you still see entries pending heal, give the latest glustershd.log files from both the nodes along with the getfattr output of the files which are listed in the heal info output. >>>> >>>> Regards, >>>> Karthik >>>> >>>> On Thu, Mar 21, 2019 at 6:03 PM Milos Cuculovic > wrote: >>>> Sure: >>>> >>>> brick1: >>>> ???????????????????????????????????????????????????????????? >>>> ???????????????????????????????????????????????????????????? >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>> trusted.gfid=0xe358ff34504241d387efe1e76eb28bb0 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> ???????????????????????????????????????????????????????????? >>>> sudo stat /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>> File: '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 40809094709 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:06:26.994047597 +0100 >>>> Modify: 2019-03-20 11:28:28.294689870 +0100 >>>> Change: 2019-03-21 13:01:03.077654239 +0100 >>>> Birth: - >>>> >>>> sudo stat /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>> File: '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 49399908865 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:07:20.342140927 +0100 >>>> Modify: 2019-03-20 11:28:28.318690015 +0100 >>>> Change: 2019-03-21 13:01:03.133654344 +0100 >>>> Birth: - >>>> >>>> sudo stat /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>> File: '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 53706303549 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:06:55.414097315 +0100 >>>> Modify: 2019-03-20 11:28:28.362690281 +0100 >>>> Change: 2019-03-21 13:01:03.141654359 +0100 >>>> Birth: - >>>> >>>> sudo stat /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>> File: '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 57990935591 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:07:08.558120309 +0100 >>>> Modify: 2019-03-20 11:28:14.226604869 +0100 >>>> Change: 2019-03-21 13:01:03.189654448 +0100 >>>> Birth: - >>>> >>>> sudo stat /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>> File: '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 62291339781 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:06:02.070003998 +0100 >>>> Modify: 2019-03-20 11:28:28.458690861 +0100 >>>> Change: 2019-03-21 13:01:03.281654621 +0100 >>>> Birth: - >>>> >>>> sudo stat /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> File: '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 66574223479 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:28:10.826584325 +0100 >>>> Modify: 2019-03-20 11:28:10.834584374 +0100 >>>> Change: 2019-03-20 14:06:07.937449353 +0100 >>>> Birth: - >>>> root at storage3:/var/log/glusterfs# >>>> ???????????????????????????????????????????????????????????? >>>> ???????????????????????????????????????????????????????????? >>>> >>>> brick2: >>>> ???????????????????????????????????????????????????????????? >>>> ???????????????????????????????????????????????????????????? >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.dht.mds=0x00000000 >>>> >>>> ???????????????????????????????????????????????????????????? >>>> >>>> sudo stat /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>> File: '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 42232631305 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:06:26.994047597 +0100 >>>> Modify: 2019-03-20 11:28:28.294689870 +0100 >>>> Change: 2019-03-21 13:01:03.078748131 +0100 >>>> Birth: - >>>> >>>> sudo stat /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>> File: '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 78589109305 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:07:20.342140927 +0100 >>>> Modify: 2019-03-20 11:28:28.318690015 +0100 >>>> Change: 2019-03-21 13:01:03.134748477 +0100 >>>> Birth: - >>>> >>>> sudo stat /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>> File: '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 54972096517 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:06:55.414097315 +0100 >>>> Modify: 2019-03-20 11:28:28.362690281 +0100 >>>> Change: 2019-03-21 13:01:03.162748650 +0100 >>>> Birth: - >>>> >>>> sudo stat /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>> File: '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 40821259275 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:07:08.558120309 +0100 >>>> Modify: 2019-03-20 11:28:14.226604869 +0100 >>>> Change: 2019-03-21 13:01:03.194748848 +0100 >>>> Birth: - >>>> >>>> sudo stat /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>> File: '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 15876654 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:06:02.070003998 +0100 >>>> Modify: 2019-03-20 11:28:28.458690861 +0100 >>>> Change: 2019-03-21 13:01:03.282749392 +0100 >>>> Birth: - >>>> >>>> sudo stat /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>> File: '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' >>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>> Device: 807h/2055d Inode: 49408944650 Links: 3 >>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>> Access: 2019-03-20 11:28:10.826584325 +0100 >>>> Modify: 2019-03-20 11:28:10.834584374 +0100 >>>> Change: 2019-03-20 14:06:07.940849268 +0100 >>>> Birth: - >>>> ???????????????????????????????????????????????????????????? >>>> ???????????????????????????????????????????????????????????? >>>> >>>> The file is from brick 2 that I upgraded and started the heal on. >>>> >>>> >>>> - Kindest regards, >>>> >>>> Milos Cuculovic >>>> IT Manager >>>> >>>> --- >>>> MDPI AG >>>> Postfach, CH-4020 Basel, Switzerland >>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>> Tel. +41 61 683 77 35 >>>> Fax +41 61 302 89 18 >>>> Email:?cuculovic at mdpi.com >>>> Skype: milos.cuculovic.mdpi >>>> >>>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>>> >>>>> On 21 Mar 2019, at 13:05, Karthik Subrahmanya > wrote: >>>>> >>>>> Can you give me the stat & getfattr output of all those 6 entries from both the bricks and the glfsheal-.log file from the node where you run this command? >>>>> Meanwhile can you also try running this with the source-brick option? >>>>> >>>>> On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic > wrote: >>>>> Thank you Karthik, >>>>> >>>>> I have run this for all files (see example below) and it says the file is not in split-brain: >>>>> >>>>> sudo gluster volume heal storage2 split-brain latest-mtime /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>>> Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: File not in split-brain. >>>>> Volume heal failed. >>>>> >>>>> >>>>> - Kindest regards, >>>>> >>>>> Milos Cuculovic >>>>> IT Manager >>>>> >>>>> --- >>>>> MDPI AG >>>>> Postfach, CH-4020 Basel, Switzerland >>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>> Tel. +41 61 683 77 35 >>>>> Fax +41 61 302 89 18 >>>>> Email:?cuculovic at mdpi.com >>>>> Skype: milos.cuculovic.mdpi >>>>> >>>>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>>>> >>>>>> On 21 Mar 2019, at 12:36, Karthik Subrahmanya > wrote: >>>>>> >>>>>> Hi Milos, >>>>>> >>>>>> Thanks for the logs and the getfattr output. >>>>>> From the logs I can see that there are 6 entries under the directory "/data/data-cluster/dms/final_archive" named >>>>>> 41be9ff5ec05c4b1c989c6053e709e59 >>>>>> 5543982fab4b56060aa09f667a8ae617 >>>>>> a8b7f31775eebc8d1867e7f9de7b6eaf >>>>>> c1d3f3c2d7ae90e891e671e2f20d5d4b >>>>>> e5934699809a3b6dcfc5945f408b978b >>>>>> e7cdc94f60d390812a5f9754885e119e >>>>>> which are having gfid mismatch, so the heal is failing on this directory. >>>>>> >>>>>> You can use the CLI option to resolve these files from gfid mismatch. You can use any of the 3 methods available: >>>>>> 1. bigger-file >>>>>> gluster volume heal split-brain bigger-file >>>>>> >>>>>> 2. latest-mtime >>>>>> gluster volume heal split-brain latest-mtime >>>>>> >>>>>> 3. source-brick >>>>>> gluster volume heal split-brain source-brick >>>>>> >>>>>> where must be absolute path w.r.t. the volume, starting with '/'. >>>>>> If all those entries are directories then go for either latest-mtime/source-brick option. >>>>>> After you resolve all these gfid-mismatches, run the "gluster volume heal " command. Then check the heal info and let me know the result. >>>>>> >>>>>> Regards, >>>>>> Karthik >>>>>> >>>>>> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic > wrote: >>>>>> Sure, thank you for following up. >>>>>> >>>>>> About the commands, here is what I see: >>>>>> >>>>>> brick1: >>>>>> ????????????????????????????????????? >>>>>> sudo gluster volume heal storage2 info >>>>>> Brick storage3:/data/data-cluster >>>>>> >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 3 >>>>>> >>>>>> Brick storage4:/data/data-cluster >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 2 >>>>>> ????????????????????????????????????? >>>>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>>>>> getfattr: Removing leading '/' from absolute path names >>>>>> # file: data/data-cluster/dms/final_archive >>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>> trusted.afr.storage2-client-1=0x000000000000000000000010 >>>>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>>> ????????????????????????????????????? >>>>>> stat /data/data-cluster/dms/final_archive >>>>>> File: '/data/data-cluster/dms/final_archive' >>>>>> Size: 3497984 Blocks: 8768 IO Block: 4096 directory >>>>>> Device: 807h/2055d Inode: 26427748396 Links: 72123 >>>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>>>> Access: 2018-10-09 04:22:40.514629044 +0200 >>>>>> Modify: 2019-03-21 11:55:37.382278863 +0100 >>>>>> Change: 2019-03-21 11:55:37.382278863 +0100 >>>>>> Birth: - >>>>>> ????????????????????????????????????? >>>>>> ????????????????????????????????????? >>>>>> >>>>>> brick2: >>>>>> ????????????????????????????????????? >>>>>> sudo gluster volume heal storage2 info >>>>>> Brick storage3:/data/data-cluster >>>>>> >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 3 >>>>>> >>>>>> Brick storage4:/data/data-cluster >>>>>> >>>>>> /dms/final_archive - Possibly undergoing heal >>>>>> >>>>>> Status: Connected >>>>>> Number of entries: 2 >>>>>> ????????????????????????????????????? >>>>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>>>>> getfattr: Removing leading '/' from absolute path names >>>>>> # file: data/data-cluster/dms/final_archive >>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>> trusted.afr.storage2-client-0=0x000000000000000000000001 >>>>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>>> ????????????????????????????????????? >>>>>> stat /data/data-cluster/dms/final_archive >>>>>> File: '/data/data-cluster/dms/final_archive' >>>>>> Size: 3497984 Blocks: 8760 IO Block: 4096 directory >>>>>> Device: 807h/2055d Inode: 13563551265 Links: 72124 >>>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>>>>> Access: 2018-10-09 04:22:40.514629044 +0200 >>>>>> Modify: 2019-03-21 11:55:46.382565124 +0100 >>>>>> Change: 2019-03-21 11:55:46.382565124 +0100 >>>>>> Birth: - >>>>>> ????????????????????????????????????? >>>>>> >>>>>> Hope this helps. >>>>>> >>>>>> - Kindest regards, >>>>>> >>>>>> Milos Cuculovic >>>>>> IT Manager >>>>>> >>>>>> --- >>>>>> MDPI AG >>>>>> Postfach, CH-4020 Basel, Switzerland >>>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>>> Tel. +41 61 683 77 35 >>>>>> Fax +41 61 302 89 18 >>>>>> Email:?cuculovic at mdpi.com >>>>>> Skype: milos.cuculovic.mdpi >>>>>> >>>>>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>>>>> >>>>>>> On 21 Mar 2019, at 11:43, Karthik Subrahmanya > wrote: >>>>>>> >>>>>>> Can you attach the "glustershd.log" file which will be present under "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m . -e hex " output of all the entries listed in the heal info output from both the bricks? >>>>>>> >>>>>>> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic > wrote: >>>>>>> Thanks Karthik! >>>>>>> >>>>>>> I was trying to find some resolution methods from [2] but unfortunately none worked (I can explain what I tried if needed). >>>>>>> >>>>>>>> I guess the volume you are talking about is of type replica-2 (1x2). >>>>>>> That?s correct, aware of the arbiter solution but still didn?t took time to implement. >>>>>>> >>>>>>> From the info results I posted, how to know in which situation I am. No files are mentioned in spit brain, only directories. One brick has 3 entries and one two entries. >>>>>>> >>>>>>> sudo gluster volume heal storage2 info >>>>>>> [sudo] password for sshadmin: >>>>>>> Brick storage3:/data/data-cluster >>>>>>> >>>>>>> >>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 3 >>>>>>> >>>>>>> Brick storage4:/data/data-cluster >>>>>>> >>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 2 >>>>>>> >>>>>>> - Kindest regards, >>>>>>> >>>>>>> Milos Cuculovic >>>>>>> IT Manager >>>>>>> >>>>>>> --- >>>>>>> MDPI AG >>>>>>> Postfach, CH-4020 Basel, Switzerland >>>>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>>>> Tel. +41 61 683 77 35 >>>>>>> Fax +41 61 302 89 18 >>>>>>> Email:?cuculovic at mdpi.com >>>>>>> Skype: milos.cuculovic.mdpi >>>>>>> >>>>>>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>>>>>> >>>>>>>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya > wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Note: I guess the volume you are talking about is of type replica-2 (1x2). Usually replica 2 volumes are prone to split-brain. If you can consider converting them to arbiter or replica-3, they will handle most of the cases which can lead to slit-brains. For more information see [1]. >>>>>>>> >>>>>>>> Resolving the split-brain: [2] talks about how to interpret the heal info output and different ways to resolve them using the CLI/manually/using the favorite-child-policy. >>>>>>>> If you are having entry split brain, and is a gfid split-brain (file/dir having different gfids on the replica bricks) then you can use the CLI option to resolve them. If a directory is in gfid split-brain in a distributed-replicate volume and you are using the source-brick option please make sure you use the brick of this subvolume, which has the same gfid as that of the other distribute subvolume(s) where you have the correct gfid, as the source. >>>>>>>> If you are having a type mismatch then follow the steps in [3] to resolve the split-brain. >>>>>>>> >>>>>>>> [1] https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >>>>>>>> [2] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >>>>>>>> [3] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain >>>>>>>> >>>>>>>> HTH, >>>>>>>> Karthik >>>>>>>> >>>>>>>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic > wrote: >>>>>>>> I was now able to catch the split brain log: >>>>>>>> >>>>>>>> sudo gluster volume heal storage2 info >>>>>>>> Brick storage3:/data/data-cluster >>>>>>>> >>>>>>>> >>>>>>>> /dms/final_archive - Is in split-brain >>>>>>>> >>>>>>>> Status: Connected >>>>>>>> Number of entries: 3 >>>>>>>> >>>>>>>> Brick storage4:/data/data-cluster >>>>>>>> >>>>>>>> /dms/final_archive - Is in split-brain >>>>>>>> >>>>>>>> Status: Connected >>>>>>>> Number of entries: 2 >>>>>>>> >>>>>>>> Milos >>>>>>>> >>>>>>>>> On 21 Mar 2019, at 09:07, Milos Cuculovic > wrote: >>>>>>>>> >>>>>>>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal shows this: >>>>>>>>> >>>>>>>>> sudo gluster volume heal storage2 info >>>>>>>>> Brick storage3:/data/data-cluster >>>>>>>>> >>>>>>>>> >>>>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>>>> >>>>>>>>> Status: Connected >>>>>>>>> Number of entries: 3 >>>>>>>>> >>>>>>>>> Brick storage4:/data/data-cluster >>>>>>>>> >>>>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>>>> >>>>>>>>> Status: Connected >>>>>>>>> Number of entries: 2 >>>>>>>>> >>>>>>>>> The same files stay there. From time to time the status of the /dms/final_archive is in split brain at the following command shows: >>>>>>>>> >>>>>>>>> sudo gluster volume heal storage2 info split-brain >>>>>>>>> Brick storage3:/data/data-cluster >>>>>>>>> /dms/final_archive >>>>>>>>> Status: Connected >>>>>>>>> Number of entries in split-brain: 1 >>>>>>>>> >>>>>>>>> Brick storage4:/data/data-cluster >>>>>>>>> /dms/final_archive >>>>>>>>> Status: Connected >>>>>>>>> Number of entries in split-brain: 1 >>>>>>>>> >>>>>>>>> How to know the file who is in split brain? The files in /dms/final_archive are not very important, fine to remove (ideally resolve the split brain) for the ones that differ. >>>>>>>>> >>>>>>>>> I can only see the directory and GFID. Any idea on how to resolve this situation as I would like to continue with the upgrade on the 2nd server, and for this the heal needs to be done with 0 entries in sudo gluster volume heal storage2 info >>>>>>>>> >>>>>>>>> Thank you in advance, Milos. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: glustershd_storage4_brick2.log Type: application/octet-stream Size: 741124 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: glustershd_storage3._brick1log.log Type: application/octet-stream Size: 1198992 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From rightkicktech at gmail.com Fri Mar 22 16:42:31 2019 From: rightkicktech at gmail.com (Alex K) Date: Fri, 22 Mar 2019 18:42:31 +0200 Subject: [Gluster-users] Gluster and bonding In-Reply-To: References: Message-ID: Hi all, I had the opportunity to test the setup on actual hardware, as I managed to arrange for a downtime at customer. The results were that, when cables were split between two switches, even though servers were able to ping each other, gluster was not able to start the volumes and the only relevant log I noticed was: [2019-03-21 14:16:15.043714] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: *Staging failed* on gluster2. Please check log file for details. [2019-03-21 14:16:15.044034] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on gluster2. Please check log file for details. [2019-03-21 14:16:15.044292] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on gluster2. Please check log file for details. [2019-03-21 14:49:11.278724] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on gluster2. Please check log file for details. [2019-03-21 14:49:40.904596] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on gluster1. Please check log file for details. Does anyone has any idea what does this staging error mean? I don't have the hardware anymore available for testing and I will try to reproduce on virtual env. Thanx Alex On Mon, Mar 18, 2019 at 12:52 PM Alex K wrote: > Performed some tests simulating the setup on OVS. > When using mode 6 I had mixed results for both scenarios (see below): > > [image: image.png] > > There were times that hosts were not able to reach each other (simple ping > tests) and other time where hosts were able to reach each other with ping > but gluster volumes were down due to connectivity issues being reported > (endpoint is not connected). systemctl restart network usually resolved the > gluster connectivity issue. This was regardless of the scenario (interlink > or not). I will need to do some more tests. > > On Tue, Feb 26, 2019 at 4:14 PM Alex K wrote: > >> >> Thank you to all for your suggestions. >> >> I came here since only gluster was having issues to start. Ping and other >> networking services were showing everything fine, so I guess there is sth >> at gluster that does not like what I tried to do. >> Unfortunately I have this system in production and I cannot experiment. >> It was a customer request to add redundancy to the switch and I went with >> what I assumed would work. >> I guess I have to have the switches stacked, but the current ones do not >> support this. They are just simple managed switches. >> >> Multiple IPs per peers could be a solution. >> I will search a little more and in case I have sth I will get back. >> >> On Tue, Feb 26, 2019 at 6:52 AM Strahil wrote: >> >>> Hi Alex, >>> >>> As per the following ( ttps:// >>> community.cisco.com/t5/switching/lacp-load-balancing-in-2-switches-part-of-3750-stack-switch/td-p/2268111 >>> ) your switches need to be stacked in order to support lacp with your setup. >>> Yet, I'm not sure if balance-alb will work with 2 separate switches - >>> maybe some special configuration is needed ?!? >>> As far as I know gluster can have multiple IPs matched to a single peer, >>> but I'm not sure if having 2 separate networks will be used as >>> active-backup or active-active. >>> >>> Someone more experienced should jump in. >>> >>> Best Regards, >>> Strahil Nikolov >>> On Feb 25, 2019 12:43, Alex K wrote: >>> >>> Hi All, >>> >>> I was asking if it is possible to have the two separate cables connected >>> to two different physical switched. When trying mode6 or mode1 in this >>> setup gluster was refusing to start the volumes, giving me "transport >>> endpoint is not connected". >>> >>> server1: cable1 ---------------- switch1 --------------------- server2: >>> cable1 >>> | >>> server1: cable2 ---------------- switch2 --------------------- server2: >>> cable2 >>> >>> Both switches are connected with each other also. This is done to >>> achieve redundancy for the switches. >>> When disconnecting cable2 from both servers, then gluster was happy. >>> What could be the problem? >>> >>> Thanx, >>> Alex >>> >>> >>> On Mon, Feb 25, 2019 at 11:32 AM Jorick Astrego >>> wrote: >>> >>> Hi, >>> >>> We use bonding mode 6 (balance-alb) for GlusterFS traffic >>> >>> >>> >>> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/network4 >>> >>> Preferred bonding mode for Red Hat Gluster Storage client is mode 6 >>> (balance-alb), this allows client to transmit writes in parallel on >>> separate NICs much of the time. >>> >>> Regards, >>> >>> Jorick Astrego >>> On 2/25/19 5:41 AM, Dmitry Melekhov wrote: >>> >>> 23.02.2019 19:54, Alex K ?????: >>> >>> Hi all, >>> >>> I have a replica 3 setup where each server was configured with a dual >>> interfaces in mode 6 bonding. All cables were connected to one common >>> network switch. >>> >>> To add redundancy to the switch, and avoid being a single point of >>> failure, I connected each second cable of each server to a second switch. >>> This turned out to not function as gluster was refusing to start the volume >>> logging "transport endpoint is disconnected" although all nodes were able >>> to reach each other (ping) in the storage network. I switched the mode to >>> mode 1 (active/passive) and initially it worked but following a reboot of >>> all cluster same issue appeared. Gluster is not starting the volumes. >>> >>> Isn't active/passive supposed to work like that? Can one have such >>> redundant network setup or are there any other recommended approaches? >>> >>> >>> Yes, we use lacp, I guess this is mode 4 ( we use teamd ), it is, no >>> doubt, best way. >>> >>> >>> Thanx, >>> Alex >>> >>> _______________________________________________ >>> Gluster-users mailing listGluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 31291 bytes Desc: not available URL: From sunkumar at redhat.com Fri Mar 22 16:52:36 2019 From: sunkumar at redhat.com (Sunny Kumar) Date: Fri, 22 Mar 2019 22:22:36 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: Message-ID: Hi Maurya, Looks like hook script is failed to set permissions for azureuser on "/var/log/glusterfs". You can assign permission manually for directory then it will work. -Sunny On Fri, Mar 22, 2019 at 2:07 PM Maurya M wrote: > > hi Sunny, > Passwordless ssh to : > > ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 azureuser at 172.16.201.35 > > is login, but when the whole command is run getting permission issues again:: > > ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 azureuser at 172.16.201.35 gluster --xml --remote-host=localhost volume info vol_a5aee81a873c043c99a938adcb5b5781 -v > ERROR: failed to create logfile "/var/log/glusterfs/cli.log" (Permission denied) > ERROR: failed to open logfile /var/log/glusterfs/cli.log > > any idea here ? > > thanks, > Maurya > > > On Thu, Mar 21, 2019 at 2:43 PM Maurya M wrote: >> >> hi Sunny, >> i did use the [1] link for the setup, when i encountered this error during ssh-copy-id : (so setup the passwordless ssh, by manually copied the private/ public keys to all the nodes , both master & slave) >> >> [root at k8s-agentpool1-24779565-1 ~]# ssh-copy-id geouser at xxx.xx.xxx.x >> /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub" >> The authenticity of host ' xxx.xx.xxx.x ( xxx.xx.xxx.x )' can't be established. >> ECDSA key fingerprint is SHA256:B2rNaocIcPjRga13oTnopbJ5KjI/7l5fMANXc+KhA9s. >> ECDSA key fingerprint is MD5:1b:70:f9:7a:bf:35:33:47:0c:f2:c1:cd:21:e2:d3:75. >> Are you sure you want to continue connecting (yes/no)? yes >> /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed >> /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys >> Permission denied (publickey). >> >> To start afresh what all needs to teardown / delete, do we have any script for it ? where all the pem keys do i need to delete? >> >> thanks, >> Maurya >> >> On Thu, Mar 21, 2019 at 2:12 PM Sunny Kumar wrote: >>> >>> Hey you can start a fresh I think you are not following proper setup steps. >>> >>> Please follow these steps [1] to create geo-rep session, you can >>> delete the old one and do a fresh start. Or alternative you can use >>> this tool[2] to setup geo-rep. >>> >>> >>> [1]. https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ >>> [2]. http://aravindavk.in/blog/gluster-georep-tools/ >>> >>> >>> /Sunny >>> >>> On Thu, Mar 21, 2019 at 11:28 AM Maurya M wrote: >>> > >>> > Hi Sunil, >>> > I did run the on the slave node : >>> > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781 >>> > getting this message "/home/azureuser/common_secret.pem.pub not present. Please run geo-replication command on master with push-pem option to generate the file" >>> > >>> > So went back and created the session again, no change, so manually copied the common_secret.pem.pub to /home/azureuser/ but still the set_geo_rep_pem_keys.sh is looking the pem file in different name : COMMON_SECRET_PEM_PUB=${master_vol}_${slave_vol}_common_secret.pem.pub , change the name of pem , ran the command again : >>> > >>> > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781 >>> > Successfully copied file. >>> > Command executed successfully. >>> > >>> > >>> > - went back and created the session , start the geo-replication , still seeing the same error in logs. Any ideas ? >>> > >>> > thanks, >>> > Maurya >>> > >>> > >>> > >>> > On Wed, Mar 20, 2019 at 11:07 PM Sunny Kumar wrote: >>> >> >>> >> Hi Maurya, >>> >> >>> >> I guess you missed last trick to distribute keys in slave node. I see >>> >> this is non-root geo-rep setup so please try this: >>> >> >>> >> >>> >> Run the following command as root in any one of Slave node. >>> >> >>> >> /usr/local/libexec/glusterfs/set_geo_rep_pem_keys.sh >>> >> >>> >> >>> >> - Sunny >>> >> >>> >> On Wed, Mar 20, 2019 at 10:47 PM Maurya M wrote: >>> >> > >>> >> > Hi all, >>> >> > Have setup a 3 master nodes - 3 slave nodes (gluster 4.1) for geo-replication, but once have the geo-replication configure the status is always on "Created', >>> >> > even after have force start the session. >>> >> > >>> >> > On close inspect of the logs on the master node seeing this error: >>> >> > >>> >> > "E [syncdutils(monitor):801:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 azureuser at xxxxx.xxxx..xxx. gluster --xml --remote-host=localhost volume info vol_a5ae34341a873c043c99a938adcb5b5781 error=255" >>> >> > >>> >> > Any ideas what is issue? >>> >> > >>> >> > thanks, >>> >> > Maurya >>> >> > >>> >> > _______________________________________________ >>> >> > Gluster-users mailing list >>> >> > Gluster-users at gluster.org >>> >> > https://lists.gluster.org/mailman/listinfo/gluster-users From xiubli at redhat.com Sat Mar 23 00:47:59 2019 From: xiubli at redhat.com (Xiubo Li) Date: Sat, 23 Mar 2019 08:47:59 +0800 Subject: [Gluster-users] Network Block device (NBD) on top of glusterfs In-Reply-To: References: Message-ID: On 2019/3/21 11:29, Xiubo Li wrote: > > All, > > I am one of the contributor forgluster-block > [1] project, and also I > contribute to linux kernel andopen-iscsi > project.[2] > > NBD was around for some time, but in recent time, linux kernel?s > Network Block Device (NBD) is enhanced and made to work with more > devices and also the option to integrate with netlink is added. So, I > tried to provide a glusterfs client based NBD driver recently. Please > refergithub issue #633 > [3], and good news is > I have a working code, with most basic things @nbd-runner project > [4]. > As mentioned the nbd-runner(NBD proto) will work in the same layer with tcmu-runner(iSCSI proto), this is not trying to replace the gluster-block/ceph-iscsi-gateway great projects. It just provides the common library to do the low level stuff, like the sysfs/netlink operations and the IOs from the nbd kernel socket, and the great tcmu-runner project is doing the sysfs/uio operations and IOs from the kernel SCSI/iSCSI. The nbd-cli tool will work like the iscsi-initiator-utils, and the nbd-runner daemon will work like the tcmu-runner daemon, that's all. In tcmu-runner for different backend storages, they have separate handlers, glfs.c handler for Gluster, rbd.c handler for Ceph, etc. And what the handlers here are doing the actual IOs with the backend storage services once the IO paths setup are done by ceph-iscsi-gateway/gluster-block.... Then we can support all the kind of backend storages, like the Gluster/Ceph/Azure... as one separate handler in nbd-runner, which no need to care about the NBD low level's stuff updates and changes. Thanks. > While this email is about announcing the project, and asking for more > collaboration, I would also like to discuss more about the placement > of the project itself. Currently nbd-runner project is expected to be > shared by our friends at Ceph project too, to provide NBD driver for > Ceph. I have personally worked with some of them closely while > contributing to open-iSCSI project, and we would like to take this > project to great success. > > Now few questions: > > 1. Can I continue to usehttp://github.com/gluster/nbd-runneras home > for this project, even if its shared by other filesystem projects? > > * I personally am fine with this. > > 2. Should there be a separate organization for this repo? > > * While it may make sense in future, for now, I am not planning to > start any new thing? > > It would be great if we have some consensus on this soon as nbd-runner > is a new repository. If there are no concerns, I will continue to > contribute to the existing repository. > > Regards, > Xiubo Li (@lxbsz) > > [1] -https://github.com/gluster/gluster-block > [2] -https://github.com/open-iscsi > [3] -https://github.com/gluster/glusterfs/issues/633 > [4] -https://github.com/gluster/nbd-runner > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From mangoo at wpkg.org Sat Mar 23 03:39:48 2019 From: mangoo at wpkg.org (Tomasz Chmielewski) Date: Sat, 23 Mar 2019 12:39:48 +0900 Subject: [Gluster-users] "gluster volume heal info" does not show all bricks Message-ID: There are three replicated bricks: repo01, repo02 and repo03. All bricks are online and show the same info for commands like "gluster volume info" or "gluster volume status"; "gluster peer status" show the other bricks connected. However, "gluster volume heal storage info" only shows the first two bricks - does anyone have an idea why? If it matters, repo03 was added later. Running gluster 5.5. # gluster volume heal storage info Brick repo01:/gluster/data Status: Connected Number of entries: 0 Brick repo02:/gluster/data Status: Connected Number of entries: 0 Other info: # gluster volume info Volume Name: storage Type: Replicate Volume ID: 8e533781-01fc-4c8a-b220-9691346fbe3c Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: repo01:/gluster/data Brick2: repo02:/gluster/data Brick3: repo03:/gluster/data Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on auth.allow: 127.0.0.1,10.192.0.30,10.192.0.31,10.192.0.32 # gluster volume status Status of volume: storage Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick repo01:/gluster/data 49153 0 Y 1829 Brick repo02:/gluster/data 49153 0 Y 81077 Brick repo03:/gluster/data 49153 0 Y 2497 Self-heal Daemon on localhost N/A N/A Y 81100 Self-heal Daemon on 10.192.0.30 N/A N/A Y 1852 Self-heal Daemon on repo03 N/A N/A Y 2520 Task Status of Volume storage ------------------------------------------------------------------------------ There are no active volume tasks Tomasz Chmielewski https://lxadm.com From hunter86_bg at yahoo.com Sat Mar 23 19:50:20 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Sat, 23 Mar 2019 21:50:20 +0200 Subject: [Gluster-users] Gluster and bonding Message-ID: Hi Alex, Did you setup LACP using links to both switches ? Best Regards, Strahil Nikolov On Mar 22, 2019 18:42, Alex K wrote: > > Hi all, > > I had the opportunity to test the setup on actual hardware, as I managed to arrange for a downtime at customer. > > The results were that, when cables were split between two switches, even though servers were able to ping each other, gluster was not able to start the volumes and the only relevant log I noticed was: > > [2019-03-21 14:16:15.043714] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on gluster2. Please check log file for details. > [2019-03-21 14:16:15.044034] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on gluster2. Please check log file for details. > [2019-03-21 14:16:15.044292] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on gluster2. Please check log file for details. > [2019-03-21 14:49:11.278724] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on gluster2. Please check log file for details. > [2019-03-21 14:49:40.904596] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on gluster1. Please check log file for details. > > Does anyone has any idea what does this staging error mean? > I don't have the hardware anymore available for testing and I will try to reproduce on virtual env. > > Thanx > Alex > > On Mon, Mar 18, 2019 at 12:52 PM Alex K wrote: >> >> Performed some tests simulating the setup on OVS. >> When using mode 6 I had mixed results for both scenarios (see below): >> >> >> >> There were times that hosts were not able to reach each other (simple ping tests) and other time where hosts were able to reach each other with ping but gluster volumes were down due to connectivity issues being reported (endpoint is not connected). systemctl restart network usually resolved the gluster connectivity issue. This was regardless of the scenario (interlink or not). I will need to do some more tests. >> >> On Tue, Feb 26, 2019 at 4:14 PM Alex K wrote: >>> >>> >>> Thank you to all for your suggestions. >>> >>> I came here since only gluster was having issues to start. Ping and other networking services were showing everything fine, so I guess there is sth at gluster that does not like what I tried to do. >>> Unfortunately I have this system in production and I cannot experiment. It was a customer request to add redundancy to the switch and I went with what I assumed would work. >>> I guess I have to have the switches stacked, but the current ones do not support this. They are just simple managed switches. >>> >>> Multiple IPs per peers could be a solution. >>> I will search a little more and in case I have sth I will get back. >>> >>> On Tue, Feb 26, 2019 at 6:52 AM Strahil wrote: >>>> >>>> Hi Alex, >>>> >>>> As per the following ( ttps://community.cisco.com/t5/switching/lacp-load-balancing-in-2-switches-part-of-3750-stack-switch/td-p/2268111 ) your switches need to be stacked in order to support lacp with your setup. >>>> Yet, I'm not sure if balance-alb will work with 2 separate switches - maybe some special configuration is needed ?!? >>>> As far as I know gluster can have multiple IPs matched to a single peer, but I'm not sure if having 2 separate networks will be used as active-backup or active-active. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Sat Mar 23 19:54:16 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Sat, 23 Mar 2019 21:54:16 +0200 Subject: [Gluster-users] "gluster volume heal info" does notshow all bricks Message-ID: Hi Tomasz, Do you have a firewall in between the nodes? Can you test with local firewall (on each node) down ? Best Regards, Strahil NikolovOn Mar 23, 2019 05:39, Tomasz Chmielewski wrote: > > There are three replicated bricks: repo01, repo02 and repo03. > > All bricks are online and show the same info for commands like "gluster > volume info" or "gluster volume status"; "gluster peer status" show the > other bricks connected. However, "gluster volume heal storage info" only > shows the first two bricks - does anyone have an idea why? If it > matters, repo03 was added later. Running gluster 5.5. > > > # gluster volume heal storage info > Brick repo01:/gluster/data > Status: Connected > Number of entries: 0 > > Brick repo02:/gluster/data > Status: Connected > Number of entries: 0 > > > > Other info: > > > # gluster volume info > > Volume Name: storage > Type: Replicate > Volume ID: 8e533781-01fc-4c8a-b220-9691346fbe3c > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: repo01:/gluster/data > Brick2: repo02:/gluster/data > Brick3: repo03:/gluster/data > Options Reconfigured: > transport.address-family: inet > performance.readdir-ahead: on > nfs.disable: on > auth.allow: 127.0.0.1,10.192.0.30,10.192.0.31,10.192.0.32 > > > > # gluster volume status > Status of volume: storage > Gluster process???????????????????????????? TCP Port? RDMA Port? Online? > Pid > ------------------------------------------------------------------------------ > Brick repo01:/gluster/data????????????????????????????????????????? > 49153???? 0????????? Y?????? 1829 > Brick repo02:/gluster/data????????????????????????????????????????? > 49153???? 0????????? Y?????? 81077 > Brick repo03:/gluster/data????????????????????????????????????????? > 49153???? 0????????? Y?????? 2497 > Self-heal Daemon on localhost?????????????? N/A?????? N/A??????? Y?????? > 81100 > Self-heal Daemon on 10.192.0.30???????????? N/A?????? N/A??????? Y?????? > 1852 > Self-heal Daemon on repo03????????????????????????????????????????? N/A? > ????? N/A??????? Y?????? 2520 > > Task Status of Volume storage > ------------------------------------------------------------------------------ > There are no active volume tasks > > > Tomasz Chmielewski > https://lxadm.com > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From mangoo at wpkg.org Sun Mar 24 02:11:27 2019 From: mangoo at wpkg.org (Tomasz Chmielewski) Date: Sun, 24 Mar 2019 11:11:27 +0900 Subject: [Gluster-users] "gluster volume heal info" does notshow all bricks In-Reply-To: References: Message-ID: <6e8f67713b3f30bdecfb582ea57df70b@wpkg.org> No, there is no firewall. Tomasz Chmielewski On 2019-03-24 04:54, Strahil wrote: > Hi Tomasz, > > Do you have a firewall in between the nodes? > Can you test with local firewall (on each node) down ? > > Best Regards, > Strahil NikolovOn Mar 23, 2019 05:39, Tomasz Chmielewski > wrote: >> >> There are three replicated bricks: repo01, repo02 and repo03. >> >> All bricks are online and show the same info for commands like >> "gluster >> volume info" or "gluster volume status"; "gluster peer status" show >> the >> other bricks connected. However, "gluster volume heal storage info" >> only >> shows the first two bricks - does anyone have an idea why? If it >> matters, repo03 was added later. Running gluster 5.5. >> >> >> # gluster volume heal storage info >> Brick repo01:/gluster/data >> Status: Connected >> Number of entries: 0 >> >> Brick repo02:/gluster/data >> Status: Connected >> Number of entries: 0 >> >> >> >> Other info: >> >> >> # gluster volume info >> >> Volume Name: storage >> Type: Replicate >> Volume ID: 8e533781-01fc-4c8a-b220-9691346fbe3c >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: repo01:/gluster/data >> Brick2: repo02:/gluster/data >> Brick3: repo03:/gluster/data >> Options Reconfigured: >> transport.address-family: inet >> performance.readdir-ahead: on >> nfs.disable: on >> auth.allow: 127.0.0.1,10.192.0.30,10.192.0.31,10.192.0.32 >> >> >> >> # gluster volume status >> Status of volume: storage >> Gluster process???????????????????????????? TCP Port? RDMA Port? >> Online? >> Pid >> ------------------------------------------------------------------------------ >> Brick repo01:/gluster/data????????????????????????????????????????? >> 49153???? 0????????? Y?????? 1829 >> Brick repo02:/gluster/data????????????????????????????????????????? >> 49153???? 0????????? Y?????? 81077 >> Brick repo03:/gluster/data????????????????????????????????????????? >> 49153???? 0????????? Y?????? 2497 >> Self-heal Daemon on localhost?????????????? N/A?????? N/A??????? >> Y?????? >> 81100 >> Self-heal Daemon on 10.192.0.30???????????? N/A?????? N/A??????? >> Y?????? >> 1852 >> Self-heal Daemon on repo03????????????????????????????????????????? >> N/A? >> ????? N/A??????? Y?????? 2520 >> >> Task Status of Volume storage >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> >> Tomasz Chmielewski >> https://lxadm.com >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users From rightkicktech at gmail.com Sun Mar 24 09:01:02 2019 From: rightkicktech at gmail.com (Alex K) Date: Sun, 24 Mar 2019 11:01:02 +0200 Subject: [Gluster-users] Gluster and bonding In-Reply-To: References: Message-ID: Hi Strahil, On Sat, Mar 23, 2019, 21:50 Strahil wrote: > Hi Alex, > > Did you setup LACP using links to both switches ? > When I configured LACP (2 cables connecting the switches) servers were not communicating with each other when one of their cables was removed (leaving one cable at each server connected to different switches), indicating that lacp was not functioning. Connecting the switches with one cable only without lacp and stp enabled servers were able to reach one another though gluster was logging stagibg errors and volumes did not start. When both cables were connected at same switch, gluster was ok. > Best Regards > Strahil Nikolov > On Mar 22, 2019 18:42, Alex K wrote: > > Hi all, > > I had the opportunity to test the setup on actual hardware, as I managed > to arrange for a downtime at customer. > > The results were that, when cables were split between two switches, even > though servers were able to ping each other, gluster was not able to start > the volumes and the only relevant log I noticed was: > > [2019-03-21 14:16:15.043714] E [MSGID: 106153] > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: *Staging failed* on > gluster2. Please check log file for details. > [2019-03-21 14:16:15.044034] E [MSGID: 106153] > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on > gluster2. Please check log file for details. > [2019-03-21 14:16:15.044292] E [MSGID: 106153] > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on > gluster2. Please check log file for details. > [2019-03-21 14:49:11.278724] E [MSGID: 106153] > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on > gluster2. Please check log file for details. > [2019-03-21 14:49:40.904596] E [MSGID: 106153] > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on > gluster1. Please check log file for details. > > Does anyone has any idea what does this staging error mean? > I don't have the hardware anymore available for testing and I will try to > reproduce on virtual env. > > Thanx > Alex > > On Mon, Mar 18, 2019 at 12:52 PM Alex K wrote: > > Performed some tests simulating the setup on OVS. > When using mode 6 I had mixed results for both scenarios (see below): > > [image: image.png] > > There were times that hosts were not able to reach each other (simple ping > tests) and other time where hosts were able to reach each other with ping > but gluster volumes were down due to connectivity issues being reported > (endpoint is not connected). systemctl restart network usually resolved the > gluster connectivity issue. This was regardless of the scenario (interlink > or not). I will need to do some more tests. > > On Tue, Feb 26, 2019 at 4:14 PM Alex K wrote: > > > Thank you to all for your suggestions. > > I came here since only gluster was having issues to start. Ping and other > networking services were showing everything fine, so I guess there is sth > at gluster that does not like what I tried to do. > Unfortunately I have this system in production and I cannot experiment. It > was a customer request to add redundancy to the switch and I went with what > I assumed would work. > I guess I have to have the switches stacked, but the current ones do not > support this. They are just simple managed switches. > > Multiple IPs per peers could be a solution. > I will search a little more and in case I have sth I will get back. > > On Tue, Feb 26, 2019 at 6:52 AM Strahil wrote: > > Hi Alex, > > As per the following ( ttps:// > community.cisco.com/t5/switching/lacp-load-balancing-in-2-switches-part-of-3750-stack-switch/td-p/2268111 > ) your switches need to be stacked in order to support lacp with your setup. > Yet, I'm not sure if balance-alb will work with 2 separate switches - > maybe some special configuration is needed ?!? > As far as I know gluster can have multiple IPs matched to a single peer, > but I'm not sure if having 2 separate networks will be used as > active-backup or active-active. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauryam at gmail.com Sun Mar 24 11:00:20 2019 From: mauryam at gmail.com (Maurya M) Date: Sun, 24 Mar 2019 16:30:20 +0530 Subject: [Gluster-users] Configure SSH -command Message-ID: Hi All, Have tried config the ssh-command to use -p 2222 using this config option: config ssh-command 'ssh -p 2222' , but failed, do you know the correct syntax , also where can i check for failed georeplication command logs. TIA, Maurya -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauryam at gmail.com Sun Mar 24 11:08:20 2019 From: mauryam at gmail.com (Maurya M) Date: Sun, 24 Mar 2019 16:38:20 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: Message-ID: Did give the persmission on both "/var/log/glusterfs/" & "/var/lib/glusterd/" too, but seems the directory where i mounted using heketi is having issues: [2019-03-22 09:48:21.546308] E [syncdutils(worker /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick):305:log_raise_exception] : connection to peer is broken [2019-03-22 09:48:21.546662] E [syncdutils(worker /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick):309:log_raise_exception] : getting "No such file or directory"errors is most likely due to MISCONFIGURATION, please remove all the public keys added by geo-replication from authorized_keys file in slave nodes and run Geo-replication create command again. [2019-03-22 09:48:21.546736] E [syncdutils(worker /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick):316:log_raise_exception] : If `gsec_create container` was used, then run `gluster volume geo-replication [@]:: config remote-gsyncd (Example GSYNCD_PATH: `/usr/libexec/glusterfs/gsyncd`) [2019-03-22 09:48:21.546858] E [syncdutils(worker /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick):801:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-OaPGc3/c784230c9648efa4d529975bd779c551.sock azureuser at 172.16.201.35 /nonexistent/gsyncd slave vol_041afbc53746053368a1840607636e97 azureuser at 172.16.201.35::vol_a5aee81a873c043c99a938adcb5b5781 --master-node 172.16.189.4 --master-node-id dd4efc35-4b86-4901-9c00-483032614c35 --master-brick /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick --local-node 172.16.201.35 --local-node-id 7eb0a2b6-c4d6-41b1-a346-0638dbf8d779 --slave-timeout 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbin error=127 [2019-03-22 09:48:21.546977] E [syncdutils(worker /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick):805:logerr] Popen: ssh> bash: /nonexistent/gsyncd: No such file or directory [2019-03-22 09:48:21.565583] I [repce(agent /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick):80:service_loop] RepceServer: terminating on reaching EOF. [2019-03-22 09:48:21.565745] I [monitor(monitor):266:monitor] Monitor: worker died before establishing connection brick=/var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick [2019-03-22 09:48:21.579195] I [gsyncdstatus(monitor):245:set_worker_status] GeorepStatus: Worker Status Change status=Faulty On Fri, Mar 22, 2019 at 10:23 PM Sunny Kumar wrote: > Hi Maurya, > > Looks like hook script is failed to set permissions for azureuser on > "/var/log/glusterfs". > You can assign permission manually for directory then it will work. > > -Sunny > > On Fri, Mar 22, 2019 at 2:07 PM Maurya M wrote: > > > > hi Sunny, > > Passwordless ssh to : > > > > ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/secret.pem -p 22 azureuser at 172.16.201.35 > > > > is login, but when the whole command is run getting permission issues > again:: > > > > ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/secret.pem -p 22 azureuser at 172.16.201.35 > gluster --xml --remote-host=localhost volume info > vol_a5aee81a873c043c99a938adcb5b5781 -v > > ERROR: failed to create logfile "/var/log/glusterfs/cli.log" (Permission > denied) > > ERROR: failed to open logfile /var/log/glusterfs/cli.log > > > > any idea here ? > > > > thanks, > > Maurya > > > > > > On Thu, Mar 21, 2019 at 2:43 PM Maurya M wrote: > >> > >> hi Sunny, > >> i did use the [1] link for the setup, when i encountered this error > during ssh-copy-id : (so setup the passwordless ssh, by manually copied the > private/ public keys to all the nodes , both master & slave) > >> > >> [root at k8s-agentpool1-24779565-1 ~]# ssh-copy-id geouser at xxx.xx.xxx.x > >> /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: > "/root/.ssh/id_rsa.pub" > >> The authenticity of host ' xxx.xx.xxx.x ( xxx.xx.xxx.x )' can't be > established. > >> ECDSA key fingerprint is > SHA256:B2rNaocIcPjRga13oTnopbJ5KjI/7l5fMANXc+KhA9s. > >> ECDSA key fingerprint is > MD5:1b:70:f9:7a:bf:35:33:47:0c:f2:c1:cd:21:e2:d3:75. > >> Are you sure you want to continue connecting (yes/no)? yes > >> /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), > to filter out any that are already installed > >> /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you > are prompted now it is to install the new keys > >> Permission denied (publickey). > >> > >> To start afresh what all needs to teardown / delete, do we have any > script for it ? where all the pem keys do i need to delete? > >> > >> thanks, > >> Maurya > >> > >> On Thu, Mar 21, 2019 at 2:12 PM Sunny Kumar > wrote: > >>> > >>> Hey you can start a fresh I think you are not following proper setup > steps. > >>> > >>> Please follow these steps [1] to create geo-rep session, you can > >>> delete the old one and do a fresh start. Or alternative you can use > >>> this tool[2] to setup geo-rep. > >>> > >>> > >>> [1]. > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ > >>> [2]. http://aravindavk.in/blog/gluster-georep-tools/ > >>> > >>> > >>> /Sunny > >>> > >>> On Thu, Mar 21, 2019 at 11:28 AM Maurya M wrote: > >>> > > >>> > Hi Sunil, > >>> > I did run the on the slave node : > >>> > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser > vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781 > >>> > getting this message "/home/azureuser/common_secret.pem.pub not > present. Please run geo-replication command on master with push-pem option > to generate the file" > >>> > > >>> > So went back and created the session again, no change, so manually > copied the common_secret.pem.pub to /home/azureuser/ but still the > set_geo_rep_pem_keys.sh is looking the pem file in different name : > COMMON_SECRET_PEM_PUB=${master_vol}_${slave_vol}_common_secret.pem.pub , > change the name of pem , ran the command again : > >>> > > >>> > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser > vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781 > >>> > Successfully copied file. > >>> > Command executed successfully. > >>> > > >>> > > >>> > - went back and created the session , start the geo-replication , > still seeing the same error in logs. Any ideas ? > >>> > > >>> > thanks, > >>> > Maurya > >>> > > >>> > > >>> > > >>> > On Wed, Mar 20, 2019 at 11:07 PM Sunny Kumar > wrote: > >>> >> > >>> >> Hi Maurya, > >>> >> > >>> >> I guess you missed last trick to distribute keys in slave node. I > see > >>> >> this is non-root geo-rep setup so please try this: > >>> >> > >>> >> > >>> >> Run the following command as root in any one of Slave node. > >>> >> > >>> >> /usr/local/libexec/glusterfs/set_geo_rep_pem_keys.sh > >>> >> > >>> >> > >>> >> - Sunny > >>> >> > >>> >> On Wed, Mar 20, 2019 at 10:47 PM Maurya M > wrote: > >>> >> > > >>> >> > Hi all, > >>> >> > Have setup a 3 master nodes - 3 slave nodes (gluster 4.1) for > geo-replication, but once have the geo-replication configure the status is > always on "Created', > >>> >> > even after have force start the session. > >>> >> > > >>> >> > On close inspect of the logs on the master node seeing this error: > >>> >> > > >>> >> > "E [syncdutils(monitor):801:errlog] Popen: command returned > error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/secret.pem -p 22 azureuser at xxxxx.xxxx..xxx. > gluster --xml --remote-host=localhost volume info > vol_a5ae34341a873c043c99a938adcb5b5781 error=255" > >>> >> > > >>> >> > Any ideas what is issue? > >>> >> > > >>> >> > thanks, > >>> >> > Maurya > >>> >> > > >>> >> > _______________________________________________ > >>> >> > Gluster-users mailing list > >>> >> > Gluster-users at gluster.org > >>> >> > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrevolodin at gmail.com Sun Mar 24 11:09:32 2019 From: andrevolodin at gmail.com (Andrey Volodin) Date: Sun, 24 Mar 2019 11:09:32 +0000 Subject: [Gluster-users] Configure SSH -command In-Reply-To: References: Message-ID: normally you should be able to edit the config file /etc/ssh/sshd_config However, at the page https://serverfault.com/questions/826893/how-to-temporarily-run-a-secondary-ssh-server-on-a-separate-port I can see a one-time run solution as /usr/sbin/sshd -D -p 22200 On Sun, Mar 24, 2019 at 11:00 AM Maurya M wrote: > Hi All, > Have tried config the ssh-command to use -p 2222 using this config option: > > config ssh-command 'ssh -p 2222' , but failed, do you know the correct > syntax , also where can i check for failed georeplication command logs. > > TIA, > Maurya > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauryam at gmail.com Sun Mar 24 11:28:34 2019 From: mauryam at gmail.com (Maurya M) Date: Sun, 24 Mar 2019 16:58:34 +0530 Subject: [Gluster-users] Configure SSH -command In-Reply-To: References: Message-ID: Hi Andrey, For some reason the sshd_config Ports pararmeter is not allowing to change when i restart the ssh service. so following this https://stackoverflow.com/questions/27525456/glusterfs-geo-replication-on-non-standard-ssh-port/55303479#55303479 tried config options as described. i did try you usuggestion config ssh-command '/usr/sbin/sshd -D -p 2222' , geo-replication command failed thanks, Maurya On Sun, Mar 24, 2019 at 4:39 PM Andrey Volodin wrote: > normally you should be able to edit the config file /etc/ssh/sshd_config > However, at the page > https://serverfault.com/questions/826893/how-to-temporarily-run-a-secondary-ssh-server-on-a-separate-port > I can see a one-time run solution as /usr/sbin/sshd -D -p 22200 > > On Sun, Mar 24, 2019 at 11:00 AM Maurya M wrote: > >> Hi All, >> Have tried config the ssh-command to use -p 2222 using this config >> option: >> >> config ssh-command 'ssh -p 2222' , but failed, do you know the correct >> syntax , also where can i check for failed georeplication command logs. >> >> TIA, >> Maurya >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauryam at gmail.com Sun Mar 24 11:35:53 2019 From: mauryam at gmail.com (Maurya M) Date: Sun, 24 Mar 2019 17:05:53 +0530 Subject: [Gluster-users] Configure SSH -command In-Reply-To: References: Message-ID: not sure where is the syntax error: gluster volume geo-replication vol_041afbc53746053368a1840607636e97 xxx.xx.xxx.xx::vol_a5aee81a873c043c99a938adcb5b5781 config ssh-command '/usr/sbin/sshd -D -p 2222' [2019-03-24 11:33:25.855839] E [syncdutils(monitor):332:log_raise_exception] : FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main func(args) File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 50, in subcmd_monitor return monitor.monitor(local, remote) File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 427, in monitor return Monitor().multiplex(*distribute(local, remote)) File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 386, in distribute svol = Volinfo(slave.volume, "localhost", prelude) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 860, in __init__ vi = XET.fromstring(vix) File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1300, in XML parser.feed(text) File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1642, in feed self._raiseerror(v) File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror raise err ParseError: syntax error: line 1, column 0 On Sun, Mar 24, 2019 at 4:58 PM Maurya M wrote: > Hi Andrey, > For some reason the sshd_config Ports pararmeter is not allowing to > change when i restart the ssh service. > so following this > https://stackoverflow.com/questions/27525456/glusterfs-geo-replication-on-non-standard-ssh-port/55303479#55303479 > tried config options as described. > > i did try you usuggestion config ssh-command '/usr/sbin/sshd -D -p 2222' > , > geo-replication command failed > > thanks, > Maurya > > On Sun, Mar 24, 2019 at 4:39 PM Andrey Volodin > wrote: > >> normally you should be able to edit the config file /etc/ssh/sshd_config >> However, at the page >> https://serverfault.com/questions/826893/how-to-temporarily-run-a-secondary-ssh-server-on-a-separate-port >> I can see a one-time run solution as /usr/sbin/sshd -D -p 22200 >> >> On Sun, Mar 24, 2019 at 11:00 AM Maurya M wrote: >> >>> Hi All, >>> Have tried config the ssh-command to use -p 2222 using this config >>> option: >>> >>> config ssh-command 'ssh -p 2222' , but failed, do you know the correct >>> syntax , also where can i check for failed georeplication command logs. >>> >>> TIA, >>> Maurya >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauryam at gmail.com Sun Mar 24 11:43:36 2019 From: mauryam at gmail.com (Maurya M) Date: Sun, 24 Mar 2019 17:13:36 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: Message-ID: did all the suggestion as mentioned in the log trace , have another setup using root user , but there i have issue on the ssh command as i am unable to change the ssh port to use default 22, but my servers (azure aks engine) are configure to using 2222 where i am unable to change the ports , restart of ssh service giving me error! Is this syntax correct to config the ssh-command: gluster volume geo-replication vol_041afbc53746053368a1840607636e97 xxx.xx.xxx.xx::vol_a5aee81a873c043c99a938adcb5b5781 *config ssh-command '/usr/sbin/sshd -D -p 2222'* On Sun, Mar 24, 2019 at 4:38 PM Maurya M wrote: > Did give the persmission on both "/var/log/glusterfs/" & > "/var/lib/glusterd/" too, but seems the directory where i mounted using > heketi is having issues: > > [2019-03-22 09:48:21.546308] E [syncdutils(worker > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick):305:log_raise_exception] > : connection to peer is broken > > [2019-03-22 09:48:21.546662] E [syncdutils(worker > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick):309:log_raise_exception] > : getting "No such file or directory"errors is most likely due to > MISCONFIGURATION, please remove all the public keys added by > geo-replication from authorized_keys file in slave nodes and run > Geo-replication create command again. > > [2019-03-22 09:48:21.546736] E [syncdutils(worker > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick):316:log_raise_exception] > : If `gsec_create container` was used, then run `gluster volume > geo-replication [@]:: config > remote-gsyncd (Example GSYNCD_PATH: > `/usr/libexec/glusterfs/gsyncd`) > > [2019-03-22 09:48:21.546858] E [syncdutils(worker > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick):801:errlog] > Popen: command returned error cmd=ssh -oPasswordAuthentication=no > -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem > -p 22 -oControlMaster=auto -S > /tmp/gsyncd-aux-ssh-OaPGc3/c784230c9648efa4d529975bd779c551.sock > azureuser at 172.16.201.35 /nonexistent/gsyncd slave > vol_041afbc53746053368a1840607636e97 > azureuser at 172.16.201.35::vol_a5aee81a873c043c99a938adcb5b5781 > --master-node 172.16.189.4 --master-node-id > dd4efc35-4b86-4901-9c00-483032614c35 --master-brick > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick > --local-node 172.16.201.35 --local-node-id > 7eb0a2b6-c4d6-41b1-a346-0638dbf8d779 --slave-timeout 120 --slave-log-level > INFO --slave-gluster-log-level INFO --slave-gluster-command-dir > /usr/sbin error=127 > > [2019-03-22 09:48:21.546977] E [syncdutils(worker > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick):805:logerr] > Popen: ssh> bash: /nonexistent/gsyncd: No such file or directory > > [2019-03-22 09:48:21.565583] I [repce(agent > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick):80:service_loop] > RepceServer: terminating on reaching EOF. > > [2019-03-22 09:48:21.565745] I [monitor(monitor):266:monitor] Monitor: > worker died before establishing connection > brick=/var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3eab2394433f02f5617012d4ae3c28f/brick > > [2019-03-22 09:48:21.579195] I > [gsyncdstatus(monitor):245:set_worker_status] GeorepStatus: Worker Status > Change status=Faulty > > On Fri, Mar 22, 2019 at 10:23 PM Sunny Kumar wrote: > >> Hi Maurya, >> >> Looks like hook script is failed to set permissions for azureuser on >> "/var/log/glusterfs". >> You can assign permission manually for directory then it will work. >> >> -Sunny >> >> On Fri, Mar 22, 2019 at 2:07 PM Maurya M wrote: >> > >> > hi Sunny, >> > Passwordless ssh to : >> > >> > ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >> /var/lib/glusterd/geo-replication/secret.pem -p 22 >> azureuser at 172.16.201.35 >> > >> > is login, but when the whole command is run getting permission issues >> again:: >> > >> > ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >> /var/lib/glusterd/geo-replication/secret.pem -p 22 >> azureuser at 172.16.201.35 gluster --xml --remote-host=localhost volume >> info vol_a5aee81a873c043c99a938adcb5b5781 -v >> > ERROR: failed to create logfile "/var/log/glusterfs/cli.log" >> (Permission denied) >> > ERROR: failed to open logfile /var/log/glusterfs/cli.log >> > >> > any idea here ? >> > >> > thanks, >> > Maurya >> > >> > >> > On Thu, Mar 21, 2019 at 2:43 PM Maurya M wrote: >> >> >> >> hi Sunny, >> >> i did use the [1] link for the setup, when i encountered this error >> during ssh-copy-id : (so setup the passwordless ssh, by manually copied the >> private/ public keys to all the nodes , both master & slave) >> >> >> >> [root at k8s-agentpool1-24779565-1 ~]# ssh-copy-id geouser at xxx.xx.xxx.x >> >> /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: >> "/root/.ssh/id_rsa.pub" >> >> The authenticity of host ' xxx.xx.xxx.x ( xxx.xx.xxx.x )' can't be >> established. >> >> ECDSA key fingerprint is >> SHA256:B2rNaocIcPjRga13oTnopbJ5KjI/7l5fMANXc+KhA9s. >> >> ECDSA key fingerprint is >> MD5:1b:70:f9:7a:bf:35:33:47:0c:f2:c1:cd:21:e2:d3:75. >> >> Are you sure you want to continue connecting (yes/no)? yes >> >> /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), >> to filter out any that are already installed >> >> /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you >> are prompted now it is to install the new keys >> >> Permission denied (publickey). >> >> >> >> To start afresh what all needs to teardown / delete, do we have any >> script for it ? where all the pem keys do i need to delete? >> >> >> >> thanks, >> >> Maurya >> >> >> >> On Thu, Mar 21, 2019 at 2:12 PM Sunny Kumar >> wrote: >> >>> >> >>> Hey you can start a fresh I think you are not following proper setup >> steps. >> >>> >> >>> Please follow these steps [1] to create geo-rep session, you can >> >>> delete the old one and do a fresh start. Or alternative you can use >> >>> this tool[2] to setup geo-rep. >> >>> >> >>> >> >>> [1]. >> https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ >> >>> [2]. http://aravindavk.in/blog/gluster-georep-tools/ >> >>> >> >>> >> >>> /Sunny >> >>> >> >>> On Thu, Mar 21, 2019 at 11:28 AM Maurya M wrote: >> >>> > >> >>> > Hi Sunil, >> >>> > I did run the on the slave node : >> >>> > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser >> vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781 >> >>> > getting this message "/home/azureuser/common_secret.pem.pub not >> present. Please run geo-replication command on master with push-pem option >> to generate the file" >> >>> > >> >>> > So went back and created the session again, no change, so manually >> copied the common_secret.pem.pub to /home/azureuser/ but still the >> set_geo_rep_pem_keys.sh is looking the pem file in different name : >> COMMON_SECRET_PEM_PUB=${master_vol}_${slave_vol}_common_secret.pem.pub , >> change the name of pem , ran the command again : >> >>> > >> >>> > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser >> vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781 >> >>> > Successfully copied file. >> >>> > Command executed successfully. >> >>> > >> >>> > >> >>> > - went back and created the session , start the geo-replication , >> still seeing the same error in logs. Any ideas ? >> >>> > >> >>> > thanks, >> >>> > Maurya >> >>> > >> >>> > >> >>> > >> >>> > On Wed, Mar 20, 2019 at 11:07 PM Sunny Kumar >> wrote: >> >>> >> >> >>> >> Hi Maurya, >> >>> >> >> >>> >> I guess you missed last trick to distribute keys in slave node. I >> see >> >>> >> this is non-root geo-rep setup so please try this: >> >>> >> >> >>> >> >> >>> >> Run the following command as root in any one of Slave node. >> >>> >> >> >>> >> /usr/local/libexec/glusterfs/set_geo_rep_pem_keys.sh >> >>> >> >> >>> >> >> >>> >> - Sunny >> >>> >> >> >>> >> On Wed, Mar 20, 2019 at 10:47 PM Maurya M >> wrote: >> >>> >> > >> >>> >> > Hi all, >> >>> >> > Have setup a 3 master nodes - 3 slave nodes (gluster 4.1) for >> geo-replication, but once have the geo-replication configure the status is >> always on "Created', >> >>> >> > even after have force start the session. >> >>> >> > >> >>> >> > On close inspect of the logs on the master node seeing this >> error: >> >>> >> > >> >>> >> > "E [syncdutils(monitor):801:errlog] Popen: command returned >> error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >> /var/lib/glusterd/geo-replication/secret.pem -p 22 azureuser at xxxxx.xxxx..xxx. >> gluster --xml --remote-host=localhost volume info >> vol_a5ae34341a873c043c99a938adcb5b5781 error=255" >> >>> >> > >> >>> >> > Any ideas what is issue? >> >>> >> > >> >>> >> > thanks, >> >>> >> > Maurya >> >>> >> > >> >>> >> > _______________________________________________ >> >>> >> > Gluster-users mailing list >> >>> >> > Gluster-users at gluster.org >> >>> >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrevolodin at gmail.com Sun Mar 24 12:04:27 2019 From: andrevolodin at gmail.com (Andrey Volodin) Date: Sun, 24 Mar 2019 12:04:27 +0000 Subject: [Gluster-users] Configure SSH -command In-Reply-To: References: Message-ID: you may find some reference here: http://fatphil.org/linux/ssh_ports.html On Sun, Mar 24, 2019 at 11:28 AM Maurya M wrote: > Hi Andrey, > For some reason the sshd_config Ports pararmeter is not allowing to > change when i restart the ssh service. > so following this > https://stackoverflow.com/questions/27525456/glusterfs-geo-replication-on-non-standard-ssh-port/55303479#55303479 > tried config options as described. > > i did try you usuggestion config ssh-command '/usr/sbin/sshd -D -p 2222' > , > geo-replication command failed > > thanks, > Maurya > > On Sun, Mar 24, 2019 at 4:39 PM Andrey Volodin > wrote: > >> normally you should be able to edit the config file /etc/ssh/sshd_config >> However, at the page >> https://serverfault.com/questions/826893/how-to-temporarily-run-a-secondary-ssh-server-on-a-separate-port >> I can see a one-time run solution as /usr/sbin/sshd -D -p 22200 >> >> On Sun, Mar 24, 2019 at 11:00 AM Maurya M wrote: >> >>> Hi All, >>> Have tried config the ssh-command to use -p 2222 using this config >>> option: >>> >>> config ssh-command 'ssh -p 2222' , but failed, do you know the correct >>> syntax , also where can i check for failed georeplication command logs. >>> >>> TIA, >>> Maurya >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim.kinney at gmail.com Sun Mar 24 12:45:33 2019 From: jim.kinney at gmail.com (Jim Kinney) Date: Sun, 24 Mar 2019 08:45:33 -0400 Subject: [Gluster-users] Configure SSH -command In-Reply-To: References: Message-ID: If the os is using selinux, a policy change is needed to allow the ssh daemon to connect to the new port. Look at audit2allow for a solution. On March 24, 2019 8:04:27 AM EDT, Andrey Volodin wrote: >you may find some reference here: >http://fatphil.org/linux/ssh_ports.html > >On Sun, Mar 24, 2019 at 11:28 AM Maurya M wrote: > >> Hi Andrey, >> For some reason the sshd_config Ports pararmeter is not allowing to >> change when i restart the ssh service. >> so following this >> >https://stackoverflow.com/questions/27525456/glusterfs-geo-replication-on-non-standard-ssh-port/55303479#55303479 >> tried config options as described. >> >> i did try you usuggestion config ssh-command '/usr/sbin/sshd -D -p >2222' >> , >> geo-replication command failed >> >> thanks, >> Maurya >> >> On Sun, Mar 24, 2019 at 4:39 PM Andrey Volodin > >> wrote: >> >>> normally you should be able to edit the config file >/etc/ssh/sshd_config >>> However, at the page >>> >https://serverfault.com/questions/826893/how-to-temporarily-run-a-secondary-ssh-server-on-a-separate-port >>> I can see a one-time run solution as /usr/sbin/sshd -D -p 22200 >>> >>> On Sun, Mar 24, 2019 at 11:00 AM Maurya M wrote: >>> >>>> Hi All, >>>> Have tried config the ssh-command to use -p 2222 using this config >>>> option: >>>> >>>> config ssh-command 'ssh -p 2222' , but failed, do you know the >correct >>>> syntax , also where can i check for failed georeplication command >logs. >>>> >>>> TIA, >>>> Maurya >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> -- Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From avishwan at redhat.com Mon Mar 25 03:38:00 2019 From: avishwan at redhat.com (Aravinda) Date: Mon, 25 Mar 2019 09:08:00 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: Message-ID: Use `ssh-port ` while creating the Geo-rep session Ref: https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#creating-the-session And set the ssh-port option before start. ``` gluster volume geo-replication \ [@]:: config ssh-port 2222 ``` -- regards Aravinda http://aravindavk.in On Sun, 2019-03-24 at 17:13 +0530, Maurya M wrote: > did all the suggestion as mentioned in the log trace , have another > setup using root user , but there i have issue on the ssh command as > i am unable to change the ssh port to use default 22, but my servers > (azure aks engine) are configure to using 2222 where i am unable to > change the ports , restart of ssh service giving me error! > > Is this syntax correct to config the ssh-command: > gluster volume geo-replication vol_041afbc53746053368a1840607636e97 > xxx.xx.xxx.xx::vol_a5aee81a873c043c99a938adcb5b5781 config ssh- > command '/usr/sbin/sshd -D -p 2222' > > On Sun, Mar 24, 2019 at 4:38 PM Maurya M wrote: > > Did give the persmission on both "/var/log/glusterfs/" & > > "/var/lib/glusterd/" too, but seems the directory where i mounted > > using heketi is having issues: > > > > [2019-03-22 09:48:21.546308] E [syncdutils(worker > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 > > eab2394433f02f5617012d4ae3c28f/brick):305:log_raise_exception] > > : connection to peer is broken > > [2019-03-22 09:48:21.546662] E [syncdutils(worker > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 > > eab2394433f02f5617012d4ae3c28f/brick):309:log_raise_exception] > > : getting "No such file or directory"errors is most likely due > > to MISCONFIGURATION, please remove all the public keys added by > > geo-replication from authorized_keys file in slave nodes and run > > Geo-replication create command again. > > [2019-03-22 09:48:21.546736] E [syncdutils(worker > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 > > eab2394433f02f5617012d4ae3c28f/brick):316:log_raise_exception] > > : If `gsec_create container` was used, then run `gluster > > volume geo-replication > > [@]:: config remote-gsyncd > > (Example GSYNCD_PATH: > > `/usr/libexec/glusterfs/gsyncd`) > > [2019-03-22 09:48:21.546858] E [syncdutils(worker > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 > > eab2394433f02f5617012d4ae3c28f/brick):801:errlog] Popen: command > > returned error cmd=ssh -oPasswordAuthentication=no > > -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo- > > replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd- > > aux-ssh-OaPGc3/c784230c9648efa4d529975bd779c551.sock > > azureuser at 172.16.201.35 /nonexistent/gsyncd slave > > vol_041afbc53746053368a1840607636e97 azureuser at 172.16.201.35::vol_a > > 5aee81a873c043c99a938adcb5b5781 --master-node 172.16.189.4 -- > > master-node-id dd4efc35-4b86-4901-9c00-483032614c35 --master-brick > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 > > eab2394433f02f5617012d4ae3c28f/brick --local-node 172.16.201.35 -- > > local-node-id 7eb0a2b6-c4d6-41b1-a346-0638dbf8d779 --slave-timeout > > 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave- > > gluster-command-dir /usr/sbin error=127 > > [2019-03-22 09:48:21.546977] E [syncdutils(worker > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 > > eab2394433f02f5617012d4ae3c28f/brick):805:logerr] Popen: ssh> bash: > > /nonexistent/gsyncd: No such file or directory > > [2019-03-22 09:48:21.565583] I [repce(agent > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 > > eab2394433f02f5617012d4ae3c28f/brick):80:service_loop] RepceServer: > > terminating on reaching EOF. > > [2019-03-22 09:48:21.565745] I [monitor(monitor):266:monitor] > > Monitor: worker died before establishing connection > > brick=/var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/br > > ick_b3eab2394433f02f5617012d4ae3c28f/brick > > [2019-03-22 09:48:21.579195] I > > [gsyncdstatus(monitor):245:set_worker_status] GeorepStatus: Worker > > Status Change status=Faulty > > > > On Fri, Mar 22, 2019 at 10:23 PM Sunny Kumar > > wrote: > > > Hi Maurya, > > > > > > Looks like hook script is failed to set permissions for azureuser > > > on > > > "/var/log/glusterfs". > > > You can assign permission manually for directory then it will > > > work. > > > > > > -Sunny > > > > > > On Fri, Mar 22, 2019 at 2:07 PM Maurya M > > > wrote: > > > > > > > > hi Sunny, > > > > Passwordless ssh to : > > > > > > > > ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > > > /var/lib/glusterd/geo-replication/secret.pem -p 22 > > > azureuser at 172.16.201.35 > > > > > > > > is login, but when the whole command is run getting permission > > > issues again:: > > > > > > > > ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > > > /var/lib/glusterd/geo-replication/secret.pem -p 22 > > > azureuser at 172.16.201.35 gluster --xml --remote-host=localhost > > > volume info vol_a5aee81a873c043c99a938adcb5b5781 -v > > > > ERROR: failed to create logfile "/var/log/glusterfs/cli.log" > > > (Permission denied) > > > > ERROR: failed to open logfile /var/log/glusterfs/cli.log > > > > > > > > any idea here ? > > > > > > > > thanks, > > > > Maurya > > > > > > > > > > > > On Thu, Mar 21, 2019 at 2:43 PM Maurya M > > > wrote: > > > >> > > > >> hi Sunny, > > > >> i did use the [1] link for the setup, when i encountered this > > > error during ssh-copy-id : (so setup the passwordless ssh, by > > > manually copied the private/ public keys to all the nodes , both > > > master & slave) > > > >> > > > >> [root at k8s-agentpool1-24779565-1 ~]# ssh-copy-id > > > geouser at xxx.xx.xxx.x > > > >> /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: > > > "/root/.ssh/id_rsa.pub" > > > >> The authenticity of host ' xxx.xx.xxx.x ( xxx.xx.xxx.x )' > > > can't be established. > > > >> ECDSA key fingerprint is > > > SHA256:B2rNaocIcPjRga13oTnopbJ5KjI/7l5fMANXc+KhA9s. > > > >> ECDSA key fingerprint is > > > MD5:1b:70:f9:7a:bf:35:33:47:0c:f2:c1:cd:21:e2:d3:75. > > > >> Are you sure you want to continue connecting (yes/no)? yes > > > >> /usr/bin/ssh-copy-id: INFO: attempting to log in with the new > > > key(s), to filter out any that are already installed > > > >> /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- > > > if you are prompted now it is to install the new keys > > > >> Permission denied (publickey). > > > >> > > > >> To start afresh what all needs to teardown / delete, do we > > > have any script for it ? where all the pem keys do i need to > > > delete? > > > >> > > > >> thanks, > > > >> Maurya > > > >> > > > >> On Thu, Mar 21, 2019 at 2:12 PM Sunny Kumar < > > > sunkumar at redhat.com> wrote: > > > >>> > > > >>> Hey you can start a fresh I think you are not following > > > proper setup steps. > > > >>> > > > >>> Please follow these steps [1] to create geo-rep session, you > > > can > > > >>> delete the old one and do a fresh start. Or alternative you > > > can use > > > >>> this tool[2] to setup geo-rep. > > > >>> > > > >>> > > > >>> [1]. > > > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ > > > >>> [2]. http://aravindavk.in/blog/gluster-georep-tools/ > > > >>> > > > >>> > > > >>> /Sunny > > > >>> > > > >>> On Thu, Mar 21, 2019 at 11:28 AM Maurya M > > > wrote: > > > >>> > > > > >>> > Hi Sunil, > > > >>> > I did run the on the slave node : > > > >>> > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser > > > vol_041afbc53746053368a1840607636e97 > > > vol_a5aee81a873c043c99a938adcb5b5781 > > > >>> > getting this message "/home/azureuser/common_secret.pem.pub > > > not present. Please run geo-replication command on master with > > > push-pem option to generate the file" > > > >>> > > > > >>> > So went back and created the session again, no change, so > > > manually copied the common_secret.pem.pub to /home/azureuser/ but > > > still the set_geo_rep_pem_keys.sh is looking the pem file in > > > different name : > > > COMMON_SECRET_PEM_PUB=${master_vol}_${slave_vol}_common_secret.pe > > > m.pub , change the name of pem , ran the command again : > > > >>> > > > > >>> > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser > > > vol_041afbc53746053368a1840607636e97 > > > vol_a5aee81a873c043c99a938adcb5b5781 > > > >>> > Successfully copied file. > > > >>> > Command executed successfully. > > > >>> > > > > >>> > > > > >>> > - went back and created the session , start the geo- > > > replication , still seeing the same error in logs. Any ideas ? > > > >>> > > > > >>> > thanks, > > > >>> > Maurya > > > >>> > > > > >>> > > > > >>> > > > > >>> > On Wed, Mar 20, 2019 at 11:07 PM Sunny Kumar < > > > sunkumar at redhat.com> wrote: > > > >>> >> > > > >>> >> Hi Maurya, > > > >>> >> > > > >>> >> I guess you missed last trick to distribute keys in slave > > > node. I see > > > >>> >> this is non-root geo-rep setup so please try this: > > > >>> >> > > > >>> >> > > > >>> >> Run the following command as root in any one of Slave > > > node. > > > >>> >> > > > >>> >> /usr/local/libexec/glusterfs/set_geo_rep_pem_keys.sh > > > > > > >>> >> > > > >>> >> > > > >>> >> - Sunny > > > >>> >> > > > >>> >> On Wed, Mar 20, 2019 at 10:47 PM Maurya M < > > > mauryam at gmail.com> wrote: > > > >>> >> > > > > >>> >> > Hi all, > > > >>> >> > Have setup a 3 master nodes - 3 slave nodes (gluster > > > 4.1) for geo-replication, but once have the geo-replication > > > configure the status is always on "Created', > > > >>> >> > even after have force start the session. > > > >>> >> > > > > >>> >> > On close inspect of the logs on the master node seeing > > > this error: > > > >>> >> > > > > >>> >> > "E [syncdutils(monitor):801:errlog] Popen: command > > > returned error cmd=ssh -oPasswordAuthentication=no > > > -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo- > > > replication/secret.pem -p 22 azureuser at xxxxx.xxxx..xxx. gluster > > > --xml --remote-host=localhost volume info > > > vol_a5ae34341a873c043c99a938adcb5b5781 error=255" > > > >>> >> > > > > >>> >> > Any ideas what is issue? > > > >>> >> > > > > >>> >> > thanks, > > > >>> >> > Maurya > > > >>> >> > > > > >>> >> > _______________________________________________ > > > >>> >> > Gluster-users mailing list > > > >>> >> > Gluster-users at gluster.org > > > >>> >> > https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From mauryam at gmail.com Mon Mar 25 04:16:41 2019 From: mauryam at gmail.com (Maurya M) Date: Mon, 25 Mar 2019 09:46:41 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: Message-ID: hi Aravinda, had the session created using : create ssh-port 2222 push-pem and also the : gluster volume geo-replication vol_75a5fd373d88ba687f591f3353fa05cf 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f config ssh-port 2222 hitting this message: geo-replication config-set failed for vol_75a5fd373d88ba687f591f3353fa05cf 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f geo-replication command failed Below is snap of status: [root at k8s-agentpool1-24779565-1 vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f]# gluster volume geo-replication vol_75a5fd373d88ba687f591f3353fa05cf 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 172.16.189.4 vol_75a5fd373d88ba687f591f3353fa05cf /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_116fb9427fb26f752d9ba8e45e183cb1/brick root 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A Created N/A N/A 172.16.189.35 vol_75a5fd373d88ba687f591f3353fa05cf /var/lib/heketi/mounts/vg_05708751110fe60b3e7da15bdcf6d4d4/brick_266bb08f0d466d346f8c0b19569736fb/brick root 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A Created N/A N/A 172.16.189.66 vol_75a5fd373d88ba687f591f3353fa05cf /var/lib/heketi/mounts/vg_4b92a2b687e59b7311055d3809b77c06/brick_dfa44c9380cdedac708e27e2c2a443a0/brick root 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A Created N/A N/A any ideas ? where can find logs for the failed commands check in gysncd.log , the trace is as below: [2019-03-25 04:04:42.295043] I [gsyncd(monitor):297:main] : Using session config file path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf [2019-03-25 04:04:42.387192] E [syncdutils(monitor):332:log_raise_exception] : FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main func(args) File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 50, in subcmd_monitor return monitor.monitor(local, remote) File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 427, in monitor return Monitor().multiplex(*distribute(local, remote)) File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 370, in distribute mvol = Volinfo(master.volume, master.host) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 860, in __init__ print "debug varible " %vix TypeError: not all arguments converted during string formatting [2019-03-25 04:04:48.997519] I [gsyncd(config-get):297:main] : Using session config file path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf [2019-03-25 04:04:49.93528] I [gsyncd(status):297:main] : Using session config file path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf [2019-03-25 04:08:07.194348] I [gsyncd(config-get):297:main] : Using session config file path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf [2019-03-25 04:08:07.262588] I [gsyncd(config-get):297:main] : Using session config file path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf [2019-03-25 04:08:07.550080] I [gsyncd(config-get):297:main] : Using session config file path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf [2019-03-25 04:08:18.933028] I [gsyncd(config-get):297:main] : Using session config file path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf [2019-03-25 04:08:19.25285] I [gsyncd(status):297:main] : Using session config file path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf [2019-03-25 04:09:15.766882] I [gsyncd(config-get):297:main] : Using session config file path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf [2019-03-25 04:09:16.30267] I [gsyncd(config-get):297:main] : Using session config file path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf [2019-03-25 04:09:16.89006] I [gsyncd(config-set):297:main] : Using session config file path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf regards, Maurya On Mon, Mar 25, 2019 at 9:08 AM Aravinda wrote: > Use `ssh-port ` while creating the Geo-rep session > > Ref: > > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#creating-the-session > > And set the ssh-port option before start. > > ``` > gluster volume geo-replication \ > [@]:: config > ssh-port 2222 > ``` > > -- > regards > Aravinda > http://aravindavk.in > > > On Sun, 2019-03-24 at 17:13 +0530, Maurya M wrote: > > did all the suggestion as mentioned in the log trace , have another > > setup using root user , but there i have issue on the ssh command as > > i am unable to change the ssh port to use default 22, but my servers > > (azure aks engine) are configure to using 2222 where i am unable to > > change the ports , restart of ssh service giving me error! > > > > Is this syntax correct to config the ssh-command: > > gluster volume geo-replication vol_041afbc53746053368a1840607636e97 > > xxx.xx.xxx.xx::vol_a5aee81a873c043c99a938adcb5b5781 config ssh- > > command '/usr/sbin/sshd -D -p 2222' > > > > On Sun, Mar 24, 2019 at 4:38 PM Maurya M wrote: > > > Did give the persmission on both "/var/log/glusterfs/" & > > > "/var/lib/glusterd/" too, but seems the directory where i mounted > > > using heketi is having issues: > > > > > > [2019-03-22 09:48:21.546308] E [syncdutils(worker > > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 > > > eab2394433f02f5617012d4ae3c28f/brick):305:log_raise_exception] > > > : connection to peer is broken > > > [2019-03-22 09:48:21.546662] E [syncdutils(worker > > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 > > > eab2394433f02f5617012d4ae3c28f/brick):309:log_raise_exception] > > > : getting "No such file or directory"errors is most likely due > > > to MISCONFIGURATION, please remove all the public keys added by > > > geo-replication from authorized_keys file in slave nodes and run > > > Geo-replication create command again. > > > [2019-03-22 09:48:21.546736] E [syncdutils(worker > > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 > > > eab2394433f02f5617012d4ae3c28f/brick):316:log_raise_exception] > > > : If `gsec_create container` was used, then run `gluster > > > volume geo-replication > > > [@]:: config remote-gsyncd > > > (Example GSYNCD_PATH: > > > `/usr/libexec/glusterfs/gsyncd`) > > > [2019-03-22 09:48:21.546858] E [syncdutils(worker > > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 > > > eab2394433f02f5617012d4ae3c28f/brick):801:errlog] Popen: command > > > returned error cmd=ssh -oPasswordAuthentication=no > > > -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo- > > > replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd- > > > aux-ssh-OaPGc3/c784230c9648efa4d529975bd779c551.sock > > > azureuser at 172.16.201.35 /nonexistent/gsyncd slave > > > vol_041afbc53746053368a1840607636e97 azureuser at 172.16.201.35::vol_a > > > 5aee81a873c043c99a938adcb5b5781 --master-node 172.16.189.4 -- > > > master-node-id dd4efc35-4b86-4901-9c00-483032614c35 --master-brick > > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 > > > eab2394433f02f5617012d4ae3c28f/brick --local-node 172.16.201.35 -- > > > local-node-id 7eb0a2b6-c4d6-41b1-a346-0638dbf8d779 --slave-timeout > > > 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave- > > > gluster-command-dir /usr/sbin error=127 > > > [2019-03-22 09:48:21.546977] E [syncdutils(worker > > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 > > > eab2394433f02f5617012d4ae3c28f/brick):805:logerr] Popen: ssh> bash: > > > /nonexistent/gsyncd: No such file or directory > > > [2019-03-22 09:48:21.565583] I [repce(agent > > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 > > > eab2394433f02f5617012d4ae3c28f/brick):80:service_loop] RepceServer: > > > terminating on reaching EOF. > > > [2019-03-22 09:48:21.565745] I [monitor(monitor):266:monitor] > > > Monitor: worker died before establishing connection > > > brick=/var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/br > > > ick_b3eab2394433f02f5617012d4ae3c28f/brick > > > [2019-03-22 09:48:21.579195] I > > > [gsyncdstatus(monitor):245:set_worker_status] GeorepStatus: Worker > > > Status Change status=Faulty > > > > > > On Fri, Mar 22, 2019 at 10:23 PM Sunny Kumar > > > wrote: > > > > Hi Maurya, > > > > > > > > Looks like hook script is failed to set permissions for azureuser > > > > on > > > > "/var/log/glusterfs". > > > > You can assign permission manually for directory then it will > > > > work. > > > > > > > > -Sunny > > > > > > > > On Fri, Mar 22, 2019 at 2:07 PM Maurya M > > > > wrote: > > > > > > > > > > hi Sunny, > > > > > Passwordless ssh to : > > > > > > > > > > ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > > > > /var/lib/glusterd/geo-replication/secret.pem -p 22 > > > > azureuser at 172.16.201.35 > > > > > > > > > > is login, but when the whole command is run getting permission > > > > issues again:: > > > > > > > > > > ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > > > > /var/lib/glusterd/geo-replication/secret.pem -p 22 > > > > azureuser at 172.16.201.35 gluster --xml --remote-host=localhost > > > > volume info vol_a5aee81a873c043c99a938adcb5b5781 -v > > > > > ERROR: failed to create logfile "/var/log/glusterfs/cli.log" > > > > (Permission denied) > > > > > ERROR: failed to open logfile /var/log/glusterfs/cli.log > > > > > > > > > > any idea here ? > > > > > > > > > > thanks, > > > > > Maurya > > > > > > > > > > > > > > > On Thu, Mar 21, 2019 at 2:43 PM Maurya M > > > > wrote: > > > > >> > > > > >> hi Sunny, > > > > >> i did use the [1] link for the setup, when i encountered this > > > > error during ssh-copy-id : (so setup the passwordless ssh, by > > > > manually copied the private/ public keys to all the nodes , both > > > > master & slave) > > > > >> > > > > >> [root at k8s-agentpool1-24779565-1 ~]# ssh-copy-id > > > > geouser at xxx.xx.xxx.x > > > > >> /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: > > > > "/root/.ssh/id_rsa.pub" > > > > >> The authenticity of host ' xxx.xx.xxx.x ( xxx.xx.xxx.x )' > > > > can't be established. > > > > >> ECDSA key fingerprint is > > > > SHA256:B2rNaocIcPjRga13oTnopbJ5KjI/7l5fMANXc+KhA9s. > > > > >> ECDSA key fingerprint is > > > > MD5:1b:70:f9:7a:bf:35:33:47:0c:f2:c1:cd:21:e2:d3:75. > > > > >> Are you sure you want to continue connecting (yes/no)? yes > > > > >> /usr/bin/ssh-copy-id: INFO: attempting to log in with the new > > > > key(s), to filter out any that are already installed > > > > >> /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- > > > > if you are prompted now it is to install the new keys > > > > >> Permission denied (publickey). > > > > >> > > > > >> To start afresh what all needs to teardown / delete, do we > > > > have any script for it ? where all the pem keys do i need to > > > > delete? > > > > >> > > > > >> thanks, > > > > >> Maurya > > > > >> > > > > >> On Thu, Mar 21, 2019 at 2:12 PM Sunny Kumar < > > > > sunkumar at redhat.com> wrote: > > > > >>> > > > > >>> Hey you can start a fresh I think you are not following > > > > proper setup steps. > > > > >>> > > > > >>> Please follow these steps [1] to create geo-rep session, you > > > > can > > > > >>> delete the old one and do a fresh start. Or alternative you > > > > can use > > > > >>> this tool[2] to setup geo-rep. > > > > >>> > > > > >>> > > > > >>> [1]. > > > > > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ > > > > >>> [2]. http://aravindavk.in/blog/gluster-georep-tools/ > > > > >>> > > > > >>> > > > > >>> /Sunny > > > > >>> > > > > >>> On Thu, Mar 21, 2019 at 11:28 AM Maurya M > > > > wrote: > > > > >>> > > > > > >>> > Hi Sunil, > > > > >>> > I did run the on the slave node : > > > > >>> > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser > > > > vol_041afbc53746053368a1840607636e97 > > > > vol_a5aee81a873c043c99a938adcb5b5781 > > > > >>> > getting this message "/home/azureuser/common_secret.pem.pub > > > > not present. Please run geo-replication command on master with > > > > push-pem option to generate the file" > > > > >>> > > > > > >>> > So went back and created the session again, no change, so > > > > manually copied the common_secret.pem.pub to /home/azureuser/ but > > > > still the set_geo_rep_pem_keys.sh is looking the pem file in > > > > different name : > > > > COMMON_SECRET_PEM_PUB=${master_vol}_${slave_vol}_common_secret.pe > > > > m.pub , change the name of pem , ran the command again : > > > > >>> > > > > > >>> > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser > > > > vol_041afbc53746053368a1840607636e97 > > > > vol_a5aee81a873c043c99a938adcb5b5781 > > > > >>> > Successfully copied file. > > > > >>> > Command executed successfully. > > > > >>> > > > > > >>> > > > > > >>> > - went back and created the session , start the geo- > > > > replication , still seeing the same error in logs. Any ideas ? > > > > >>> > > > > > >>> > thanks, > > > > >>> > Maurya > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > On Wed, Mar 20, 2019 at 11:07 PM Sunny Kumar < > > > > sunkumar at redhat.com> wrote: > > > > >>> >> > > > > >>> >> Hi Maurya, > > > > >>> >> > > > > >>> >> I guess you missed last trick to distribute keys in slave > > > > node. I see > > > > >>> >> this is non-root geo-rep setup so please try this: > > > > >>> >> > > > > >>> >> > > > > >>> >> Run the following command as root in any one of Slave > > > > node. > > > > >>> >> > > > > >>> >> /usr/local/libexec/glusterfs/set_geo_rep_pem_keys.sh > > > > > > > > >>> >> > > > > >>> >> > > > > >>> >> - Sunny > > > > >>> >> > > > > >>> >> On Wed, Mar 20, 2019 at 10:47 PM Maurya M < > > > > mauryam at gmail.com> wrote: > > > > >>> >> > > > > > >>> >> > Hi all, > > > > >>> >> > Have setup a 3 master nodes - 3 slave nodes (gluster > > > > 4.1) for geo-replication, but once have the geo-replication > > > > configure the status is always on "Created', > > > > >>> >> > even after have force start the session. > > > > >>> >> > > > > > >>> >> > On close inspect of the logs on the master node seeing > > > > this error: > > > > >>> >> > > > > > >>> >> > "E [syncdutils(monitor):801:errlog] Popen: command > > > > returned error cmd=ssh -oPasswordAuthentication=no > > > > -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo- > > > > replication/secret.pem -p 22 azureuser at xxxxx.xxxx..xxx. gluster > > > > --xml --remote-host=localhost volume info > > > > vol_a5ae34341a873c043c99a938adcb5b5781 error=255" > > > > >>> >> > > > > > >>> >> > Any ideas what is issue? > > > > >>> >> > > > > > >>> >> > thanks, > > > > >>> >> > Maurya > > > > >>> >> > > > > > >>> >> > _______________________________________________ > > > > >>> >> > Gluster-users mailing list > > > > >>> >> > Gluster-users at gluster.org > > > > >>> >> > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauryam at gmail.com Mon Mar 25 04:19:59 2019 From: mauryam at gmail.com (Maurya M) Date: Mon, 25 Mar 2019 09:49:59 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: Message-ID: tried even this, did not work : [root at k8s-agentpool1-24779565-1 vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f]# gluster volume geo-replication vol_75a5fd373d88ba687f591f3353fa05cf 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f* config ssh-command 'ssh -p 2222'* geo-replication config-set failed for vol_75a5fd373d88ba687f591f3353fa05cf 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f geo-replication command failed On Mon, Mar 25, 2019 at 9:46 AM Maurya M wrote: > hi Aravinda, > had the session created using : create ssh-port 2222 push-pem and also > the : > > gluster volume geo-replication vol_75a5fd373d88ba687f591f3353fa05cf > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f config ssh-port 2222 > > hitting this message: > geo-replication config-set failed for vol_75a5fd373d88ba687f591f3353fa05cf > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > geo-replication command failed > > Below is snap of status: > > [root at k8s-agentpool1-24779565-1 > vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f]# > gluster volume geo-replication vol_75a5fd373d88ba687f591f3353fa05cf > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f status > > MASTER NODE MASTER VOL MASTER BRICK > > SLAVE USER SLAVE > SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > 172.16.189.4 vol_75a5fd373d88ba687f591f3353fa05cf > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_116fb9427fb26f752d9ba8e45e183cb1/brick > root 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > N/A Created N/A N/A > 172.16.189.35 vol_75a5fd373d88ba687f591f3353fa05cf > /var/lib/heketi/mounts/vg_05708751110fe60b3e7da15bdcf6d4d4/brick_266bb08f0d466d346f8c0b19569736fb/brick > root 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > N/A Created N/A N/A > 172.16.189.66 vol_75a5fd373d88ba687f591f3353fa05cf > /var/lib/heketi/mounts/vg_4b92a2b687e59b7311055d3809b77c06/brick_dfa44c9380cdedac708e27e2c2a443a0/brick > root 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > N/A Created N/A N/A > > any ideas ? where can find logs for the failed commands check in > gysncd.log , the trace is as below: > > [2019-03-25 04:04:42.295043] I [gsyncd(monitor):297:main] : Using > session config file > path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:04:42.387192] E > [syncdutils(monitor):332:log_raise_exception] : FAIL: > Traceback (most recent call last): > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in > main > func(args) > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 50, in > subcmd_monitor > return monitor.monitor(local, remote) > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 427, in > monitor > return Monitor().multiplex(*distribute(local, remote)) > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 370, in > distribute > mvol = Volinfo(master.volume, master.host) > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 860, > in __init__ > print "debug varible " %vix > TypeError: not all arguments converted during string formatting > [2019-03-25 04:04:48.997519] I [gsyncd(config-get):297:main] : Using > session config file > path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:04:49.93528] I [gsyncd(status):297:main] : Using > session config file > path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:08:07.194348] I [gsyncd(config-get):297:main] : Using > session config file > path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:08:07.262588] I [gsyncd(config-get):297:main] : Using > session config file > path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:08:07.550080] I [gsyncd(config-get):297:main] : Using > session config file > path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:08:18.933028] I [gsyncd(config-get):297:main] : Using > session config file > path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:08:19.25285] I [gsyncd(status):297:main] : Using > session config file > path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:09:15.766882] I [gsyncd(config-get):297:main] : Using > session config file > path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:09:16.30267] I [gsyncd(config-get):297:main] : Using > session config file > path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:09:16.89006] I [gsyncd(config-set):297:main] : Using > session config file > path=/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > regards, > Maurya > > On Mon, Mar 25, 2019 at 9:08 AM Aravinda wrote: > >> Use `ssh-port ` while creating the Geo-rep session >> >> Ref: >> >> https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#creating-the-session >> >> And set the ssh-port option before start. >> >> ``` >> gluster volume geo-replication \ >> [@]:: config >> ssh-port 2222 >> ``` >> >> -- >> regards >> Aravinda >> http://aravindavk.in >> >> >> On Sun, 2019-03-24 at 17:13 +0530, Maurya M wrote: >> > did all the suggestion as mentioned in the log trace , have another >> > setup using root user , but there i have issue on the ssh command as >> > i am unable to change the ssh port to use default 22, but my servers >> > (azure aks engine) are configure to using 2222 where i am unable to >> > change the ports , restart of ssh service giving me error! >> > >> > Is this syntax correct to config the ssh-command: >> > gluster volume geo-replication vol_041afbc53746053368a1840607636e97 >> > xxx.xx.xxx.xx::vol_a5aee81a873c043c99a938adcb5b5781 config ssh- >> > command '/usr/sbin/sshd -D -p 2222' >> > >> > On Sun, Mar 24, 2019 at 4:38 PM Maurya M wrote: >> > > Did give the persmission on both "/var/log/glusterfs/" & >> > > "/var/lib/glusterd/" too, but seems the directory where i mounted >> > > using heketi is having issues: >> > > >> > > [2019-03-22 09:48:21.546308] E [syncdutils(worker >> > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 >> > > eab2394433f02f5617012d4ae3c28f/brick):305:log_raise_exception] >> > > : connection to peer is broken >> > > [2019-03-22 09:48:21.546662] E [syncdutils(worker >> > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 >> > > eab2394433f02f5617012d4ae3c28f/brick):309:log_raise_exception] >> > > : getting "No such file or directory"errors is most likely due >> > > to MISCONFIGURATION, please remove all the public keys added by >> > > geo-replication from authorized_keys file in slave nodes and run >> > > Geo-replication create command again. >> > > [2019-03-22 09:48:21.546736] E [syncdutils(worker >> > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 >> > > eab2394433f02f5617012d4ae3c28f/brick):316:log_raise_exception] >> > > : If `gsec_create container` was used, then run `gluster >> > > volume geo-replication >> > > [@]:: config remote-gsyncd >> > > (Example GSYNCD_PATH: >> > > `/usr/libexec/glusterfs/gsyncd`) >> > > [2019-03-22 09:48:21.546858] E [syncdutils(worker >> > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 >> > > eab2394433f02f5617012d4ae3c28f/brick):801:errlog] Popen: command >> > > returned error cmd=ssh -oPasswordAuthentication=no >> > > -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo- >> > > replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd- >> > > aux-ssh-OaPGc3/c784230c9648efa4d529975bd779c551.sock >> > > azureuser at 172.16.201.35 /nonexistent/gsyncd slave >> > > vol_041afbc53746053368a1840607636e97 azureuser at 172.16.201.35::vol_a >> > > 5aee81a873c043c99a938adcb5b5781 --master-node 172.16.189.4 -- >> > > master-node-id dd4efc35-4b86-4901-9c00-483032614c35 --master-brick >> > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 >> > > eab2394433f02f5617012d4ae3c28f/brick --local-node 172.16.201.35 -- >> > > local-node-id 7eb0a2b6-c4d6-41b1-a346-0638dbf8d779 --slave-timeout >> > > 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave- >> > > gluster-command-dir /usr/sbin error=127 >> > > [2019-03-22 09:48:21.546977] E [syncdutils(worker >> > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 >> > > eab2394433f02f5617012d4ae3c28f/brick):805:logerr] Popen: ssh> bash: >> > > /nonexistent/gsyncd: No such file or directory >> > > [2019-03-22 09:48:21.565583] I [repce(agent >> > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_b3 >> > > eab2394433f02f5617012d4ae3c28f/brick):80:service_loop] RepceServer: >> > > terminating on reaching EOF. >> > > [2019-03-22 09:48:21.565745] I [monitor(monitor):266:monitor] >> > > Monitor: worker died before establishing connection >> > > brick=/var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/br >> > > ick_b3eab2394433f02f5617012d4ae3c28f/brick >> > > [2019-03-22 09:48:21.579195] I >> > > [gsyncdstatus(monitor):245:set_worker_status] GeorepStatus: Worker >> > > Status Change status=Faulty >> > > >> > > On Fri, Mar 22, 2019 at 10:23 PM Sunny Kumar >> > > wrote: >> > > > Hi Maurya, >> > > > >> > > > Looks like hook script is failed to set permissions for azureuser >> > > > on >> > > > "/var/log/glusterfs". >> > > > You can assign permission manually for directory then it will >> > > > work. >> > > > >> > > > -Sunny >> > > > >> > > > On Fri, Mar 22, 2019 at 2:07 PM Maurya M >> > > > wrote: >> > > > > >> > > > > hi Sunny, >> > > > > Passwordless ssh to : >> > > > > >> > > > > ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >> > > > /var/lib/glusterd/geo-replication/secret.pem -p 22 >> > > > azureuser at 172.16.201.35 >> > > > > >> > > > > is login, but when the whole command is run getting permission >> > > > issues again:: >> > > > > >> > > > > ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >> > > > /var/lib/glusterd/geo-replication/secret.pem -p 22 >> > > > azureuser at 172.16.201.35 gluster --xml --remote-host=localhost >> > > > volume info vol_a5aee81a873c043c99a938adcb5b5781 -v >> > > > > ERROR: failed to create logfile "/var/log/glusterfs/cli.log" >> > > > (Permission denied) >> > > > > ERROR: failed to open logfile /var/log/glusterfs/cli.log >> > > > > >> > > > > any idea here ? >> > > > > >> > > > > thanks, >> > > > > Maurya >> > > > > >> > > > > >> > > > > On Thu, Mar 21, 2019 at 2:43 PM Maurya M >> > > > wrote: >> > > > >> >> > > > >> hi Sunny, >> > > > >> i did use the [1] link for the setup, when i encountered this >> > > > error during ssh-copy-id : (so setup the passwordless ssh, by >> > > > manually copied the private/ public keys to all the nodes , both >> > > > master & slave) >> > > > >> >> > > > >> [root at k8s-agentpool1-24779565-1 ~]# ssh-copy-id >> > > > geouser at xxx.xx.xxx.x >> > > > >> /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: >> > > > "/root/.ssh/id_rsa.pub" >> > > > >> The authenticity of host ' xxx.xx.xxx.x ( xxx.xx.xxx.x )' >> > > > can't be established. >> > > > >> ECDSA key fingerprint is >> > > > SHA256:B2rNaocIcPjRga13oTnopbJ5KjI/7l5fMANXc+KhA9s. >> > > > >> ECDSA key fingerprint is >> > > > MD5:1b:70:f9:7a:bf:35:33:47:0c:f2:c1:cd:21:e2:d3:75. >> > > > >> Are you sure you want to continue connecting (yes/no)? yes >> > > > >> /usr/bin/ssh-copy-id: INFO: attempting to log in with the new >> > > > key(s), to filter out any that are already installed >> > > > >> /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- >> > > > if you are prompted now it is to install the new keys >> > > > >> Permission denied (publickey). >> > > > >> >> > > > >> To start afresh what all needs to teardown / delete, do we >> > > > have any script for it ? where all the pem keys do i need to >> > > > delete? >> > > > >> >> > > > >> thanks, >> > > > >> Maurya >> > > > >> >> > > > >> On Thu, Mar 21, 2019 at 2:12 PM Sunny Kumar < >> > > > sunkumar at redhat.com> wrote: >> > > > >>> >> > > > >>> Hey you can start a fresh I think you are not following >> > > > proper setup steps. >> > > > >>> >> > > > >>> Please follow these steps [1] to create geo-rep session, you >> > > > can >> > > > >>> delete the old one and do a fresh start. Or alternative you >> > > > can use >> > > > >>> this tool[2] to setup geo-rep. >> > > > >>> >> > > > >>> >> > > > >>> [1]. >> > > > >> https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/ >> > > > >>> [2]. http://aravindavk.in/blog/gluster-georep-tools/ >> > > > >>> >> > > > >>> >> > > > >>> /Sunny >> > > > >>> >> > > > >>> On Thu, Mar 21, 2019 at 11:28 AM Maurya M >> > > > wrote: >> > > > >>> > >> > > > >>> > Hi Sunil, >> > > > >>> > I did run the on the slave node : >> > > > >>> > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser >> > > > vol_041afbc53746053368a1840607636e97 >> > > > vol_a5aee81a873c043c99a938adcb5b5781 >> > > > >>> > getting this message "/home/azureuser/common_secret.pem.pub >> > > > not present. Please run geo-replication command on master with >> > > > push-pem option to generate the file" >> > > > >>> > >> > > > >>> > So went back and created the session again, no change, so >> > > > manually copied the common_secret.pem.pub to /home/azureuser/ but >> > > > still the set_geo_rep_pem_keys.sh is looking the pem file in >> > > > different name : >> > > > COMMON_SECRET_PEM_PUB=${master_vol}_${slave_vol}_common_secret.pe >> > > > m.pub , change the name of pem , ran the command again : >> > > > >>> > >> > > > >>> > /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser >> > > > vol_041afbc53746053368a1840607636e97 >> > > > vol_a5aee81a873c043c99a938adcb5b5781 >> > > > >>> > Successfully copied file. >> > > > >>> > Command executed successfully. >> > > > >>> > >> > > > >>> > >> > > > >>> > - went back and created the session , start the geo- >> > > > replication , still seeing the same error in logs. Any ideas ? >> > > > >>> > >> > > > >>> > thanks, >> > > > >>> > Maurya >> > > > >>> > >> > > > >>> > >> > > > >>> > >> > > > >>> > On Wed, Mar 20, 2019 at 11:07 PM Sunny Kumar < >> > > > sunkumar at redhat.com> wrote: >> > > > >>> >> >> > > > >>> >> Hi Maurya, >> > > > >>> >> >> > > > >>> >> I guess you missed last trick to distribute keys in slave >> > > > node. I see >> > > > >>> >> this is non-root geo-rep setup so please try this: >> > > > >>> >> >> > > > >>> >> >> > > > >>> >> Run the following command as root in any one of Slave >> > > > node. >> > > > >>> >> >> > > > >>> >> /usr/local/libexec/glusterfs/set_geo_rep_pem_keys.sh >> > > > >> > > > >>> >> >> > > > >>> >> >> > > > >>> >> - Sunny >> > > > >>> >> >> > > > >>> >> On Wed, Mar 20, 2019 at 10:47 PM Maurya M < >> > > > mauryam at gmail.com> wrote: >> > > > >>> >> > >> > > > >>> >> > Hi all, >> > > > >>> >> > Have setup a 3 master nodes - 3 slave nodes (gluster >> > > > 4.1) for geo-replication, but once have the geo-replication >> > > > configure the status is always on "Created', >> > > > >>> >> > even after have force start the session. >> > > > >>> >> > >> > > > >>> >> > On close inspect of the logs on the master node seeing >> > > > this error: >> > > > >>> >> > >> > > > >>> >> > "E [syncdutils(monitor):801:errlog] Popen: command >> > > > returned error cmd=ssh -oPasswordAuthentication=no >> > > > -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo- >> > > > replication/secret.pem -p 22 azureuser at xxxxx.xxxx..xxx. gluster >> > > > --xml --remote-host=localhost volume info >> > > > vol_a5ae34341a873c043c99a938adcb5b5781 error=255" >> > > > >>> >> > >> > > > >>> >> > Any ideas what is issue? >> > > > >>> >> > >> > > > >>> >> > thanks, >> > > > >>> >> > Maurya >> > > > >>> >> > >> > > > >>> >> > _______________________________________________ >> > > > >>> >> > Gluster-users mailing list >> > > > >>> >> > Gluster-users at gluster.org >> > > > >>> >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> > >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From avishwan at redhat.com Mon Mar 25 05:21:57 2019 From: avishwan at redhat.com (Aravinda) Date: Mon, 25 Mar 2019 10:51:57 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: Message-ID: Below print statement looks wrong. Latest Glusterfs code doesn't have this print statement. Please let us know which version of glusterfs you are using. ``` File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 860, in __init__ print "debug varible " %vix ``` As a workaround, edit that file and comment the print line and test the geo-rep config command. On Mon, 2019-03-25 at 09:46 +0530, Maurya M wrote: > hi Aravinda, > had the session created using : create ssh-port 2222 push-pem and > also the : > > gluster volume geo-replication vol_75a5fd373d88ba687f591f3353fa05cf > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f config ssh-port > 2222 > > hitting this message: > geo-replication config-set failed for > vol_75a5fd373d88ba687f591f3353fa05cf > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > geo-replication command failed > > Below is snap of status: > > [root at k8s-agentpool1-24779565-1 > vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f]# gluster volume geo-replication vol_75a5fd373d88ba687f591f3353fa05cf 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f status > > MASTER NODE MASTER VOL MASTER > BRICK > SLAVE USER SLAVE > SLAVE NODE STATUS CRAWL STATUS > LAST_SYNCED > ------------------------------------------------------------------- > ------------------------------------------------------------------- > ------------------------------------------------------------------- > ------------------------------------------------------------------- > ---------------- > 172.16.189.4 vol_75a5fd373d88ba687f591f3353fa05cf > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_116f > b9427fb26f752d9ba8e45e183cb1/brick root > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > Created N/A N/A > 172.16.189.35 vol_75a5fd373d88ba687f591f3353fa05cf > /var/lib/heketi/mounts/vg_05708751110fe60b3e7da15bdcf6d4d4/brick_266b > b08f0d466d346f8c0b19569736fb/brick root > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > Created N/A N/A > 172.16.189.66 vol_75a5fd373d88ba687f591f3353fa05cf > /var/lib/heketi/mounts/vg_4b92a2b687e59b7311055d3809b77c06/brick_dfa4 > 4c9380cdedac708e27e2c2a443a0/brick root > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > Created N/A N/A > > any ideas ? where can find logs for the failed commands check in > gysncd.log , the trace is as below: > > [2019-03-25 04:04:42.295043] I [gsyncd(monitor):297:main] : > Using session config file path=/var/lib/glusterd/geo- > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:04:42.387192] E > [syncdutils(monitor):332:log_raise_exception] : FAIL: > Traceback (most recent call last): > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line > 311, in main > func(args) > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line > 50, in subcmd_monitor > return monitor.monitor(local, remote) > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line > 427, in monitor > return Monitor().multiplex(*distribute(local, remote)) > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line > 370, in distribute > mvol = Volinfo(master.volume, master.host) > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line > 860, in __init__ > print "debug varible " %vix > TypeError: not all arguments converted during string formatting > [2019-03-25 04:04:48.997519] I [gsyncd(config-get):297:main] : > Using session config file path=/var/lib/glusterd/geo- > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:04:49.93528] I [gsyncd(status):297:main] : Using > session config file path=/var/lib/glusterd/geo- > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:08:07.194348] I [gsyncd(config-get):297:main] : > Using session config file path=/var/lib/glusterd/geo- > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:08:07.262588] I [gsyncd(config-get):297:main] : > Using session config file path=/var/lib/glusterd/geo- > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:08:07.550080] I [gsyncd(config-get):297:main] : > Using session config file path=/var/lib/glusterd/geo- > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:08:18.933028] I [gsyncd(config-get):297:main] : > Using session config file path=/var/lib/glusterd/geo- > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:08:19.25285] I [gsyncd(status):297:main] : Using > session config file path=/var/lib/glusterd/geo- > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:09:15.766882] I [gsyncd(config-get):297:main] : > Using session config file path=/var/lib/glusterd/geo- > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:09:16.30267] I [gsyncd(config-get):297:main] : > Using session config file path=/var/lib/glusterd/geo- > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > [2019-03-25 04:09:16.89006] I [gsyncd(config-set):297:main] : > Using session config file path=/var/lib/glusterd/geo- > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > regards, > Maurya > > On Mon, Mar 25, 2019 at 9:08 AM Aravinda wrote: > > Use `ssh-port ` while creating the Geo-rep session > > > > Ref: > > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#creating-the-session > > > > And set the ssh-port option before start. > > > > ``` > > gluster volume geo-replication \ > > [@]:: config > > ssh-port 2222 > > ``` > > -- regards Aravinda From mauryam at gmail.com Mon Mar 25 05:59:08 2019 From: mauryam at gmail.com (Maurya M) Date: Mon, 25 Mar 2019 11:29:08 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: Message-ID: Sorry my bad, had put the print line to debug, i am using gluster 4.1.7, will remove the print line. On Mon, Mar 25, 2019 at 10:52 AM Aravinda wrote: > Below print statement looks wrong. Latest Glusterfs code doesn't have > this print statement. Please let us know which version of glusterfs you > are using. > > > ``` > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line > 860, in __init__ > print "debug varible " %vix > ``` > > As a workaround, edit that file and comment the print line and test the > geo-rep config command. > > > On Mon, 2019-03-25 at 09:46 +0530, Maurya M wrote: > > hi Aravinda, > > had the session created using : create ssh-port 2222 push-pem and > > also the : > > > > gluster volume geo-replication vol_75a5fd373d88ba687f591f3353fa05cf > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f config ssh-port > > 2222 > > > > hitting this message: > > geo-replication config-set failed for > > vol_75a5fd373d88ba687f591f3353fa05cf > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > geo-replication command failed > > > > Below is snap of status: > > > > [root at k8s-agentpool1-24779565-1 > > > vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f]# > gluster volume geo-replication vol_75a5fd373d88ba687f591f3353fa05cf > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f status > > > > MASTER NODE MASTER VOL MASTER > > BRICK > > SLAVE USER SLAVE > > SLAVE NODE STATUS CRAWL STATUS > > LAST_SYNCED > > ------------------------------------------------------------------- > > ------------------------------------------------------------------- > > ------------------------------------------------------------------- > > ------------------------------------------------------------------- > > ---------------- > > 172.16.189.4 vol_75a5fd373d88ba687f591f3353fa05cf > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_116f > > b9427fb26f752d9ba8e45e183cb1/brick root > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > Created N/A N/A > > 172.16.189.35 vol_75a5fd373d88ba687f591f3353fa05cf > > /var/lib/heketi/mounts/vg_05708751110fe60b3e7da15bdcf6d4d4/brick_266b > > b08f0d466d346f8c0b19569736fb/brick root > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > Created N/A N/A > > 172.16.189.66 vol_75a5fd373d88ba687f591f3353fa05cf > > /var/lib/heketi/mounts/vg_4b92a2b687e59b7311055d3809b77c06/brick_dfa4 > > 4c9380cdedac708e27e2c2a443a0/brick root > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > Created N/A N/A > > > > any ideas ? where can find logs for the failed commands check in > > gysncd.log , the trace is as below: > > > > [2019-03-25 04:04:42.295043] I [gsyncd(monitor):297:main] : > > Using session config file path=/var/lib/glusterd/geo- > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > [2019-03-25 04:04:42.387192] E > > [syncdutils(monitor):332:log_raise_exception] : FAIL: > > Traceback (most recent call last): > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line > > 311, in main > > func(args) > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line > > 50, in subcmd_monitor > > return monitor.monitor(local, remote) > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line > > 427, in monitor > > return Monitor().multiplex(*distribute(local, remote)) > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line > > 370, in distribute > > mvol = Volinfo(master.volume, master.host) > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line > > 860, in __init__ > > print "debug varible " %vix > > TypeError: not all arguments converted during string formatting > > [2019-03-25 04:04:48.997519] I [gsyncd(config-get):297:main] : > > Using session config file path=/var/lib/glusterd/geo- > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > [2019-03-25 04:04:49.93528] I [gsyncd(status):297:main] : Using > > session config file path=/var/lib/glusterd/geo- > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > [2019-03-25 04:08:07.194348] I [gsyncd(config-get):297:main] : > > Using session config file path=/var/lib/glusterd/geo- > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > [2019-03-25 04:08:07.262588] I [gsyncd(config-get):297:main] : > > Using session config file path=/var/lib/glusterd/geo- > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > [2019-03-25 04:08:07.550080] I [gsyncd(config-get):297:main] : > > Using session config file path=/var/lib/glusterd/geo- > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > [2019-03-25 04:08:18.933028] I [gsyncd(config-get):297:main] : > > Using session config file path=/var/lib/glusterd/geo- > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > [2019-03-25 04:08:19.25285] I [gsyncd(status):297:main] : Using > > session config file path=/var/lib/glusterd/geo- > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > [2019-03-25 04:09:15.766882] I [gsyncd(config-get):297:main] : > > Using session config file path=/var/lib/glusterd/geo- > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > [2019-03-25 04:09:16.30267] I [gsyncd(config-get):297:main] : > > Using session config file path=/var/lib/glusterd/geo- > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > [2019-03-25 04:09:16.89006] I [gsyncd(config-set):297:main] : > > Using session config file path=/var/lib/glusterd/geo- > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > regards, > > Maurya > > > > On Mon, Mar 25, 2019 at 9:08 AM Aravinda wrote: > > > Use `ssh-port ` while creating the Geo-rep session > > > > > > Ref: > > > > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#creating-the-session > > > > > > And set the ssh-port option before start. > > > > > > ``` > > > gluster volume geo-replication \ > > > [@]:: config > > > ssh-port 2222 > > > ``` > > > > -- > regards > Aravinda > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauryam at gmail.com Mon Mar 25 06:13:32 2019 From: mauryam at gmail.com (Maurya M) Date: Mon, 25 Mar 2019 11:43:32 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: Message-ID: Now the error is on the same line 860 : as highlighted below: [2019-03-25 06:11:52.376238] E [syncdutils(monitor):332:log_raise_exception] : FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main func(args) File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 50, in subcmd_monitor return monitor.monitor(local, remote) File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 427, in monitor return Monitor().multiplex(*distribute(local, remote)) File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 386, in distribute svol = Volinfo(slave.volume, "localhost", prelude) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 860, in __init__ * vi = XET.fromstring(vix)* File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1300, in XML parser.feed(text) File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1642, in feed self._raiseerror(v) File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror raise err ParseError: syntax error: line 1, column 0 On Mon, Mar 25, 2019 at 11:29 AM Maurya M wrote: > Sorry my bad, had put the print line to debug, i am using gluster 4.1.7, > will remove the print line. > > On Mon, Mar 25, 2019 at 10:52 AM Aravinda wrote: > >> Below print statement looks wrong. Latest Glusterfs code doesn't have >> this print statement. Please let us know which version of glusterfs you >> are using. >> >> >> ``` >> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line >> 860, in __init__ >> print "debug varible " %vix >> ``` >> >> As a workaround, edit that file and comment the print line and test the >> geo-rep config command. >> >> >> On Mon, 2019-03-25 at 09:46 +0530, Maurya M wrote: >> > hi Aravinda, >> > had the session created using : create ssh-port 2222 push-pem and >> > also the : >> > >> > gluster volume geo-replication vol_75a5fd373d88ba687f591f3353fa05cf >> > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f config ssh-port >> > 2222 >> > >> > hitting this message: >> > geo-replication config-set failed for >> > vol_75a5fd373d88ba687f591f3353fa05cf >> > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f >> > geo-replication command failed >> > >> > Below is snap of status: >> > >> > [root at k8s-agentpool1-24779565-1 >> > >> vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f]# >> gluster volume geo-replication vol_75a5fd373d88ba687f591f3353fa05cf >> 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f status >> > >> > MASTER NODE MASTER VOL MASTER >> > BRICK >> > SLAVE USER SLAVE >> > SLAVE NODE STATUS CRAWL STATUS >> > LAST_SYNCED >> > ------------------------------------------------------------------- >> > ------------------------------------------------------------------- >> > ------------------------------------------------------------------- >> > ------------------------------------------------------------------- >> > ---------------- >> > 172.16.189.4 vol_75a5fd373d88ba687f591f3353fa05cf >> > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_116f >> > b9427fb26f752d9ba8e45e183cb1/brick root >> > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A >> > Created N/A N/A >> > 172.16.189.35 vol_75a5fd373d88ba687f591f3353fa05cf >> > /var/lib/heketi/mounts/vg_05708751110fe60b3e7da15bdcf6d4d4/brick_266b >> > b08f0d466d346f8c0b19569736fb/brick root >> > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A >> > Created N/A N/A >> > 172.16.189.66 vol_75a5fd373d88ba687f591f3353fa05cf >> > /var/lib/heketi/mounts/vg_4b92a2b687e59b7311055d3809b77c06/brick_dfa4 >> > 4c9380cdedac708e27e2c2a443a0/brick root >> > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A >> > Created N/A N/A >> > >> > any ideas ? where can find logs for the failed commands check in >> > gysncd.log , the trace is as below: >> > >> > [2019-03-25 04:04:42.295043] I [gsyncd(monitor):297:main] : >> > Using session config file path=/var/lib/glusterd/geo- >> > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 >> > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > [2019-03-25 04:04:42.387192] E >> > [syncdutils(monitor):332:log_raise_exception] : FAIL: >> > Traceback (most recent call last): >> > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line >> > 311, in main >> > func(args) >> > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line >> > 50, in subcmd_monitor >> > return monitor.monitor(local, remote) >> > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line >> > 427, in monitor >> > return Monitor().multiplex(*distribute(local, remote)) >> > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line >> > 370, in distribute >> > mvol = Volinfo(master.volume, master.host) >> > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line >> > 860, in __init__ >> > print "debug varible " %vix >> > TypeError: not all arguments converted during string formatting >> > [2019-03-25 04:04:48.997519] I [gsyncd(config-get):297:main] : >> > Using session config file path=/var/lib/glusterd/geo- >> > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 >> > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > [2019-03-25 04:04:49.93528] I [gsyncd(status):297:main] : Using >> > session config file path=/var/lib/glusterd/geo- >> > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 >> > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > [2019-03-25 04:08:07.194348] I [gsyncd(config-get):297:main] : >> > Using session config file path=/var/lib/glusterd/geo- >> > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 >> > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > [2019-03-25 04:08:07.262588] I [gsyncd(config-get):297:main] : >> > Using session config file path=/var/lib/glusterd/geo- >> > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 >> > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > [2019-03-25 04:08:07.550080] I [gsyncd(config-get):297:main] : >> > Using session config file path=/var/lib/glusterd/geo- >> > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 >> > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > [2019-03-25 04:08:18.933028] I [gsyncd(config-get):297:main] : >> > Using session config file path=/var/lib/glusterd/geo- >> > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 >> > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > [2019-03-25 04:08:19.25285] I [gsyncd(status):297:main] : Using >> > session config file path=/var/lib/glusterd/geo- >> > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 >> > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > [2019-03-25 04:09:15.766882] I [gsyncd(config-get):297:main] : >> > Using session config file path=/var/lib/glusterd/geo- >> > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 >> > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > [2019-03-25 04:09:16.30267] I [gsyncd(config-get):297:main] : >> > Using session config file path=/var/lib/glusterd/geo- >> > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 >> > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > [2019-03-25 04:09:16.89006] I [gsyncd(config-set):297:main] : >> > Using session config file path=/var/lib/glusterd/geo- >> > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e7 >> > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > >> > regards, >> > Maurya >> > >> > On Mon, Mar 25, 2019 at 9:08 AM Aravinda wrote: >> > > Use `ssh-port ` while creating the Geo-rep session >> > > >> > > Ref: >> > > >> https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#creating-the-session >> > > >> > > And set the ssh-port option before start. >> > > >> > > ``` >> > > gluster volume geo-replication \ >> > > [@]:: config >> > > ssh-port 2222 >> > > ``` >> > > >> -- >> regards >> Aravinda >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From vbellur at redhat.com Mon Mar 25 06:36:46 2019 From: vbellur at redhat.com (Vijay Bellur) Date: Sun, 24 Mar 2019 23:36:46 -0700 Subject: [Gluster-users] Network Block device (NBD) on top of glusterfs In-Reply-To: References: Message-ID: Hi Xiubo, On Fri, Mar 22, 2019 at 5:48 PM Xiubo Li wrote: > On 2019/3/21 11:29, Xiubo Li wrote: > > All, > > I am one of the contributor for gluster-block > [1] project, and also I > contribute to linux kernel and open-iscsi > project.[2] > > NBD was around for some time, but in recent time, linux kernel?s Network > Block Device (NBD) is enhanced and made to work with more devices and also > the option to integrate with netlink is added. So, I tried to provide a > glusterfs client based NBD driver recently. Please refer github issue #633 > [3], and good news is I > have a working code, with most basic things @ nbd-runner project > [4]. > > This is nice. Thank you for your work! > As mentioned the nbd-runner(NBD proto) will work in the same layer with > tcmu-runner(iSCSI proto), this is not trying to replace the > gluster-block/ceph-iscsi-gateway great projects. > > It just provides the common library to do the low level stuff, like the > sysfs/netlink operations and the IOs from the nbd kernel socket, and the > great tcmu-runner project is doing the sysfs/uio operations and IOs from > the kernel SCSI/iSCSI. > > The nbd-cli tool will work like the iscsi-initiator-utils, and the > nbd-runner daemon will work like the tcmu-runner daemon, that's all. > Do you have thoughts on how nbd-runner currently differs or would differ from tcmu-runner? It might be useful to document the differences in github (or elsewhere) so that users can make an informed choice between nbd-runner & tcmu-runner. In tcmu-runner for different backend storages, they have separate handlers, > glfs.c handler for Gluster, rbd.c handler for Ceph, etc. And what the > handlers here are doing the actual IOs with the backend storage services > once the IO paths setup are done by ceph-iscsi-gateway/gluster-block.... > > Then we can support all the kind of backend storages, like the > Gluster/Ceph/Azure... as one separate handler in nbd-runner, which no need > to care about the NBD low level's stuff updates and changes. > Given that the charter for this project is to support multiple backend storage projects, would not it be better to host the project in the github repository associated with nbd [5]? Doing it that way could provide a more neutral (as perceived by users) venue for hosting nbd-runner and help you in getting more adoption for your work. Thanks, Vijay [5] https://github.com/NetworkBlockDevice/nbd > Thanks. > > > While this email is about announcing the project, and asking for more > collaboration, I would also like to discuss more about the placement of the > project itself. Currently nbd-runner project is expected to be shared by > our friends at Ceph project too, to provide NBD driver for Ceph. I have > personally worked with some of them closely while contributing to > open-iSCSI project, and we would like to take this project to great success. > > Now few questions: > > 1. Can I continue to use http://github.com/gluster/nbd-runner as home > for this project, even if its shared by other filesystem projects? > > > - I personally am fine with this. > > > 1. Should there be a separate organization for this repo? > > > - While it may make sense in future, for now, I am not planning to > start any new thing? > > It would be great if we have some consensus on this soon as nbd-runner is > a new repository. If there are no concerns, I will continue to contribute > to the existing repository. > > Regards, > Xiubo Li (@lxbsz) > > [1] - https://github.com/gluster/gluster-block > [2] - https://github.com/open-iscsi > [3] - https://github.com/gluster/glusterfs/issues/633 > [4] - https://github.com/gluster/nbd-runner > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From avishwan at redhat.com Mon Mar 25 08:43:37 2019 From: avishwan at redhat.com (Aravinda) Date: Mon, 25 Mar 2019 14:13:37 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: Message-ID: <5342e4c8e5bff06a22edbc6be704e3f10bd67a4e.camel@redhat.com> Geo-rep is running `ssh -i /var/lib/glusterd/geo-replication/secret.pem root@ gluster volume info --xml` and parsing its output. Please try to to run the command from the same node and let us know the output. On Mon, 2019-03-25 at 11:43 +0530, Maurya M wrote: > Now the error is on the same line 860 : as highlighted below: > > [2019-03-25 06:11:52.376238] E > [syncdutils(monitor):332:log_raise_exception] : FAIL: > Traceback (most recent call last): > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line > 311, in main > func(args) > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line > 50, in subcmd_monitor > return monitor.monitor(local, remote) > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line > 427, in monitor > return Monitor().multiplex(*distribute(local, remote)) > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line > 386, in distribute > svol = Volinfo(slave.volume, "localhost", prelude) > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line > 860, in __init__ > vi = XET.fromstring(vix) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1300, in > XML > parser.feed(text) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1642, in > feed > self._raiseerror(v) > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1506, in > _raiseerror > raise err > ParseError: syntax error: line 1, column 0 > > > On Mon, Mar 25, 2019 at 11:29 AM Maurya M wrote: > > Sorry my bad, had put the print line to debug, i am using gluster > > 4.1.7, will remove the print line. > > > > On Mon, Mar 25, 2019 at 10:52 AM Aravinda > > wrote: > > > Below print statement looks wrong. Latest Glusterfs code doesn't > > > have > > > this print statement. Please let us know which version of > > > glusterfs you > > > are using. > > > > > > > > > ``` > > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", > > > line > > > 860, in __init__ > > > print "debug varible " %vix > > > ``` > > > > > > As a workaround, edit that file and comment the print line and > > > test the > > > geo-rep config command. > > > > > > > > > On Mon, 2019-03-25 at 09:46 +0530, Maurya M wrote: > > > > hi Aravinda, > > > > had the session created using : create ssh-port 2222 push-pem > > > and > > > > also the : > > > > > > > > gluster volume geo-replication > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f config ssh- > > > port > > > > 2222 > > > > > > > > hitting this message: > > > > geo-replication config-set failed for > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > > > geo-replication command failed > > > > > > > > Below is snap of status: > > > > > > > > [root at k8s-agentpool1-24779565-1 > > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a73057 > > > 8e45ed9d51b9a80df6c33f]# gluster volume geo-replication > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f status > > > > > > > > MASTER NODE MASTER VOL MASTER > > > > BRICK > > > > > > > SLAVE USER SLAVE > > > > > > > SLAVE NODE STATUS CRAWL > > > STATUS > > > > LAST_SYNCED > > > > ------------------------------------------------------------- > > > ------ > > > > ------------------------------------------------------------- > > > ------ > > > > ------------------------------------------------------------- > > > ------ > > > > ------------------------------------------------------------- > > > ------ > > > > ---------------- > > > > 172.16.189.4 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_ > > > 116f > > > > b9427fb26f752d9ba8e45e183cb1/brick root > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > > > > > > Created N/A N/A > > > > 172.16.189.35 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > /var/lib/heketi/mounts/vg_05708751110fe60b3e7da15bdcf6d4d4/brick_ > > > 266b > > > > b08f0d466d346f8c0b19569736fb/brick root > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > > > > > > Created N/A N/A > > > > 172.16.189.66 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > /var/lib/heketi/mounts/vg_4b92a2b687e59b7311055d3809b77c06/brick_ > > > dfa4 > > > > 4c9380cdedac708e27e2c2a443a0/brick root > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > > > > > > Created N/A N/A > > > > > > > > any ideas ? where can find logs for the failed commands check > > > in > > > > gysncd.log , the trace is as below: > > > > > > > > [2019-03-25 04:04:42.295043] I [gsyncd(monitor):297:main] > > > : > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > l_e7 > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > [2019-03-25 04:04:42.387192] E > > > > [syncdutils(monitor):332:log_raise_exception] : FAIL: > > > > Traceback (most recent call last): > > > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", > > > line > > > > 311, in main > > > > func(args) > > > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", > > > line > > > > 50, in subcmd_monitor > > > > return monitor.monitor(local, remote) > > > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > line > > > > 427, in monitor > > > > return Monitor().multiplex(*distribute(local, remote)) > > > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > line > > > > 370, in distribute > > > > mvol = Volinfo(master.volume, master.host) > > > > File > > > "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line > > > > 860, in __init__ > > > > print "debug varible " %vix > > > > TypeError: not all arguments converted during string formatting > > > > [2019-03-25 04:04:48.997519] I [gsyncd(config-get):297:main] > > > : > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > l_e7 > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > [2019-03-25 04:04:49.93528] I [gsyncd(status):297:main] : > > > Using > > > > session config file path=/var/lib/glusterd/geo- > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > l_e7 > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > [2019-03-25 04:08:07.194348] I [gsyncd(config-get):297:main] > > > : > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > l_e7 > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > [2019-03-25 04:08:07.262588] I [gsyncd(config-get):297:main] > > > : > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > l_e7 > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > [2019-03-25 04:08:07.550080] I [gsyncd(config-get):297:main] > > > : > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > l_e7 > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > [2019-03-25 04:08:18.933028] I [gsyncd(config-get):297:main] > > > : > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > l_e7 > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > [2019-03-25 04:08:19.25285] I [gsyncd(status):297:main] : > > > Using > > > > session config file path=/var/lib/glusterd/geo- > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > l_e7 > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > [2019-03-25 04:09:15.766882] I [gsyncd(config-get):297:main] > > > : > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > l_e7 > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > [2019-03-25 04:09:16.30267] I [gsyncd(config-get):297:main] > > > : > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > l_e7 > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > [2019-03-25 04:09:16.89006] I [gsyncd(config-set):297:main] > > > : > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > l_e7 > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > regards, > > > > Maurya > > > > > > > > On Mon, Mar 25, 2019 at 9:08 AM Aravinda > > > wrote: > > > > > Use `ssh-port ` while creating the Geo-rep session > > > > > > > > > > Ref: > > > > > > > > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#creating-the-session > > > > > > > > > > And set the ssh-port option before start. > > > > > > > > > > ``` > > > > > gluster volume geo-replication \ > > > > > [@]:: config > > > > > ssh-port 2222 > > > > > ``` > > > > > -- regards Aravinda From atumball at redhat.com Mon Mar 25 09:18:59 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Mon, 25 Mar 2019 14:48:59 +0530 Subject: [Gluster-users] GlusterFS v7.0 (and v8.0) roadmap discussion Message-ID: Hello Gluster Members, We are now done with glusterfs-6.0 release, and the next up is glusterfs-7.0. But considering for many 'initiatives', 3-4 months are not enough time to complete the tasks, we would like to call for a road-map discussion meeting for calendar year 2019 (covers both glusterfs-7.0, and 8.0). It would be good to use the meeting slot of community meeting for this. While talking to team locally, I compiled a presentation here: < https://docs.google.com/presentation/d/1rtn38S4YBe77KK5IjczWmoAR-ZSO-i3tNHg9pAH8Wt8/edit?usp=sharing>, please go through and let me know what more can be added, or what can be dropped? We can start having discussions in https://hackmd.io/jlnWqzwCRvC9uoEU2h01Zw Regards, Amar -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Mon Mar 25 09:25:27 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Mon, 25 Mar 2019 14:55:27 +0530 Subject: [Gluster-users] Proposal: Changes in Gluster Community meetings Message-ID: All, We currently have 3 meetings which are public: 1. Maintainer's Meeting - Runs once in 2 weeks (on Mondays), and current attendance is around 3-5 on an avg, and not much is discussed. - Without majority attendance, we can't take any decisions too. 2. Community meeting - Supposed to happen on #gluster-meeting, every 2 weeks, and is the only meeting which is for 'Community/Users'. Others are for developers as of now. Sadly attendance is getting closer to 0 in recent times. 3. GCS meeting - We started it as an effort inside Red Hat gluster team, and opened it up for community from Jan 2019, but the attendance was always from RHT members, and haven't seen any traction from wider group. So, I have a proposal to call out for cancelling all these meeting, and keeping just 1 weekly 'Community' meeting, where even topics related to maintainers and GCS and other projects can be discussed. I have a template of a draft template @ https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g Please feel free to suggest improvements, both in agenda and in timings. So, we can have more participation from members of community, which allows more user - developer interactions, and hence quality of project. Waiting for feedbacks, Regards, Amar -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauryam at gmail.com Mon Mar 25 10:07:49 2019 From: mauryam at gmail.com (Maurya M) Date: Mon, 25 Mar 2019 15:37:49 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: <5342e4c8e5bff06a22edbc6be704e3f10bd67a4e.camel@redhat.com> References: <5342e4c8e5bff06a22edbc6be704e3f10bd67a4e.camel@redhat.com> Message-ID: ran this command : ssh -p 2222 -i /var/lib/glusterd/geo-replication/secret.pem root@gluster volume info --xml attaching the output. On Mon, Mar 25, 2019 at 2:13 PM Aravinda wrote: > Geo-rep is running `ssh -i /var/lib/glusterd/geo-replication/secret.pem > root@ gluster volume info --xml` and parsing its output. > Please try to to run the command from the same node and let us know the > output. > > > On Mon, 2019-03-25 at 11:43 +0530, Maurya M wrote: > > Now the error is on the same line 860 : as highlighted below: > > > > [2019-03-25 06:11:52.376238] E > > [syncdutils(monitor):332:log_raise_exception] : FAIL: > > Traceback (most recent call last): > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line > > 311, in main > > func(args) > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line > > 50, in subcmd_monitor > > return monitor.monitor(local, remote) > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line > > 427, in monitor > > return Monitor().multiplex(*distribute(local, remote)) > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line > > 386, in distribute > > svol = Volinfo(slave.volume, "localhost", prelude) > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line > > 860, in __init__ > > vi = XET.fromstring(vix) > > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1300, in > > XML > > parser.feed(text) > > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1642, in > > feed > > self._raiseerror(v) > > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1506, in > > _raiseerror > > raise err > > ParseError: syntax error: line 1, column 0 > > > > > > On Mon, Mar 25, 2019 at 11:29 AM Maurya M wrote: > > > Sorry my bad, had put the print line to debug, i am using gluster > > > 4.1.7, will remove the print line. > > > > > > On Mon, Mar 25, 2019 at 10:52 AM Aravinda > > > wrote: > > > > Below print statement looks wrong. Latest Glusterfs code doesn't > > > > have > > > > this print statement. Please let us know which version of > > > > glusterfs you > > > > are using. > > > > > > > > > > > > ``` > > > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", > > > > line > > > > 860, in __init__ > > > > print "debug varible " %vix > > > > ``` > > > > > > > > As a workaround, edit that file and comment the print line and > > > > test the > > > > geo-rep config command. > > > > > > > > > > > > On Mon, 2019-03-25 at 09:46 +0530, Maurya M wrote: > > > > > hi Aravinda, > > > > > had the session created using : create ssh-port 2222 push-pem > > > > and > > > > > also the : > > > > > > > > > > gluster volume geo-replication > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f config ssh- > > > > port > > > > > 2222 > > > > > > > > > > hitting this message: > > > > > geo-replication config-set failed for > > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > > > > geo-replication command failed > > > > > > > > > > Below is snap of status: > > > > > > > > > > [root at k8s-agentpool1-24779565-1 > > > > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a73057 > > > > 8e45ed9d51b9a80df6c33f]# gluster volume geo-replication > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f status > > > > > > > > > > MASTER NODE MASTER VOL MASTER > > > > > BRICK > > > > > > > > > SLAVE USER SLAVE > > > > > > > > > SLAVE NODE STATUS CRAWL > > > > STATUS > > > > > LAST_SYNCED > > > > > ------------------------------------------------------------- > > > > ------ > > > > > ------------------------------------------------------------- > > > > ------ > > > > > ------------------------------------------------------------- > > > > ------ > > > > > ------------------------------------------------------------- > > > > ------ > > > > > ---------------- > > > > > 172.16.189.4 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_ > > > > 116f > > > > > b9427fb26f752d9ba8e45e183cb1/brick root > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > > > > > > > > Created N/A N/A > > > > > 172.16.189.35 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > /var/lib/heketi/mounts/vg_05708751110fe60b3e7da15bdcf6d4d4/brick_ > > > > 266b > > > > > b08f0d466d346f8c0b19569736fb/brick root > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > > > > > > > > Created N/A N/A > > > > > 172.16.189.66 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > /var/lib/heketi/mounts/vg_4b92a2b687e59b7311055d3809b77c06/brick_ > > > > dfa4 > > > > > 4c9380cdedac708e27e2c2a443a0/brick root > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > > > > > > > > Created N/A N/A > > > > > > > > > > any ideas ? where can find logs for the failed commands check > > > > in > > > > > gysncd.log , the trace is as below: > > > > > > > > > > [2019-03-25 04:04:42.295043] I [gsyncd(monitor):297:main] > > > > : > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > l_e7 > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > [2019-03-25 04:04:42.387192] E > > > > > [syncdutils(monitor):332:log_raise_exception] : FAIL: > > > > > Traceback (most recent call last): > > > > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", > > > > line > > > > > 311, in main > > > > > func(args) > > > > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", > > > > line > > > > > 50, in subcmd_monitor > > > > > return monitor.monitor(local, remote) > > > > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > > line > > > > > 427, in monitor > > > > > return Monitor().multiplex(*distribute(local, remote)) > > > > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > > line > > > > > 370, in distribute > > > > > mvol = Volinfo(master.volume, master.host) > > > > > File > > > > "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line > > > > > 860, in __init__ > > > > > print "debug varible " %vix > > > > > TypeError: not all arguments converted during string formatting > > > > > [2019-03-25 04:04:48.997519] I [gsyncd(config-get):297:main] > > > > : > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > l_e7 > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > [2019-03-25 04:04:49.93528] I [gsyncd(status):297:main] : > > > > Using > > > > > session config file path=/var/lib/glusterd/geo- > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > l_e7 > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > [2019-03-25 04:08:07.194348] I [gsyncd(config-get):297:main] > > > > : > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > l_e7 > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > [2019-03-25 04:08:07.262588] I [gsyncd(config-get):297:main] > > > > : > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > l_e7 > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > [2019-03-25 04:08:07.550080] I [gsyncd(config-get):297:main] > > > > : > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > l_e7 > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > [2019-03-25 04:08:18.933028] I [gsyncd(config-get):297:main] > > > > : > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > l_e7 > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > [2019-03-25 04:08:19.25285] I [gsyncd(status):297:main] : > > > > Using > > > > > session config file path=/var/lib/glusterd/geo- > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > l_e7 > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > [2019-03-25 04:09:15.766882] I [gsyncd(config-get):297:main] > > > > : > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > l_e7 > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > [2019-03-25 04:09:16.30267] I [gsyncd(config-get):297:main] > > > > : > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > l_e7 > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > [2019-03-25 04:09:16.89006] I [gsyncd(config-set):297:main] > > > > : > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > l_e7 > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > > > regards, > > > > > Maurya > > > > > > > > > > On Mon, Mar 25, 2019 at 9:08 AM Aravinda > > > > wrote: > > > > > > Use `ssh-port ` while creating the Geo-rep session > > > > > > > > > > > > Ref: > > > > > > > > > > > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#creating-the-session > > > > > > > > > > > > And set the ssh-port option before start. > > > > > > > > > > > > ``` > > > > > > gluster volume geo-replication \ > > > > > > [@]:: config > > > > > > ssh-port 2222 > > > > > > ``` > > > > > > > -- > regards > Aravinda > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- [root at k8s-agentpool1-24779565-1 geo-replication]# ssh -p 2222 -i /var/lib/glusterd/geo-replication/secret.pem root at 172.16.201.35 gluster volume info --xml 0 0 heketidbstorage a24ac1cd-4ea2-423e-9d1c-ccfbe3e607f9 1 Started 0 3 3 1 3 0 0 0 2 Replicate 0 172.16.201.4:/var/lib/heketi/mounts/vg_6c5d501fdff3889cd28fc85f2b373e85/brick_d5da700ee29c0328584170f31698b0df/brick172.16.201.4:/var/lib/heketi/mounts/vg_6c5d501fdff3889cd28fc85f2b373e85/brick_d5da700ee29c0328584170f31698b0df/brickcf79df71-73b3-4513-bbd2-ef63d891b97e0 172.16.201.35:/var/lib/heketi/mounts/vg_971b9ef8e2977cca69bebd5ad82f604c/brick_3bac8cfd51861fce6cecdd0e8cc605cc/brick172.16.201.35:/var/lib/heketi/mounts/vg_971b9ef8e2977cca69bebd5ad82f604c/brick_3bac8cfd51861fce6cecdd0e8cc605cc/brick7eb0a2b6-c4d6-41b1-a346-0638dbf8d7790 172.16.201.66:/var/lib/heketi/mounts/vg_be107dc7d94595ff30b5b4b6cceab663/brick_0fc1dde2b69bab57360b71db2fe79569/brick172.16.201.66:/var/lib/heketi/mounts/vg_be107dc7d94595ff30b5b4b6cceab663/brick_0fc1dde2b69bab57360b71db2fe79569/brick26a6272a-a4e6-4812-bbf4-c7e2b64508c10 3 vol_a5aee81a873c043c99a938adcb5b5781 9e92ec96-2134-4163-8cba-010582433bbc 1 Started 0 3 3 1 3 0 0 0 2 Replicate 0 172.16.201.66:/var/lib/heketi/mounts/vg_be107dc7d94595ff30b5b4b6cceab663/brick_2b980230c37a1bd58c12fad92d43804d/brick172.16.201.66:/var/lib/heketi/mounts/vg_be107dc7d94595ff30b5b4b6cceab663/brick_2b980230c37a1bd58c12fad92d43804d/brick26a6272a-a4e6-4812-bbf4-c7e2b64508c10 172.16.201.4:/var/lib/heketi/mounts/vg_6c5d501fdff3889cd28fc85f2b373e85/brick_f805b8c7bead7189672415c048e82231/brick172.16.201.4:/var/lib/heketi/mounts/vg_6c5d501fdff3889cd28fc85f2b373e85/brick_f805b8c7bead7189672415c048e82231/brickcf79df71-73b3-4513-bbd2-ef63d891b97e0 172.16.201.35:/var/lib/heketi/mounts/vg_971b9ef8e2977cca69bebd5ad82f604c/brick_6e7f6e47da96c305ece514f94caddeaf/brick172.16.201.35:/var/lib/heketi/mounts/vg_971b9ef8e2977cca69bebd5ad82f604c/brick_6e7f6e47da96c305ece514f94caddeaf/brick7eb0a2b6-c4d6-41b1-a346-0638dbf8d7790 3 vol_a7568f9c87d4aaf20bc56d8909390e24 310b2b01-cd9a-4c1e-810b-700d78158a82 1 Started 0 3 3 1 3 0 0 0 2 Replicate 0 172.16.201.66:/var/lib/heketi/mounts/vg_be107dc7d94595ff30b5b4b6cceab663/brick_152482ad653bf54a2c2a9978d8e5f65a/brick172.16.201.66:/var/lib/heketi/mounts/vg_be107dc7d94595ff30b5b4b6cceab663/brick_152482ad653bf54a2c2a9978d8e5f65a/brick26a6272a-a4e6-4812-bbf4-c7e2b64508c10 172.16.201.4:/var/lib/heketi/mounts/vg_6c5d501fdff3889cd28fc85f2b373e85/brick_26b82d524b427664ba3bb43de0d441aa/brick172.16.201.4:/var/lib/heketi/mounts/vg_6c5d501fdff3889cd28fc85f2b373e85/brick_26b82d524b427664ba3bb43de0d441aa/brickcf79df71-73b3-4513-bbd2-ef63d891b97e0 172.16.201.35:/var/lib/heketi/mounts/vg_971b9ef8e2977cca69bebd5ad82f604c/brick_893bb9c4942f9e765e83b9cb5d81b6ce/brick172.16.201.35:/var/lib/heketi/mounts/vg_971b9ef8e2977cca69bebd5ad82f604c/brick_893bb9c4942f9e765e83b9cb5d81b6ce/brick7eb0a2b6-c4d6-41b1-a346-0638dbf8d7790 3 vol_e783a730578e45ed9d51b9a80df6c33f 301b00a8-162a-4d90-a978-4c7f7b048fec 1 Started 0 3 3 1 3 0 0 0 2 Replicate 0 172.16.201.4:/var/lib/heketi/mounts/vg_6c5d501fdff3889cd28fc85f2b373e85/brick_26c7eac980667092e6f84dc1398a8337/brick172.16.201.4:/var/lib/heketi/mounts/vg_6c5d501fdff3889cd28fc85f2b373e85/brick_26c7eac980667092e6f84dc1398a8337/brickcf79df71-73b3-4513-bbd2-ef63d891b97e0 172.16.201.35:/var/lib/heketi/mounts/vg_971b9ef8e2977cca69bebd5ad82f604c/brick_0c54a3aecaf24de1b2a5a1337145bacf/brick172.16.201.35:/var/lib/heketi/mounts/vg_971b9ef8e2977cca69bebd5ad82f604c/brick_0c54a3aecaf24de1b2a5a1337145bacf/brick7eb0a2b6-c4d6-41b1-a346-0638dbf8d7790 172.16.201.66:/var/lib/heketi/mounts/vg_be107dc7d94595ff30b5b4b6cceab663/brick_cc777eaebd1c3a64e8483ee88bfa6b43/brick172.16.201.66:/var/lib/heketi/mounts/vg_be107dc7d94595ff30b5b4b6cceab663/brick_cc777eaebd1c3a64e8483ee88bfa6b43/brick26a6272a-a4e6-4812-bbf4-c7e2b64508c10 3 4 From xiubli at redhat.com Mon Mar 25 10:32:25 2019 From: xiubli at redhat.com (Xiubo Li) Date: Mon, 25 Mar 2019 18:32:25 +0800 Subject: [Gluster-users] Network Block device (NBD) on top of glusterfs In-Reply-To: References: Message-ID: On 2019/3/25 14:36, Vijay Bellur wrote: > > Hi Xiubo, > > On Fri, Mar 22, 2019 at 5:48 PM Xiubo Li > wrote: > > On 2019/3/21 11:29, Xiubo Li wrote: >> >> All, >> >> I am one of the contributor forgluster-block >> [1] project, and also I >> contribute to linux kernel andopen-iscsi >> project.[2] >> >> NBD was around for some time, but in recent time, linux kernel?s >> Network Block Device (NBD) is enhanced and made to work with more >> devices and also the option to integrate with netlink is added. >> So, I tried to provide a glusterfs client based NBD driver >> recently. Please refergithub issue #633 >> [3], and good >> news is I have a working code, with most basic things @nbd-runner >> project [4]. >> > > This is nice. Thank you for your work! > > As mentioned the nbd-runner(NBD proto) will work in the same layer > with tcmu-runner(iSCSI proto), this is not trying to replace the > gluster-block/ceph-iscsi-gateway great projects. > > It just provides the common library to do the low level stuff, > like the sysfs/netlink operations and the IOs from the nbd kernel > socket, and the great tcmu-runner project is doing the sysfs/uio > operations and IOs from the kernel SCSI/iSCSI. > > The nbd-cli tool will work like the iscsi-initiator-utils, and the > nbd-runner daemon will work like the tcmu-runner daemon, that's all. > > > Do you have thoughts on how nbd-runner currently differs or would > differ from tcmu-runner? It might be useful to document the > differences in github (or elsewhere) so that users can make an > informed choice between nbd-runner & tcmu-runner. Yeah, this makes sense and I will figure it out in the github. Currently for the open-iscsi/tcmu-runner, there are already many existing tools to help product it, and for NBD we may need to implement them, correct me if I am wrong here :-) > > In tcmu-runner for different backend storages, they have separate > handlers, glfs.c handler for Gluster, rbd.c handler for Ceph, etc. > And what the handlers here are doing the actual IOs with the > backend storage services once the IO paths setup are done by > ceph-iscsi-gateway/gluster-block.... > > Then we can support all the kind of backend storages, like the > Gluster/Ceph/Azure... as one separate handler in nbd-runner, which > no need to care about the NBD low level's stuff updates and changes. > > > Given that the charter for this project is to support multiple backend > storage projects, would not it be better to host the project in the > github repository associated with nbd [5]? Doing it that way could > provide a more neutral (as perceived by users) venue for hosting > nbd-runner and help you in getting more adoption for your work. > This is a good idea, I will try to push this forward. Thanks very much Vijay. BRs Xiubo Li > Thanks, > Vijay > > [5] https://github.com/NetworkBlockDevice/nbd > > > Thanks. > > >> While this email is about announcing the project, and asking for >> more collaboration, I would also like to discuss more about the >> placement of the project itself. Currently nbd-runner project is >> expected to be shared by our friends at Ceph project too, to >> provide NBD driver for Ceph. I have personally worked with some >> of them closely while contributing to open-iSCSI project, and we >> would like to take this project to great success. >> >> Now few questions: >> >> 1. Can I continue to usehttp://github.com/gluster/nbd-runneras >> home for this project, even if its shared by other filesystem >> projects? >> >> * I personally am fine with this. >> >> 2. Should there be a separate organization for this repo? >> >> * While it may make sense in future, for now, I am not planning >> to start any new thing? >> >> It would be great if we have some consensus on this soon as >> nbd-runner is a new repository. If there are no concerns, I will >> continue to contribute to the existing repository. >> >> Regards, >> Xiubo Li (@lxbsz) >> >> [1] -https://github.com/gluster/gluster-block >> [2] -https://github.com/open-iscsi >> [3] -https://github.com/gluster/glusterfs/issues/633 >> [4] -https://github.com/gluster/nbd-runner >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauryam at gmail.com Mon Mar 25 12:46:00 2019 From: mauryam at gmail.com (Maurya M) Date: Mon, 25 Mar 2019 18:16:00 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: <5342e4c8e5bff06a22edbc6be704e3f10bd67a4e.camel@redhat.com> Message-ID: some addtion logs from gverify-mastermnt.log & gverify-slavemnt.log: [2019-03-25 12:13:23.819665] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-vol_75a5fd373d88ba687f591f3353fa05cf-client-2: error returned while attempting to connect to host:(null), port:0 [2019-03-25 12:13:23.819814] W [dict.c:923:str_to_data] (-->/usr/lib64/glusterfs/4.1.7/xlator/protocol/client.so(+0x40c0a) [0x7f3eb4d86c0a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16) [0x7f3ebc334266] -->/lib64/libglusterfs.so.0(str_to_data+0x91) [0x7f3ebc330ea1] ) 0-dict: *value is NULL [Invalid argument]* any idea how to fix this ? any patch file i can try with please share. thanks, Maurya On Mon, Mar 25, 2019 at 3:37 PM Maurya M wrote: > ran this command : ssh -p 2222 -i > /var/lib/glusterd/geo-replication/secret.pem root@gluster > volume info --xml > > attaching the output. > > > > On Mon, Mar 25, 2019 at 2:13 PM Aravinda wrote: > >> Geo-rep is running `ssh -i /var/lib/glusterd/geo-replication/secret.pem >> root@ gluster volume info --xml` and parsing its output. >> Please try to to run the command from the same node and let us know the >> output. >> >> >> On Mon, 2019-03-25 at 11:43 +0530, Maurya M wrote: >> > Now the error is on the same line 860 : as highlighted below: >> > >> > [2019-03-25 06:11:52.376238] E >> > [syncdutils(monitor):332:log_raise_exception] : FAIL: >> > Traceback (most recent call last): >> > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line >> > 311, in main >> > func(args) >> > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line >> > 50, in subcmd_monitor >> > return monitor.monitor(local, remote) >> > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line >> > 427, in monitor >> > return Monitor().multiplex(*distribute(local, remote)) >> > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line >> > 386, in distribute >> > svol = Volinfo(slave.volume, "localhost", prelude) >> > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line >> > 860, in __init__ >> > vi = XET.fromstring(vix) >> > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1300, in >> > XML >> > parser.feed(text) >> > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1642, in >> > feed >> > self._raiseerror(v) >> > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1506, in >> > _raiseerror >> > raise err >> > ParseError: syntax error: line 1, column 0 >> > >> > >> > On Mon, Mar 25, 2019 at 11:29 AM Maurya M wrote: >> > > Sorry my bad, had put the print line to debug, i am using gluster >> > > 4.1.7, will remove the print line. >> > > >> > > On Mon, Mar 25, 2019 at 10:52 AM Aravinda >> > > wrote: >> > > > Below print statement looks wrong. Latest Glusterfs code doesn't >> > > > have >> > > > this print statement. Please let us know which version of >> > > > glusterfs you >> > > > are using. >> > > > >> > > > >> > > > ``` >> > > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", >> > > > line >> > > > 860, in __init__ >> > > > print "debug varible " %vix >> > > > ``` >> > > > >> > > > As a workaround, edit that file and comment the print line and >> > > > test the >> > > > geo-rep config command. >> > > > >> > > > >> > > > On Mon, 2019-03-25 at 09:46 +0530, Maurya M wrote: >> > > > > hi Aravinda, >> > > > > had the session created using : create ssh-port 2222 push-pem >> > > > and >> > > > > also the : >> > > > > >> > > > > gluster volume geo-replication >> > > > vol_75a5fd373d88ba687f591f3353fa05cf >> > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f config ssh- >> > > > port >> > > > > 2222 >> > > > > >> > > > > hitting this message: >> > > > > geo-replication config-set failed for >> > > > > vol_75a5fd373d88ba687f591f3353fa05cf >> > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f >> > > > > geo-replication command failed >> > > > > >> > > > > Below is snap of status: >> > > > > >> > > > > [root at k8s-agentpool1-24779565-1 >> > > > > >> > > > vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a73057 >> > > > 8e45ed9d51b9a80df6c33f]# gluster volume geo-replication >> > > > vol_75a5fd373d88ba687f591f3353fa05cf >> > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f status >> > > > > >> > > > > MASTER NODE MASTER VOL MASTER >> > > > > BRICK >> > > > >> > > > > SLAVE USER SLAVE >> > > > >> > > > > SLAVE NODE STATUS CRAWL >> > > > STATUS >> > > > > LAST_SYNCED >> > > > > ------------------------------------------------------------- >> > > > ------ >> > > > > ------------------------------------------------------------- >> > > > ------ >> > > > > ------------------------------------------------------------- >> > > > ------ >> > > > > ------------------------------------------------------------- >> > > > ------ >> > > > > ---------------- >> > > > > 172.16.189.4 vol_75a5fd373d88ba687f591f3353fa05cf >> > > > > >> > > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_ >> > > > 116f >> > > > > b9427fb26f752d9ba8e45e183cb1/brick root >> > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A >> > > > >> > > > > Created N/A N/A >> > > > > 172.16.189.35 vol_75a5fd373d88ba687f591f3353fa05cf >> > > > > >> > > > /var/lib/heketi/mounts/vg_05708751110fe60b3e7da15bdcf6d4d4/brick_ >> > > > 266b >> > > > > b08f0d466d346f8c0b19569736fb/brick root >> > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A >> > > > >> > > > > Created N/A N/A >> > > > > 172.16.189.66 vol_75a5fd373d88ba687f591f3353fa05cf >> > > > > >> > > > /var/lib/heketi/mounts/vg_4b92a2b687e59b7311055d3809b77c06/brick_ >> > > > dfa4 >> > > > > 4c9380cdedac708e27e2c2a443a0/brick root >> > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A >> > > > >> > > > > Created N/A N/A >> > > > > >> > > > > any ideas ? where can find logs for the failed commands check >> > > > in >> > > > > gysncd.log , the trace is as below: >> > > > > >> > > > > [2019-03-25 04:04:42.295043] I [gsyncd(monitor):297:main] >> > > > : >> > > > > Using session config file path=/var/lib/glusterd/geo- >> > > > > >> > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo >> > > > l_e7 >> > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > > > > [2019-03-25 04:04:42.387192] E >> > > > > [syncdutils(monitor):332:log_raise_exception] : FAIL: >> > > > > Traceback (most recent call last): >> > > > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", >> > > > line >> > > > > 311, in main >> > > > > func(args) >> > > > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", >> > > > line >> > > > > 50, in subcmd_monitor >> > > > > return monitor.monitor(local, remote) >> > > > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", >> > > > line >> > > > > 427, in monitor >> > > > > return Monitor().multiplex(*distribute(local, remote)) >> > > > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", >> > > > line >> > > > > 370, in distribute >> > > > > mvol = Volinfo(master.volume, master.host) >> > > > > File >> > > > "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line >> > > > > 860, in __init__ >> > > > > print "debug varible " %vix >> > > > > TypeError: not all arguments converted during string formatting >> > > > > [2019-03-25 04:04:48.997519] I [gsyncd(config-get):297:main] >> > > > : >> > > > > Using session config file path=/var/lib/glusterd/geo- >> > > > > >> > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo >> > > > l_e7 >> > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > > > > [2019-03-25 04:04:49.93528] I [gsyncd(status):297:main] : >> > > > Using >> > > > > session config file path=/var/lib/glusterd/geo- >> > > > > >> > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo >> > > > l_e7 >> > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > > > > [2019-03-25 04:08:07.194348] I [gsyncd(config-get):297:main] >> > > > : >> > > > > Using session config file path=/var/lib/glusterd/geo- >> > > > > >> > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo >> > > > l_e7 >> > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > > > > [2019-03-25 04:08:07.262588] I [gsyncd(config-get):297:main] >> > > > : >> > > > > Using session config file path=/var/lib/glusterd/geo- >> > > > > >> > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo >> > > > l_e7 >> > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > > > > [2019-03-25 04:08:07.550080] I [gsyncd(config-get):297:main] >> > > > : >> > > > > Using session config file path=/var/lib/glusterd/geo- >> > > > > >> > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo >> > > > l_e7 >> > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > > > > [2019-03-25 04:08:18.933028] I [gsyncd(config-get):297:main] >> > > > : >> > > > > Using session config file path=/var/lib/glusterd/geo- >> > > > > >> > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo >> > > > l_e7 >> > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > > > > [2019-03-25 04:08:19.25285] I [gsyncd(status):297:main] : >> > > > Using >> > > > > session config file path=/var/lib/glusterd/geo- >> > > > > >> > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo >> > > > l_e7 >> > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > > > > [2019-03-25 04:09:15.766882] I [gsyncd(config-get):297:main] >> > > > : >> > > > > Using session config file path=/var/lib/glusterd/geo- >> > > > > >> > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo >> > > > l_e7 >> > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > > > > [2019-03-25 04:09:16.30267] I [gsyncd(config-get):297:main] >> > > > : >> > > > > Using session config file path=/var/lib/glusterd/geo- >> > > > > >> > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo >> > > > l_e7 >> > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > > > > [2019-03-25 04:09:16.89006] I [gsyncd(config-set):297:main] >> > > > : >> > > > > Using session config file path=/var/lib/glusterd/geo- >> > > > > >> > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo >> > > > l_e7 >> > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf >> > > > > >> > > > > regards, >> > > > > Maurya >> > > > > >> > > > > On Mon, Mar 25, 2019 at 9:08 AM Aravinda >> > > > wrote: >> > > > > > Use `ssh-port ` while creating the Geo-rep session >> > > > > > >> > > > > > Ref: >> > > > > > >> > > > >> https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#creating-the-session >> > > > > > >> > > > > > And set the ssh-port option before start. >> > > > > > >> > > > > > ``` >> > > > > > gluster volume geo-replication \ >> > > > > > [@]:: config >> > > > > > ssh-port 2222 >> > > > > > ``` >> > > > > > >> -- >> regards >> Aravinda >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gverify-slavemnt.log Type: application/octet-stream Size: 3556 bytes Desc: not available URL: From srangana at redhat.com Mon Mar 25 14:28:40 2019 From: srangana at redhat.com (Shyam Ranganathan) Date: Mon, 25 Mar 2019 10:28:40 -0400 Subject: [Gluster-users] Announcing Gluster Release 6 Message-ID: The Gluster community is pleased to announce the release of 6.0, our latest release. This is a major release that includes a range of code improvements and stability fixes along with a few features as noted below. A selection of the key features and bugs addressed are documented in this [1] page. Announcements: 1. Releases that receive maintenance updates post release 6 are, 4.1 and 5 [2] 2. Release 6 will receive maintenance updates around the 10th of every month for the first 3 months post release (i.e Apr'19, May'19, Jun'19). Post the initial 3 months, it will receive maintenance updates every 2 months till EOL. [3] A series of features/xlators have been deprecated in release 6 as follows, for upgrade procedures from volumes that use these features to release 6 refer to the release 6 upgrade guide [4]. Features deprecated: - Block device (bd) xlator - Decompounder feature - Crypt xlator - Symlink-cache xlator - Stripe feature - Tiering support (tier xlator and changetimerecorder) Highlights of this release are: - Several stability fixes addressing, coverity, clang-scan, address sanitizer and valgrind reported issues - Removal of unused and hence, deprecated code and features - Client side inode garbage collection - This release addresses one of the major concerns regarding FUSE mount process memory footprint, by introducing client side inode garbage collection - Performance Improvements - "--auto-invalidation" on FUSE mounts to leverage kernel page cache more effectively Bugs addressed are provided towards the end, in the release notes [1] Thank you, Gluster community References: [1] Release notes: https://docs.gluster.org/en/latest/release-notes/6.0/ [2] Release schedule: https://www.gluster.org/release-schedule/ [3] Gluster release cadence and version changes: https://lists.gluster.org/pipermail/announce/2018-July/000103.html [4] Upgrade guide to release-6: https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_6/ From budic at onholyground.com Mon Mar 25 15:10:05 2019 From: budic at onholyground.com (Darrell Budic) Date: Mon, 25 Mar 2019 10:10:05 -0500 Subject: [Gluster-users] Proposal: Changes in Gluster Community meetings In-Reply-To: References: Message-ID: <62104B6F-99CF-4C22-80FC-9C177F73E897@onholyground.com> As a user, I?d like to visit more of these, but the time slot is my 3AM. Any possibility for a rolling schedule (move meeting +6 hours each week with rolling attendance from maintainers?) or an occasional regional meeting 12 hours opposed to the one you?re proposing? -Darrell > On Mar 25, 2019, at 4:25 AM, Amar Tumballi Suryanarayan wrote: > > All, > > We currently have 3 meetings which are public: > > 1. Maintainer's Meeting > > - Runs once in 2 weeks (on Mondays), and current attendance is around 3-5 on an avg, and not much is discussed. > - Without majority attendance, we can't take any decisions too. > > 2. Community meeting > > - Supposed to happen on #gluster-meeting, every 2 weeks, and is the only meeting which is for 'Community/Users'. Others are for developers as of now. > Sadly attendance is getting closer to 0 in recent times. > > 3. GCS meeting > > - We started it as an effort inside Red Hat gluster team, and opened it up for community from Jan 2019, but the attendance was always from RHT members, and haven't seen any traction from wider group. > > So, I have a proposal to call out for cancelling all these meeting, and keeping just 1 weekly 'Community' meeting, where even topics related to maintainers and GCS and other projects can be discussed. > > I have a template of a draft template @ https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g > > Please feel free to suggest improvements, both in agenda and in timings. So, we can have more participation from members of community, which allows more user - developer interactions, and hence quality of project. > > Waiting for feedbacks, > > Regards, > Amar > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Mon Mar 25 15:23:43 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Mon, 25 Mar 2019 20:53:43 +0530 Subject: [Gluster-users] Proposal: Changes in Gluster Community meetings In-Reply-To: <62104B6F-99CF-4C22-80FC-9C177F73E897@onholyground.com> References: <62104B6F-99CF-4C22-80FC-9C177F73E897@onholyground.com> Message-ID: Thanks for the feedback Darrell, The new proposal is to have one in North America 'morning' time. (10AM PST), And another in ASIA day time, which is evening 7pm/6pm in Australia, 9pm Newzealand, 5pm Tokyo, 4pm Beijing. For example, if we choose Every other Tuesday for meeting, and 1st of the month is Tuesday, we would have North America time for 1st, and on 15th it would be ASIA/Pacific time. Hopefully, this way, we can cover all the timezones, and meeting minutes would be committed to github repo, so that way, it will be easier for everyone to be aware of what is happening. Regards, Amar On Mon, Mar 25, 2019 at 8:40 PM Darrell Budic wrote: > As a user, I?d like to visit more of these, but the time slot is my 3AM. > Any possibility for a rolling schedule (move meeting +6 hours each week > with rolling attendance from maintainers?) or an occasional regional > meeting 12 hours opposed to the one you?re proposing? > > -Darrell > > On Mar 25, 2019, at 4:25 AM, Amar Tumballi Suryanarayan < > atumball at redhat.com> wrote: > > All, > > We currently have 3 meetings which are public: > > 1. Maintainer's Meeting > > - Runs once in 2 weeks (on Mondays), and current attendance is around 3-5 > on an avg, and not much is discussed. > - Without majority attendance, we can't take any decisions too. > > 2. Community meeting > > - Supposed to happen on #gluster-meeting, every 2 weeks, and is the only > meeting which is for 'Community/Users'. Others are for developers as of > now. > Sadly attendance is getting closer to 0 in recent times. > > 3. GCS meeting > > - We started it as an effort inside Red Hat gluster team, and opened it up > for community from Jan 2019, but the attendance was always from RHT > members, and haven't seen any traction from wider group. > > So, I have a proposal to call out for cancelling all these meeting, and > keeping just 1 weekly 'Community' meeting, where even topics related to > maintainers and GCS and other projects can be discussed. > > I have a template of a draft template @ > https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g > > Please feel free to suggest improvements, both in agenda and in timings. > So, we can have more participation from members of community, which allows > more user - developer interactions, and hence quality of project. > > Waiting for feedbacks, > > Regards, > Amar > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkalever at redhat.com Mon Mar 25 16:00:14 2019 From: pkalever at redhat.com (Prasanna Kalever) Date: Mon, 25 Mar 2019 21:30:14 +0530 Subject: [Gluster-users] Help: gluster-block In-Reply-To: References: Message-ID: [ adding +gluster-users for archive purpose ] On Sat, Mar 23, 2019 at 1:51 AM Jeffrey Chin wrote: > > Hello Mr. Kalever, Hello Jeffrey, > > I am currently working on a project to utilize GlusterFS for VMWare VMs. In our research, we found that utilizing block devices with GlusterFS would be the best approach for our use case (correct me if I am wrong). I saw the gluster utility that you are a contributor for called gluster-block (https://github.com/gluster/gluster-block), and I had a question about the configuration. From what I understand, gluster-block only works on the servers that are serving the gluster volume. Would it be possible to run the gluster-block utility on a client machine that has a gluster volume mounted to it? Yes, that is right! At the moment gluster-block is coupled with glusterd for simplicity. But we have made some changes here [1] to provide a way to specify server address (volfile-server) which is outside the gluster-blockd node, please take a look. Although it is not complete solution, but it should at-least help for some usecases. Feel free to raise an issue [2] with the details about your usecase and etc or submit a PR by your self :-) We never picked it, as we never have a usecase needing separation of gluster-blockd and glusterd. > > I also have another question: how do I make the iSCSI targets persist if all of the gluster nodes were rebooted? It seems like once all of the nodes reboot, I am unable to reconnect to the iSCSI targets created by the gluster-block utility. do you mean rebooting iscsi initiator ? or gluster-block/gluster target/server nodes ? 1. for initiator to automatically connect to block devices post reboot, we need to make below changes in /etc/iscsi/iscsid.conf: node.startup = automatic 2. if you mean, just in case if all the gluster nodes goes down, on the initiator all the available HA path's will be down, but we still want the IO to be queued on the initiator, until one of the path (gluster node) is availabe: for this in gluster-block sepcific section of multipath.conf you need to replace 'no_path_retry 120' as 'no_path_retry queue' Note: refer README for current multipath.conf setting recommendations. [1] https://github.com/gluster/gluster-block/pull/161 [2] https://github.com/gluster/gluster-block/issues/new BRs, -- Prasanna From amye at redhat.com Mon Mar 25 22:18:06 2019 From: amye at redhat.com (Amye Scavarda) Date: Mon, 25 Mar 2019 12:18:06 -1000 Subject: [Gluster-users] Gluster 6 Retrospective Open Until April 8 Message-ID: Congrats to the team for getting 6 released! We're doing another retrospective, please come give us your feedback! This retrospective will be open until April 8. https://www.gluster.org/gluster-6-0-retrospective/ Thanks! - amye -- Amye Scavarda | amye at redhat.com | Gluster Community Lead From ksubrahm at redhat.com Tue Mar 26 06:05:23 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Tue, 26 Mar 2019 11:35:23 +0530 Subject: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain In-Reply-To: <2D0FE76B-F6B0-4CB4-92F7-88F030E150A1@mdpi.com> References: <132679F1-A96E-402F-BDF0-79FE200AD9F7@mdpi.com> <548844D2-E06E-4A60-972E-72213C61D264@mdpi.com> <14B1CACB-4049-42DD-AB69-3B75FBD6BE30@mdpi.com> <990A42AA-1F60-441E-BD5A-97B8333E2083@mdpi.com> <76354C5F-C197-475A-B4D3-D6089CED12EB@mdpi.com> <4B662F79-4947-4DFE-BDC7-B6B61A1054FF@mdpi.com> <2D0FE76B-F6B0-4CB4-92F7-88F030E150A1@mdpi.com> Message-ID: Hi, Sorry for the delay. Comments inline. On Fri, Mar 22, 2019 at 6:23 PM Milos Cuculovic wrote: > You can make use of the same ls - l command to get the actual path of the > parent directory (a96e940d-3130-45d1-9efe-7aff463fec3d). > > It seems the problem is that those directories exist on one brick but not > on the other, as we can see here: > > sudo gluster volume heal storage2 info > Brick storage3:/data/data-cluster > > > Status: Connected > Number of entries: 2 > > Brick storage4:/data/data-cluster > > > > > > > Status: Connected > Number of entries: 6 > > the first part shows the two gfids are aailable on brick1 only. the second > part shows the gfids available on 2nd brick only. > If a brick showing some gfids in the heal info output need not be necessarily saying it is present on that brick and not on other. If there are anything pending heal from this brick to another brick for that gfid then also it will be shown in the heal info output. > > Once you get the complete path you can run the lookup on those directories > from the client and see whether they are getting healed? > > How to do this ? :) > You can simply run "ls" or "stat" on those entries from the client after finding the actual path. > > If not send the getfattr output of all the directories which are pending > heal, glustershd.log from both the nodes, and client mount log from where > you run the lookup. > >> Just in case, here attached are the log files. I really do not think > those get healed as they stay there since yesterday, and those are > directories with not so large files, should be done within seconds. > I don't see any heal related messages for some of the entries which are listed in the heal info output. Can you do the lookup and see whether it gets healed? If not provide the details which are requested in the previous mail. Regards, Karthik > > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email: cuculovic at mdpi.com > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message > are confidential and intended solely for the use of the individual or > entity to whom they are addressed. If you have received this message in > error, please notify me and delete this message from your system. You may > not copy this message in its entirety or in part, or disclose its contents > to anyone. > > On 22 Mar 2019, at 12:58, Karthik Subrahmanya wrote: > > > > On Fri, Mar 22, 2019 at 5:02 PM Milos Cuculovic > wrote: > >> Thank you Karthik, >> >> The 2nd command works for all of them, those are directories: >> sudo ls -l >> /data/data-cluster/.glusterfs/27/6f/276fec9a-1c9b-4efe-9715-dcf4207e99b0 >> lrwxrwxrwx 1 root root 60 Jun 14 2018 >> /data/data-cluster/.glusterfs/27/6f/276fec9a-1c9b-4efe-9715-dcf4207e99b0 -> >> ../../a9/6e/a96e940d-3130-45d1-9efe-7aff463fec3d/final_files >> >> But now, what to do with this info? Since yesterday, the heal info shows >> the samge gfids. >> > You can make use of the same ls - l command to get the actual path of the > parent directory (a96e940d-3130-45d1-9efe-7aff463fec3d). Once you get the > complete path you can run the lookup on those directories from the client > and see whether they are getting healed? > If not send the getfattr output of all the directories which are pending > heal, glustershd.log from both the nodes, and client mount log from where > you run the lookup. > >> >> - Kindest regards, >> >> Milos Cuculovic >> IT Manager >> >> --- >> MDPI AG >> Postfach, CH-4020 Basel, Switzerland >> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >> Tel. +41 61 683 77 35 >> Fax +41 61 302 89 18 >> Email: cuculovic at mdpi.com >> Skype: milos.cuculovic.mdpi >> >> Disclaimer: The information and files contained in this message >> are confidential and intended solely for the use of the individual or >> entity to whom they are addressed. If you have received this message in >> error, please notify me and delete this message from your system. You may >> not copy this message in its entirety or in part, or disclose its contents >> to anyone. >> >> On 22 Mar 2019, at 08:51, Karthik Subrahmanya >> wrote: >> >> Hi, >> >> If it is a file then you can find the filename from the gfid by running >> the following on the nodes hosting the bricks >> find -samefile > gfid>// >> >> If it is a directory you can run the following on the nodes hosting the >> bricks >> ls -l /> gfid>/ >> >> Run these on both the nodes and paste the output of these commands before >> running the lookup from client on these entries. >> >> Regards, >> Karthik >> >> On Fri, Mar 22, 2019 at 1:06 PM Milos Cuculovic >> wrote: >> >>> I have run a few minutes ago the info and here are the results: >>> >>> sudo gluster volume heal storage2 info >>> Brick storage3:/data/data-cluster >>> >>> >>> Status: Connected >>> Number of entries: 2 >>> >>> Brick storage4:/data/data-cluster >>> >>> >>> >>> >>> >>> >>> Status: Connected >>> Number of entries: 6 >>> >>> >>> sudo gluster volume heal storage2 info split-brain >>> Brick storage3:/data/data-cluster >>> Status: Connected >>> Number of entries in split-brain: 0 >>> >>> Brick storage4:/data/data-cluster >>> Status: Connected >>> Number of entries in split-brain: 0 >>> >>> The heal info (2 + 6) are there since yesterday and do not change. >>> >>> >>> If they are still there can you try doing a lookup on those entries from >>> client and see whether they are getting healed? >>> >>> How can I do this having the gfid only? >>> >>> - Kindest regards, >>> >>> Milos Cuculovic >>> IT Manager >>> >>> --- >>> MDPI AG >>> Postfach, CH-4020 Basel, Switzerland >>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>> Tel. +41 61 683 77 35 >>> Fax +41 61 302 89 18 >>> Email: cuculovic at mdpi.com >>> Skype: milos.cuculovic.mdpi >>> >>> Disclaimer: The information and files contained in this message >>> are confidential and intended solely for the use of the individual or >>> entity to whom they are addressed. If you have received this message in >>> error, please notify me and delete this message from your system. You may >>> not copy this message in its entirety or in part, or disclose its contents >>> to anyone. >>> >>> On 21 Mar 2019, at 14:34, Karthik Subrahmanya >>> wrote: >>> >>> Now the slit-brain on the directory is resolved. >>> Are these entries which are there in the latest heal info output not >>> getting healed? Are they still present in the heal info output? >>> If they are still there can you try doing a lookup on those entries from >>> client and see whether they are getting healed? >>> >>> >>> On Thu, Mar 21, 2019 at 6:49 PM Milos Cuculovic >>> wrote: >>> >>>> Hey Karthik, >>>> >>>> Can you run the "guster volume heal ? >>>> >>>> sudo gluster volume heal storage2 >>>> Launching heal operation to perform index self heal on volume storage2 >>>> has been successful >>>> Use heal info commands to check status. >>>> >>>> "gluster volume heal info? >>>> >>>> sudo gluster volume heal storage2 info >>>> Brick storage3:/data/data-cluster >>>> >>>> >>>> Status: Connected >>>> Number of entries: 2 >>>> >>>> Brick storage4:/data/data-cluster >>>> >>>> >>>> >>>> >>>> >>>> >>>> Status: Connected >>>> Number of entries: 6 >>>> >>>> >>>> >>>> - Kindest regards, >>>> >>>> Milos Cuculovic >>>> IT Manager >>>> >>>> --- >>>> MDPI AG >>>> Postfach, CH-4020 Basel, Switzerland >>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>> Tel. +41 61 683 77 35 >>>> Fax +41 61 302 89 18 >>>> Email: cuculovic at mdpi.com >>>> Skype: milos.cuculovic.mdpi >>>> >>>> Disclaimer: The information and files contained in this message >>>> are confidential and intended solely for the use of the individual or >>>> entity to whom they are addressed. If you have received this message in >>>> error, please notify me and delete this message from your system. You may >>>> not copy this message in its entirety or in part, or disclose its contents >>>> to anyone. >>>> >>>> On 21 Mar 2019, at 14:07, Karthik Subrahmanya >>>> wrote: >>>> >>>> Hey Milos, >>>> >>>> I see that gfid got healed for those directories from the getfattr >>>> output and the glfsheal log also has messages corresponding to deleting the >>>> entries on one brick as part of healing which then got recreated on the >>>> brick with the correct gfid. Can you run the "guster volume heal " >>>> & "gluster volume heal info" command and paste the output here? >>>> If you still see entries pending heal, give the latest glustershd.log >>>> files from both the nodes along with the getfattr output of the files which >>>> are listed in the heal info output. >>>> >>>> Regards, >>>> Karthik >>>> >>>> On Thu, Mar 21, 2019 at 6:03 PM Milos Cuculovic >>>> wrote: >>>> >>>>> Sure: >>>>> >>>>> brick1: >>>>> ???????????????????????????????????????????????????????????? >>>>> ???????????????????????????????????????????????????????????? >>>>> sudo getfattr -d -m . -e hex >>>>> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: >>>>> data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>>> trusted.gfid=0xe358ff34504241d387efe1e76eb28bb0 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> >>>>> sudo getfattr -d -m . -e hex >>>>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: >>>>> data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>>> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> >>>>> sudo getfattr -d -m . -e hex >>>>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: >>>>> data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>>> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> >>>>> sudo getfattr -d -m . -e hex >>>>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: >>>>> data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>>> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> >>>>> sudo getfattr -d -m . -e hex >>>>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: >>>>> data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>>> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> >>>>> sudo getfattr -d -m . -e hex >>>>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: >>>>> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> ???????????????????????????????????????????????????????????? >>>>> sudo stat >>>>> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>>> File: >>>>> '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' >>>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 40809094709 Links: 3 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>> 33/www-data) >>>>> Access: 2019-03-20 11:06:26.994047597 +0100 >>>>> Modify: 2019-03-20 11:28:28.294689870 +0100 >>>>> Change: 2019-03-21 13:01:03.077654239 +0100 >>>>> Birth: - >>>>> >>>>> sudo stat >>>>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>>> File: >>>>> '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' >>>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 49399908865 Links: 3 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>> 33/www-data) >>>>> Access: 2019-03-20 11:07:20.342140927 +0100 >>>>> Modify: 2019-03-20 11:28:28.318690015 +0100 >>>>> Change: 2019-03-21 13:01:03.133654344 +0100 >>>>> Birth: - >>>>> >>>>> sudo stat >>>>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>>> File: >>>>> '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' >>>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 53706303549 Links: 3 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>> 33/www-data) >>>>> Access: 2019-03-20 11:06:55.414097315 +0100 >>>>> Modify: 2019-03-20 11:28:28.362690281 +0100 >>>>> Change: 2019-03-21 13:01:03.141654359 +0100 >>>>> Birth: - >>>>> >>>>> sudo stat >>>>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>>> File: >>>>> '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' >>>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 57990935591 Links: 3 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>> 33/www-data) >>>>> Access: 2019-03-20 11:07:08.558120309 +0100 >>>>> Modify: 2019-03-20 11:28:14.226604869 +0100 >>>>> Change: 2019-03-21 13:01:03.189654448 +0100 >>>>> Birth: - >>>>> >>>>> sudo stat >>>>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>>> File: >>>>> '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' >>>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 62291339781 Links: 3 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>> 33/www-data) >>>>> Access: 2019-03-20 11:06:02.070003998 +0100 >>>>> Modify: 2019-03-20 11:28:28.458690861 +0100 >>>>> Change: 2019-03-21 13:01:03.281654621 +0100 >>>>> Birth: - >>>>> >>>>> sudo stat >>>>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>>> File: >>>>> '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' >>>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 66574223479 Links: 3 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>> 33/www-data) >>>>> Access: 2019-03-20 11:28:10.826584325 +0100 >>>>> Modify: 2019-03-20 11:28:10.834584374 +0100 >>>>> Change: 2019-03-20 14:06:07.937449353 +0100 >>>>> Birth: - >>>>> root at storage3:/var/log/glusterfs# >>>>> ???????????????????????????????????????????????????????????? >>>>> ???????????????????????????????????????????????????????????? >>>>> >>>>> brick2: >>>>> ???????????????????????????????????????????????????????????? >>>>> ???????????????????????????????????????????????????????????? >>>>> sudo getfattr -d -m . -e hex >>>>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: >>>>> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> >>>>> sudo getfattr -d -m . -e hex >>>>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: >>>>> data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>>> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> >>>>> sudo getfattr -d -m . -e hex >>>>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: >>>>> data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>>> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> >>>>> sudo getfattr -d -m . -e hex >>>>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: >>>>> data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>>> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> >>>>> sudo getfattr -d -m . -e hex >>>>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: >>>>> data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>>> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> >>>>> sudo getfattr -d -m . -e hex >>>>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: >>>>> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>> trusted.afr.storage2-client-0=0x000000000000000000000000 >>>>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>> >>>>> ???????????????????????????????????????????????????????????? >>>>> >>>>> sudo stat >>>>> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>>> File: >>>>> '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' >>>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 42232631305 Links: 3 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>> 33/www-data) >>>>> Access: 2019-03-20 11:06:26.994047597 +0100 >>>>> Modify: 2019-03-20 11:28:28.294689870 +0100 >>>>> Change: 2019-03-21 13:01:03.078748131 +0100 >>>>> Birth: - >>>>> >>>>> sudo stat >>>>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 >>>>> File: >>>>> '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' >>>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 78589109305 Links: 3 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>> 33/www-data) >>>>> Access: 2019-03-20 11:07:20.342140927 +0100 >>>>> Modify: 2019-03-20 11:28:28.318690015 +0100 >>>>> Change: 2019-03-21 13:01:03.134748477 +0100 >>>>> Birth: - >>>>> >>>>> sudo stat >>>>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf >>>>> File: >>>>> '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' >>>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 54972096517 Links: 3 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>> 33/www-data) >>>>> Access: 2019-03-20 11:06:55.414097315 +0100 >>>>> Modify: 2019-03-20 11:28:28.362690281 +0100 >>>>> Change: 2019-03-21 13:01:03.162748650 +0100 >>>>> Birth: - >>>>> >>>>> sudo stat >>>>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b >>>>> File: >>>>> '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' >>>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 40821259275 Links: 3 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>> 33/www-data) >>>>> Access: 2019-03-20 11:07:08.558120309 +0100 >>>>> Modify: 2019-03-20 11:28:14.226604869 +0100 >>>>> Change: 2019-03-21 13:01:03.194748848 +0100 >>>>> Birth: - >>>>> >>>>> sudo stat >>>>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b >>>>> File: >>>>> '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' >>>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 15876654 Links: 3 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>> 33/www-data) >>>>> Access: 2019-03-20 11:06:02.070003998 +0100 >>>>> Modify: 2019-03-20 11:28:28.458690861 +0100 >>>>> Change: 2019-03-21 13:01:03.282749392 +0100 >>>>> Birth: - >>>>> >>>>> sudo stat >>>>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e >>>>> File: >>>>> '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' >>>>> Size: 33 Blocks: 0 IO Block: 4096 directory >>>>> Device: 807h/2055d Inode: 49408944650 Links: 3 >>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>> 33/www-data) >>>>> Access: 2019-03-20 11:28:10.826584325 +0100 >>>>> Modify: 2019-03-20 11:28:10.834584374 +0100 >>>>> Change: 2019-03-20 14:06:07.940849268 +0100 >>>>> Birth: - >>>>> ???????????????????????????????????????????????????????????? >>>>> ???????????????????????????????????????????????????????????? >>>>> >>>>> The file is from brick 2 that I upgraded and started the heal on. >>>>> >>>>> >>>>> - Kindest regards, >>>>> >>>>> Milos Cuculovic >>>>> IT Manager >>>>> >>>>> --- >>>>> MDPI AG >>>>> Postfach, CH-4020 Basel, Switzerland >>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>> Tel. +41 61 683 77 35 >>>>> Fax +41 61 302 89 18 >>>>> Email: cuculovic at mdpi.com >>>>> Skype: milos.cuculovic.mdpi >>>>> >>>>> Disclaimer: The information and files contained in this message >>>>> are confidential and intended solely for the use of the individual or >>>>> entity to whom they are addressed. If you have received this message in >>>>> error, please notify me and delete this message from your system. You may >>>>> not copy this message in its entirety or in part, or disclose its contents >>>>> to anyone. >>>>> >>>>> On 21 Mar 2019, at 13:05, Karthik Subrahmanya >>>>> wrote: >>>>> >>>>> Can you give me the stat & getfattr output of all those 6 entries from >>>>> both the bricks and the glfsheal-.log file from the node where you >>>>> run this command? >>>>> Meanwhile can you also try running this with the source-brick option? >>>>> >>>>> On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic >>>>> wrote: >>>>> >>>>>> Thank you Karthik, >>>>>> >>>>>> I have run this for all files (see example below) and it says the >>>>>> file is not in split-brain: >>>>>> >>>>>> sudo gluster volume heal storage2 split-brain latest-mtime >>>>>> /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >>>>>> Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: >>>>>> File not in split-brain. >>>>>> Volume heal failed. >>>>>> >>>>>> >>>>>> - Kindest regards, >>>>>> >>>>>> Milos Cuculovic >>>>>> IT Manager >>>>>> >>>>>> --- >>>>>> MDPI AG >>>>>> Postfach, CH-4020 Basel, Switzerland >>>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>>> Tel. +41 61 683 77 35 >>>>>> Fax +41 61 302 89 18 >>>>>> Email: cuculovic at mdpi.com >>>>>> Skype: milos.cuculovic.mdpi >>>>>> >>>>>> Disclaimer: The information and files contained in this message >>>>>> are confidential and intended solely for the use of the individual or >>>>>> entity to whom they are addressed. If you have received this message in >>>>>> error, please notify me and delete this message from your system. You may >>>>>> not copy this message in its entirety or in part, or disclose its contents >>>>>> to anyone. >>>>>> >>>>>> On 21 Mar 2019, at 12:36, Karthik Subrahmanya >>>>>> wrote: >>>>>> >>>>>> Hi Milos, >>>>>> >>>>>> Thanks for the logs and the getfattr output. >>>>>> From the logs I can see that there are 6 entries under the >>>>>> directory "/data/data-cluster/dms/final_archive" named >>>>>> 41be9ff5ec05c4b1c989c6053e709e59 >>>>>> 5543982fab4b56060aa09f667a8ae617 >>>>>> a8b7f31775eebc8d1867e7f9de7b6eaf >>>>>> c1d3f3c2d7ae90e891e671e2f20d5d4b >>>>>> e5934699809a3b6dcfc5945f408b978b >>>>>> e7cdc94f60d390812a5f9754885e119e >>>>>> which are having gfid mismatch, so the heal is failing on this >>>>>> directory. >>>>>> >>>>>> You can use the CLI option to resolve these files from gfid mismatch. >>>>>> You can use any of the 3 methods available: >>>>>> 1. bigger-file >>>>>> gluster volume heal split-brain bigger-file >>>>>> >>>>>> 2. latest-mtime >>>>>> gluster volume heal split-brain latest-mtime >>>>>> >>>>>> 3. source-brick >>>>>> gluster volume heal split-brain source-brick >>>>>> >>>>>> >>>>>> where must be absolute path w.r.t. the volume, starting with >>>>>> '/'. >>>>>> If all those entries are directories then go for either >>>>>> latest-mtime/source-brick option. >>>>>> After you resolve all these gfid-mismatches, run the "gluster volume >>>>>> heal " command. Then check the heal info and let me know the >>>>>> result. >>>>>> >>>>>> Regards, >>>>>> Karthik >>>>>> >>>>>> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic >>>>>> wrote: >>>>>> >>>>>>> Sure, thank you for following up. >>>>>>> >>>>>>> About the commands, here is what I see: >>>>>>> >>>>>>> brick1: >>>>>>> ????????????????????????????????????? >>>>>>> sudo gluster volume heal storage2 info >>>>>>> Brick storage3:/data/data-cluster >>>>>>> >>>>>>> >>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 3 >>>>>>> >>>>>>> Brick storage4:/data/data-cluster >>>>>>> >>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 2 >>>>>>> ????????????????????????????????????? >>>>>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>>>>>> getfattr: Removing leading '/' from absolute path names >>>>>>> # file: data/data-cluster/dms/final_archive >>>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>>> trusted.afr.storage2-client-1=0x000000000000000000000010 >>>>>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>>>> ????????????????????????????????????? >>>>>>> stat /data/data-cluster/dms/final_archive >>>>>>> File: '/data/data-cluster/dms/final_archive' >>>>>>> Size: 3497984 Blocks: 8768 IO Block: 4096 directory >>>>>>> Device: 807h/2055d Inode: 26427748396 Links: 72123 >>>>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>>>> 33/www-data) >>>>>>> Access: 2018-10-09 04:22:40.514629044 +0200 >>>>>>> Modify: 2019-03-21 11:55:37.382278863 +0100 >>>>>>> Change: 2019-03-21 11:55:37.382278863 +0100 >>>>>>> Birth: - >>>>>>> ????????????????????????????????????? >>>>>>> ????????????????????????????????????? >>>>>>> >>>>>>> brick2: >>>>>>> ????????????????????????????????????? >>>>>>> sudo gluster volume heal storage2 info >>>>>>> Brick storage3:/data/data-cluster >>>>>>> >>>>>>> >>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 3 >>>>>>> >>>>>>> Brick storage4:/data/data-cluster >>>>>>> >>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>> >>>>>>> Status: Connected >>>>>>> Number of entries: 2 >>>>>>> ????????????????????????????????????? >>>>>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>>>>>> getfattr: Removing leading '/' from absolute path names >>>>>>> # file: data/data-cluster/dms/final_archive >>>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>>> trusted.afr.storage2-client-0=0x000000000000000000000001 >>>>>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>>>> trusted.glusterfs.dht.mds=0x00000000 >>>>>>> ????????????????????????????????????? >>>>>>> stat /data/data-cluster/dms/final_archive >>>>>>> File: '/data/data-cluster/dms/final_archive' >>>>>>> Size: 3497984 Blocks: 8760 IO Block: 4096 directory >>>>>>> Device: 807h/2055d Inode: 13563551265 Links: 72124 >>>>>>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( >>>>>>> 33/www-data) >>>>>>> Access: 2018-10-09 04:22:40.514629044 +0200 >>>>>>> Modify: 2019-03-21 11:55:46.382565124 +0100 >>>>>>> Change: 2019-03-21 11:55:46.382565124 +0100 >>>>>>> Birth: - >>>>>>> ????????????????????????????????????? >>>>>>> >>>>>>> Hope this helps. >>>>>>> >>>>>>> - Kindest regards, >>>>>>> >>>>>>> Milos Cuculovic >>>>>>> IT Manager >>>>>>> >>>>>>> --- >>>>>>> MDPI AG >>>>>>> Postfach, CH-4020 Basel, Switzerland >>>>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>>>> Tel. +41 61 683 77 35 >>>>>>> Fax +41 61 302 89 18 >>>>>>> Email: cuculovic at mdpi.com >>>>>>> Skype: milos.cuculovic.mdpi >>>>>>> >>>>>>> Disclaimer: The information and files contained in this message >>>>>>> are confidential and intended solely for the use of the individual or >>>>>>> entity to whom they are addressed. If you have received this message in >>>>>>> error, please notify me and delete this message from your system. You may >>>>>>> not copy this message in its entirety or in part, or disclose its contents >>>>>>> to anyone. >>>>>>> >>>>>>> On 21 Mar 2019, at 11:43, Karthik Subrahmanya >>>>>>> wrote: >>>>>>> >>>>>>> Can you attach the "glustershd.log" file which will be present >>>>>>> under "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr >>>>>>> -d -m . -e hex " output of all the entries listed in >>>>>>> the heal info output from both the bricks? >>>>>>> >>>>>>> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks Karthik! >>>>>>>> >>>>>>>> I was trying to find some resolution methods from [2] but >>>>>>>> unfortunately none worked (I can explain what I tried if needed). >>>>>>>> >>>>>>>> I guess the volume you are talking about is of type replica-2 (1x2). >>>>>>>> >>>>>>>> That?s correct, aware of the arbiter solution but still didn?t took >>>>>>>> time to implement. >>>>>>>> >>>>>>>> From the info results I posted, how to know in which situation I >>>>>>>> am. No files are mentioned in spit brain, only directories. One brick has 3 >>>>>>>> entries and one two entries. >>>>>>>> >>>>>>>> sudo gluster volume heal storage2 info >>>>>>>> [sudo] password for sshadmin: >>>>>>>> Brick storage3:/data/data-cluster >>>>>>>> >>>>>>>> >>>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>>> >>>>>>>> Status: Connected >>>>>>>> Number of entries: 3 >>>>>>>> >>>>>>>> Brick storage4:/data/data-cluster >>>>>>>> >>>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>>> >>>>>>>> Status: Connected >>>>>>>> Number of entries: 2 >>>>>>>> >>>>>>>> - Kindest regards, >>>>>>>> >>>>>>>> Milos Cuculovic >>>>>>>> IT Manager >>>>>>>> >>>>>>>> --- >>>>>>>> MDPI AG >>>>>>>> Postfach, CH-4020 Basel, Switzerland >>>>>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>>>>>> Tel. +41 61 683 77 35 >>>>>>>> Fax +41 61 302 89 18 >>>>>>>> Email: cuculovic at mdpi.com >>>>>>>> Skype: milos.cuculovic.mdpi >>>>>>>> >>>>>>>> Disclaimer: The information and files contained in this message >>>>>>>> are confidential and intended solely for the use of the individual or >>>>>>>> entity to whom they are addressed. If you have received this message in >>>>>>>> error, please notify me and delete this message from your system. You may >>>>>>>> not copy this message in its entirety or in part, or disclose its contents >>>>>>>> to anyone. >>>>>>>> >>>>>>>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Note: I guess the volume you are talking about is of type replica-2 >>>>>>>> (1x2). Usually replica 2 volumes are prone to split-brain. If you can >>>>>>>> consider converting them to arbiter or replica-3, they will handle most of >>>>>>>> the cases which can lead to slit-brains. For more information see [1]. >>>>>>>> >>>>>>>> Resolving the split-brain: [2] talks about how to interpret the >>>>>>>> heal info output and different ways to resolve them using the >>>>>>>> CLI/manually/using the favorite-child-policy. >>>>>>>> If you are having entry split brain, and is a gfid split-brain >>>>>>>> (file/dir having different gfids on the replica bricks) then you can use >>>>>>>> the CLI option to resolve them. If a directory is in gfid split-brain in a >>>>>>>> distributed-replicate volume and you are using the source-brick option >>>>>>>> please make sure you use the brick of this subvolume, which has the same >>>>>>>> gfid as that of the other distribute subvolume(s) where you have the >>>>>>>> correct gfid, as the source. >>>>>>>> If you are having a type mismatch then follow the steps in [3] to >>>>>>>> resolve the split-brain. >>>>>>>> >>>>>>>> [1] >>>>>>>> https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >>>>>>>> [2] >>>>>>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >>>>>>>> [3] >>>>>>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain >>>>>>>> >>>>>>>> HTH, >>>>>>>> Karthik >>>>>>>> >>>>>>>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I was now able to catch the split brain log: >>>>>>>>> >>>>>>>>> sudo gluster volume heal storage2 info >>>>>>>>> Brick storage3:/data/data-cluster >>>>>>>>> >>>>>>>>> >>>>>>>>> /dms/final_archive - Is in split-brain >>>>>>>>> >>>>>>>>> Status: Connected >>>>>>>>> Number of entries: 3 >>>>>>>>> >>>>>>>>> Brick storage4:/data/data-cluster >>>>>>>>> >>>>>>>>> /dms/final_archive - Is in split-brain >>>>>>>>> >>>>>>>>> Status: Connected >>>>>>>>> Number of entries: 2 >>>>>>>>> >>>>>>>>> Milos >>>>>>>>> >>>>>>>>> On 21 Mar 2019, at 09:07, Milos Cuculovic >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, >>>>>>>>> the heal shows this: >>>>>>>>> >>>>>>>>> sudo gluster volume heal storage2 info >>>>>>>>> Brick storage3:/data/data-cluster >>>>>>>>> >>>>>>>>> >>>>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>>>> >>>>>>>>> Status: Connected >>>>>>>>> Number of entries: 3 >>>>>>>>> >>>>>>>>> Brick storage4:/data/data-cluster >>>>>>>>> >>>>>>>>> /dms/final_archive - Possibly undergoing heal >>>>>>>>> >>>>>>>>> Status: Connected >>>>>>>>> Number of entries: 2 >>>>>>>>> >>>>>>>>> The same files stay there. From time to time the status of the >>>>>>>>> /dms/final_archive is in split brain at the following command shows: >>>>>>>>> >>>>>>>>> sudo gluster volume heal storage2 info split-brain >>>>>>>>> Brick storage3:/data/data-cluster >>>>>>>>> /dms/final_archive >>>>>>>>> Status: Connected >>>>>>>>> Number of entries in split-brain: 1 >>>>>>>>> >>>>>>>>> Brick storage4:/data/data-cluster >>>>>>>>> /dms/final_archive >>>>>>>>> Status: Connected >>>>>>>>> Number of entries in split-brain: 1 >>>>>>>>> >>>>>>>>> How to know the file who is in split brain? The files in >>>>>>>>> /dms/final_archive are not very important, fine to remove (ideally resolve >>>>>>>>> the split brain) for the ones that differ. >>>>>>>>> >>>>>>>>> I can only see the directory and GFID. Any idea on how to resolve >>>>>>>>> this situation as I would like to continue with the upgrade on the 2nd >>>>>>>>> server, and for this the heal needs to be done with 0 entries in sudo >>>>>>>>> gluster volume heal storage2 info >>>>>>>>> >>>>>>>>> Thank you in advance, Milos. >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Gluster-users mailing list >>>>>>>>> Gluster-users at gluster.org >>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From avishwan at redhat.com Tue Mar 26 08:10:23 2019 From: avishwan at redhat.com (Aravinda) Date: Tue, 26 Mar 2019 13:40:23 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: <5342e4c8e5bff06a22edbc6be704e3f10bd67a4e.camel@redhat.com> Message-ID: <47ea47b7c4709d16677c7086fe683203bdd1662e.camel@redhat.com> I got chance to investigate this issue further and identified a issue with Geo-replication config set and sent patch to fix the same. BUG: https://bugzilla.redhat.com/show_bug.cgi?id=1692666 Patch: https://review.gluster.org/22418 On Mon, 2019-03-25 at 15:37 +0530, Maurya M wrote: > ran this command : ssh -p 2222 -i /var/lib/glusterd/geo- > replication/secret.pem root@gluster volume info --xml > > attaching the output. > > > > On Mon, Mar 25, 2019 at 2:13 PM Aravinda wrote: > > Geo-rep is running `ssh -i /var/lib/glusterd/geo- > > replication/secret.pem > > root@ gluster volume info --xml` and parsing its output. > > Please try to to run the command from the same node and let us know > > the > > output. > > > > > > On Mon, 2019-03-25 at 11:43 +0530, Maurya M wrote: > > > Now the error is on the same line 860 : as highlighted below: > > > > > > [2019-03-25 06:11:52.376238] E > > > [syncdutils(monitor):332:log_raise_exception] : FAIL: > > > Traceback (most recent call last): > > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line > > > 311, in main > > > func(args) > > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", > > line > > > 50, in subcmd_monitor > > > return monitor.monitor(local, remote) > > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > line > > > 427, in monitor > > > return Monitor().multiplex(*distribute(local, remote)) > > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > line > > > 386, in distribute > > > svol = Volinfo(slave.volume, "localhost", prelude) > > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", > > line > > > 860, in __init__ > > > vi = XET.fromstring(vix) > > > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line > > 1300, in > > > XML > > > parser.feed(text) > > > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line > > 1642, in > > > feed > > > self._raiseerror(v) > > > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line > > 1506, in > > > _raiseerror > > > raise err > > > ParseError: syntax error: line 1, column 0 > > > > > > > > > On Mon, Mar 25, 2019 at 11:29 AM Maurya M > > wrote: > > > > Sorry my bad, had put the print line to debug, i am using > > gluster > > > > 4.1.7, will remove the print line. > > > > > > > > On Mon, Mar 25, 2019 at 10:52 AM Aravinda > > > > wrote: > > > > > Below print statement looks wrong. Latest Glusterfs code > > doesn't > > > > > have > > > > > this print statement. Please let us know which version of > > > > > glusterfs you > > > > > are using. > > > > > > > > > > > > > > > ``` > > > > > File > > "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", > > > > > line > > > > > 860, in __init__ > > > > > print "debug varible " %vix > > > > > ``` > > > > > > > > > > As a workaround, edit that file and comment the print line > > and > > > > > test the > > > > > geo-rep config command. > > > > > > > > > > > > > > > On Mon, 2019-03-25 at 09:46 +0530, Maurya M wrote: > > > > > > hi Aravinda, > > > > > > had the session created using : create ssh-port 2222 push- > > pem > > > > > and > > > > > > also the : > > > > > > > > > > > > gluster volume geo-replication > > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f config > > ssh- > > > > > port > > > > > > 2222 > > > > > > > > > > > > hitting this message: > > > > > > geo-replication config-set failed for > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > > > > > geo-replication command failed > > > > > > > > > > > > Below is snap of status: > > > > > > > > > > > > [root at k8s-agentpool1-24779565-1 > > > > > > > > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a73057 > > > > > 8e45ed9d51b9a80df6c33f]# gluster volume geo-replication > > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f status > > > > > > > > > > > > MASTER NODE MASTER VOL > > MASTER > > > > > > BRICK > > > > > > > > > > > > > SLAVE USER SLAVE > > > > > > > > > > > > > SLAVE NODE STATUS > > CRAWL > > > > > STATUS > > > > > > LAST_SYNCED > > > > > > --------------------------------------------------------- > > ---- > > > > > ------ > > > > > > --------------------------------------------------------- > > ---- > > > > > ------ > > > > > > --------------------------------------------------------- > > ---- > > > > > ------ > > > > > > --------------------------------------------------------- > > ---- > > > > > ------ > > > > > > ---------------- > > > > > > 172.16.189.4 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > > > > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_ > > > > > 116f > > > > > > b9427fb26f752d9ba8e45e183cb1/brick root > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > > > > > > > > > > > > Created N/A N/A > > > > > > 172.16.189.35 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > > > > > /var/lib/heketi/mounts/vg_05708751110fe60b3e7da15bdcf6d4d4/brick_ > > > > > 266b > > > > > > b08f0d466d346f8c0b19569736fb/brick root > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > > > > > > > > > > > > Created N/A N/A > > > > > > 172.16.189.66 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > > > > > /var/lib/heketi/mounts/vg_4b92a2b687e59b7311055d3809b77c06/brick_ > > > > > dfa4 > > > > > > 4c9380cdedac708e27e2c2a443a0/brick root > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > > > > > > > > > > > > Created N/A N/A > > > > > > > > > > > > any ideas ? where can find logs for the failed commands > > check > > > > > in > > > > > > gysncd.log , the trace is as below: > > > > > > > > > > > > [2019-03-25 04:04:42.295043] I [gsyncd(monitor):297:main] > > > > > : > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > l_e7 > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > [2019-03-25 04:04:42.387192] E > > > > > > [syncdutils(monitor):332:log_raise_exception] : FAIL: > > > > > > Traceback (most recent call last): > > > > > > File > > "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", > > > > > line > > > > > > 311, in main > > > > > > func(args) > > > > > > File > > "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", > > > > > line > > > > > > 50, in subcmd_monitor > > > > > > return monitor.monitor(local, remote) > > > > > > File > > "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > > > line > > > > > > 427, in monitor > > > > > > return Monitor().multiplex(*distribute(local, remote)) > > > > > > File > > "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > > > line > > > > > > 370, in distribute > > > > > > mvol = Volinfo(master.volume, master.host) > > > > > > File > > > > > "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", > > line > > > > > > 860, in __init__ > > > > > > print "debug varible " %vix > > > > > > TypeError: not all arguments converted during string > > formatting > > > > > > [2019-03-25 04:04:48.997519] I [gsyncd(config- > > get):297:main] > > > > > : > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > l_e7 > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > [2019-03-25 04:04:49.93528] I [gsyncd(status):297:main] > > : > > > > > Using > > > > > > session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > l_e7 > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > [2019-03-25 04:08:07.194348] I [gsyncd(config- > > get):297:main] > > > > > : > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > l_e7 > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > [2019-03-25 04:08:07.262588] I [gsyncd(config- > > get):297:main] > > > > > : > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > l_e7 > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > [2019-03-25 04:08:07.550080] I [gsyncd(config- > > get):297:main] > > > > > : > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > l_e7 > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > [2019-03-25 04:08:18.933028] I [gsyncd(config- > > get):297:main] > > > > > : > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > l_e7 > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > [2019-03-25 04:08:19.25285] I [gsyncd(status):297:main] > > : > > > > > Using > > > > > > session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > l_e7 > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > [2019-03-25 04:09:15.766882] I [gsyncd(config- > > get):297:main] > > > > > : > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > l_e7 > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > [2019-03-25 04:09:16.30267] I [gsyncd(config-get):297:main] > > > > > : > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > l_e7 > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > [2019-03-25 04:09:16.89006] I [gsyncd(config-set):297:main] > > > > > : > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > l_e7 > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > > > > > regards, > > > > > > Maurya > > > > > > > > > > > > On Mon, Mar 25, 2019 at 9:08 AM Aravinda < > > avishwan at redhat.com> > > > > > wrote: > > > > > > > Use `ssh-port ` while creating the Geo-rep session > > > > > > > > > > > > > > Ref: > > > > > > > > > > > > > > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#creating-the-session > > > > > > > > > > > > > > And set the ssh-port option before start. > > > > > > > > > > > > > > ``` > > > > > > > gluster volume geo-replication \ > > > > > > > [@]:: config > > > > > > > ssh-port 2222 > > > > > > > ``` > > > > > > > -- regards Aravinda From nux at li.nux.ro Tue Mar 26 11:21:54 2019 From: nux at li.nux.ro (Nux!) Date: Tue, 26 Mar 2019 11:21:54 +0000 (GMT) Subject: [Gluster-users] Prioritise local bricks for IO? Message-ID: <29221907.583.1553599314586.JavaMail.zimbra@li.nux.ro> Hello, I'm trying to set up a distributed backup storage (no replicas), but I'd like to prioritise the local bricks for any IO done on the volume. This will be a backup stor, so in other words, I'd like the files to be written locally if there is space, so as to save the NICs for other traffic. Anyone knows how this might be achievable, if at all? -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro From alvin at netvel.net Tue Mar 26 12:40:08 2019 From: alvin at netvel.net (Alvin Starr) Date: Tue, 26 Mar 2019 08:40:08 -0400 Subject: [Gluster-users] recovery from reboot time? In-Reply-To: References: Message-ID: After almost a week of doing nothing the brick failed and we were able to stop and restart glusterd and then could start a manual heal. It was interesting when the heal started the time to completion was just about 21 days but as it worked through the 300000 some entries it got faster to the point where it completed in 2 days. Now I have 2 gfids that refuse to heal. We have also been looking at converting these systems to RHEL and buying support from RH but it seems that the sales arm is not interested in calling people back. On 3/20/19 1:39 AM, Amar Tumballi Suryanarayan wrote: > There are 2 things happen after a reboot. > > 1. glusterd (management layer) does a sanity check of its volumes, and > sees if there are anything different while it went down, and tries to > correct its state. > ? - This is fine as long as number of volumes are less, or numbers of > nodes are less. (less is referred as < 100). > > 2. If it is a replicate or disperse volume, then self-heal daemon does > check if there are any self-heal pending. > ? - This does a 'index' crawl to check which files actually changed > when one of the brick/node was down. > ? - If this list is big, it can sometimes does take some time. > > But 'Days/weeks/month' is not a expected/observed behavior. Is there > any logs in the log file? If not, can you do a 'strace -f' to the pid > which is consuming major CPU?? (strace for 1 mins sample is good enough). > > -Amar > > > On Wed, Mar 20, 2019 at 2:05 AM Alvin Starr > wrote: > > We have a simple replicated volume? with 1 brick on each node of 17TB. > > There is something like 35M files and directories on the volume. > > One of the servers rebooted and is now "doing something". > > It kind of looks like its doing some kind of sality check with the > node > that did not reboot but its hard to say and it looks like it may > run for > hours/days/months.... > > Will Gluster take a long time with Lots of little files to resync? > > > -- > Alvin Starr? ? ? ? ? ? ? ? ? ?||? ?land:? (905)513-7688 > Netvel Inc.? ? ? ? ? ? ? ? ? ?||? ?Cell:? (416)806-0133 > alvin at netvel.net ? ? ? ? ? ? ? || > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Amar Tumballi (amarts) -- Alvin Starr || land: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin at netvel.net || -------------- next part -------------- An HTML attachment was scrubbed... URL: From sabose at redhat.com Tue Mar 26 13:23:57 2019 From: sabose at redhat.com (Sahina Bose) Date: Tue, 26 Mar 2019 18:53:57 +0530 Subject: [Gluster-users] [ovirt-users] VM disk corruption with LSM on Gluster In-Reply-To: <723e9e09-1c2b-8422-cb37-d72ba3bb26dc@hoentjen.eu> References: <723e9e09-1c2b-8422-cb37-d72ba3bb26dc@hoentjen.eu> Message-ID: +Krutika Dhananjay and gluster ml On Tue, Mar 26, 2019 at 6:16 PM Sander Hoentjen wrote: > > Hello, > > tl;dr We have disk corruption when doing live storage migration on oVirt > 4.2 with gluster 3.12.15. Any idea why? > > We have a 3-node oVirt cluster that is both compute and gluster-storage. > The manager runs on separate hardware. We are running out of space on > this volume, so we added another Gluster volume that is bigger, put a > storage domain on it and then we migrated VM's to it with LSM. After > some time, we noticed that (some of) the migrated VM's had corrupted > filesystems. After moving everything back with export-import to the old > domain where possible, and recovering from backups where needed we set > off to investigate this issue. > > We are now at the point where we can reproduce this issue within a day. > What we have found so far: > 1) The corruption occurs at the very end of the replication step, most > probably between START and FINISH of diskReplicateFinish, before the > START merge step > 2) In the corrupted VM, at some place where data should be, this data is > replaced by zero's. This can be file-contents or a directory-structure > or whatever. > 3) The source gluster volume has different settings then the destination > (Mostly because the defaults were different at creation time): > > Setting old(src) new(dst) > cluster.op-version 30800 30800 (the same) > cluster.max-op-version 31202 31202 (the same) > cluster.metadata-self-heal off on > cluster.data-self-heal off on > cluster.entry-self-heal off on > performance.low-prio-threads 16 32 > performance.strict-o-direct off on > network.ping-timeout 42 30 > network.remote-dio enable off > transport.address-family - inet > performance.stat-prefetch off on > features.shard-block-size 512MB 64MB > cluster.shd-max-threads 1 8 > cluster.shd-wait-qlength 1024 10000 > cluster.locking-scheme full granular > cluster.granular-entry-heal no enable > > 4) To test, we migrate some VM's back and forth. The corruption does not > occur every time. To this point it only occurs from old to new, but we > don't have enough data-points to be sure about that. > > Anybody an idea what is causing the corruption? Is this the best list to > ask, or should I ask on a Gluster list? I am not sure if this is oVirt > specific or Gluster specific though. Do you have logs from old and new gluster volumes? Any errors in the new volume's fuse mount logs? > > Kind regards, > Sander Hoentjen > _______________________________________________ > Users mailing list -- users at ovirt.org > To unsubscribe send an email to users-leave at ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ > List Archives: https://lists.ovirt.org/archives/list/users at ovirt.org/message/43E2QYJYDHPYTIU3IFS53WS4WL5OFXUV/ From mauryam at gmail.com Tue Mar 26 14:14:55 2019 From: mauryam at gmail.com (Maurya M) Date: Tue, 26 Mar 2019 19:44:55 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: <47ea47b7c4709d16677c7086fe683203bdd1662e.camel@redhat.com> References: <5342e4c8e5bff06a22edbc6be704e3f10bd67a4e.camel@redhat.com> <47ea47b7c4709d16677c7086fe683203bdd1662e.camel@redhat.com> Message-ID: Hi Arvind, Have patched my setup with your fix: re-run the setup, but this time getting a different error where it failed to commit the ssh-port on my other 2 nodes on the master cluster, so manually copied the : *[vars]* *ssh-port = 2222* into gsyncd.conf and status reported back is as shown below : Any ideas how to troubleshoot this? MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 172.16.189.4 vol_75a5fd373d88ba687f591f3353fa05cf /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_116fb9427fb26f752d9ba8e45e183cb1/brick root 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f 172.16.201.4 *Passive *N/A N/A 172.16.189.35 vol_75a5fd373d88ba687f591f3353fa05cf /var/lib/heketi/mounts/vg_05708751110fe60b3e7da15bdcf6d4d4/brick_266bb08f0d466d346f8c0b19569736fb/brick root 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A *Faulty *N/A N/A 172.16.189.66 vol_75a5fd373d88ba687f591f3353fa05cf /var/lib/heketi/mounts/vg_4b92a2b687e59b7311055d3809b77c06/brick_dfa44c9380cdedac708e27e2c2a443a0/brick root 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A *Initializing*... N/A N/A On Tue, Mar 26, 2019 at 1:40 PM Aravinda wrote: > I got chance to investigate this issue further and identified a issue > with Geo-replication config set and sent patch to fix the same. > > BUG: https://bugzilla.redhat.com/show_bug.cgi?id=1692666 > Patch: https://review.gluster.org/22418 > > On Mon, 2019-03-25 at 15:37 +0530, Maurya M wrote: > > ran this command : ssh -p 2222 -i /var/lib/glusterd/geo- > > replication/secret.pem root@gluster volume info --xml > > > > attaching the output. > > > > > > > > On Mon, Mar 25, 2019 at 2:13 PM Aravinda wrote: > > > Geo-rep is running `ssh -i /var/lib/glusterd/geo- > > > replication/secret.pem > > > root@ gluster volume info --xml` and parsing its output. > > > Please try to to run the command from the same node and let us know > > > the > > > output. > > > > > > > > > On Mon, 2019-03-25 at 11:43 +0530, Maurya M wrote: > > > > Now the error is on the same line 860 : as highlighted below: > > > > > > > > [2019-03-25 06:11:52.376238] E > > > > [syncdutils(monitor):332:log_raise_exception] : FAIL: > > > > Traceback (most recent call last): > > > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line > > > > 311, in main > > > > func(args) > > > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", > > > line > > > > 50, in subcmd_monitor > > > > return monitor.monitor(local, remote) > > > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > line > > > > 427, in monitor > > > > return Monitor().multiplex(*distribute(local, remote)) > > > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > line > > > > 386, in distribute > > > > svol = Volinfo(slave.volume, "localhost", prelude) > > > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", > > > line > > > > 860, in __init__ > > > > vi = XET.fromstring(vix) > > > > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line > > > 1300, in > > > > XML > > > > parser.feed(text) > > > > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line > > > 1642, in > > > > feed > > > > self._raiseerror(v) > > > > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line > > > 1506, in > > > > _raiseerror > > > > raise err > > > > ParseError: syntax error: line 1, column 0 > > > > > > > > > > > > On Mon, Mar 25, 2019 at 11:29 AM Maurya M > > > wrote: > > > > > Sorry my bad, had put the print line to debug, i am using > > > gluster > > > > > 4.1.7, will remove the print line. > > > > > > > > > > On Mon, Mar 25, 2019 at 10:52 AM Aravinda > > > > > wrote: > > > > > > Below print statement looks wrong. Latest Glusterfs code > > > doesn't > > > > > > have > > > > > > this print statement. Please let us know which version of > > > > > > glusterfs you > > > > > > are using. > > > > > > > > > > > > > > > > > > ``` > > > > > > File > > > "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", > > > > > > line > > > > > > 860, in __init__ > > > > > > print "debug varible " %vix > > > > > > ``` > > > > > > > > > > > > As a workaround, edit that file and comment the print line > > > and > > > > > > test the > > > > > > geo-rep config command. > > > > > > > > > > > > > > > > > > On Mon, 2019-03-25 at 09:46 +0530, Maurya M wrote: > > > > > > > hi Aravinda, > > > > > > > had the session created using : create ssh-port 2222 push- > > > pem > > > > > > and > > > > > > > also the : > > > > > > > > > > > > > > gluster volume geo-replication > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f config > > > ssh- > > > > > > port > > > > > > > 2222 > > > > > > > > > > > > > > hitting this message: > > > > > > > geo-replication config-set failed for > > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > > > > > > geo-replication command failed > > > > > > > > > > > > > > Below is snap of status: > > > > > > > > > > > > > > [root at k8s-agentpool1-24779565-1 > > > > > > > > > > > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a73057 > > > > > > 8e45ed9d51b9a80df6c33f]# gluster volume geo-replication > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f status > > > > > > > > > > > > > > MASTER NODE MASTER VOL > > > MASTER > > > > > > > BRICK > > > > > > > > > > > > > > > > SLAVE USER SLAVE > > > > > > > > > > > > > > > > SLAVE NODE STATUS > > > CRAWL > > > > > > STATUS > > > > > > > LAST_SYNCED > > > > > > > --------------------------------------------------------- > > > ---- > > > > > > ------ > > > > > > > --------------------------------------------------------- > > > ---- > > > > > > ------ > > > > > > > --------------------------------------------------------- > > > ---- > > > > > > ------ > > > > > > > --------------------------------------------------------- > > > ---- > > > > > > ------ > > > > > > > ---------------- > > > > > > > 172.16.189.4 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > > > > > > > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_ > > > > > > 116f > > > > > > > b9427fb26f752d9ba8e45e183cb1/brick root > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > > > > > > > > > > > > > > > Created N/A N/A > > > > > > > 172.16.189.35 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > > > > > > > > /var/lib/heketi/mounts/vg_05708751110fe60b3e7da15bdcf6d4d4/brick_ > > > > > > 266b > > > > > > > b08f0d466d346f8c0b19569736fb/brick root > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > > > > > > > > > > > > > > > Created N/A N/A > > > > > > > 172.16.189.66 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > > > > > > > > /var/lib/heketi/mounts/vg_4b92a2b687e59b7311055d3809b77c06/brick_ > > > > > > dfa4 > > > > > > > 4c9380cdedac708e27e2c2a443a0/brick root > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > > > > > > > > > > > > > > > Created N/A N/A > > > > > > > > > > > > > > any ideas ? where can find logs for the failed commands > > > check > > > > > > in > > > > > > > gysncd.log , the trace is as below: > > > > > > > > > > > > > > [2019-03-25 04:04:42.295043] I [gsyncd(monitor):297:main] > > > > > > : > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > l_e7 > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > [2019-03-25 04:04:42.387192] E > > > > > > > [syncdutils(monitor):332:log_raise_exception] : FAIL: > > > > > > > Traceback (most recent call last): > > > > > > > File > > > "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", > > > > > > line > > > > > > > 311, in main > > > > > > > func(args) > > > > > > > File > > > "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", > > > > > > line > > > > > > > 50, in subcmd_monitor > > > > > > > return monitor.monitor(local, remote) > > > > > > > File > > > "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > > > > line > > > > > > > 427, in monitor > > > > > > > return Monitor().multiplex(*distribute(local, remote)) > > > > > > > File > > > "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > > > > line > > > > > > > 370, in distribute > > > > > > > mvol = Volinfo(master.volume, master.host) > > > > > > > File > > > > > > "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", > > > line > > > > > > > 860, in __init__ > > > > > > > print "debug varible " %vix > > > > > > > TypeError: not all arguments converted during string > > > formatting > > > > > > > [2019-03-25 04:04:48.997519] I [gsyncd(config- > > > get):297:main] > > > > > > : > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > l_e7 > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > [2019-03-25 04:04:49.93528] I [gsyncd(status):297:main] > > > : > > > > > > Using > > > > > > > session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > l_e7 > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > [2019-03-25 04:08:07.194348] I [gsyncd(config- > > > get):297:main] > > > > > > : > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > l_e7 > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > [2019-03-25 04:08:07.262588] I [gsyncd(config- > > > get):297:main] > > > > > > : > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > l_e7 > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > [2019-03-25 04:08:07.550080] I [gsyncd(config- > > > get):297:main] > > > > > > : > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > l_e7 > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > [2019-03-25 04:08:18.933028] I [gsyncd(config- > > > get):297:main] > > > > > > : > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > l_e7 > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > [2019-03-25 04:08:19.25285] I [gsyncd(status):297:main] > > > : > > > > > > Using > > > > > > > session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > l_e7 > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > [2019-03-25 04:09:15.766882] I [gsyncd(config- > > > get):297:main] > > > > > > : > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > l_e7 > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > [2019-03-25 04:09:16.30267] I [gsyncd(config-get):297:main] > > > > > > : > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > l_e7 > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > [2019-03-25 04:09:16.89006] I [gsyncd(config-set):297:main] > > > > > > : > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > l_e7 > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > > > > > > > regards, > > > > > > > Maurya > > > > > > > > > > > > > > On Mon, Mar 25, 2019 at 9:08 AM Aravinda < > > > avishwan at redhat.com> > > > > > > wrote: > > > > > > > > Use `ssh-port ` while creating the Geo-rep session > > > > > > > > > > > > > > > > Ref: > > > > > > > > > > > > > > > > > > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#creating-the-session > > > > > > > > > > > > > > > > And set the ssh-port option before start. > > > > > > > > > > > > > > > > ``` > > > > > > > > gluster volume geo-replication \ > > > > > > > > [@]:: config > > > > > > > > ssh-port 2222 > > > > > > > > ``` > > > > > > > > > -- > regards > Aravinda > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From avishwan at redhat.com Tue Mar 26 15:03:56 2019 From: avishwan at redhat.com (Aravinda) Date: Tue, 26 Mar 2019 20:33:56 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: References: <5342e4c8e5bff06a22edbc6be704e3f10bd67a4e.camel@redhat.com> <47ea47b7c4709d16677c7086fe683203bdd1662e.camel@redhat.com> Message-ID: <526b85c223325f79256dd7d991c6340a7e40ba14.camel@redhat.com> Please check error message in gsyncd.log file in /var/log/glusterfs/geo-replication/ On Tue, 2019-03-26 at 19:44 +0530, Maurya M wrote: > Hi Arvind, > Have patched my setup with your fix: re-run the setup, but this time > getting a different error where it failed to commit the ssh-port on > my other 2 nodes on the master cluster, so manually copied the : > [vars] > ssh-port = 2222 > > into gsyncd.conf > > and status reported back is as shown below : Any ideas how to > troubleshoot this? > > MASTER NODE MASTER VOL MASTER > BRICK > SLAVE USER SLAVE > SLAVE NODE STATUS > CRAWL STATUS LAST_SYNCED > ------------------------------------------------------------------- > ------------------------------------------------------------------- > ------------------------------------------------------------------- > ------------------------------------------------------------------- > -------------------------- > 172.16.189.4 vol_75a5fd373d88ba687f591f3353fa05cf > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_116f > b9427fb26f752d9ba8e45e183cb1/brick root > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f 172.16.201.4 > Passive N/A N/A > 172.16.189.35 vol_75a5fd373d88ba687f591f3353fa05cf > /var/lib/heketi/mounts/vg_05708751110fe60b3e7da15bdcf6d4d4/brick_266b > b08f0d466d346f8c0b19569736fb/brick root > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > Faulty N/A N/A > 172.16.189.66 vol_75a5fd373d88ba687f591f3353fa05cf > /var/lib/heketi/mounts/vg_4b92a2b687e59b7311055d3809b77c06/brick_dfa4 > 4c9380cdedac708e27e2c2a443a0/brick root > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > Initializing... N/A N/A > > > > > On Tue, Mar 26, 2019 at 1:40 PM Aravinda wrote: > > I got chance to investigate this issue further and identified a > > issue > > with Geo-replication config set and sent patch to fix the same. > > > > BUG: https://bugzilla.redhat.com/show_bug.cgi?id=1692666 > > Patch: https://review.gluster.org/22418 > > > > On Mon, 2019-03-25 at 15:37 +0530, Maurya M wrote: > > > ran this command : ssh -p 2222 -i /var/lib/glusterd/geo- > > > replication/secret.pem root@gluster volume info -- > > xml > > > > > > attaching the output. > > > > > > > > > > > > On Mon, Mar 25, 2019 at 2:13 PM Aravinda > > wrote: > > > > Geo-rep is running `ssh -i /var/lib/glusterd/geo- > > > > replication/secret.pem > > > > root@ gluster volume info --xml` and parsing its > > output. > > > > Please try to to run the command from the same node and let us > > know > > > > the > > > > output. > > > > > > > > > > > > On Mon, 2019-03-25 at 11:43 +0530, Maurya M wrote: > > > > > Now the error is on the same line 860 : as highlighted below: > > > > > > > > > > [2019-03-25 06:11:52.376238] E > > > > > [syncdutils(monitor):332:log_raise_exception] : FAIL: > > > > > Traceback (most recent call last): > > > > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", > > line > > > > > 311, in main > > > > > func(args) > > > > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", > > > > line > > > > > 50, in subcmd_monitor > > > > > return monitor.monitor(local, remote) > > > > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > > line > > > > > 427, in monitor > > > > > return Monitor().multiplex(*distribute(local, remote)) > > > > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > > line > > > > > 386, in distribute > > > > > svol = Volinfo(slave.volume, "localhost", prelude) > > > > > File > > "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", > > > > line > > > > > 860, in __init__ > > > > > vi = XET.fromstring(vix) > > > > > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line > > > > 1300, in > > > > > XML > > > > > parser.feed(text) > > > > > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line > > > > 1642, in > > > > > feed > > > > > self._raiseerror(v) > > > > > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line > > > > 1506, in > > > > > _raiseerror > > > > > raise err > > > > > ParseError: syntax error: line 1, column 0 > > > > > > > > > > > > > > > On Mon, Mar 25, 2019 at 11:29 AM Maurya M > > > > wrote: > > > > > > Sorry my bad, had put the print line to debug, i am using > > > > gluster > > > > > > 4.1.7, will remove the print line. > > > > > > > > > > > > On Mon, Mar 25, 2019 at 10:52 AM Aravinda < > > avishwan at redhat.com> > > > > > > wrote: > > > > > > > Below print statement looks wrong. Latest Glusterfs code > > > > doesn't > > > > > > > have > > > > > > > this print statement. Please let us know which version of > > > > > > > glusterfs you > > > > > > > are using. > > > > > > > > > > > > > > > > > > > > > ``` > > > > > > > File > > > > "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", > > > > > > > line > > > > > > > 860, in __init__ > > > > > > > print "debug varible " %vix > > > > > > > ``` > > > > > > > > > > > > > > As a workaround, edit that file and comment the print > > line > > > > and > > > > > > > test the > > > > > > > geo-rep config command. > > > > > > > > > > > > > > > > > > > > > On Mon, 2019-03-25 at 09:46 +0530, Maurya M wrote: > > > > > > > > hi Aravinda, > > > > > > > > had the session created using : create ssh-port 2222 > > push- > > > > pem > > > > > > > and > > > > > > > > also the : > > > > > > > > > > > > > > > > gluster volume geo-replication > > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > config > > > > ssh- > > > > > > > port > > > > > > > > 2222 > > > > > > > > > > > > > > > > hitting this message: > > > > > > > > geo-replication config-set failed for > > > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > > > > > > > geo-replication command failed > > > > > > > > > > > > > > > > Below is snap of status: > > > > > > > > > > > > > > > > [root at k8s-agentpool1-24779565-1 > > > > > > > > > > > > > > > > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a73057 > > > > > > > 8e45ed9d51b9a80df6c33f]# gluster volume geo-replication > > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > status > > > > > > > > > > > > > > > > MASTER NODE MASTER VOL > > > > > > MASTER > > > > > > > > BRICK > > > > > > > > > > > > > > > > > > > > > SLAVE USER SLAVE > > > > > > > > > > > > > > > > > > > > > SLAVE NODE STATUS > > > > CRAWL > > > > > > > STATUS > > > > > > > > LAST_SYNCED > > > > > > > > ----------------------------------------------------- > > ---- > > > > ---- > > > > > > > ------ > > > > > > > > ----------------------------------------------------- > > ---- > > > > ---- > > > > > > > ------ > > > > > > > > ----------------------------------------------------- > > ---- > > > > ---- > > > > > > > ------ > > > > > > > > ----------------------------------------------------- > > ---- > > > > ---- > > > > > > > ------ > > > > > > > > ---------------- > > > > > > > > 172.16.189.4 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > > > > > > > > > > > > > > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_ > > > > > > > 116f > > > > > > > > b9427fb26f752d9ba8e45e183cb1/brick root > > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > N/A > > > > > > > > > > > > > > > > > > > Created N/A N/A > > > > > > > > 172.16.189.35 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > > > > > > > > > > > > > > > /var/lib/heketi/mounts/vg_05708751110fe60b3e7da15bdcf6d4d4/brick_ > > > > > > > 266b > > > > > > > > b08f0d466d346f8c0b19569736fb/brick root > > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > N/A > > > > > > > > > > > > > > > > > > > Created N/A N/A > > > > > > > > 172.16.189.66 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > > > > > > > > > > > > > > > /var/lib/heketi/mounts/vg_4b92a2b687e59b7311055d3809b77c06/brick_ > > > > > > > dfa4 > > > > > > > > 4c9380cdedac708e27e2c2a443a0/brick root > > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > N/A > > > > > > > > > > > > > > > > > > > Created N/A N/A > > > > > > > > > > > > > > > > any ideas ? where can find logs for the failed commands > > > > check > > > > > > > in > > > > > > > > gysncd.log , the trace is as below: > > > > > > > > > > > > > > > > [2019-03-25 04:04:42.295043] I > > [gsyncd(monitor):297:main] > > > > > > > : > > > > > > > > Using session config file > > path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > l_e7 > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > [2019-03-25 04:04:42.387192] E > > > > > > > > [syncdutils(monitor):332:log_raise_exception] : > > FAIL: > > > > > > > > Traceback (most recent call last): > > > > > > > > File > > > > "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", > > > > > > > line > > > > > > > > 311, in main > > > > > > > > func(args) > > > > > > > > File > > > > "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", > > > > > > > line > > > > > > > > 50, in subcmd_monitor > > > > > > > > return monitor.monitor(local, remote) > > > > > > > > File > > > > "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > > > > > line > > > > > > > > 427, in monitor > > > > > > > > return Monitor().multiplex(*distribute(local, > > remote)) > > > > > > > > File > > > > "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > > > > > line > > > > > > > > 370, in distribute > > > > > > > > mvol = Volinfo(master.volume, master.host) > > > > > > > > File > > > > > > > "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", > > > > line > > > > > > > > 860, in __init__ > > > > > > > > print "debug varible " %vix > > > > > > > > TypeError: not all arguments converted during string > > > > formatting > > > > > > > > [2019-03-25 04:04:48.997519] I [gsyncd(config- > > > > get):297:main] > > > > > > > : > > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > l_e7 > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > [2019-03-25 04:04:49.93528] I [gsyncd(status):297:main] > > > > : > > > > > > > Using > > > > > > > > session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > l_e7 > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > [2019-03-25 04:08:07.194348] I [gsyncd(config- > > > > get):297:main] > > > > > > > : > > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > l_e7 > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > [2019-03-25 04:08:07.262588] I [gsyncd(config- > > > > get):297:main] > > > > > > > : > > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > l_e7 > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > [2019-03-25 04:08:07.550080] I [gsyncd(config- > > > > get):297:main] > > > > > > > : > > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > l_e7 > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > [2019-03-25 04:08:18.933028] I [gsyncd(config- > > > > get):297:main] > > > > > > > : > > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > l_e7 > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > [2019-03-25 04:08:19.25285] I [gsyncd(status):297:main] > > > > : > > > > > > > Using > > > > > > > > session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > l_e7 > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > [2019-03-25 04:09:15.766882] I [gsyncd(config- > > > > get):297:main] > > > > > > > : > > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > l_e7 > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > [2019-03-25 04:09:16.30267] I [gsyncd(config- > > get):297:main] > > > > > > > : > > > > > > > > Using session config file > > path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > l_e7 > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > [2019-03-25 04:09:16.89006] I [gsyncd(config- > > set):297:main] > > > > > > > : > > > > > > > > Using session config file > > path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > l_e7 > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > > > > > > > > > regards, > > > > > > > > Maurya > > > > > > > > > > > > > > > > On Mon, Mar 25, 2019 at 9:08 AM Aravinda < > > > > avishwan at redhat.com> > > > > > > > wrote: > > > > > > > > > Use `ssh-port ` while creating the Geo-rep > > session > > > > > > > > > > > > > > > > > > Ref: > > > > > > > > > > > > > > > > > > > > > > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#creating-the-session > > > > > > > > > > > > > > > > > > And set the ssh-port option before start. > > > > > > > > > > > > > > > > > > ``` > > > > > > > > > gluster volume geo-replication \ > > > > > > > > > [@]:: > > config > > > > > > > > > ssh-port 2222 > > > > > > > > > ``` > > > > > > > > > -- regards Aravinda From sankarshan.mukhopadhyay at gmail.com Tue Mar 26 15:45:58 2019 From: sankarshan.mukhopadhyay at gmail.com (Sankarshan Mukhopadhyay) Date: Tue, 26 Mar 2019 21:15:58 +0530 Subject: [Gluster-users] recovery from reboot time? In-Reply-To: References: Message-ID: On Tue, Mar 26, 2019 at 6:10 PM Alvin Starr wrote: > > After almost a week of doing nothing the brick failed and we were able to stop and restart glusterd and then could start a manual heal. > > It was interesting when the heal started the time to completion was just about 21 days but as it worked through the 300000 some entries it got faster to the point where it completed in 2 days. > > Now I have 2 gfids that refuse to heal. > Do you need help from the developers on that topic? From budic at onholyground.com Tue Mar 26 16:26:00 2019 From: budic at onholyground.com (Darrell Budic) Date: Tue, 26 Mar 2019 11:26:00 -0500 Subject: [Gluster-users] Announcing Gluster release 5.5 In-Reply-To: <71be9d39-2794-bfab-ba58-6b904d22e1a1@redhat.com> References: <71be9d39-2794-bfab-ba58-6b904d22e1a1@redhat.com> Message-ID: Heads up for the Centos storage maintainers, I?ve tested 5.5 on my dev cluster and it behaves well. It also resolved rolling upgrade issues in a hyperconverged ovirt cluster for me, so I recommend moving it out of testing. -Darrell > On Mar 21, 2019, at 6:06 AM, Shyam Ranganathan wrote: > > The Gluster community is pleased to announce the release of Gluster > 5.5 (packages available at [1]). > > Release notes for the release can be found at [3]. > > Major changes, features and limitations addressed in this release: > > - Release 5.4 introduced an incompatible change that prevented rolling > upgrades, and hence was never announced to the lists. As a result we are > jumping a release version and going to 5.5 from 5.3, that does not have > the problem. > > Thanks, > Gluster community > > [1] Packages for 5.5: > https://download.gluster.org/pub/gluster/glusterfs/5/5.5/ > > [2] Release notes for 5.5: > https://docs.gluster.org/en/latest/release-notes/5.5/ > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From ndevos at redhat.com Tue Mar 26 17:01:33 2019 From: ndevos at redhat.com (Niels de Vos) Date: Tue, 26 Mar 2019 18:01:33 +0100 Subject: [Gluster-users] [Gluster-Maintainers] Announcing Gluster release 5.5 In-Reply-To: References: <71be9d39-2794-bfab-ba58-6b904d22e1a1@redhat.com> Message-ID: <20190326170133.GD2684@ndevos-x270.lan.nixpanic.net> On Tue, Mar 26, 2019 at 11:26:00AM -0500, Darrell Budic wrote: > Heads up for the Centos storage maintainers, I?ve tested 5.5 on my dev cluster and it behaves well. It also resolved rolling upgrade issues in a hyperconverged ovirt cluster for me, so I recommend moving it out of testing. Thanks for the info! Packages have been pushed to the CentOS mirrors yesterday already. Some mirrors take a little more time to catch up, but I expect that all have the update by now. Niels > > -Darrell > > > On Mar 21, 2019, at 6:06 AM, Shyam Ranganathan wrote: > > > > The Gluster community is pleased to announce the release of Gluster > > 5.5 (packages available at [1]). > > > > Release notes for the release can be found at [3]. > > > > Major changes, features and limitations addressed in this release: > > > > - Release 5.4 introduced an incompatible change that prevented rolling > > upgrades, and hence was never announced to the lists. As a result we are > > jumping a release version and going to 5.5 from 5.3, that does not have > > the problem. > > > > Thanks, > > Gluster community > > > > [1] Packages for 5.5: > > https://download.gluster.org/pub/gluster/glusterfs/5/5.5/ > > > > [2] Release notes for 5.5: > > https://docs.gluster.org/en/latest/release-notes/5.5/ > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > maintainers mailing list > maintainers at gluster.org > https://lists.gluster.org/mailman/listinfo/maintainers From sander at hoentjen.eu Tue Mar 26 16:08:40 2019 From: sander at hoentjen.eu (Sander Hoentjen) Date: Tue, 26 Mar 2019 17:08:40 +0100 Subject: [Gluster-users] [ovirt-users] Re: VM disk corruption with LSM on Gluster In-Reply-To: References: <723e9e09-1c2b-8422-cb37-d72ba3bb26dc@hoentjen.eu> Message-ID: <0bd6f1c5-1448-50aa-b3c6-d3ee6f0a069d@hoentjen.eu> On 26-03-19 14:23, Sahina Bose wrote: > +Krutika Dhananjay and gluster ml > > On Tue, Mar 26, 2019 at 6:16 PM Sander Hoentjen wrote: >> Hello, >> >> tl;dr We have disk corruption when doing live storage migration on oVirt >> 4.2 with gluster 3.12.15. Any idea why? >> >> We have a 3-node oVirt cluster that is both compute and gluster-storage. >> The manager runs on separate hardware. We are running out of space on >> this volume, so we added another Gluster volume that is bigger, put a >> storage domain on it and then we migrated VM's to it with LSM. After >> some time, we noticed that (some of) the migrated VM's had corrupted >> filesystems. After moving everything back with export-import to the old >> domain where possible, and recovering from backups where needed we set >> off to investigate this issue. >> >> We are now at the point where we can reproduce this issue within a day. >> What we have found so far: >> 1) The corruption occurs at the very end of the replication step, most >> probably between START and FINISH of diskReplicateFinish, before the >> START merge step >> 2) In the corrupted VM, at some place where data should be, this data is >> replaced by zero's. This can be file-contents or a directory-structure >> or whatever. >> 3) The source gluster volume has different settings then the destination >> (Mostly because the defaults were different at creation time): >> >> Setting old(src) new(dst) >> cluster.op-version 30800 30800 (the same) >> cluster.max-op-version 31202 31202 (the same) >> cluster.metadata-self-heal off on >> cluster.data-self-heal off on >> cluster.entry-self-heal off on >> performance.low-prio-threads 16 32 >> performance.strict-o-direct off on >> network.ping-timeout 42 30 >> network.remote-dio enable off >> transport.address-family - inet >> performance.stat-prefetch off on >> features.shard-block-size 512MB 64MB >> cluster.shd-max-threads 1 8 >> cluster.shd-wait-qlength 1024 10000 >> cluster.locking-scheme full granular >> cluster.granular-entry-heal no enable >> >> 4) To test, we migrate some VM's back and forth. The corruption does not >> occur every time. To this point it only occurs from old to new, but we >> don't have enough data-points to be sure about that. >> >> Anybody an idea what is causing the corruption? Is this the best list to >> ask, or should I ask on a Gluster list? I am not sure if this is oVirt >> specific or Gluster specific though. > Do you have logs from old and new gluster volumes? Any errors in the > new volume's fuse mount logs? Around the time of corruption I see the message: The message "I [MSGID: 133017] [shard.c:4941:shard_seek] 0-ZoneA_Gluster1-shard: seek called on 7fabc273-3d8a-4a49-8906-b8ccbea4a49f. [Operation not supported]" repeated 231 times between [2019-03-26 13:14:22.297333] and [2019-03-26 13:15:42.912170] I also see this message at other times, when I don't see the corruption occur, though. -- Sander From alvin at netvel.net Tue Mar 26 17:24:08 2019 From: alvin at netvel.net (Alvin Starr) Date: Tue, 26 Mar 2019 13:24:08 -0400 Subject: [Gluster-users] recovery from reboot time? In-Reply-To: References: Message-ID: I tracked down the 2 gfis and it looks like they were "partly?" configured. I copied the data off the gluster volume? they existed on and then removed the files on the server and recreated them on the client. Things seem to be sane again but at this point I am not amazingly confident in the consistency of the filesystem. I will try running a bit-rot scan against the system to see if there are any errors. On 3/26/19 11:45 AM, Sankarshan Mukhopadhyay wrote: > On Tue, Mar 26, 2019 at 6:10 PM Alvin Starr wrote: >> After almost a week of doing nothing the brick failed and we were able to stop and restart glusterd and then could start a manual heal. >> >> It was interesting when the heal started the time to completion was just about 21 days but as it worked through the 300000 some entries it got faster to the point where it completed in 2 days. >> >> Now I have 2 gfids that refuse to heal. >> > Do you need help from the developers on that topic? > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Alvin Starr || land: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin at netvel.net || From rgowdapp at redhat.com Wed Mar 27 01:48:07 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Wed, 27 Mar 2019 07:18:07 +0530 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks Message-ID: All, Glusterfs cleans up POSIX locks held on an fd when the client/mount through which those locks are held disconnects from bricks/server. This helps Glusterfs to not run into a stale lock problem later (For eg., if application unlocks while the connection was still down). However, this means the lock is no longer exclusive as other applications/clients can acquire the same lock. To communicate that locks are no longer valid, we are planning to mark the fd (which has POSIX locks) bad on a disconnect so that any future operations on that fd will fail, forcing the application to re-open the fd and re-acquire locks it needs [1]. Note that with AFR/replicate in picture we can prevent errors to application as long as Quorum number of children "never ever" lost connection with bricks after locks have been acquired. I am using the term "never ever" as locks are not healed back after re-connection and hence first disconnect would've marked the fd bad and the fd remains so even after re-connection happens. So, its not just Quorum number of children "currently online", but Quorum number of children "never having disconnected with bricks after locks are acquired". However, this use case is not affected if the application don't acquire any POSIX locks. So, I am interested in knowing * whether your use cases use POSIX locks? * Is it feasible for your application to re-open fds and re-acquire locks on seeing EBADFD errors? [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 regards, Raghavendra -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladkopy at gmail.com Wed Mar 27 03:15:57 2019 From: vladkopy at gmail.com (Vlad Kopylov) Date: Tue, 26 Mar 2019 23:15:57 -0400 Subject: [Gluster-users] Prioritise local bricks for IO? In-Reply-To: <29221907.583.1553599314586.JavaMail.zimbra@li.nux.ro> References: <29221907.583.1553599314586.JavaMail.zimbra@li.nux.ro> Message-ID: I don't remember if it still in works NUFA https://github.com/gluster/glusterfs-specs/blob/master/done/Features/nufa.md v On Tue, Mar 26, 2019 at 7:27 AM Nux! wrote: > Hello, > > I'm trying to set up a distributed backup storage (no replicas), but I'd > like to prioritise the local bricks for any IO done on the volume. > This will be a backup stor, so in other words, I'd like the files to be > written locally if there is space, so as to save the NICs for other traffic. > > Anyone knows how this might be achievable, if at all? > > -- > Sent from the Delta quadrant using Borg technology! > > Nux! > www.nux.ro > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdhananj at redhat.com Wed Mar 27 05:02:07 2019 From: kdhananj at redhat.com (Krutika Dhananjay) Date: Wed, 27 Mar 2019 10:32:07 +0530 Subject: [Gluster-users] [ovirt-users] Re: VM disk corruption with LSM on Gluster In-Reply-To: <0bd6f1c5-1448-50aa-b3c6-d3ee6f0a069d@hoentjen.eu> References: <723e9e09-1c2b-8422-cb37-d72ba3bb26dc@hoentjen.eu> <0bd6f1c5-1448-50aa-b3c6-d3ee6f0a069d@hoentjen.eu> Message-ID: Could you enable strict-o-direct and disable remote-dio on the src volume as well, restart the vms on "old" and retry migration? # gluster volume set performance.strict-o-direct on # gluster volume set network.remote-dio off -Krutika On Tue, Mar 26, 2019 at 10:32 PM Sander Hoentjen wrote: > On 26-03-19 14:23, Sahina Bose wrote: > > +Krutika Dhananjay and gluster ml > > > > On Tue, Mar 26, 2019 at 6:16 PM Sander Hoentjen > wrote: > >> Hello, > >> > >> tl;dr We have disk corruption when doing live storage migration on oVirt > >> 4.2 with gluster 3.12.15. Any idea why? > >> > >> We have a 3-node oVirt cluster that is both compute and gluster-storage. > >> The manager runs on separate hardware. We are running out of space on > >> this volume, so we added another Gluster volume that is bigger, put a > >> storage domain on it and then we migrated VM's to it with LSM. After > >> some time, we noticed that (some of) the migrated VM's had corrupted > >> filesystems. After moving everything back with export-import to the old > >> domain where possible, and recovering from backups where needed we set > >> off to investigate this issue. > >> > >> We are now at the point where we can reproduce this issue within a day. > >> What we have found so far: > >> 1) The corruption occurs at the very end of the replication step, most > >> probably between START and FINISH of diskReplicateFinish, before the > >> START merge step > >> 2) In the corrupted VM, at some place where data should be, this data is > >> replaced by zero's. This can be file-contents or a directory-structure > >> or whatever. > >> 3) The source gluster volume has different settings then the destination > >> (Mostly because the defaults were different at creation time): > >> > >> Setting old(src) new(dst) > >> cluster.op-version 30800 30800 (the same) > >> cluster.max-op-version 31202 31202 (the same) > >> cluster.metadata-self-heal off on > >> cluster.data-self-heal off on > >> cluster.entry-self-heal off on > >> performance.low-prio-threads 16 32 > >> performance.strict-o-direct off on > >> network.ping-timeout 42 30 > >> network.remote-dio enable off > >> transport.address-family - inet > >> performance.stat-prefetch off on > >> features.shard-block-size 512MB 64MB > >> cluster.shd-max-threads 1 8 > >> cluster.shd-wait-qlength 1024 10000 > >> cluster.locking-scheme full granular > >> cluster.granular-entry-heal no enable > >> > >> 4) To test, we migrate some VM's back and forth. The corruption does not > >> occur every time. To this point it only occurs from old to new, but we > >> don't have enough data-points to be sure about that. > >> > >> Anybody an idea what is causing the corruption? Is this the best list to > >> ask, or should I ask on a Gluster list? I am not sure if this is oVirt > >> specific or Gluster specific though. > > Do you have logs from old and new gluster volumes? Any errors in the > > new volume's fuse mount logs? > > Around the time of corruption I see the message: > The message "I [MSGID: 133017] [shard.c:4941:shard_seek] > 0-ZoneA_Gluster1-shard: seek called on > 7fabc273-3d8a-4a49-8906-b8ccbea4a49f. [Operation not supported]" repeated > 231 times between [2019-03-26 13:14:22.297333] and [2019-03-26 > 13:15:42.912170] > > I also see this message at other times, when I don't see the corruption > occur, though. > > -- > Sander > _______________________________________________ > Users mailing list -- users at ovirt.org > To unsubscribe send an email to users-leave at ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users at ovirt.org/message/M3T2VGGGV6DE643ZKKJUAF274VSWTJFH/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nux at li.nux.ro Wed Mar 27 07:01:58 2019 From: nux at li.nux.ro (Lucian) Date: Wed, 27 Mar 2019 07:01:58 +0000 Subject: [Gluster-users] Prioritise local bricks for IO? In-Reply-To: References: <29221907.583.1553599314586.JavaMail.zimbra@li.nux.ro> Message-ID: Oh, that's just what the doctor ordered! Hope it works, thanks On 27 March 2019 03:15:57 GMT, Vlad Kopylov wrote: >I don't remember if it still in works >NUFA >https://github.com/gluster/glusterfs-specs/blob/master/done/Features/nufa.md > >v > >On Tue, Mar 26, 2019 at 7:27 AM Nux! wrote: > >> Hello, >> >> I'm trying to set up a distributed backup storage (no replicas), but >I'd >> like to prioritise the local bricks for any IO done on the volume. >> This will be a backup stor, so in other words, I'd like the files to >be >> written locally if there is space, so as to save the NICs for other >traffic. >> >> Anyone knows how this might be achievable, if at all? >> >> -- >> Sent from the Delta quadrant using Borg technology! >> >> Nux! >> www.nux.ro >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jahernan at redhat.com Wed Mar 27 07:25:57 2019 From: jahernan at redhat.com (Xavi Hernandez) Date: Wed, 27 Mar 2019 08:25:57 +0100 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks In-Reply-To: References: Message-ID: Hi Raghavendra, On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa wrote: > All, > > Glusterfs cleans up POSIX locks held on an fd when the client/mount > through which those locks are held disconnects from bricks/server. This > helps Glusterfs to not run into a stale lock problem later (For eg., if > application unlocks while the connection was still down). However, this > means the lock is no longer exclusive as other applications/clients can > acquire the same lock. To communicate that locks are no longer valid, we > are planning to mark the fd (which has POSIX locks) bad on a disconnect so > that any future operations on that fd will fail, forcing the application to > re-open the fd and re-acquire locks it needs [1]. > Wouldn't it be better to retake the locks when the brick is reconnected if the lock is still in use ? BTW, the referenced bug is not public. Should we open another bug to track this ? > > Note that with AFR/replicate in picture we can prevent errors to > application as long as Quorum number of children "never ever" lost > connection with bricks after locks have been acquired. I am using the term > "never ever" as locks are not healed back after re-connection and hence > first disconnect would've marked the fd bad and the fd remains so even > after re-connection happens. So, its not just Quorum number of children > "currently online", but Quorum number of children "never having > disconnected with bricks after locks are acquired". > I think this requisite is not feasible. In a distributed file system, sooner or later all bricks will be disconnected. It could be because of failures or because an upgrade is done, but it will happen. The difference here is how long are fd's kept open. If applications open and close files frequently enough (i.e. the fd is not kept open more time than it takes to have more than Quorum bricks disconnected) then there's no problem. The problem can only appear on applications that open files for a long time and also use posix locks. In this case, the only good solution I see is to retake the locks on brick reconnection. > However, this use case is not affected if the application don't acquire > any POSIX locks. So, I am interested in knowing > * whether your use cases use POSIX locks? > * Is it feasible for your application to re-open fds and re-acquire locks > on seeing EBADFD errors? > I think that many applications are not prepared to handle that. Xavi > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 > > regards, > Raghavendra > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From riccardo.murri at gmail.com Wed Mar 27 08:03:42 2019 From: riccardo.murri at gmail.com (Riccardo Murri) Date: Wed, 27 Mar 2019 09:03:42 +0100 Subject: [Gluster-users] what versions are packaged for what Linux distro? Message-ID: Hello, following the announcement of GlusterFS 6, I tried to install the package from the Ubuntu PPA on a 16.04 "xenial" machine, only to find out that GlusterFS 6 is only packaged for Ubuntu "bionic" and up. Is there an online page with a table or matrix detailing what versions are packaged for what Linux distribution? Thanks, Riccardo From kdhananj at redhat.com Wed Mar 27 09:00:43 2019 From: kdhananj at redhat.com (Krutika Dhananjay) Date: Wed, 27 Mar 2019 14:30:43 +0530 Subject: [Gluster-users] [ovirt-users] Re: VM disk corruption with LSM on Gluster In-Reply-To: References: <723e9e09-1c2b-8422-cb37-d72ba3bb26dc@hoentjen.eu> <0bd6f1c5-1448-50aa-b3c6-d3ee6f0a069d@hoentjen.eu> Message-ID: This is needed to prevent any inconsistencies stemming from buffered writes/caching file data during live VM migration. Besides, for Gluster to truly honor direct-io behavior in qemu's 'cache=none' mode (which is what oVirt uses), one needs to turn on performance.strict-o-direct and disable remote-dio. -Krutika On Wed, Mar 27, 2019 at 12:24 PM Leo David wrote: > Hi, > I can confirm that after setting these two options, I haven't encountered > disk corruptions anymore. > The downside, is that at least for me it had a pretty big impact on > performance. > The iops really went down - performing inside vm fio tests. > > On Wed, Mar 27, 2019, 07:03 Krutika Dhananjay wrote: > >> Could you enable strict-o-direct and disable remote-dio on the src volume >> as well, restart the vms on "old" and retry migration? >> >> # gluster volume set performance.strict-o-direct on >> # gluster volume set network.remote-dio off >> >> -Krutika >> >> On Tue, Mar 26, 2019 at 10:32 PM Sander Hoentjen >> wrote: >> >>> On 26-03-19 14:23, Sahina Bose wrote: >>> > +Krutika Dhananjay and gluster ml >>> > >>> > On Tue, Mar 26, 2019 at 6:16 PM Sander Hoentjen >>> wrote: >>> >> Hello, >>> >> >>> >> tl;dr We have disk corruption when doing live storage migration on >>> oVirt >>> >> 4.2 with gluster 3.12.15. Any idea why? >>> >> >>> >> We have a 3-node oVirt cluster that is both compute and >>> gluster-storage. >>> >> The manager runs on separate hardware. We are running out of space on >>> >> this volume, so we added another Gluster volume that is bigger, put a >>> >> storage domain on it and then we migrated VM's to it with LSM. After >>> >> some time, we noticed that (some of) the migrated VM's had corrupted >>> >> filesystems. After moving everything back with export-import to the >>> old >>> >> domain where possible, and recovering from backups where needed we set >>> >> off to investigate this issue. >>> >> >>> >> We are now at the point where we can reproduce this issue within a >>> day. >>> >> What we have found so far: >>> >> 1) The corruption occurs at the very end of the replication step, most >>> >> probably between START and FINISH of diskReplicateFinish, before the >>> >> START merge step >>> >> 2) In the corrupted VM, at some place where data should be, this data >>> is >>> >> replaced by zero's. This can be file-contents or a directory-structure >>> >> or whatever. >>> >> 3) The source gluster volume has different settings then the >>> destination >>> >> (Mostly because the defaults were different at creation time): >>> >> >>> >> Setting old(src) new(dst) >>> >> cluster.op-version 30800 30800 (the same) >>> >> cluster.max-op-version 31202 31202 (the same) >>> >> cluster.metadata-self-heal off on >>> >> cluster.data-self-heal off on >>> >> cluster.entry-self-heal off on >>> >> performance.low-prio-threads 16 32 >>> >> performance.strict-o-direct off on >>> >> network.ping-timeout 42 30 >>> >> network.remote-dio enable off >>> >> transport.address-family - inet >>> >> performance.stat-prefetch off on >>> >> features.shard-block-size 512MB 64MB >>> >> cluster.shd-max-threads 1 8 >>> >> cluster.shd-wait-qlength 1024 10000 >>> >> cluster.locking-scheme full granular >>> >> cluster.granular-entry-heal no enable >>> >> >>> >> 4) To test, we migrate some VM's back and forth. The corruption does >>> not >>> >> occur every time. To this point it only occurs from old to new, but we >>> >> don't have enough data-points to be sure about that. >>> >> >>> >> Anybody an idea what is causing the corruption? Is this the best list >>> to >>> >> ask, or should I ask on a Gluster list? I am not sure if this is oVirt >>> >> specific or Gluster specific though. >>> > Do you have logs from old and new gluster volumes? Any errors in the >>> > new volume's fuse mount logs? >>> >>> Around the time of corruption I see the message: >>> The message "I [MSGID: 133017] [shard.c:4941:shard_seek] >>> 0-ZoneA_Gluster1-shard: seek called on >>> 7fabc273-3d8a-4a49-8906-b8ccbea4a49f. [Operation not supported]" repeated >>> 231 times between [2019-03-26 13:14:22.297333] and [2019-03-26 >>> 13:15:42.912170] >>> >>> I also see this message at other times, when I don't see the corruption >>> occur, though. >>> >>> -- >>> Sander >>> _______________________________________________ >>> Users mailing list -- users at ovirt.org >>> To unsubscribe send an email to users-leave at ovirt.org >>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>> oVirt Code of Conduct: >>> https://www.ovirt.org/community/about/community-guidelines/ >>> List Archives: >>> https://lists.ovirt.org/archives/list/users at ovirt.org/message/M3T2VGGGV6DE643ZKKJUAF274VSWTJFH/ >>> >> _______________________________________________ >> Users mailing list -- users at ovirt.org >> To unsubscribe send an email to users-leave at ovirt.org >> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/users at ovirt.org/message/ZUIRM5PT4Y4USOSDGSUEP3YEE23LE4WG/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sander at hoentjen.eu Wed Mar 27 09:17:29 2019 From: sander at hoentjen.eu (Sander Hoentjen) Date: Wed, 27 Mar 2019 10:17:29 +0100 Subject: [Gluster-users] [ovirt-users] Re: VM disk corruption with LSM on Gluster In-Reply-To: References: <723e9e09-1c2b-8422-cb37-d72ba3bb26dc@hoentjen.eu> <0bd6f1c5-1448-50aa-b3c6-d3ee6f0a069d@hoentjen.eu> Message-ID: Hi Krutika, Leo, Sounds promising. I will test this too, and report back tomorrow (or maybe sooner, if corruption occurs again). -- Sander On 27-03-19 10:00, Krutika Dhananjay wrote: > This is needed to prevent any inconsistencies stemming from buffered > writes/caching file data during live VM migration. > Besides, for Gluster to truly honor direct-io behavior in qemu's > 'cache=none' mode (which is what oVirt uses), > one needs to turn on performance.strict-o-direct and disable remote-dio. > > -Krutika > > On Wed, Mar 27, 2019 at 12:24 PM Leo David > wrote: > > Hi, > I can confirm that after setting these two options, I haven't > encountered disk corruptions anymore. > The downside, is that at least for me it had a pretty big impact > on performance. > The iops really went down - performing? inside vm fio tests. > > On Wed, Mar 27, 2019, 07:03 Krutika Dhananjay > wrote: > > Could you enable strict-o-direct and disable remote-dio on the > src volume as well, restart the vms on "old" and retry migration? > > # gluster volume set performance.strict-o-direct on > # gluster volume set network.remote-dio off > > -Krutika > > On Tue, Mar 26, 2019 at 10:32 PM Sander Hoentjen > > wrote: > > On 26-03-19 14:23, Sahina Bose wrote: > > +Krutika Dhananjay and gluster ml > > > > On Tue, Mar 26, 2019 at 6:16 PM Sander Hoentjen > > wrote: > >> Hello, > >> > >> tl;dr We have disk corruption when doing live storage > migration on oVirt > >> 4.2 with gluster 3.12.15. Any idea why? > >> > >> We have a 3-node oVirt cluster that is both compute and > gluster-storage. > >> The manager runs on separate hardware. We are running > out of space on > >> this volume, so we added another Gluster volume that is > bigger, put a > >> storage domain on it and then we migrated VM's to it > with LSM. After > >> some time, we noticed that (some of) the migrated VM's > had corrupted > >> filesystems. After moving everything back with > export-import to the old > >> domain where possible, and recovering from backups > where needed we set > >> off to investigate this issue. > >> > >> We are now at the point where we can reproduce this > issue within a day. > >> What we have found so far: > >> 1) The corruption occurs at the very end of the > replication step, most > >> probably between START and FINISH of > diskReplicateFinish, before the > >> START merge step > >> 2) In the corrupted VM, at some place where data should > be, this data is > >> replaced by zero's. This can be file-contents or a > directory-structure > >> or whatever. > >> 3) The source gluster volume has different settings > then the destination > >> (Mostly because the defaults were different at creation > time): > >> > >> Setting? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?old(src)? new(dst) > >> cluster.op-version? ? ? ? ? ? ? ? ? ? ? 30800? ? ?30800 > (the same) > >> cluster.max-op-version? ? ? ? ? ? ? ? ? 31202? ? ?31202 > (the same) > >> cluster.metadata-self-heal? ? ? ? ? ? ? off? ? ? ?on > >> cluster.data-self-heal? ? ? ? ? ? ? ? ? off? ? ? ?on > >> cluster.entry-self-heal? ? ? ? ? ? ? ? ?off? ? ? ?on > >> performance.low-prio-threads? ? ? ? ? ? 16? ? ? ? 32 > >> performance.strict-o-direct? ? ? ? ? ? ?off? ? ? ?on > >> network.ping-timeout? ? ? ? ? ? ? ? ? ? 42? ? ? ? 30 > >> network.remote-dio? ? ? ? ? ? ? ? ? ? ? enable? ? off > >> transport.address-family? ? ? ? ? ? ? ? -? ? ? ? ?inet > >> performance.stat-prefetch? ? ? ? ? ? ? ?off? ? ? ?on > >> features.shard-block-size? ? ? ? ? ? ? ?512MB? ? ?64MB > >> cluster.shd-max-threads? ? ? ? ? ? ? ? ?1? ? ? ? ?8 > >> cluster.shd-wait-qlength? ? ? ? ? ? ? ? 1024? ? ? 10000 > >> cluster.locking-scheme? ? ? ? ? ? ? ? ? full? ? ? granular > >> cluster.granular-entry-heal? ? ? ? ? ? ?no? ? ? ? enable > >> > >> 4) To test, we migrate some VM's back and forth. The > corruption does not > >> occur every time. To this point it only occurs from old > to new, but we > >> don't have enough data-points to be sure about that. > >> > >> Anybody an idea what is causing the corruption? Is this > the best list to > >> ask, or should I ask on a Gluster list? I am not sure > if this is oVirt > >> specific or Gluster specific though. > > Do you have logs from old and new gluster volumes? Any > errors in the > > new volume's fuse mount logs? > > Around the time of corruption I see the message: > The message "I [MSGID: 133017] [shard.c:4941:shard_seek] > 0-ZoneA_Gluster1-shard: seek called on > 7fabc273-3d8a-4a49-8906-b8ccbea4a49f. [Operation not > supported]" repeated 231 times between [2019-03-26 > 13:14:22.297333] and [2019-03-26 13:15:42.912170] > > I also see this message at other times, when I don't see > the corruption occur, though. > > -- > Sander > _______________________________________________ > Users mailing list -- users at ovirt.org > To unsubscribe send an email to users-leave at ovirt.org > > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users at ovirt.org/message/M3T2VGGGV6DE643ZKKJUAF274VSWTJFH/ > > _______________________________________________ > Users mailing list -- users at ovirt.org > To unsubscribe send an email to users-leave at ovirt.org > > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users at ovirt.org/message/ZUIRM5PT4Y4USOSDGSUEP3YEE23LE4WG/ > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From riccardo.murri at gmail.com Wed Mar 27 09:39:19 2019 From: riccardo.murri at gmail.com (riccardo.murri at gmail.com) Date: Wed, 27 Mar 2019 10:39:19 +0100 Subject: [Gluster-users] cannot add server back to cluster after reinstallation Message-ID: <87h8bok660.fsf@gmail.com> Hello, a couple days ago, the OS disk of one of the server of a local GlusterFS cluster suffered a bad crash, and I had to reinstall everything from scratch. However, when I restart the GlusterFS service on the server that has been reinstalled, I see that it sends back a "RJT" response to other servers of the cluster, which then list it as "State: Peer Rejected (Connected)"; the reinstalled server instead shows "Number of peers: 0". The DEBUG level log on the reinstalled machine shows these lines after the peer probe from another server in the cluster: I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 9a5763d2-1941-4e5d-8d33-8d6756f7f318 D [MSGID: 0] [glusterd-peer-utils.c:208:glusterd_peerinfo_find_by_uuid] 0-management: Friend with uuid: 9a5763d2-1941-4e5d-8d33-8d6756f7f318, not found D [MSGID: 0] [glusterd-peer-utils.c:234:glusterd_peerinfo_find] 0-management: Unable to find peer by uuid: 9a5763d2-1941-4e5d-8d33-8d6756f7f318 D [MSGID: 0] [glusterd-peer-utils.c:132:glusterd_peerinfo_find_by_hostname] 0-management: Unable to find friend: glusterfs-server-004 D [MSGID: 0] [glusterd-peer-utils.c:246:glusterd_peerinfo_find] 0-management: Unable to find hostname: glusterfs-server-004 I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to glusterfs-server-004 (24007), ret: 0, op_ret: -1 What can I do to re-add the reinstalled server into the cluster? Is it safe (= keeps data) to "peer detach" it and then "peer probe" again? Additional info: * The actual GlusterFS brick data was on a different disk and so is safe and mounted back in the original location. * I copied back the `/etc/glusterfs/glusterd.vol` from the other servers in the cluster and restored the UUID into `/var/lib/glusterfs/glusterd.info` * I have checked that `max.op-version` is the same on all servers of the cluster, including the reinstalled one. * All servers run Ubuntu 16.04 Thanks for any suggestion! Riccardo From skoduri at redhat.com Wed Mar 27 09:53:35 2019 From: skoduri at redhat.com (Soumya Koduri) Date: Wed, 27 Mar 2019 15:23:35 +0530 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks In-Reply-To: References: Message-ID: On 3/27/19 12:55 PM, Xavi Hernandez wrote: > Hi?Raghavendra, > > On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa > > wrote: > > All, > > Glusterfs cleans up POSIX locks held on an fd when the client/mount > through which those locks are held disconnects from bricks/server. > This helps Glusterfs to not run into a stale lock problem later (For > eg., if application unlocks while the connection was still down). > However, this means the lock is no longer exclusive as other > applications/clients can acquire the same lock. To communicate that > locks are no longer valid, we are planning to mark the fd (which has > POSIX locks) bad on a disconnect so that any future operations on > that fd will fail, forcing the application to re-open the fd and > re-acquire locks it needs [1]. > > > Wouldn't it be better to retake the locks when the brick is reconnected > if the lock is still in use ? > > BTW, the referenced bug is not public. Should we open another bug to > track this ? > > > Note that with AFR/replicate in picture we can prevent errors to > application as long as Quorum number of children "never ever" lost > connection with bricks after locks have been acquired. I am using > the term "never ever" as locks are not healed back after > re-connection and hence first disconnect would've marked the fd bad > and the fd remains so even after re-connection happens. So, its not > just Quorum number of children "currently online", but Quorum number > of children "never having disconnected with bricks after locks are > acquired". > > > I think this requisite is not feasible. In a distributed file system, > sooner or later all bricks will be disconnected. It could be because of > failures or because an upgrade is done, but it will happen. > > The difference here is how long are fd's kept open. If applications open > and close files frequently enough (i.e. the fd is not kept open more > time than it takes to have more than Quorum bricks disconnected) then > there's no problem. The problem can only appear on applications that > open files for a long time and also use posix locks. In this case, the > only good solution I see is to retake the locks on brick reconnection. > > > However, this use case is not affected if the application don't > acquire any POSIX locks. So, I am interested in knowing > * whether your use cases use POSIX locks? > * Is it feasible for your application to re-open fds and re-acquire > locks on seeing EBADFD errors? > > > I think that many applications are not prepared to handle that. +1 to all the points mentioned by Xavi. This has been day-1 issue for all the applications using locks (like NFS-Ganesha and Samba). Not many applications re-open and re-acquire the locks. On receiving EBADFD, that error is most likely propagated to application clients. Agree with Xavi that its better to heal/re-acquire the locks on brick reconnects before it accepts any fresh requests. I also suggest to have this healing mechanism generic enough (if possible) to heal any server-side state (like upcall, leases etc). Thanks, Soumya > > Xavi > > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 > > regards, > Raghavendra > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > From riccardo.murri at gmail.com Wed Mar 27 09:53:52 2019 From: riccardo.murri at gmail.com (Riccardo Murri) Date: Wed, 27 Mar 2019 10:53:52 +0100 Subject: [Gluster-users] cannot add server back to cluster after reinstallation In-Reply-To: <87h8bok660.fsf@gmail.com> References: <87h8bok660.fsf@gmail.com> Message-ID: I managed to put the reinstalled server back into connected state with this procedure: 1. Run `for other_server in ...; do gluster peer probe $other_server; done` on the reinstalled server 2. Now all the peers on the reinstalled server show up as "Accepted Peer Request", which I fixed with the procedure outlined in the last paragraph of https://docs.gluster.org/en/v3/Troubleshooting/troubleshooting-glusterd/#debugging-glusterd Can anyone confirm that this is a good way to proceed and I won't be heading quickly towards corrupting volume data? Thanks, Riccardo From ksubrahm at redhat.com Wed Mar 27 10:01:01 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Wed, 27 Mar 2019 15:31:01 +0530 Subject: [Gluster-users] cannot add server back to cluster after reinstallation In-Reply-To: References: <87h8bok660.fsf@gmail.com> Message-ID: +Sanju Rakonde & +Atin Mukherjee adding glusterd folks who can help here. On Wed, Mar 27, 2019 at 3:24 PM Riccardo Murri wrote: > I managed to put the reinstalled server back into connected state with > this procedure: > > 1. Run `for other_server in ...; do gluster peer probe $other_server; > done` on the reinstalled server > 2. Now all the peers on the reinstalled server show up as "Accepted > Peer Request", which I fixed with the procedure outlined in the last > paragraph of > https://docs.gluster.org/en/v3/Troubleshooting/troubleshooting-glusterd/#debugging-glusterd > > Can anyone confirm that this is a good way to proceed and I won't be > heading quickly towards corrupting volume data? > > Thanks, > Riccardo > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From atin.mukherjee83 at gmail.com Wed Mar 27 10:03:28 2019 From: atin.mukherjee83 at gmail.com (Atin Mukherjee) Date: Wed, 27 Mar 2019 15:33:28 +0530 Subject: [Gluster-users] cannot add server back to cluster after reinstallation In-Reply-To: References: <87h8bok660.fsf@gmail.com> Message-ID: On Wed, 27 Mar 2019 at 15:24, Riccardo Murri wrote: > I managed to put the reinstalled server back into connected state with > this procedure: > > 1. Run `for other_server in ...; do gluster peer probe $other_server; > done` on the reinstalled server > 2. Now all the peers on the reinstalled server show up as "Accepted > Peer Request", which I fixed with the procedure outlined in the last > paragraph of > https://docs.gluster.org/en/v3/Troubleshooting/troubleshooting-glusterd/#debugging-glusterd > > Can anyone confirm that this is a good way to proceed and I won't be > heading quickly towards corrupting volume data? Check cluster.op-version, peer status, volume status output. If they are all fine you?re good. > > Thanks, > Riccardo > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -- --Atin -------------- next part -------------- An HTML attachment was scrubbed... URL: From riccardo.murri at gmail.com Wed Mar 27 10:31:55 2019 From: riccardo.murri at gmail.com (Riccardo Murri) Date: Wed, 27 Mar 2019 11:31:55 +0100 Subject: [Gluster-users] cannot add server back to cluster after reinstallation In-Reply-To: References: <87h8bok660.fsf@gmail.com> Message-ID: Hello Atin, > Check cluster.op-version, peer status, volume status output. If they are all fine you?re good. Both `op-version` and `peer status` look fine: ``` # gluster volume get all cluster.max-op-version Option Value ------ ----- cluster.max-op-version 31202 # gluster peer status Number of Peers: 4 Hostname: glusterfs-server-004 Uuid: 9a5763d2-1941-4e5d-8d33-8d6756f7f318 State: Peer in Cluster (Connected) Hostname: glusterfs-server-005 Uuid: d53398f6-19d4-4633-8bc3-e493dac41789 State: Peer in Cluster (Connected) Hostname: glusterfs-server-003 Uuid: 3c74d2b4-a4f3-42d4-9511-f6174b0a641d State: Peer in Cluster (Connected) Hostname: glusterfs-server-001 Uuid: 60bcc47e-ccbe-493e-b4ea-d45d63123977 State: Peer in Cluster (Connected) ``` However, `volume status` shows a missing snapshotd on the reinstalled server (the 002 one). We're not using snapshots so I guess this is fine too? ``` # gluster volume status Status of volume: glusterfs Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick glusterfs-server-005:/s rv/glusterfs 49152 0 Y 1410 Brick glusterfs-server-004:/s rv/glusterfs 49152 0 Y 1416 Brick glusterfs-server-003:/s rv/glusterfs 49152 0 Y 1520 Brick glusterfs-server-001:/s rv/glusterfs 49152 0 Y 1266 Brick glusterfs-server-002:/s rv/glusterfs 49152 0 Y 3011 Snapshot Daemon on localhost N/A N/A Y 3029 Snapshot Daemon on glusterfs- server-001 49153 0 Y 1361 Snapshot Daemon on glusterfs- server-005 49153 0 Y 1478 Snapshot Daemon on glusterfs- server-004 49153 0 Y 1490 Snapshot Daemon on glusterfs- server-003 49153 0 Y 1563 Task Status of Volume glusterfs ------------------------------------------------------------------------------ Task : Rebalance ID : 0eaf6ad1-df95-48f4-b941-17488010ddcc Status : failed ``` Thanks, Riccardo From atin.mukherjee83 at gmail.com Wed Mar 27 10:37:42 2019 From: atin.mukherjee83 at gmail.com (Atin Mukherjee) Date: Wed, 27 Mar 2019 16:07:42 +0530 Subject: [Gluster-users] cannot add server back to cluster after reinstallation In-Reply-To: References: <87h8bok660.fsf@gmail.com> Message-ID: On Wed, 27 Mar 2019 at 16:02, Riccardo Murri wrote: > Hello Atin, > > > Check cluster.op-version, peer status, volume status output. If they are > all fine you?re good. > > Both `op-version` and `peer status` look fine: > ``` > # gluster volume get all cluster.max-op-version > Option Value > ------ ----- > cluster.max-op-version 31202 > > # gluster peer status > Number of Peers: 4 > > Hostname: glusterfs-server-004 > Uuid: 9a5763d2-1941-4e5d-8d33-8d6756f7f318 > State: Peer in Cluster (Connected) > > Hostname: glusterfs-server-005 > Uuid: d53398f6-19d4-4633-8bc3-e493dac41789 > State: Peer in Cluster (Connected) > > Hostname: glusterfs-server-003 > Uuid: 3c74d2b4-a4f3-42d4-9511-f6174b0a641d > State: Peer in Cluster (Connected) > > Hostname: glusterfs-server-001 > Uuid: 60bcc47e-ccbe-493e-b4ea-d45d63123977 > State: Peer in Cluster (Connected) > ``` > > However, `volume status` shows a missing snapshotd on the reinstalled > server (the 002 one). I believe you ran this command on 002? And in that case its showing as localhost. > We're not using snapshots so I guess this is fine too? Is features.uss enabled for this volume? Otherwise we don?t show snapd information in status output. Rafi - am I correct? > > ``` > # gluster volume status > Status of volume: glusterfs > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick glusterfs-server-005:/s > rv/glusterfs 49152 0 Y > 1410 > Brick glusterfs-server-004:/s > rv/glusterfs 49152 0 Y > 1416 > Brick glusterfs-server-003:/s > rv/glusterfs 49152 0 Y > 1520 > Brick glusterfs-server-001:/s > rv/glusterfs 49152 0 Y > 1266 > Brick glusterfs-server-002:/s > rv/glusterfs 49152 0 Y > 3011 > Snapshot Daemon on localhost N/A N/A Y > 3029 > Snapshot Daemon on glusterfs- > server-001 49153 0 Y > 1361 > Snapshot Daemon on glusterfs- > server-005 49153 0 Y > 1478 > Snapshot Daemon on glusterfs- > server-004 49153 0 Y > 1490 > Snapshot Daemon on glusterfs- > server-003 49153 0 Y > 1563 > > Task Status of Volume glusterfs > > ------------------------------------------------------------------------------ > Task : Rebalance > ID : 0eaf6ad1-df95-48f4-b941-17488010ddcc > Status : failed > ``` > > Thanks, > Riccardo > -- --Atin -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Wed Mar 27 10:52:45 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Wed, 27 Mar 2019 16:22:45 +0530 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks In-Reply-To: References: Message-ID: On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez wrote: > Hi Raghavendra, > > On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa > wrote: > >> All, >> >> Glusterfs cleans up POSIX locks held on an fd when the client/mount >> through which those locks are held disconnects from bricks/server. This >> helps Glusterfs to not run into a stale lock problem later (For eg., if >> application unlocks while the connection was still down). However, this >> means the lock is no longer exclusive as other applications/clients can >> acquire the same lock. To communicate that locks are no longer valid, we >> are planning to mark the fd (which has POSIX locks) bad on a disconnect so >> that any future operations on that fd will fail, forcing the application to >> re-open the fd and re-acquire locks it needs [1]. >> > > Wouldn't it be better to retake the locks when the brick is reconnected if > the lock is still in use ? > There is also a possibility that clients may never reconnect. That's the primary reason why bricks assume the worst (client will not reconnect) and cleanup the locks. > BTW, the referenced bug is not public. Should we open another bug to track > this ? > I've just opened up the comment to give enough context. I'll open a bug upstream too. > > >> >> Note that with AFR/replicate in picture we can prevent errors to >> application as long as Quorum number of children "never ever" lost >> connection with bricks after locks have been acquired. I am using the term >> "never ever" as locks are not healed back after re-connection and hence >> first disconnect would've marked the fd bad and the fd remains so even >> after re-connection happens. So, its not just Quorum number of children >> "currently online", but Quorum number of children "never having >> disconnected with bricks after locks are acquired". >> > > I think this requisite is not feasible. In a distributed file system, > sooner or later all bricks will be disconnected. It could be because of > failures or because an upgrade is done, but it will happen. > > The difference here is how long are fd's kept open. If applications open > and close files frequently enough (i.e. the fd is not kept open more time > than it takes to have more than Quorum bricks disconnected) then there's no > problem. The problem can only appear on applications that open files for a > long time and also use posix locks. In this case, the only good solution I > see is to retake the locks on brick reconnection. > Agree. But lock-healing should be done only by HA layers like AFR/EC as only they know whether there are enough online bricks to have prevented any conflicting lock. Protocol/client itself doesn't have enough information to do that. If its a plain distribute, I don't see a way to heal locks without loosing the property of exclusivity of locks. What I proposed is a short term solution. mid to long term solution should be lock healing feature implemented in AFR/EC. In fact I had this conversation with +Karampuri, Pranith before posting this msg to ML. > >> However, this use case is not affected if the application don't acquire >> any POSIX locks. So, I am interested in knowing >> * whether your use cases use POSIX locks? >> * Is it feasible for your application to re-open fds and re-acquire locks >> on seeing EBADFD errors? >> > > I think that many applications are not prepared to handle that. > I too suspected that and in fact not too happy with the solution. But went ahead with this mail as I heard implementing lock-heal in AFR will take time and hence there are no alternative short term solutions. > Xavi > > >> >> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 >> >> regards, >> Raghavendra >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Wed Mar 27 10:54:12 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Wed, 27 Mar 2019 16:24:12 +0530 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks In-Reply-To: References: Message-ID: On Wed, Mar 27, 2019 at 4:22 PM Raghavendra Gowdappa wrote: > > > On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez > wrote: > >> Hi Raghavendra, >> >> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa >> wrote: >> >>> All, >>> >>> Glusterfs cleans up POSIX locks held on an fd when the client/mount >>> through which those locks are held disconnects from bricks/server. This >>> helps Glusterfs to not run into a stale lock problem later (For eg., if >>> application unlocks while the connection was still down). However, this >>> means the lock is no longer exclusive as other applications/clients can >>> acquire the same lock. To communicate that locks are no longer valid, we >>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so >>> that any future operations on that fd will fail, forcing the application to >>> re-open the fd and re-acquire locks it needs [1]. >>> >> >> Wouldn't it be better to retake the locks when the brick is reconnected >> if the lock is still in use ? >> > > There is also a possibility that clients may never reconnect. That's the > primary reason why bricks assume the worst (client will not reconnect) and > cleanup the locks. > > >> BTW, the referenced bug is not public. Should we open another bug to >> track this ? >> > > I've just opened up the comment to give enough context. I'll open a bug > upstream too. > > >> >> >>> >>> Note that with AFR/replicate in picture we can prevent errors to >>> application as long as Quorum number of children "never ever" lost >>> connection with bricks after locks have been acquired. I am using the term >>> "never ever" as locks are not healed back after re-connection and hence >>> first disconnect would've marked the fd bad and the fd remains so even >>> after re-connection happens. So, its not just Quorum number of children >>> "currently online", but Quorum number of children "never having >>> disconnected with bricks after locks are acquired". >>> >> >> I think this requisite is not feasible. In a distributed file system, >> sooner or later all bricks will be disconnected. It could be because of >> failures or because an upgrade is done, but it will happen. >> >> The difference here is how long are fd's kept open. If applications open >> and close files frequently enough (i.e. the fd is not kept open more time >> than it takes to have more than Quorum bricks disconnected) then there's no >> problem. The problem can only appear on applications that open files for a >> long time and also use posix locks. In this case, the only good solution I >> see is to retake the locks on brick reconnection. >> > > Agree. But lock-healing should be done only by HA layers like AFR/EC as > only they know whether there are enough online bricks to have prevented any > conflicting lock. Protocol/client itself doesn't have enough information to > do that. If its a plain distribute, I don't see a way to heal locks without > loosing the property of exclusivity of locks. > > What I proposed is a short term solution. mid to long term solution should > be lock healing feature implemented in AFR/EC. In fact I had this > conversation with +Karampuri, Pranith before > posting this msg to ML. > > >> >>> However, this use case is not affected if the application don't acquire >>> any POSIX locks. So, I am interested in knowing >>> * whether your use cases use POSIX locks? >>> * Is it feasible for your application to re-open fds and re-acquire >>> locks on seeing EBADFD errors? >>> >> >> I think that many applications are not prepared to handle that. >> > > I too suspected that and in fact not too happy with the solution. But went > ahead with this mail as I heard implementing lock-heal in AFR will take > time and hence there are no alternative short term solutions. > Also failing loudly is preferred to silently dropping locks. > > >> Xavi >> >> >>> >>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 >>> >>> regards, >>> Raghavendra >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From thorgeir.marthinussen at basefarm.com Wed Mar 27 11:15:36 2019 From: thorgeir.marthinussen at basefarm.com (Thorgeir Marthinussen) Date: Wed, 27 Mar 2019 11:15:36 +0000 Subject: [Gluster-users] Weird issue with logrotate of bitd.log on GlusterFS 4.1 Message-ID: <40800bdd57a1df3b5d3c3d290cbb76e13b7cf05a.camel@basefarm.com> All, We're seeing some issues with the default provided logrotate configuration in regards to the bitd.log files. Logrotate has a postrotate-script to run "killall -HUP glusterfs", to make the processes release the filehandles and create a new logfile, and using "delaycompress". Recently we noticed that the 'df' reported usage on our /var/log didn't match "actual" usage reported with 'du'. Checking 'lsof' we found that basically all "bitd.log.1" files are listed as open but "deleted", when lograte did the compression. This only applies to the bitrot-daemon logs, none of the other logs. In addition to this we are also seeing that the bitd.log file is significantly larger on the "second" replica-node in the cluster (the "first" node is the one used in fstab on the clients). Please note, we are currently running a two-node replica set, we have a plan to introduce an arbiter-node, but need to complete some internal testing, as one of the volumes currently contain over 20 million files, and we are unsure how the introduction of the arbiter will impact the volume. We are running glusterfs-4.1.5-1.el7.x86_64 'lsof' output from "first" node glusterfs 12698 root 5w REG 253,11 611193834 50333986 /var/log/glusterfs/bitd.log.1 (deleted) glusterfs 12698 root 8w REG 253,11 611193834 50333986 /var/log/glusterfs/bitd.log.1 (deleted) glusterfs 12698 root 12w REG 253,11 611193834 50333986 /var/log/glusterfs/bitd.log.1 (deleted) 'lsof' output from "second" node glusterfs 12742 root 5w REG 253,11 12959954668 50351288 /var/log/glusterfs/bitd.log.1 (deleted) glusterfs 12742 root 8w REG 253,11 12959954668 50351288 /var/log/glusterfs/bitd.log.1 (deleted) glusterfs 12742 root 11w REG 253,11 12959954668 50351288 /var/log/glusterfs/bitd.log.1 (deleted) Relevant part of logrotate-config /var/log/glusterfs/*.log { sharedscripts weekly rotate 52 missingok compress delaycompress notifempty postrotate /usr/bin/killall -HUP glusterfs > /dev/null 2>&1 || true /usr/bin/killall -HUP glusterd > /dev/null 2>&1 || true endscript } Best regards -- THORGEIR MARTHINUSSEN Systems Consultant -------------- next part -------------- An HTML attachment was scrubbed... URL: From jahernan at redhat.com Wed Mar 27 11:43:11 2019 From: jahernan at redhat.com (Xavi Hernandez) Date: Wed, 27 Mar 2019 12:43:11 +0100 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks In-Reply-To: References: Message-ID: On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa wrote: > > > On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez > wrote: > >> Hi Raghavendra, >> >> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa >> wrote: >> >>> All, >>> >>> Glusterfs cleans up POSIX locks held on an fd when the client/mount >>> through which those locks are held disconnects from bricks/server. This >>> helps Glusterfs to not run into a stale lock problem later (For eg., if >>> application unlocks while the connection was still down). However, this >>> means the lock is no longer exclusive as other applications/clients can >>> acquire the same lock. To communicate that locks are no longer valid, we >>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so >>> that any future operations on that fd will fail, forcing the application to >>> re-open the fd and re-acquire locks it needs [1]. >>> >> >> Wouldn't it be better to retake the locks when the brick is reconnected >> if the lock is still in use ? >> > > There is also a possibility that clients may never reconnect. That's the > primary reason why bricks assume the worst (client will not reconnect) and > cleanup the locks. > True, so it's fine to cleanup the locks. I'm not saying that locks shouldn't be released on disconnect. The assumption is that if the client has really died, it will also disconnect from other bricks, who will release the locks. So, eventually, another client will have enough quorum to attempt a lock that will succeed. In other words, if a client gets disconnected from too many bricks simultaneously (loses Quorum), then that client can be considered as bad and can return errors to the application. This should also cause to release the locks on the remaining connected bricks. On the other hand, if the disconnection is very short and the client has not died, it will keep enough locked files (it has quorum) to avoid other clients to successfully acquire a lock. In this case, if the brick is reconnected, all existing locks should be reacquired to recover the original state before the disconnection. > >> BTW, the referenced bug is not public. Should we open another bug to >> track this ? >> > > I've just opened up the comment to give enough context. I'll open a bug > upstream too. > > >> >> >>> >>> Note that with AFR/replicate in picture we can prevent errors to >>> application as long as Quorum number of children "never ever" lost >>> connection with bricks after locks have been acquired. I am using the term >>> "never ever" as locks are not healed back after re-connection and hence >>> first disconnect would've marked the fd bad and the fd remains so even >>> after re-connection happens. So, its not just Quorum number of children >>> "currently online", but Quorum number of children "never having >>> disconnected with bricks after locks are acquired". >>> >> >> I think this requisite is not feasible. In a distributed file system, >> sooner or later all bricks will be disconnected. It could be because of >> failures or because an upgrade is done, but it will happen. >> >> The difference here is how long are fd's kept open. If applications open >> and close files frequently enough (i.e. the fd is not kept open more time >> than it takes to have more than Quorum bricks disconnected) then there's no >> problem. The problem can only appear on applications that open files for a >> long time and also use posix locks. In this case, the only good solution I >> see is to retake the locks on brick reconnection. >> > > Agree. But lock-healing should be done only by HA layers like AFR/EC as > only they know whether there are enough online bricks to have prevented any > conflicting lock. Protocol/client itself doesn't have enough information to > do that. If its a plain distribute, I don't see a way to heal locks without > loosing the property of exclusivity of locks. > Lock-healing of locks acquired while a brick was disconnected need to be handled by AFR/EC. However, locks already present at the moment of disconnection could be recovered by client xlator itself as long as the file has not been closed (which client xlator already knows). Xavi > What I proposed is a short term solution. mid to long term solution should > be lock healing feature implemented in AFR/EC. In fact I had this > conversation with +Karampuri, Pranith before > posting this msg to ML. > > >> >>> However, this use case is not affected if the application don't acquire >>> any POSIX locks. So, I am interested in knowing >>> * whether your use cases use POSIX locks? >>> * Is it feasible for your application to re-open fds and re-acquire >>> locks on seeing EBADFD errors? >>> >> >> I think that many applications are not prepared to handle that. >> > > I too suspected that and in fact not too happy with the solution. But went > ahead with this mail as I heard implementing lock-heal in AFR will take > time and hence there are no alternative short term solutions. > > >> Xavi >> >> >>> >>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 >>> >>> regards, >>> Raghavendra >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jahernan at redhat.com Wed Mar 27 11:51:41 2019 From: jahernan at redhat.com (Xavi Hernandez) Date: Wed, 27 Mar 2019 12:51:41 +0100 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks In-Reply-To: References: Message-ID: On Wed, Mar 27, 2019 at 11:54 AM Raghavendra Gowdappa wrote: > > > On Wed, Mar 27, 2019 at 4:22 PM Raghavendra Gowdappa > wrote: > >> >> >> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez >> wrote: >> >>> Hi Raghavendra, >>> >>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa < >>> rgowdapp at redhat.com> wrote: >>> >>>> All, >>>> >>>> Glusterfs cleans up POSIX locks held on an fd when the client/mount >>>> through which those locks are held disconnects from bricks/server. This >>>> helps Glusterfs to not run into a stale lock problem later (For eg., if >>>> application unlocks while the connection was still down). However, this >>>> means the lock is no longer exclusive as other applications/clients can >>>> acquire the same lock. To communicate that locks are no longer valid, we >>>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so >>>> that any future operations on that fd will fail, forcing the application to >>>> re-open the fd and re-acquire locks it needs [1]. >>>> >>> >>> Wouldn't it be better to retake the locks when the brick is reconnected >>> if the lock is still in use ? >>> >> >> There is also a possibility that clients may never reconnect. That's the >> primary reason why bricks assume the worst (client will not reconnect) and >> cleanup the locks. >> >> >>> BTW, the referenced bug is not public. Should we open another bug to >>> track this ? >>> >> >> I've just opened up the comment to give enough context. I'll open a bug >> upstream too. >> >> >>> >>> >>>> >>>> Note that with AFR/replicate in picture we can prevent errors to >>>> application as long as Quorum number of children "never ever" lost >>>> connection with bricks after locks have been acquired. I am using the term >>>> "never ever" as locks are not healed back after re-connection and hence >>>> first disconnect would've marked the fd bad and the fd remains so even >>>> after re-connection happens. So, its not just Quorum number of children >>>> "currently online", but Quorum number of children "never having >>>> disconnected with bricks after locks are acquired". >>>> >>> >>> I think this requisite is not feasible. In a distributed file system, >>> sooner or later all bricks will be disconnected. It could be because of >>> failures or because an upgrade is done, but it will happen. >>> >>> The difference here is how long are fd's kept open. If applications open >>> and close files frequently enough (i.e. the fd is not kept open more time >>> than it takes to have more than Quorum bricks disconnected) then there's no >>> problem. The problem can only appear on applications that open files for a >>> long time and also use posix locks. In this case, the only good solution I >>> see is to retake the locks on brick reconnection. >>> >> >> Agree. But lock-healing should be done only by HA layers like AFR/EC as >> only they know whether there are enough online bricks to have prevented any >> conflicting lock. Protocol/client itself doesn't have enough information to >> do that. If its a plain distribute, I don't see a way to heal locks without >> loosing the property of exclusivity of locks. >> >> What I proposed is a short term solution. mid to long term solution >> should be lock healing feature implemented in AFR/EC. In fact I had this >> conversation with +Karampuri, Pranith before >> posting this msg to ML. >> >> >>> >>>> However, this use case is not affected if the application don't acquire >>>> any POSIX locks. So, I am interested in knowing >>>> * whether your use cases use POSIX locks? >>>> * Is it feasible for your application to re-open fds and re-acquire >>>> locks on seeing EBADFD errors? >>>> >>> >>> I think that many applications are not prepared to handle that. >>> >> >> I too suspected that and in fact not too happy with the solution. But >> went ahead with this mail as I heard implementing lock-heal in AFR will >> take time and hence there are no alternative short term solutions. >> > > Also failing loudly is preferred to silently dropping locks. > Yes. Silently dropping locks can cause corruption, which is worse. However causing application failures doesn't improve user experience either. Unfortunately I'm not aware of any other short term solution right now. > >> >> >>> Xavi >>> >>> >>>> >>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 >>>> >>>> regards, >>>> Raghavendra >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkarampu at redhat.com Wed Mar 27 12:13:06 2019 From: pkarampu at redhat.com (Pranith Kumar Karampuri) Date: Wed, 27 Mar 2019 17:43:06 +0530 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks In-Reply-To: References: Message-ID: On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez wrote: > On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa > wrote: > >> >> >> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez >> wrote: >> >>> Hi Raghavendra, >>> >>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa < >>> rgowdapp at redhat.com> wrote: >>> >>>> All, >>>> >>>> Glusterfs cleans up POSIX locks held on an fd when the client/mount >>>> through which those locks are held disconnects from bricks/server. This >>>> helps Glusterfs to not run into a stale lock problem later (For eg., if >>>> application unlocks while the connection was still down). However, this >>>> means the lock is no longer exclusive as other applications/clients can >>>> acquire the same lock. To communicate that locks are no longer valid, we >>>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so >>>> that any future operations on that fd will fail, forcing the application to >>>> re-open the fd and re-acquire locks it needs [1]. >>>> >>> >>> Wouldn't it be better to retake the locks when the brick is reconnected >>> if the lock is still in use ? >>> >> >> There is also a possibility that clients may never reconnect. That's the >> primary reason why bricks assume the worst (client will not reconnect) and >> cleanup the locks. >> > > True, so it's fine to cleanup the locks. I'm not saying that locks > shouldn't be released on disconnect. The assumption is that if the client > has really died, it will also disconnect from other bricks, who will > release the locks. So, eventually, another client will have enough quorum > to attempt a lock that will succeed. In other words, if a client gets > disconnected from too many bricks simultaneously (loses Quorum), then that > client can be considered as bad and can return errors to the application. > This should also cause to release the locks on the remaining connected > bricks. > > On the other hand, if the disconnection is very short and the client has > not died, it will keep enough locked files (it has quorum) to avoid other > clients to successfully acquire a lock. In this case, if the brick is > reconnected, all existing locks should be reacquired to recover the > original state before the disconnection. > > >> >>> BTW, the referenced bug is not public. Should we open another bug to >>> track this ? >>> >> >> I've just opened up the comment to give enough context. I'll open a bug >> upstream too. >> >> >>> >>> >>>> >>>> Note that with AFR/replicate in picture we can prevent errors to >>>> application as long as Quorum number of children "never ever" lost >>>> connection with bricks after locks have been acquired. I am using the term >>>> "never ever" as locks are not healed back after re-connection and hence >>>> first disconnect would've marked the fd bad and the fd remains so even >>>> after re-connection happens. So, its not just Quorum number of children >>>> "currently online", but Quorum number of children "never having >>>> disconnected with bricks after locks are acquired". >>>> >>> >>> I think this requisite is not feasible. In a distributed file system, >>> sooner or later all bricks will be disconnected. It could be because of >>> failures or because an upgrade is done, but it will happen. >>> >>> The difference here is how long are fd's kept open. If applications open >>> and close files frequently enough (i.e. the fd is not kept open more time >>> than it takes to have more than Quorum bricks disconnected) then there's no >>> problem. The problem can only appear on applications that open files for a >>> long time and also use posix locks. In this case, the only good solution I >>> see is to retake the locks on brick reconnection. >>> >> >> Agree. But lock-healing should be done only by HA layers like AFR/EC as >> only they know whether there are enough online bricks to have prevented any >> conflicting lock. Protocol/client itself doesn't have enough information to >> do that. If its a plain distribute, I don't see a way to heal locks without >> loosing the property of exclusivity of locks. >> > > Lock-healing of locks acquired while a brick was disconnected need to be > handled by AFR/EC. However, locks already present at the moment of > disconnection could be recovered by client xlator itself as long as the > file has not been closed (which client xlator already knows). > What if another client (say mount-2) took locks at the time of disconnect from mount-1 and modified the file and unlocked? client xlator doing the heal may not be a good idea. > > Xavi > > >> What I proposed is a short term solution. mid to long term solution >> should be lock healing feature implemented in AFR/EC. In fact I had this >> conversation with +Karampuri, Pranith before >> posting this msg to ML. >> >> >>> >>>> However, this use case is not affected if the application don't acquire >>>> any POSIX locks. So, I am interested in knowing >>>> * whether your use cases use POSIX locks? >>>> * Is it feasible for your application to re-open fds and re-acquire >>>> locks on seeing EBADFD errors? >>>> >>> >>> I think that many applications are not prepared to handle that. >>> >> >> I too suspected that and in fact not too happy with the solution. But >> went ahead with this mail as I heard implementing lock-heal in AFR will >> take time and hence there are no alternative short term solutions. >> > >> >>> Xavi >>> >>> >>>> >>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 >>>> >>>> regards, >>>> Raghavendra >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> -- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkavunga at redhat.com Wed Mar 27 12:31:35 2019 From: rkavunga at redhat.com (Rafi Kavungal Chundattu Parambil) Date: Wed, 27 Mar 2019 08:31:35 -0400 (EDT) Subject: [Gluster-users] cannot add server back to cluster after reinstallation In-Reply-To: References: <87h8bok660.fsf@gmail.com> Message-ID: <552718422.11052209.1553689895752.JavaMail.zimbra@redhat.com> ----- Original Message ----- From: "Atin Mukherjee" To: "Rafi Kavungal Chundattu Parambil" , "Riccardo Murri" Cc: gluster-users at gluster.org Sent: Wednesday, March 27, 2019 4:07:42 PM Subject: Re: [Gluster-users] cannot add server back to cluster after reinstallation On Wed, 27 Mar 2019 at 16:02, Riccardo Murri wrote: > Hello Atin, > > > Check cluster.op-version, peer status, volume status output. If they are > all fine you?re good. > > Both `op-version` and `peer status` look fine: > ``` > # gluster volume get all cluster.max-op-version > Option Value > ------ ----- > cluster.max-op-version 31202 > > # gluster peer status > Number of Peers: 4 > > Hostname: glusterfs-server-004 > Uuid: 9a5763d2-1941-4e5d-8d33-8d6756f7f318 > State: Peer in Cluster (Connected) > > Hostname: glusterfs-server-005 > Uuid: d53398f6-19d4-4633-8bc3-e493dac41789 > State: Peer in Cluster (Connected) > > Hostname: glusterfs-server-003 > Uuid: 3c74d2b4-a4f3-42d4-9511-f6174b0a641d > State: Peer in Cluster (Connected) > > Hostname: glusterfs-server-001 > Uuid: 60bcc47e-ccbe-493e-b4ea-d45d63123977 > State: Peer in Cluster (Connected) > ``` > > However, `volume status` shows a missing snapshotd on the reinstalled > server (the 002 one). I believe you ran this command on 002? And in that case its showing as localhost. > We're not using snapshots so I guess this is fine too? Is features.uss enabled for this volume? Otherwise we don?t show snapd information in status output. Rafi - am I correct? Yes. We don't show snapd information unless uss is enabled. So please check whether uss is enabled or not. You can use gluster v get glusterfs features.uss . If you are not using any snapshot then it doesn't make sense to use uss. You can disable it using gluster v set glusterfs features.uss disable Please note that if you are doing the rolling upgrade, it is not recommended to do any configuration changes. In that case you can disable it after completing the upgrade. Rafi KC > > ``` > # gluster volume status > Status of volume: glusterfs > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick glusterfs-server-005:/s > rv/glusterfs 49152 0 Y > 1410 > Brick glusterfs-server-004:/s > rv/glusterfs 49152 0 Y > 1416 > Brick glusterfs-server-003:/s > rv/glusterfs 49152 0 Y > 1520 > Brick glusterfs-server-001:/s > rv/glusterfs 49152 0 Y > 1266 > Brick glusterfs-server-002:/s > rv/glusterfs 49152 0 Y > 3011 > Snapshot Daemon on localhost N/A N/A Y > 3029 > Snapshot Daemon on glusterfs- > server-001 49153 0 Y > 1361 > Snapshot Daemon on glusterfs- > server-005 49153 0 Y > 1478 > Snapshot Daemon on glusterfs- > server-004 49153 0 Y > 1490 > Snapshot Daemon on glusterfs- > server-003 49153 0 Y > 1563 > > Task Status of Volume glusterfs > > ------------------------------------------------------------------------------ > Task : Rebalance > ID : 0eaf6ad1-df95-48f4-b941-17488010ddcc > Status : failed > ``` > > Thanks, > Riccardo > -- --Atin From jahernan at redhat.com Wed Mar 27 13:08:08 2019 From: jahernan at redhat.com (Xavi Hernandez) Date: Wed, 27 Mar 2019 14:08:08 +0100 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks In-Reply-To: References: Message-ID: On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri wrote: > > > On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez > wrote: > >> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa < >> rgowdapp at redhat.com> wrote: >> >>> >>> >>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez >>> wrote: >>> >>>> Hi Raghavendra, >>>> >>>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa < >>>> rgowdapp at redhat.com> wrote: >>>> >>>>> All, >>>>> >>>>> Glusterfs cleans up POSIX locks held on an fd when the client/mount >>>>> through which those locks are held disconnects from bricks/server. This >>>>> helps Glusterfs to not run into a stale lock problem later (For eg., if >>>>> application unlocks while the connection was still down). However, this >>>>> means the lock is no longer exclusive as other applications/clients can >>>>> acquire the same lock. To communicate that locks are no longer valid, we >>>>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so >>>>> that any future operations on that fd will fail, forcing the application to >>>>> re-open the fd and re-acquire locks it needs [1]. >>>>> >>>> >>>> Wouldn't it be better to retake the locks when the brick is reconnected >>>> if the lock is still in use ? >>>> >>> >>> There is also a possibility that clients may never reconnect. That's >>> the primary reason why bricks assume the worst (client will not reconnect) >>> and cleanup the locks. >>> >> >> True, so it's fine to cleanup the locks. I'm not saying that locks >> shouldn't be released on disconnect. The assumption is that if the client >> has really died, it will also disconnect from other bricks, who will >> release the locks. So, eventually, another client will have enough quorum >> to attempt a lock that will succeed. In other words, if a client gets >> disconnected from too many bricks simultaneously (loses Quorum), then that >> client can be considered as bad and can return errors to the application. >> This should also cause to release the locks on the remaining connected >> bricks. >> >> On the other hand, if the disconnection is very short and the client has >> not died, it will keep enough locked files (it has quorum) to avoid other >> clients to successfully acquire a lock. In this case, if the brick is >> reconnected, all existing locks should be reacquired to recover the >> original state before the disconnection. >> >> >>> >>>> BTW, the referenced bug is not public. Should we open another bug to >>>> track this ? >>>> >>> >>> I've just opened up the comment to give enough context. I'll open a bug >>> upstream too. >>> >>> >>>> >>>> >>>>> >>>>> Note that with AFR/replicate in picture we can prevent errors to >>>>> application as long as Quorum number of children "never ever" lost >>>>> connection with bricks after locks have been acquired. I am using the term >>>>> "never ever" as locks are not healed back after re-connection and hence >>>>> first disconnect would've marked the fd bad and the fd remains so even >>>>> after re-connection happens. So, its not just Quorum number of children >>>>> "currently online", but Quorum number of children "never having >>>>> disconnected with bricks after locks are acquired". >>>>> >>>> >>>> I think this requisite is not feasible. In a distributed file system, >>>> sooner or later all bricks will be disconnected. It could be because of >>>> failures or because an upgrade is done, but it will happen. >>>> >>>> The difference here is how long are fd's kept open. If applications >>>> open and close files frequently enough (i.e. the fd is not kept open more >>>> time than it takes to have more than Quorum bricks disconnected) then >>>> there's no problem. The problem can only appear on applications that open >>>> files for a long time and also use posix locks. In this case, the only good >>>> solution I see is to retake the locks on brick reconnection. >>>> >>> >>> Agree. But lock-healing should be done only by HA layers like AFR/EC as >>> only they know whether there are enough online bricks to have prevented any >>> conflicting lock. Protocol/client itself doesn't have enough information to >>> do that. If its a plain distribute, I don't see a way to heal locks without >>> loosing the property of exclusivity of locks. >>> >> >> Lock-healing of locks acquired while a brick was disconnected need to be >> handled by AFR/EC. However, locks already present at the moment of >> disconnection could be recovered by client xlator itself as long as the >> file has not been closed (which client xlator already knows). >> > > What if another client (say mount-2) took locks at the time of disconnect > from mount-1 and modified the file and unlocked? client xlator doing the > heal may not be a good idea. > To avoid that we should ensure that any lock/unlocks are sent to the client, even if we know it's disconnected, so that client xlator can track them. The alternative is to duplicate and maintain code both on AFR and EC (and not sure if even in DHT depending on how we want to handle some cases). A similar thing could be done for open fd, since the current solution duplicates code in AFR and EC, but this is another topic... > >> >> Xavi >> >> >>> What I proposed is a short term solution. mid to long term solution >>> should be lock healing feature implemented in AFR/EC. In fact I had this >>> conversation with +Karampuri, Pranith before >>> posting this msg to ML. >>> >>> >>>> >>>>> However, this use case is not affected if the application don't >>>>> acquire any POSIX locks. So, I am interested in knowing >>>>> * whether your use cases use POSIX locks? >>>>> * Is it feasible for your application to re-open fds and re-acquire >>>>> locks on seeing EBADFD errors? >>>>> >>>> >>>> I think that many applications are not prepared to handle that. >>>> >>> >>> I too suspected that and in fact not too happy with the solution. But >>> went ahead with this mail as I heard implementing lock-heal in AFR will >>> take time and hence there are no alternative short term solutions. >>> >> >>> >>>> Xavi >>>> >>>> >>>>> >>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 >>>>> >>>>> regards, >>>>> Raghavendra >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> > > -- > Pranith > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkarampu at redhat.com Wed Mar 27 13:19:52 2019 From: pkarampu at redhat.com (Pranith Kumar Karampuri) Date: Wed, 27 Mar 2019 18:49:52 +0530 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks In-Reply-To: References: Message-ID: On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez wrote: > On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri < > pkarampu at redhat.com> wrote: > >> >> >> On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez >> wrote: >> >>> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa < >>> rgowdapp at redhat.com> wrote: >>> >>>> >>>> >>>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez >>>> wrote: >>>> >>>>> Hi Raghavendra, >>>>> >>>>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa < >>>>> rgowdapp at redhat.com> wrote: >>>>> >>>>>> All, >>>>>> >>>>>> Glusterfs cleans up POSIX locks held on an fd when the client/mount >>>>>> through which those locks are held disconnects from bricks/server. This >>>>>> helps Glusterfs to not run into a stale lock problem later (For eg., if >>>>>> application unlocks while the connection was still down). However, this >>>>>> means the lock is no longer exclusive as other applications/clients can >>>>>> acquire the same lock. To communicate that locks are no longer valid, we >>>>>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so >>>>>> that any future operations on that fd will fail, forcing the application to >>>>>> re-open the fd and re-acquire locks it needs [1]. >>>>>> >>>>> >>>>> Wouldn't it be better to retake the locks when the brick is >>>>> reconnected if the lock is still in use ? >>>>> >>>> >>>> There is also a possibility that clients may never reconnect. That's >>>> the primary reason why bricks assume the worst (client will not reconnect) >>>> and cleanup the locks. >>>> >>> >>> True, so it's fine to cleanup the locks. I'm not saying that locks >>> shouldn't be released on disconnect. The assumption is that if the client >>> has really died, it will also disconnect from other bricks, who will >>> release the locks. So, eventually, another client will have enough quorum >>> to attempt a lock that will succeed. In other words, if a client gets >>> disconnected from too many bricks simultaneously (loses Quorum), then that >>> client can be considered as bad and can return errors to the application. >>> This should also cause to release the locks on the remaining connected >>> bricks. >>> >>> On the other hand, if the disconnection is very short and the client has >>> not died, it will keep enough locked files (it has quorum) to avoid other >>> clients to successfully acquire a lock. In this case, if the brick is >>> reconnected, all existing locks should be reacquired to recover the >>> original state before the disconnection. >>> >>> >>>> >>>>> BTW, the referenced bug is not public. Should we open another bug to >>>>> track this ? >>>>> >>>> >>>> I've just opened up the comment to give enough context. I'll open a bug >>>> upstream too. >>>> >>>> >>>>> >>>>> >>>>>> >>>>>> Note that with AFR/replicate in picture we can prevent errors to >>>>>> application as long as Quorum number of children "never ever" lost >>>>>> connection with bricks after locks have been acquired. I am using the term >>>>>> "never ever" as locks are not healed back after re-connection and hence >>>>>> first disconnect would've marked the fd bad and the fd remains so even >>>>>> after re-connection happens. So, its not just Quorum number of children >>>>>> "currently online", but Quorum number of children "never having >>>>>> disconnected with bricks after locks are acquired". >>>>>> >>>>> >>>>> I think this requisite is not feasible. In a distributed file system, >>>>> sooner or later all bricks will be disconnected. It could be because of >>>>> failures or because an upgrade is done, but it will happen. >>>>> >>>>> The difference here is how long are fd's kept open. If applications >>>>> open and close files frequently enough (i.e. the fd is not kept open more >>>>> time than it takes to have more than Quorum bricks disconnected) then >>>>> there's no problem. The problem can only appear on applications that open >>>>> files for a long time and also use posix locks. In this case, the only good >>>>> solution I see is to retake the locks on brick reconnection. >>>>> >>>> >>>> Agree. But lock-healing should be done only by HA layers like AFR/EC as >>>> only they know whether there are enough online bricks to have prevented any >>>> conflicting lock. Protocol/client itself doesn't have enough information to >>>> do that. If its a plain distribute, I don't see a way to heal locks without >>>> loosing the property of exclusivity of locks. >>>> >>> >>> Lock-healing of locks acquired while a brick was disconnected need to be >>> handled by AFR/EC. However, locks already present at the moment of >>> disconnection could be recovered by client xlator itself as long as the >>> file has not been closed (which client xlator already knows). >>> >> >> What if another client (say mount-2) took locks at the time of disconnect >> from mount-1 and modified the file and unlocked? client xlator doing the >> heal may not be a good idea. >> > > To avoid that we should ensure that any lock/unlocks are sent to the > client, even if we know it's disconnected, so that client xlator can track > them. The alternative is to duplicate and maintain code both on AFR and EC > (and not sure if even in DHT depending on how we want to handle some > cases). > Didn't understand the solution. I wanted to highlight that client xlator by itself can't make a decision about healing locks because it doesn't know what happened on other replicas. If we have replica-3 volume and all 3 bricks get disconnected to their respective bricks. Now another mount process can take a lock on that file modify it and unlock. Now upon reconnection, the old mount process which had locks would think it always had the lock if client xlator independently tries to heal its own locks because file is not closed on it so far. But that is wrong. Let me know if it makes sense.... > A similar thing could be done for open fd, since the current solution > duplicates code in AFR and EC, but this is another topic... > > >> >>> >>> Xavi >>> >>> >>>> What I proposed is a short term solution. mid to long term solution >>>> should be lock healing feature implemented in AFR/EC. In fact I had this >>>> conversation with +Karampuri, Pranith before >>>> posting this msg to ML. >>>> >>>> >>>>> >>>>>> However, this use case is not affected if the application don't >>>>>> acquire any POSIX locks. So, I am interested in knowing >>>>>> * whether your use cases use POSIX locks? >>>>>> * Is it feasible for your application to re-open fds and re-acquire >>>>>> locks on seeing EBADFD errors? >>>>>> >>>>> >>>>> I think that many applications are not prepared to handle that. >>>>> >>>> >>>> I too suspected that and in fact not too happy with the solution. But >>>> went ahead with this mail as I heard implementing lock-heal in AFR will >>>> take time and hence there are no alternative short term solutions. >>>> >>> >>>> >>>>> Xavi >>>>> >>>>> >>>>>> >>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 >>>>>> >>>>>> regards, >>>>>> Raghavendra >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >> >> -- >> Pranith >> > -- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmgreene364 at gmail.com Wed Mar 27 13:39:49 2019 From: tmgreene364 at gmail.com (Tami Greene) Date: Wed, 27 Mar 2019 09:39:49 -0400 Subject: [Gluster-users] Inconsistent issues with a client Message-ID: The system is a 5 server, 20 brick distributed system with a hardware configured RAID 6 underneath with xfs as filesystem. This client is a data collection node which transfers data to specific directories within one of the gluster volumes. I have a client with submounted directories (glustervolume/project) rather than the entire volume. Some files can be transferred no problem, but others send an error about transport endpoint not connected. The transfer is handed by a rsync script triggered as a cron job. When remotely connected to this client, user access to these files does not always behave as they are set ? 2770 for directories and 440. Owners are not always able to move the files, processes ran as the owners are not always able to move files; root is not always allowed to move or delete these file. This process seemed to worked smoothly before adding another server and 4 storage bricks to the volume, logs indicate there were intermittent issues at least a month before the last server was added. While a new collection device has been streaming to this one machine, the issue started the day before. Is there another level for permissions and ownership that I am not aware of that needs to be sync?d? -- Tami -------------- next part -------------- An HTML attachment was scrubbed... URL: From greenet at ornl.gov Wed Mar 27 13:37:52 2019 From: greenet at ornl.gov (Greene, Tami McFarlin) Date: Wed, 27 Mar 2019 13:37:52 +0000 Subject: [Gluster-users] Issues with submounted directories on a client Message-ID: <6016F36B-F5E1-48EB-9EFF-EF9D8F419AD2@ornl.gov> The system is a 5 server, 20 brick distributed system with a hardware configured RAID 6 underneath with xfs as filesystem. This client is a data collection node which transfers data to specific directories within one of the gluster volumes. I have a client with submounted directories (glustervolume/project) rather than the entire volume. Some files can be transferred no problem, but others send an error about transport endpoint not connected. The transfer is handed by a rsync script triggered as a cron job. When remotely connected to this client, user access to these files does not always behave as they are set ? 2770 for directories and 440. Owners are not always able to move the files, processes ran as the owners are not always able to move files; root is not always allowed to move or delete these file. This process seemed to worked smoothly before adding another server and 4 storage bricks to the volume, logs indicate there were intermittent issues at least a month before the last server was added. While a new collection device has been streaming to this one machine, the issue started the day before. Is there another level for permissions and ownership that I am not aware of that needs to be sync?d? Tami Tami McFarlin Greene Lab Technician RF, Communications, and Intelligent Systems Group Electrical and Electronics System Research Division Oak Ridge National Laboratory Bldg. 3500, Rm. A15 greenet at ornl.gov (865) 643-0401 -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.talikov at gmail.com Wed Mar 27 14:09:41 2019 From: a.talikov at gmail.com (Alexey Talikov) Date: Wed, 27 Mar 2019 17:09:41 +0300 Subject: [Gluster-users] Gluster GEO replication fault after write over nfs-ganesha Message-ID: I have two clusters with dispersed volumes (2+1) with GEO replication It works fine till I use glusterfs-fuse, but as even one file written over nfs-ganesha replication goes to Fault and recovers after I remove this file (sometimes after stop/start) I think nfs-hanesha writes file in some way that produces problem with replication OSError: [Errno 61] No data available: '.gfid/9c9514ce-a310-4a1c-a87b-a800a32a99f8' but if I check over glusterfs mounted with aux-gfid-mount getfattr -n trusted.glusterfs.pathinfo -e text /mnt/TEST/.gfid/9c9514ce-a310-4a1c-a87b-a800a32a99f8 getfattr: Removing leading '/' from absolute path names # file: mnt/TEST/.gfid/9c9514ce-a310-4a1c-a87b-a800a32a99f8 trusted.glusterfs.pathinfo="( ( ))" File exists Details available here https://github.com/nfs-ganesha/nfs-ganesha/issues/408 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailinglists at lucassen.org Wed Mar 27 14:23:35 2019 From: mailinglists at lucassen.org (richard lucassen) Date: Wed, 27 Mar 2019 15:23:35 +0100 Subject: [Gluster-users] glusterfs unusable? Message-ID: <20190327152335.21df41c96bd99cb3d63efe05@lucassen.org> Hello list, glusterfs 5.4-1 on Debian Buster (both servers and clients) I'm quite new to GFS and it's an old problem I know. When running a simple "ls -alR" on a local directory containing 50MB and 3468 files it takes: real 0m0.567s user 0m0.084s sys 0m0.168s Same thing for a copy of that dir on GFS takes more than 5 seconds: real 0m5.557s user 0m0.128s sys 0m0.208s Ok. But from my workstation at home, an "ls -alR" of that directory takes more than half an hour and the upload is more than 2GB (no typo: TWO Gigabytes). To keep it simple, the ls of a few directories: $ time ls all xabc-db xabc-dc1 xabc-gluster xabc-mail xabc-otp xabc-smtp real 0m5.766s user 0m0.001s sys 0m0.003s it receives 56kB and sends 2.3 MB for a simple ls. This is weird isn't it? Why this huge upload? Changing these options mentioned here doesn't make any difference: https://lists.gluster.org/pipermail/gluster-users/2016-January/024865.html Anyone a hint? Or should I drop GFS? This is unusable IMHO. Richard. -- richard lucassen http://contact.xaq.nl/ From mhterres at gmail.com Wed Mar 27 14:37:14 2019 From: mhterres at gmail.com (Marcelo Terres) Date: Wed, 27 Mar 2019 14:37:14 +0000 Subject: [Gluster-users] glusterfs unusable? In-Reply-To: <20190327152335.21df41c96bd99cb3d63efe05@lucassen.org> References: <20190327152335.21df41c96bd99cb3d63efe05@lucassen.org> Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1673058 Regards, Marcelo H. Terres https://www.mundoopensource.com.br https://twitter.com/mhterres https://linkedin.com/in/marceloterres On Wed, 27 Mar 2019 at 14:32, richard lucassen wrote: > Hello list, > > glusterfs 5.4-1 on Debian Buster (both servers and clients) > > I'm quite new to GFS and it's an old problem I know. When running a > simple "ls -alR" on a local directory containing 50MB and 3468 files it > takes: > > real 0m0.567s > user 0m0.084s > sys 0m0.168s > > Same thing for a copy of that dir on GFS takes more than 5 seconds: > > real 0m5.557s > user 0m0.128s > sys 0m0.208s > > Ok. But from my workstation at home, an "ls -alR" of that directory > takes more than half an hour and the upload is more than 2GB (no typo: > TWO Gigabytes). To keep it simple, the ls of a few directories: > > $ time ls > all xabc-db xabc-dc1 xabc-gluster xabc-mail xabc-otp xabc-smtp > > real 0m5.766s > user 0m0.001s > sys 0m0.003s > > it receives 56kB and sends 2.3 MB for a simple ls. > > This is weird isn't it? Why this huge upload? > > Changing these options mentioned here doesn't make any difference: > > https://lists.gluster.org/pipermail/gluster-users/2016-January/024865.html > > Anyone a hint? Or should I drop GFS? This is unusable IMHO. > > Richard. > > -- > richard lucassen > http://contact.xaq.nl/ > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailinglists at lucassen.org Wed Mar 27 14:56:01 2019 From: mailinglists at lucassen.org (richard lucassen) Date: Wed, 27 Mar 2019 15:56:01 +0100 Subject: [Gluster-users] glusterfs unusable? In-Reply-To: References: <20190327152335.21df41c96bd99cb3d63efe05@lucassen.org> Message-ID: <20190327155601.d4ee4c35f8ed108cbc89a4b3@lucassen.org> On Wed, 27 Mar 2019 14:37:14 +0000 Marcelo Terres wrote: > https://bugzilla.redhat.com/show_bug.cgi?id=1673058 Ok, thnx, I missed that one (not used the proper search arguments I guess) Hope this will resolve the problem. There is a 5.5-1 in Debian experimental from the 25th of march, don't think that version will resolve the issue, there's no changelog AFAICS. I'll try to compile and apply the patches tonight or tomorrow. -- richard lucassen http://contact.xaq.nl/ From pgurusid at redhat.com Wed Mar 27 14:56:13 2019 From: pgurusid at redhat.com (Poornima Gurusiddaiah) Date: Wed, 27 Mar 2019 20:26:13 +0530 Subject: [Gluster-users] Prioritise local bricks for IO? In-Reply-To: References: <29221907.583.1553599314586.JavaMail.zimbra@li.nux.ro> Message-ID: This feature is not under active development as it was not used widely. AFAIK its not supported feature. +Nithya +Raghavendra for further clarifications. Regards, Poornima On Wed, Mar 27, 2019 at 12:33 PM Lucian wrote: > Oh, that's just what the doctor ordered! > Hope it works, thanks > > On 27 March 2019 03:15:57 GMT, Vlad Kopylov wrote: >> >> I don't remember if it still in works >> NUFA >> >> https://github.com/gluster/glusterfs-specs/blob/master/done/Features/nufa.md >> >> v >> >> On Tue, Mar 26, 2019 at 7:27 AM Nux! wrote: >> >>> Hello, >>> >>> I'm trying to set up a distributed backup storage (no replicas), but I'd >>> like to prioritise the local bricks for any IO done on the volume. >>> This will be a backup stor, so in other words, I'd like the files to be >>> written locally if there is space, so as to save the NICs for other traffic. >>> >>> Anyone knows how this might be achievable, if at all? >>> >>> -- >>> Sent from the Delta quadrant using Borg technology! >>> >>> Nux! >>> www.nux.ro >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From joe at julianfamily.org Wed Mar 27 14:53:55 2019 From: joe at julianfamily.org (Joe Julian) Date: Wed, 27 Mar 2019 07:53:55 -0700 Subject: [Gluster-users] glusterfs unusable? In-Reply-To: <20190327152335.21df41c96bd99cb3d63efe05@lucassen.org> References: <20190327152335.21df41c96bd99cb3d63efe05@lucassen.org> Message-ID: First, your statement and subject is hyperbolic and combative. In general it's best not to begin any approach for help with an uneducated attack on a community. GFS (Global File System) is an entirely different project but I'm going to assume you're in the right place and actually asking about GlusterFS. You haven't described your use case so I'll make an assumption that your intent is to sync files from your office to your home. I'll further guess that you're replicating one brick at home and the other at the office. Yes, this is generally an unusable use case for to latency and connectivity reasons. Your 2Gb transfer was very likely a self heal due to a connectivity problem from one of your clients. When your home client performed a lookup() of the files, it caught the discrepancy and fixed it. The latency is multiplied due to the very nature of clustering and your latent connection. For a more useful answer, I'd suggest describing your needs and asking for help. There is tons of experienced storage professionals here that are happy to share their knowledge and advice. On March 27, 2019 7:23:35 AM PDT, richard lucassen wrote: >Hello list, > >glusterfs 5.4-1 on Debian Buster (both servers and clients) > >I'm quite new to GFS and it's an old problem I know. When running a >simple "ls -alR" on a local directory containing 50MB and 3468 files it >takes: > >real 0m0.567s >user 0m0.084s >sys 0m0.168s > >Same thing for a copy of that dir on GFS takes more than 5 seconds: > >real 0m5.557s >user 0m0.128s >sys 0m0.208s > >Ok. But from my workstation at home, an "ls -alR" of that directory >takes more than half an hour and the upload is more than 2GB (no typo: >TWO Gigabytes). To keep it simple, the ls of a few directories: > >$ time ls >all xabc-db xabc-dc1 xabc-gluster xabc-mail xabc-otp xabc-smtp > >real 0m5.766s >user 0m0.001s >sys 0m0.003s > >it receives 56kB and sends 2.3 MB for a simple ls. > >This is weird isn't it? Why this huge upload? > >Changing these options mentioned here doesn't make any difference: > >https://lists.gluster.org/pipermail/gluster-users/2016-January/024865.html > >Anyone a hint? Or should I drop GFS? This is unusable IMHO. > >Richard. > >-- >richard lucassen >http://contact.xaq.nl/ >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >https://lists.gluster.org/mailman/listinfo/gluster-users -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jahernan at redhat.com Wed Mar 27 15:08:34 2019 From: jahernan at redhat.com (Xavi Hernandez) Date: Wed, 27 Mar 2019 16:08:34 +0100 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks In-Reply-To: References: Message-ID: On Wed, Mar 27, 2019 at 2:20 PM Pranith Kumar Karampuri wrote: > > > On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez > wrote: > >> On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri < >> pkarampu at redhat.com> wrote: >> >>> >>> >>> On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez >>> wrote: >>> >>>> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa < >>>> rgowdapp at redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez >>>>> wrote: >>>>> >>>>>> Hi Raghavendra, >>>>>> >>>>>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa < >>>>>> rgowdapp at redhat.com> wrote: >>>>>> >>>>>>> All, >>>>>>> >>>>>>> Glusterfs cleans up POSIX locks held on an fd when the client/mount >>>>>>> through which those locks are held disconnects from bricks/server. This >>>>>>> helps Glusterfs to not run into a stale lock problem later (For eg., if >>>>>>> application unlocks while the connection was still down). However, this >>>>>>> means the lock is no longer exclusive as other applications/clients can >>>>>>> acquire the same lock. To communicate that locks are no longer valid, we >>>>>>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so >>>>>>> that any future operations on that fd will fail, forcing the application to >>>>>>> re-open the fd and re-acquire locks it needs [1]. >>>>>>> >>>>>> >>>>>> Wouldn't it be better to retake the locks when the brick is >>>>>> reconnected if the lock is still in use ? >>>>>> >>>>> >>>>> There is also a possibility that clients may never reconnect. That's >>>>> the primary reason why bricks assume the worst (client will not reconnect) >>>>> and cleanup the locks. >>>>> >>>> >>>> True, so it's fine to cleanup the locks. I'm not saying that locks >>>> shouldn't be released on disconnect. The assumption is that if the client >>>> has really died, it will also disconnect from other bricks, who will >>>> release the locks. So, eventually, another client will have enough quorum >>>> to attempt a lock that will succeed. In other words, if a client gets >>>> disconnected from too many bricks simultaneously (loses Quorum), then that >>>> client can be considered as bad and can return errors to the application. >>>> This should also cause to release the locks on the remaining connected >>>> bricks. >>>> >>>> On the other hand, if the disconnection is very short and the client >>>> has not died, it will keep enough locked files (it has quorum) to avoid >>>> other clients to successfully acquire a lock. In this case, if the brick is >>>> reconnected, all existing locks should be reacquired to recover the >>>> original state before the disconnection. >>>> >>>> >>>>> >>>>>> BTW, the referenced bug is not public. Should we open another bug to >>>>>> track this ? >>>>>> >>>>> >>>>> I've just opened up the comment to give enough context. I'll open a >>>>> bug upstream too. >>>>> >>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Note that with AFR/replicate in picture we can prevent errors to >>>>>>> application as long as Quorum number of children "never ever" lost >>>>>>> connection with bricks after locks have been acquired. I am using the term >>>>>>> "never ever" as locks are not healed back after re-connection and hence >>>>>>> first disconnect would've marked the fd bad and the fd remains so even >>>>>>> after re-connection happens. So, its not just Quorum number of children >>>>>>> "currently online", but Quorum number of children "never having >>>>>>> disconnected with bricks after locks are acquired". >>>>>>> >>>>>> >>>>>> I think this requisite is not feasible. In a distributed file system, >>>>>> sooner or later all bricks will be disconnected. It could be because of >>>>>> failures or because an upgrade is done, but it will happen. >>>>>> >>>>>> The difference here is how long are fd's kept open. If applications >>>>>> open and close files frequently enough (i.e. the fd is not kept open more >>>>>> time than it takes to have more than Quorum bricks disconnected) then >>>>>> there's no problem. The problem can only appear on applications that open >>>>>> files for a long time and also use posix locks. In this case, the only good >>>>>> solution I see is to retake the locks on brick reconnection. >>>>>> >>>>> >>>>> Agree. But lock-healing should be done only by HA layers like AFR/EC >>>>> as only they know whether there are enough online bricks to have prevented >>>>> any conflicting lock. Protocol/client itself doesn't have enough >>>>> information to do that. If its a plain distribute, I don't see a way to >>>>> heal locks without loosing the property of exclusivity of locks. >>>>> >>>> >>>> Lock-healing of locks acquired while a brick was disconnected need to >>>> be handled by AFR/EC. However, locks already present at the moment of >>>> disconnection could be recovered by client xlator itself as long as the >>>> file has not been closed (which client xlator already knows). >>>> >>> >>> What if another client (say mount-2) took locks at the time of >>> disconnect from mount-1 and modified the file and unlocked? client xlator >>> doing the heal may not be a good idea. >>> >> >> To avoid that we should ensure that any lock/unlocks are sent to the >> client, even if we know it's disconnected, so that client xlator can track >> them. The alternative is to duplicate and maintain code both on AFR and EC >> (and not sure if even in DHT depending on how we want to handle some >> cases). >> > > Didn't understand the solution. I wanted to highlight that client xlator > by itself can't make a decision about healing locks because it doesn't know > what happened on other replicas. If we have replica-3 volume and all 3 > bricks get disconnected to their respective bricks. Now another mount > process can take a lock on that file modify it and unlock. Now upon > reconnection, the old mount process which had locks would think it always > had the lock if client xlator independently tries to heal its own locks > because file is not closed on it so far. But that is wrong. Let me know if > it makes sense.... > My point of view is that any configuration with these requirements will have an appropriate quorum value so that it's impossible to have two or more partitions of the nodes working at the same time. So, under this assumptions, mount-1 can be in two situations: 1. It has lost a single brick and it's still operational. The other bricks will continue locked and everything should work fine from the point of view of the application. Any other application trying to get a lock will fail due to lack of quorum. When the lost brick comes back and is reconnected, client xlator will still have the fd reference and locks taken (unless the application has released the lock or closed the fd, in which case client xlator should get notified and clear that information), so it should be able to recover the previous state. 2. It has lost 2 or 3 bricks. In this case mount-1 has lost quorum and any operation going to that file should fail with EIO. AFR should send a special request to client xlator so that it forgets any fd's and locks for that file. If bricks reconnect after that, no fd reopen or lock recovery will happen. Eventually the application should close the fd and retry later. This may succeed to not, depending on whether mount-2 has taken the lock already or not. So, it's true that client xlator doesn't know the state of the other bricks, but it doesn't need to as long as AFR/EC strictly enforces quorum and updates client xlator when quorum is lost. I haven't worked out all the details of this approach, but I think it should work and it's simpler to maintain than trying to do the same for AFR and EC. Xavi > >> A similar thing could be done for open fd, since the current solution >> duplicates code in AFR and EC, but this is another topic... >> >> >>> >>>> >>>> Xavi >>>> >>>> >>>>> What I proposed is a short term solution. mid to long term solution >>>>> should be lock healing feature implemented in AFR/EC. In fact I had this >>>>> conversation with +Karampuri, Pranith before >>>>> posting this msg to ML. >>>>> >>>>> >>>>>> >>>>>>> However, this use case is not affected if the application don't >>>>>>> acquire any POSIX locks. So, I am interested in knowing >>>>>>> * whether your use cases use POSIX locks? >>>>>>> * Is it feasible for your application to re-open fds and re-acquire >>>>>>> locks on seeing EBADFD errors? >>>>>>> >>>>>> >>>>>> I think that many applications are not prepared to handle that. >>>>>> >>>>> >>>>> I too suspected that and in fact not too happy with the solution. But >>>>> went ahead with this mail as I heard implementing lock-heal in AFR will >>>>> take time and hence there are no alternative short term solutions. >>>>> >>>> >>>>> >>>>>> Xavi >>>>>> >>>>>> >>>>>>> >>>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 >>>>>>> >>>>>>> regards, >>>>>>> Raghavendra >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> >>> >>> -- >>> Pranith >>> >> > > -- > Pranith > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brandon at thinkhuge.net Wed Mar 27 16:16:33 2019 From: brandon at thinkhuge.net (brandon at thinkhuge.net) Date: Wed, 27 Mar 2019 09:16:33 -0700 Subject: [Gluster-users] Transport endpoint is not connected failures in Message-ID: <009a01d4e4b8$731a8420$594f8c60$@thinkhuge.net> Hello Amar and list, I wanted to follow-up to confirm that upgrading to 5.5 seem to fix the "Transport endpoint is not connected failures" for us. We did not have any of these failures in this past weekend backups cycle. Thank you very much for fixing whatever was the problem. I also removed some volume config options. One or more of the settings was contributing to the slow directory listing. Here is our current volume info. [root at lonbaknode3 ~]# gluster volume info Volume Name: volbackups Type: Distribute Volume ID: 32bf4fe9-5450-49f8-b6aa-05471d3bdffa Status: Started Snapshot Count: 0 Number of Bricks: 8 Transport-type: tcp Bricks: Brick1: lonbaknode3.domain.net:/lvbackups/brick Brick2: lonbaknode4.domain.net:/lvbackups/brick Brick3: lonbaknode5.domain.net:/lvbackups/brick Brick4: lonbaknode6.domain.net:/lvbackups/brick Brick5: lonbaknode7.domain.net:/lvbackups/brick Brick6: lonbaknode8.domain.net:/lvbackups/brick Brick7: lonbaknode9.domain.net:/lvbackups/brick Brick8: lonbaknode10.domain.net:/lvbackups/brick Options Reconfigured: performance.io-thread-count: 32 performance.client-io-threads: on client.event-threads: 8 diagnostics.brick-sys-log-level: WARNING diagnostics.brick-log-level: WARNING performance.cache-max-file-size: 2MB performance.cache-size: 256MB cluster.min-free-disk: 1% nfs.disable: on transport.address-family: inet server.event-threads: 8 [root at lonbaknode3 ~]# -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailinglists at lucassen.org Wed Mar 27 16:17:26 2019 From: mailinglists at lucassen.org (richard lucassen) Date: Wed, 27 Mar 2019 17:17:26 +0100 Subject: [Gluster-users] glusterfs unusable? In-Reply-To: References: <20190327152335.21df41c96bd99cb3d63efe05@lucassen.org> Message-ID: <20190327171726.71dbb0bc68e5aa3d391ec340@lucassen.org> On Wed, 27 Mar 2019 07:53:55 -0700 Joe Julian wrote: Ok Joe, this is the situation: I have a glusterfs cluster using R630 Dell servers with 256GB of memory, a bunch of 3.4TB SSD's and Intel Xeon E5-2667 beasts. Using such power and seeing glusterfs taking 5 seconds for a simple "ls -alR" on a client directly connected over a 1Gbit cable to these servers is rather slow (this will be nominated for The Understement Of The Week) Rather slow, not unusable (and I haven't even added an arbiter to these two servers yet) OTOH, contrary to what you suggest, I'm not using a brick at home, it is just a linux client connecting to these two servers, ok, I admit, over a slow line. I was just looking how long it would take before a simple "ls -alR" would take. And when this takes almost an hour consuming 2GB upload then I think I can say it's quite unusable. So I'm sorry Joe, I don't want to spoil your day, but I have to say that Glusterfs (sorry for the wrong abbreviation) did spoil my day because of this issue. Such a bad behaviour would certainly be a show stopper. I hope the patches will resolve these issues. R. > First, your statement and subject is hyperbolic and combative. In > general it's best not to begin any approach for help with an > uneducated attack on a community. > > GFS (Global File System) is an entirely different project but I'm > going to assume you're in the right place and actually asking about > GlusterFS. > > You haven't described your use case so I'll make an assumption that > your intent is to sync files from your office to your home. I'll > further guess that you're replicating one brick at home and the other > at the office. > > Yes, this is generally an unusable use case for to latency and > connectivity reasons. Your 2Gb transfer was very likely a self heal > due to a connectivity problem from one of your clients. When your > home client performed a lookup() of the files, it caught the > discrepancy and fixed it. The latency is multiplied due to the very > nature of clustering and your latent connection. > > For a more useful answer, I'd suggest describing your needs and > asking for help. There is tons of experienced storage professionals > here that are happy to share their knowledge and advice. > > On March 27, 2019 7:23:35 AM PDT, richard lucassen > wrote: > >Hello list, > > > >glusterfs 5.4-1 on Debian Buster (both servers and clients) > > > >I'm quite new to GFS and it's an old problem I know. When running a > >simple "ls -alR" on a local directory containing 50MB and 3468 files > >it takes: > > > >real 0m0.567s > >user 0m0.084s > >sys 0m0.168s > > > >Same thing for a copy of that dir on GFS takes more than 5 seconds: > > > >real 0m5.557s > >user 0m0.128s > >sys 0m0.208s > > > >Ok. But from my workstation at home, an "ls -alR" of that directory > >takes more than half an hour and the upload is more than 2GB (no > >typo: TWO Gigabytes). To keep it simple, the ls of a few directories: > > > >$ time ls > >all xabc-db xabc-dc1 xabc-gluster xabc-mail xabc-otp xabc-smtp > > > >real 0m5.766s > >user 0m0.001s > >sys 0m0.003s > > > >it receives 56kB and sends 2.3 MB for a simple ls. > > > >This is weird isn't it? Why this huge upload? > > > >Changing these options mentioned here doesn't make any difference: > > > >https://lists.gluster.org/pipermail/gluster-users/2016-January/024865.html > > > >Anyone a hint? Or should I drop GFS? This is unusable IMHO. > > > >Richard. > > > >-- > >richard lucassen > >http://contact.xaq.nl/ > >_______________________________________________ > >Gluster-users mailing list > >Gluster-users at gluster.org > >https://lists.gluster.org/mailman/listinfo/gluster-users > -- richard lucassen http://contact.xaq.nl/ From riccardo.murri at gmail.com Wed Mar 27 16:30:19 2019 From: riccardo.murri at gmail.com (Riccardo Murri) Date: Wed, 27 Mar 2019 17:30:19 +0100 Subject: [Gluster-users] cannot add server back to cluster after reinstallation In-Reply-To: <552718422.11052209.1553689895752.JavaMail.zimbra@redhat.com> References: <87h8bok660.fsf@gmail.com> <552718422.11052209.1553689895752.JavaMail.zimbra@redhat.com> Message-ID: Thanks all for the help! The cluster has been up for a few hours now with no reported errors, so I guess replacement of the server went ultimately fine ;-) Ciao, R From pkarampu at redhat.com Wed Mar 27 17:26:41 2019 From: pkarampu at redhat.com (Pranith Kumar Karampuri) Date: Wed, 27 Mar 2019 22:56:41 +0530 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks In-Reply-To: References: Message-ID: On Wed, Mar 27, 2019 at 8:38 PM Xavi Hernandez wrote: > On Wed, Mar 27, 2019 at 2:20 PM Pranith Kumar Karampuri < > pkarampu at redhat.com> wrote: > >> >> >> On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez >> wrote: >> >>> On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri < >>> pkarampu at redhat.com> wrote: >>> >>>> >>>> >>>> On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez >>>> wrote: >>>> >>>>> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa < >>>>> rgowdapp at redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez >>>>>> wrote: >>>>>> >>>>>>> Hi Raghavendra, >>>>>>> >>>>>>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa < >>>>>>> rgowdapp at redhat.com> wrote: >>>>>>> >>>>>>>> All, >>>>>>>> >>>>>>>> Glusterfs cleans up POSIX locks held on an fd when the client/mount >>>>>>>> through which those locks are held disconnects from bricks/server. This >>>>>>>> helps Glusterfs to not run into a stale lock problem later (For eg., if >>>>>>>> application unlocks while the connection was still down). However, this >>>>>>>> means the lock is no longer exclusive as other applications/clients can >>>>>>>> acquire the same lock. To communicate that locks are no longer valid, we >>>>>>>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so >>>>>>>> that any future operations on that fd will fail, forcing the application to >>>>>>>> re-open the fd and re-acquire locks it needs [1]. >>>>>>>> >>>>>>> >>>>>>> Wouldn't it be better to retake the locks when the brick is >>>>>>> reconnected if the lock is still in use ? >>>>>>> >>>>>> >>>>>> There is also a possibility that clients may never reconnect. That's >>>>>> the primary reason why bricks assume the worst (client will not reconnect) >>>>>> and cleanup the locks. >>>>>> >>>>> >>>>> True, so it's fine to cleanup the locks. I'm not saying that locks >>>>> shouldn't be released on disconnect. The assumption is that if the client >>>>> has really died, it will also disconnect from other bricks, who will >>>>> release the locks. So, eventually, another client will have enough quorum >>>>> to attempt a lock that will succeed. In other words, if a client gets >>>>> disconnected from too many bricks simultaneously (loses Quorum), then that >>>>> client can be considered as bad and can return errors to the application. >>>>> This should also cause to release the locks on the remaining connected >>>>> bricks. >>>>> >>>>> On the other hand, if the disconnection is very short and the client >>>>> has not died, it will keep enough locked files (it has quorum) to avoid >>>>> other clients to successfully acquire a lock. In this case, if the brick is >>>>> reconnected, all existing locks should be reacquired to recover the >>>>> original state before the disconnection. >>>>> >>>>> >>>>>> >>>>>>> BTW, the referenced bug is not public. Should we open another bug to >>>>>>> track this ? >>>>>>> >>>>>> >>>>>> I've just opened up the comment to give enough context. I'll open a >>>>>> bug upstream too. >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Note that with AFR/replicate in picture we can prevent errors to >>>>>>>> application as long as Quorum number of children "never ever" lost >>>>>>>> connection with bricks after locks have been acquired. I am using the term >>>>>>>> "never ever" as locks are not healed back after re-connection and hence >>>>>>>> first disconnect would've marked the fd bad and the fd remains so even >>>>>>>> after re-connection happens. So, its not just Quorum number of children >>>>>>>> "currently online", but Quorum number of children "never having >>>>>>>> disconnected with bricks after locks are acquired". >>>>>>>> >>>>>>> >>>>>>> I think this requisite is not feasible. In a distributed file >>>>>>> system, sooner or later all bricks will be disconnected. It could be >>>>>>> because of failures or because an upgrade is done, but it will happen. >>>>>>> >>>>>>> The difference here is how long are fd's kept open. If applications >>>>>>> open and close files frequently enough (i.e. the fd is not kept open more >>>>>>> time than it takes to have more than Quorum bricks disconnected) then >>>>>>> there's no problem. The problem can only appear on applications that open >>>>>>> files for a long time and also use posix locks. In this case, the only good >>>>>>> solution I see is to retake the locks on brick reconnection. >>>>>>> >>>>>> >>>>>> Agree. But lock-healing should be done only by HA layers like AFR/EC >>>>>> as only they know whether there are enough online bricks to have prevented >>>>>> any conflicting lock. Protocol/client itself doesn't have enough >>>>>> information to do that. If its a plain distribute, I don't see a way to >>>>>> heal locks without loosing the property of exclusivity of locks. >>>>>> >>>>> >>>>> Lock-healing of locks acquired while a brick was disconnected need to >>>>> be handled by AFR/EC. However, locks already present at the moment of >>>>> disconnection could be recovered by client xlator itself as long as the >>>>> file has not been closed (which client xlator already knows). >>>>> >>>> >>>> What if another client (say mount-2) took locks at the time of >>>> disconnect from mount-1 and modified the file and unlocked? client xlator >>>> doing the heal may not be a good idea. >>>> >>> >>> To avoid that we should ensure that any lock/unlocks are sent to the >>> client, even if we know it's disconnected, so that client xlator can track >>> them. The alternative is to duplicate and maintain code both on AFR and EC >>> (and not sure if even in DHT depending on how we want to handle some >>> cases). >>> >> >> Didn't understand the solution. I wanted to highlight that client xlator >> by itself can't make a decision about healing locks because it doesn't know >> what happened on other replicas. If we have replica-3 volume and all 3 >> bricks get disconnected to their respective bricks. Now another mount >> process can take a lock on that file modify it and unlock. Now upon >> reconnection, the old mount process which had locks would think it always >> had the lock if client xlator independently tries to heal its own locks >> because file is not closed on it so far. But that is wrong. Let me know if >> it makes sense.... >> > > My point of view is that any configuration with these requirements will > have an appropriate quorum value so that it's impossible to have two or > more partitions of the nodes working at the same time. So, under this > assumptions, mount-1 can be in two situations: > > 1. It has lost a single brick and it's still operational. The other bricks > will continue locked and everything should work fine from the point of view > of the application. Any other application trying to get a lock will fail > due to lack of quorum. When the lost brick comes back and is reconnected, > client xlator will still have the fd reference and locks taken (unless the > application has released the lock or closed the fd, in which case client > xlator should get notified and clear that information), so it should be > able to recover the previous state. > Application could be in blocked state as well if it tries to get blocking lock. So as soon as a disconnect happens, the lock will be granted on that brick to one of the blocked locks. On the other two bricks it would still be blocked. Trying to heal that will require a new operation that is not already present in locks code, which should be able to tell client as well about either changing the lock state to blocked on that brick or to retry lock operation. > > 2. It has lost 2 or 3 bricks. In this case mount-1 has lost quorum and any > operation going to that file should fail with EIO. AFR should send a > special request to client xlator so that it forgets any fd's and locks for > that file. If bricks reconnect after that, no fd reopen or lock recovery > will happen. Eventually the application should close the fd and retry > later. This may succeed to not, depending on whether mount-2 has taken the > lock already or not. > > So, it's true that client xlator doesn't know the state of the other > bricks, but it doesn't need to as long as AFR/EC strictly enforces quorum > and updates client xlator when quorum is lost. > This part seems good. > > I haven't worked out all the details of this approach, but I think it > should work and it's simpler to maintain than trying to do the same for AFR > and EC. > Let us spend some time on this on #gluster-dev when you get some time tomorrow to figure out the complete solution which handles the corner cases too. > > Xavi > > >> >>> A similar thing could be done for open fd, since the current solution >>> duplicates code in AFR and EC, but this is another topic... >>> >>> >>>> >>>>> >>>>> Xavi >>>>> >>>>> >>>>>> What I proposed is a short term solution. mid to long term solution >>>>>> should be lock healing feature implemented in AFR/EC. In fact I had this >>>>>> conversation with +Karampuri, Pranith before >>>>>> posting this msg to ML. >>>>>> >>>>>> >>>>>>> >>>>>>>> However, this use case is not affected if the application don't >>>>>>>> acquire any POSIX locks. So, I am interested in knowing >>>>>>>> * whether your use cases use POSIX locks? >>>>>>>> * Is it feasible for your application to re-open fds and re-acquire >>>>>>>> locks on seeing EBADFD errors? >>>>>>>> >>>>>>> >>>>>>> I think that many applications are not prepared to handle that. >>>>>>> >>>>>> >>>>>> I too suspected that and in fact not too happy with the solution. But >>>>>> went ahead with this mail as I heard implementing lock-heal in AFR will >>>>>> take time and hence there are no alternative short term solutions. >>>>>> >>>>> >>>>>> >>>>>>> Xavi >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 >>>>>>>> >>>>>>>> regards, >>>>>>>> Raghavendra >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>>> >>>> >>>> -- >>>> Pranith >>>> >>> >> >> -- >> Pranith >> > -- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: From jahernan at redhat.com Wed Mar 27 18:24:57 2019 From: jahernan at redhat.com (Xavi Hernandez) Date: Wed, 27 Mar 2019 19:24:57 +0100 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks In-Reply-To: References: Message-ID: On Wed, 27 Mar 2019, 18:26 Pranith Kumar Karampuri, wrote: > > > On Wed, Mar 27, 2019 at 8:38 PM Xavi Hernandez > wrote: > >> On Wed, Mar 27, 2019 at 2:20 PM Pranith Kumar Karampuri < >> pkarampu at redhat.com> wrote: >> >>> >>> >>> On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez >>> wrote: >>> >>>> On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri < >>>> pkarampu at redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez >>>>> wrote: >>>>> >>>>>> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa < >>>>>> rgowdapp at redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Raghavendra, >>>>>>>> >>>>>>>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa < >>>>>>>> rgowdapp at redhat.com> wrote: >>>>>>>> >>>>>>>>> All, >>>>>>>>> >>>>>>>>> Glusterfs cleans up POSIX locks held on an fd when the >>>>>>>>> client/mount through which those locks are held disconnects from >>>>>>>>> bricks/server. This helps Glusterfs to not run into a stale lock problem >>>>>>>>> later (For eg., if application unlocks while the connection was still >>>>>>>>> down). However, this means the lock is no longer exclusive as other >>>>>>>>> applications/clients can acquire the same lock. To communicate that locks >>>>>>>>> are no longer valid, we are planning to mark the fd (which has POSIX locks) >>>>>>>>> bad on a disconnect so that any future operations on that fd will fail, >>>>>>>>> forcing the application to re-open the fd and re-acquire locks it needs [1]. >>>>>>>>> >>>>>>>> >>>>>>>> Wouldn't it be better to retake the locks when the brick is >>>>>>>> reconnected if the lock is still in use ? >>>>>>>> >>>>>>> >>>>>>> There is also a possibility that clients may never reconnect. >>>>>>> That's the primary reason why bricks assume the worst (client will not >>>>>>> reconnect) and cleanup the locks. >>>>>>> >>>>>> >>>>>> True, so it's fine to cleanup the locks. I'm not saying that locks >>>>>> shouldn't be released on disconnect. The assumption is that if the client >>>>>> has really died, it will also disconnect from other bricks, who will >>>>>> release the locks. So, eventually, another client will have enough quorum >>>>>> to attempt a lock that will succeed. In other words, if a client gets >>>>>> disconnected from too many bricks simultaneously (loses Quorum), then that >>>>>> client can be considered as bad and can return errors to the application. >>>>>> This should also cause to release the locks on the remaining connected >>>>>> bricks. >>>>>> >>>>>> On the other hand, if the disconnection is very short and the client >>>>>> has not died, it will keep enough locked files (it has quorum) to avoid >>>>>> other clients to successfully acquire a lock. In this case, if the brick is >>>>>> reconnected, all existing locks should be reacquired to recover the >>>>>> original state before the disconnection. >>>>>> >>>>>> >>>>>>> >>>>>>>> BTW, the referenced bug is not public. Should we open another bug >>>>>>>> to track this ? >>>>>>>> >>>>>>> >>>>>>> I've just opened up the comment to give enough context. I'll open a >>>>>>> bug upstream too. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Note that with AFR/replicate in picture we can prevent errors to >>>>>>>>> application as long as Quorum number of children "never ever" lost >>>>>>>>> connection with bricks after locks have been acquired. I am using the term >>>>>>>>> "never ever" as locks are not healed back after re-connection and hence >>>>>>>>> first disconnect would've marked the fd bad and the fd remains so even >>>>>>>>> after re-connection happens. So, its not just Quorum number of children >>>>>>>>> "currently online", but Quorum number of children "never having >>>>>>>>> disconnected with bricks after locks are acquired". >>>>>>>>> >>>>>>>> >>>>>>>> I think this requisite is not feasible. In a distributed file >>>>>>>> system, sooner or later all bricks will be disconnected. It could be >>>>>>>> because of failures or because an upgrade is done, but it will happen. >>>>>>>> >>>>>>>> The difference here is how long are fd's kept open. If applications >>>>>>>> open and close files frequently enough (i.e. the fd is not kept open more >>>>>>>> time than it takes to have more than Quorum bricks disconnected) then >>>>>>>> there's no problem. The problem can only appear on applications that open >>>>>>>> files for a long time and also use posix locks. In this case, the only good >>>>>>>> solution I see is to retake the locks on brick reconnection. >>>>>>>> >>>>>>> >>>>>>> Agree. But lock-healing should be done only by HA layers like AFR/EC >>>>>>> as only they know whether there are enough online bricks to have prevented >>>>>>> any conflicting lock. Protocol/client itself doesn't have enough >>>>>>> information to do that. If its a plain distribute, I don't see a way to >>>>>>> heal locks without loosing the property of exclusivity of locks. >>>>>>> >>>>>> >>>>>> Lock-healing of locks acquired while a brick was disconnected need to >>>>>> be handled by AFR/EC. However, locks already present at the moment of >>>>>> disconnection could be recovered by client xlator itself as long as the >>>>>> file has not been closed (which client xlator already knows). >>>>>> >>>>> >>>>> What if another client (say mount-2) took locks at the time of >>>>> disconnect from mount-1 and modified the file and unlocked? client xlator >>>>> doing the heal may not be a good idea. >>>>> >>>> >>>> To avoid that we should ensure that any lock/unlocks are sent to the >>>> client, even if we know it's disconnected, so that client xlator can track >>>> them. The alternative is to duplicate and maintain code both on AFR and EC >>>> (and not sure if even in DHT depending on how we want to handle some >>>> cases). >>>> >>> >>> Didn't understand the solution. I wanted to highlight that client xlator >>> by itself can't make a decision about healing locks because it doesn't know >>> what happened on other replicas. If we have replica-3 volume and all 3 >>> bricks get disconnected to their respective bricks. Now another mount >>> process can take a lock on that file modify it and unlock. Now upon >>> reconnection, the old mount process which had locks would think it always >>> had the lock if client xlator independently tries to heal its own locks >>> because file is not closed on it so far. But that is wrong. Let me know if >>> it makes sense.... >>> >> >> My point of view is that any configuration with these requirements will >> have an appropriate quorum value so that it's impossible to have two or >> more partitions of the nodes working at the same time. So, under this >> assumptions, mount-1 can be in two situations: >> >> 1. It has lost a single brick and it's still operational. The other >> bricks will continue locked and everything should work fine from the point >> of view of the application. Any other application trying to get a lock will >> fail due to lack of quorum. When the lost brick comes back and is >> reconnected, client xlator will still have the fd reference and locks taken >> (unless the application has released the lock or closed the fd, in which >> case client xlator should get notified and clear that information), so it >> should be able to recover the previous state. >> > > Application could be in blocked state as well if it tries to get blocking > lock. So as soon as a disconnect happens, the lock will be granted on that > brick to one of the blocked locks. On the other two bricks it would still > be blocked. Trying to heal that will require a new operation that is not > already present in locks code, which should be able to tell client as well > about either changing the lock state to blocked on that brick or to retry > lock operation. > Yes, but this problem exists even if the lock-heal is done by AFR/EC. This is something that needs to be solved anyway, but it's independent of who does the lock-heal. > >> >> 2. It has lost 2 or 3 bricks. In this case mount-1 has lost quorum and >> any operation going to that file should fail with EIO. AFR should send a >> special request to client xlator so that it forgets any fd's and locks for >> that file. If bricks reconnect after that, no fd reopen or lock recovery >> will happen. Eventually the application should close the fd and retry >> later. This may succeed to not, depending on whether mount-2 has taken the >> lock already or not. >> >> So, it's true that client xlator doesn't know the state of the other >> bricks, but it doesn't need to as long as AFR/EC strictly enforces quorum >> and updates client xlator when quorum is lost. >> > > This part seems good. > > >> >> I haven't worked out all the details of this approach, but I think it >> should work and it's simpler to maintain than trying to do the same for AFR >> and EC. >> > > Let us spend some time on this on #gluster-dev when you get some time > tomorrow to figure out the complete solution which handles the corner cases > too. > > >> >> Xavi >> >> >>> >>>> A similar thing could be done for open fd, since the current solution >>>> duplicates code in AFR and EC, but this is another topic... >>>> >>>> >>>>> >>>>>> >>>>>> Xavi >>>>>> >>>>>> >>>>>>> What I proposed is a short term solution. mid to long term solution >>>>>>> should be lock healing feature implemented in AFR/EC. In fact I had this >>>>>>> conversation with +Karampuri, Pranith before >>>>>>> posting this msg to ML. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> However, this use case is not affected if the application don't >>>>>>>>> acquire any POSIX locks. So, I am interested in knowing >>>>>>>>> * whether your use cases use POSIX locks? >>>>>>>>> * Is it feasible for your application to re-open fds and >>>>>>>>> re-acquire locks on seeing EBADFD errors? >>>>>>>>> >>>>>>>> >>>>>>>> I think that many applications are not prepared to handle that. >>>>>>>> >>>>>>> >>>>>>> I too suspected that and in fact not too happy with the solution. >>>>>>> But went ahead with this mail as I heard implementing lock-heal in AFR >>>>>>> will take time and hence there are no alternative short term solutions. >>>>>>> >>>>>> >>>>>>> >>>>>>>> Xavi >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 >>>>>>>>> >>>>>>>>> regards, >>>>>>>>> Raghavendra >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Gluster-users mailing list >>>>>>>>> Gluster-users at gluster.org >>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>>>> >>>>> >>>>> -- >>>>> Pranith >>>>> >>>> >>> >>> -- >>> Pranith >>> >> > > -- > Pranith > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sankarshan.mukhopadhyay at gmail.com Wed Mar 27 23:57:49 2019 From: sankarshan.mukhopadhyay at gmail.com (Sankarshan Mukhopadhyay) Date: Thu, 28 Mar 2019 05:27:49 +0530 Subject: [Gluster-users] Transport endpoint is not connected failures in In-Reply-To: <009a01d4e4b8$731a8420$594f8c60$@thinkhuge.net> References: <009a01d4e4b8$731a8420$594f8c60$@thinkhuge.net> Message-ID: On Wed, Mar 27, 2019 at 9:46 PM wrote: > > Hello Amar and list, > > > > I wanted to follow-up to confirm that upgrading to 5.5 seem to fix the ?Transport endpoint is not connected failures? for us. > > > > We did not have any of these failures in this past weekend backups cycle. > > > > Thank you very much for fixing whatever was the problem. As always, thank you for circling back to the list and sharing that the issues have been addressed. > > I also removed some volume config options. One or more of the settings was contributing to the slow directory listing. > > > > Here is our current volume info. > This is very useful! > > [root at lonbaknode3 ~]# gluster volume info > > > > Volume Name: volbackups > > Type: Distribute > > Volume ID: 32bf4fe9-5450-49f8-b6aa-05471d3bdffa > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 8 > > Transport-type: tcp > > Bricks: > > Brick1: lonbaknode3.domain.net:/lvbackups/brick > > Brick2: lonbaknode4.domain.net:/lvbackups/brick > > Brick3: lonbaknode5.domain.net:/lvbackups/brick > > Brick4: lonbaknode6.domain.net:/lvbackups/brick > > Brick5: lonbaknode7.domain.net:/lvbackups/brick > > Brick6: lonbaknode8.domain.net:/lvbackups/brick > > Brick7: lonbaknode9.domain.net:/lvbackups/brick > > Brick8: lonbaknode10.domain.net:/lvbackups/brick > > Options Reconfigured: > > performance.io-thread-count: 32 > > performance.client-io-threads: on > > client.event-threads: 8 > > diagnostics.brick-sys-log-level: WARNING > > diagnostics.brick-log-level: WARNING > > performance.cache-max-file-size: 2MB > > performance.cache-size: 256MB > > cluster.min-free-disk: 1% > > nfs.disable: on > > transport.address-family: inet > > server.event-threads: 8 > > [root at lonbaknode3 ~]# > From sankarshan.mukhopadhyay at gmail.com Thu Mar 28 00:00:44 2019 From: sankarshan.mukhopadhyay at gmail.com (Sankarshan Mukhopadhyay) Date: Thu, 28 Mar 2019 05:30:44 +0530 Subject: [Gluster-users] [Gluster-infra] Gluster HA In-Reply-To: <7fee8396-f524-68ef-3eab-e7c5461c9bbd@ingtegration.com> References: <7fee8396-f524-68ef-3eab-e7c5461c9bbd@ingtegration.com> Message-ID: [This email was originally posted to the gluster-infra list. Since that list is used for coordination between members who work on the infrastructure of the project, I am redirecting it to gluster-users for better visibility and responses.] On Thu, Mar 28, 2019 at 1:13 AM Guy Boisvert wrote: > > Hi, > > New to this mailing list. I'm seeking people advice for GlusterFS > HA in the context of KVM Virtual Machines (VM) storage. We have 3 x KVM > servers that use a 3 x GlusterFS nodes. The Volumes are 3 way replicate. > > My question is: You guys, what is your network architecture / setup > for GlusterFS HA? I read many articles on the internet. Many people are > talking about bonding to a switch but i don't consider this as a good > solution. I'd like to have Gluster and KVM servers linked to at least 2 > switches to have switch / wire and network car redundancy. > > I saw people using 2 x dumb switches with bonding mode 6 on their > servers with mii monitoring. It seems to be about good but it could > append that mii is up but frames / packets won't flow. So it this case, > i can't imagine how the servers would handle this. > > Another setup is dual dumb switches and running Quagga on the > servers (OSPF / ECMP). This seems to be the best setup, what do you > think? Do you have experience with one of those setups? What are your > thoughts on this? Ah and lastly, how can i search in the list? > > > Thanks! > > > Guy > > -- > Guy Boisvert, ing. > IngTegration inc. > http://www.ingtegration.com > https://www.linkedin.com/pub/guy-boisvert/7/48/899/fr > > AVIS DE CONFIDENTIALITE : ce message peut contenir des > renseignements confidentiels appartenant exclusivement a > IngTegration Inc. ou a ses filiales. Si vous n'etes pas > le destinataire indique ou prevu dans ce message (ou > responsable de livrer ce message a la personne indiquee ou > prevue) ou si vous pensez que ce message vous a ete adresse > par erreur, vous ne pouvez pas utiliser ou reproduire ce > message, ni le livrer a quelqu'un d'autre. Dans ce cas, vous > devez le detruire et vous etes prie d'avertir l'expediteur > en repondant au courriel. > > CONFIDENTIALITY NOTICE : Proprietary/Confidential Information > belonging to IngTegration Inc. and its affiliates may be > contained in this message. If you are not a recipient > indicated or intended in this message (or responsible for > delivery of this message to such person), or you think for > any reason that this message may have been addressed to you > in error, you may not use or copy or deliver this message to > anyone else. In such case, you should destroy this message > and are asked to notify the sender by reply email. From thing.thing at gmail.com Thu Mar 28 00:04:41 2019 From: thing.thing at gmail.com (Thing) Date: Thu, 28 Mar 2019 13:04:41 +1300 Subject: [Gluster-users] Transport endpoint is not connected failures in In-Reply-To: References: <009a01d4e4b8$731a8420$594f8c60$@thinkhuge.net> Message-ID: I have this issue, for a few days with my new setup. I will have to get back to you on versions but it was centos7.6 patched yesterday (27/3/2019). On Thu, 28 Mar 2019 at 12:58, Sankarshan Mukhopadhyay < sankarshan.mukhopadhyay at gmail.com> wrote: > On Wed, Mar 27, 2019 at 9:46 PM wrote: > > > > Hello Amar and list, > > > > > > > > I wanted to follow-up to confirm that upgrading to 5.5 seem to fix the > ?Transport endpoint is not connected failures? for us. > > > > > > > > We did not have any of these failures in this past weekend backups cycle. > > > > > > > > Thank you very much for fixing whatever was the problem. > > As always, thank you for circling back to the list and sharing that > the issues have been addressed. > > > > I also removed some volume config options. One or more of the settings > was contributing to the slow directory listing. > > > > > > > > Here is our current volume info. > > > > This is very useful! > > > > > [root at lonbaknode3 ~]# gluster volume info > > > > > > > > Volume Name: volbackups > > > > Type: Distribute > > > > Volume ID: 32bf4fe9-5450-49f8-b6aa-05471d3bdffa > > > > Status: Started > > > > Snapshot Count: 0 > > > > Number of Bricks: 8 > > > > Transport-type: tcp > > > > Bricks: > > > > Brick1: lonbaknode3.domain.net:/lvbackups/brick > > > > Brick2: lonbaknode4.domain.net:/lvbackups/brick > > > > Brick3: lonbaknode5.domain.net:/lvbackups/brick > > > > Brick4: lonbaknode6.domain.net:/lvbackups/brick > > > > Brick5: lonbaknode7.domain.net:/lvbackups/brick > > > > Brick6: lonbaknode8.domain.net:/lvbackups/brick > > > > Brick7: lonbaknode9.domain.net:/lvbackups/brick > > > > Brick8: lonbaknode10.domain.net:/lvbackups/brick > > > > Options Reconfigured: > > > > performance.io-thread-count: 32 > > > > performance.client-io-threads: on > > > > client.event-threads: 8 > > > > diagnostics.brick-sys-log-level: WARNING > > > > diagnostics.brick-log-level: WARNING > > > > performance.cache-max-file-size: 2MB > > > > performance.cache-size: 256MB > > > > cluster.min-free-disk: 1% > > > > nfs.disable: on > > > > transport.address-family: inet > > > > server.event-threads: 8 > > > > [root at lonbaknode3 ~]# > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Thu Mar 28 01:54:03 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Thu, 28 Mar 2019 07:24:03 +0530 Subject: [Gluster-users] Transport endpoint is not connected failures in In-Reply-To: <009a01d4e4b8$731a8420$594f8c60$@thinkhuge.net> References: <009a01d4e4b8$731a8420$594f8c60$@thinkhuge.net> Message-ID: On Wed, Mar 27, 2019 at 9:46 PM wrote: > Hello Amar and list, > > > > I wanted to follow-up to confirm that upgrading to 5.5 seem to fix the > ?Transport endpoint is not connected failures? for us. > What was the version you saw failures in? Were there any logs matching with the pattern "ping_timer_expired" earlier? > > We did not have any of these failures in this past weekend backups cycle. > > > > Thank you very much for fixing whatever was the problem. > > > > I also removed some volume config options. One or more of the settings > was contributing to the slow directory listing. > > > > Here is our current volume info. > > > > [root at lonbaknode3 ~]# gluster volume info > > > > Volume Name: volbackups > > Type: Distribute > > Volume ID: 32bf4fe9-5450-49f8-b6aa-05471d3bdffa > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 8 > > Transport-type: tcp > > Bricks: > > Brick1: lonbaknode3.domain.net:/lvbackups/brick > > Brick2: lonbaknode4.domain.net:/lvbackups/brick > > Brick3: lonbaknode5.domain.net:/lvbackups/brick > > Brick4: lonbaknode6.domain.net:/lvbackups/brick > > Brick5: lonbaknode7.domain.net:/lvbackups/brick > > Brick6: lonbaknode8.domain.net:/lvbackups/brick > > Brick7: lonbaknode9.domain.net:/lvbackups/brick > > Brick8: lonbaknode10.domain.net:/lvbackups/brick > > Options Reconfigured: > > performance.io-thread-count: 32 > > performance.client-io-threads: on > > client.event-threads: 8 > > diagnostics.brick-sys-log-level: WARNING > > diagnostics.brick-log-level: WARNING > > performance.cache-max-file-size: 2MB > > performance.cache-size: 256MB > > cluster.min-free-disk: 1% > > nfs.disable: on > > transport.address-family: inet > > server.event-threads: 8 > > [root at lonbaknode3 ~]# > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Thu Mar 28 02:04:48 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Thu, 28 Mar 2019 07:34:48 +0530 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks In-Reply-To: References: Message-ID: On Wed, Mar 27, 2019 at 8:38 PM Xavi Hernandez wrote: > On Wed, Mar 27, 2019 at 2:20 PM Pranith Kumar Karampuri < > pkarampu at redhat.com> wrote: > >> >> >> On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez >> wrote: >> >>> On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri < >>> pkarampu at redhat.com> wrote: >>> >>>> >>>> >>>> On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez >>>> wrote: >>>> >>>>> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa < >>>>> rgowdapp at redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez >>>>>> wrote: >>>>>> >>>>>>> Hi Raghavendra, >>>>>>> >>>>>>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa < >>>>>>> rgowdapp at redhat.com> wrote: >>>>>>> >>>>>>>> All, >>>>>>>> >>>>>>>> Glusterfs cleans up POSIX locks held on an fd when the client/mount >>>>>>>> through which those locks are held disconnects from bricks/server. This >>>>>>>> helps Glusterfs to not run into a stale lock problem later (For eg., if >>>>>>>> application unlocks while the connection was still down). However, this >>>>>>>> means the lock is no longer exclusive as other applications/clients can >>>>>>>> acquire the same lock. To communicate that locks are no longer valid, we >>>>>>>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so >>>>>>>> that any future operations on that fd will fail, forcing the application to >>>>>>>> re-open the fd and re-acquire locks it needs [1]. >>>>>>>> >>>>>>> >>>>>>> Wouldn't it be better to retake the locks when the brick is >>>>>>> reconnected if the lock is still in use ? >>>>>>> >>>>>> >>>>>> There is also a possibility that clients may never reconnect. That's >>>>>> the primary reason why bricks assume the worst (client will not reconnect) >>>>>> and cleanup the locks. >>>>>> >>>>> >>>>> True, so it's fine to cleanup the locks. I'm not saying that locks >>>>> shouldn't be released on disconnect. The assumption is that if the client >>>>> has really died, it will also disconnect from other bricks, who will >>>>> release the locks. So, eventually, another client will have enough quorum >>>>> to attempt a lock that will succeed. In other words, if a client gets >>>>> disconnected from too many bricks simultaneously (loses Quorum), then that >>>>> client can be considered as bad and can return errors to the application. >>>>> This should also cause to release the locks on the remaining connected >>>>> bricks. >>>>> >>>>> On the other hand, if the disconnection is very short and the client >>>>> has not died, it will keep enough locked files (it has quorum) to avoid >>>>> other clients to successfully acquire a lock. In this case, if the brick is >>>>> reconnected, all existing locks should be reacquired to recover the >>>>> original state before the disconnection. >>>>> >>>>> >>>>>> >>>>>>> BTW, the referenced bug is not public. Should we open another bug to >>>>>>> track this ? >>>>>>> >>>>>> >>>>>> I've just opened up the comment to give enough context. I'll open a >>>>>> bug upstream too. >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Note that with AFR/replicate in picture we can prevent errors to >>>>>>>> application as long as Quorum number of children "never ever" lost >>>>>>>> connection with bricks after locks have been acquired. I am using the term >>>>>>>> "never ever" as locks are not healed back after re-connection and hence >>>>>>>> first disconnect would've marked the fd bad and the fd remains so even >>>>>>>> after re-connection happens. So, its not just Quorum number of children >>>>>>>> "currently online", but Quorum number of children "never having >>>>>>>> disconnected with bricks after locks are acquired". >>>>>>>> >>>>>>> >>>>>>> I think this requisite is not feasible. In a distributed file >>>>>>> system, sooner or later all bricks will be disconnected. It could be >>>>>>> because of failures or because an upgrade is done, but it will happen. >>>>>>> >>>>>>> The difference here is how long are fd's kept open. If applications >>>>>>> open and close files frequently enough (i.e. the fd is not kept open more >>>>>>> time than it takes to have more than Quorum bricks disconnected) then >>>>>>> there's no problem. The problem can only appear on applications that open >>>>>>> files for a long time and also use posix locks. In this case, the only good >>>>>>> solution I see is to retake the locks on brick reconnection. >>>>>>> >>>>>> >>>>>> Agree. But lock-healing should be done only by HA layers like AFR/EC >>>>>> as only they know whether there are enough online bricks to have prevented >>>>>> any conflicting lock. Protocol/client itself doesn't have enough >>>>>> information to do that. If its a plain distribute, I don't see a way to >>>>>> heal locks without loosing the property of exclusivity of locks. >>>>>> >>>>> >>>>> Lock-healing of locks acquired while a brick was disconnected need to >>>>> be handled by AFR/EC. However, locks already present at the moment of >>>>> disconnection could be recovered by client xlator itself as long as the >>>>> file has not been closed (which client xlator already knows). >>>>> >>>> >>>> What if another client (say mount-2) took locks at the time of >>>> disconnect from mount-1 and modified the file and unlocked? client xlator >>>> doing the heal may not be a good idea. >>>> >>> >>> To avoid that we should ensure that any lock/unlocks are sent to the >>> client, even if we know it's disconnected, so that client xlator can track >>> them. The alternative is to duplicate and maintain code both on AFR and EC >>> (and not sure if even in DHT depending on how we want to handle some >>> cases). >>> >> >> Didn't understand the solution. I wanted to highlight that client xlator >> by itself can't make a decision about healing locks because it doesn't know >> what happened on other replicas. If we have replica-3 volume and all 3 >> bricks get disconnected to their respective bricks. Now another mount >> process can take a lock on that file modify it and unlock. Now upon >> reconnection, the old mount process which had locks would think it always >> had the lock if client xlator independently tries to heal its own locks >> because file is not closed on it so far. But that is wrong. Let me know if >> it makes sense.... >> > > My point of view is that any configuration with these requirements will > have an appropriate quorum value so that it's impossible to have two or > more partitions of the nodes working at the same time. So, under this > assumptions, mount-1 can be in two situations: > > 1. It has lost a single brick and it's still operational. The other bricks > will continue locked and everything should work fine from the point of view > of the application. Any other application trying to get a lock will fail > due to lack of quorum. When the lost brick comes back and is reconnected, > client xlator will still have the fd reference and locks taken (unless the > application has released the lock or closed the fd, in which case client > xlator should get notified and clear that information), so it should be > able to recover the previous state. > > 2. It has lost 2 or 3 bricks. In this case mount-1 has lost quorum and any > operation going to that file should fail with EIO. AFR should send a > special request to client xlator so that it forgets any fd's and locks for > that file. If bricks reconnect after that, no fd reopen or lock recovery > will happen. Eventually the application should close the fd and retry > later. This may succeed to not, depending on whether mount-2 has taken the > lock already or not. > > So, it's true that client xlator doesn't know the state of the other > bricks, but it doesn't need to as long as AFR/EC strictly enforces quorum > and updates client xlator when quorum is lost. > Just curious. Is there any reason why you think delegating the actual responsibility of re-opening or forgetting the locks to protocol/client is better when compared to AFR/EC doing the actual work of re-opening files and reacquiring locks? Asking this because, in the case of plain distribute, DHT will also have to indicate Quorum loss on every disconnect (as Quorum consisted of just 1 brick). >From what I understand, the design is the same one which me, Pranith, Anoop and Vijay had discussed (in essence) but varies in implementation details. > I haven't worked out all the details of this approach, but I think it > should work and it's simpler to maintain than trying to do the same for AFR > and EC. > > Xavi > > >> >>> A similar thing could be done for open fd, since the current solution >>> duplicates code in AFR and EC, but this is another topic... >>> >>> >>>> >>>>> >>>>> Xavi >>>>> >>>>> >>>>>> What I proposed is a short term solution. mid to long term solution >>>>>> should be lock healing feature implemented in AFR/EC. In fact I had this >>>>>> conversation with +Karampuri, Pranith before >>>>>> posting this msg to ML. >>>>>> >>>>>> >>>>>>> >>>>>>>> However, this use case is not affected if the application don't >>>>>>>> acquire any POSIX locks. So, I am interested in knowing >>>>>>>> * whether your use cases use POSIX locks? >>>>>>>> * Is it feasible for your application to re-open fds and re-acquire >>>>>>>> locks on seeing EBADFD errors? >>>>>>>> >>>>>>> >>>>>>> I think that many applications are not prepared to handle that. >>>>>>> >>>>>> >>>>>> I too suspected that and in fact not too happy with the solution. But >>>>>> went ahead with this mail as I heard implementing lock-heal in AFR will >>>>>> take time and hence there are no alternative short term solutions. >>>>>> >>>>> >>>>>> >>>>>>> Xavi >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 >>>>>>>> >>>>>>>> regards, >>>>>>>> Raghavendra >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>>> >>>> >>>> -- >>>> Pranith >>>> >>> >> >> -- >> Pranith >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Thu Mar 28 03:12:47 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Thu, 28 Mar 2019 08:42:47 +0530 Subject: [Gluster-users] Transport endpoint is not connected failures in In-Reply-To: <009a01d4e4b8$731a8420$594f8c60$@thinkhuge.net> References: <009a01d4e4b8$731a8420$594f8c60$@thinkhuge.net> Message-ID: On Wed, 27 Mar 2019 at 21:47, wrote: > Hello Amar and list, > > > > I wanted to follow-up to confirm that upgrading to 5.5 seem to fix the > ?Transport endpoint is not connected failures? for us. > > > > We did not have any of these failures in this past weekend backups cycle. > > > > Thank you very much for fixing whatever was the problem. > > > > I also removed some volume config options. One or more of the settings > was contributing to the slow directory listing. > Hi Brandon, Which options were removed? Thanks, Nithya > > > Here is our current volume info. > > > > [root at lonbaknode3 ~]# gluster volume info > > > > Volume Name: volbackups > > Type: Distribute > > Volume ID: 32bf4fe9-5450-49f8-b6aa-05471d3bdffa > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 8 > > Transport-type: tcp > > Bricks: > > Brick1: lonbaknode3.domain.net:/lvbackups/brick > > Brick2: lonbaknode4.domain.net:/lvbackups/brick > > Brick3: lonbaknode5.domain.net:/lvbackups/brick > > Brick4: lonbaknode6.domain.net:/lvbackups/brick > > Brick5: lonbaknode7.domain.net:/lvbackups/brick > > Brick6: lonbaknode8.domain.net:/lvbackups/brick > > Brick7: lonbaknode9.domain.net:/lvbackups/brick > > Brick8: lonbaknode10.domain.net:/lvbackups/brick > > Options Reconfigured: > > performance.io-thread-count: 32 > > performance.client-io-threads: on > > client.event-threads: 8 > > diagnostics.brick-sys-log-level: WARNING > > diagnostics.brick-log-level: WARNING > > performance.cache-max-file-size: 2MB > > performance.cache-size: 256MB > > cluster.min-free-disk: 1% > > nfs.disable: on > > transport.address-family: inet > > server.event-threads: 8 > > [root at lonbaknode3 ~]# > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From jahernan at redhat.com Thu Mar 28 09:07:33 2019 From: jahernan at redhat.com (Xavi Hernandez) Date: Thu, 28 Mar 2019 10:07:33 +0100 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks In-Reply-To: References: Message-ID: On Thu, Mar 28, 2019 at 3:05 AM Raghavendra Gowdappa wrote: > > > On Wed, Mar 27, 2019 at 8:38 PM Xavi Hernandez > wrote: > >> On Wed, Mar 27, 2019 at 2:20 PM Pranith Kumar Karampuri < >> pkarampu at redhat.com> wrote: >> >>> >>> >>> On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez >>> wrote: >>> >>>> On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri < >>>> pkarampu at redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez >>>>> wrote: >>>>> >>>>>> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa < >>>>>> rgowdapp at redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Raghavendra, >>>>>>>> >>>>>>>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa < >>>>>>>> rgowdapp at redhat.com> wrote: >>>>>>>> >>>>>>>>> All, >>>>>>>>> >>>>>>>>> Glusterfs cleans up POSIX locks held on an fd when the >>>>>>>>> client/mount through which those locks are held disconnects from >>>>>>>>> bricks/server. This helps Glusterfs to not run into a stale lock problem >>>>>>>>> later (For eg., if application unlocks while the connection was still >>>>>>>>> down). However, this means the lock is no longer exclusive as other >>>>>>>>> applications/clients can acquire the same lock. To communicate that locks >>>>>>>>> are no longer valid, we are planning to mark the fd (which has POSIX locks) >>>>>>>>> bad on a disconnect so that any future operations on that fd will fail, >>>>>>>>> forcing the application to re-open the fd and re-acquire locks it needs [1]. >>>>>>>>> >>>>>>>> >>>>>>>> Wouldn't it be better to retake the locks when the brick is >>>>>>>> reconnected if the lock is still in use ? >>>>>>>> >>>>>>> >>>>>>> There is also a possibility that clients may never reconnect. >>>>>>> That's the primary reason why bricks assume the worst (client will not >>>>>>> reconnect) and cleanup the locks. >>>>>>> >>>>>> >>>>>> True, so it's fine to cleanup the locks. I'm not saying that locks >>>>>> shouldn't be released on disconnect. The assumption is that if the client >>>>>> has really died, it will also disconnect from other bricks, who will >>>>>> release the locks. So, eventually, another client will have enough quorum >>>>>> to attempt a lock that will succeed. In other words, if a client gets >>>>>> disconnected from too many bricks simultaneously (loses Quorum), then that >>>>>> client can be considered as bad and can return errors to the application. >>>>>> This should also cause to release the locks on the remaining connected >>>>>> bricks. >>>>>> >>>>>> On the other hand, if the disconnection is very short and the client >>>>>> has not died, it will keep enough locked files (it has quorum) to avoid >>>>>> other clients to successfully acquire a lock. In this case, if the brick is >>>>>> reconnected, all existing locks should be reacquired to recover the >>>>>> original state before the disconnection. >>>>>> >>>>>> >>>>>>> >>>>>>>> BTW, the referenced bug is not public. Should we open another bug >>>>>>>> to track this ? >>>>>>>> >>>>>>> >>>>>>> I've just opened up the comment to give enough context. I'll open a >>>>>>> bug upstream too. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Note that with AFR/replicate in picture we can prevent errors to >>>>>>>>> application as long as Quorum number of children "never ever" lost >>>>>>>>> connection with bricks after locks have been acquired. I am using the term >>>>>>>>> "never ever" as locks are not healed back after re-connection and hence >>>>>>>>> first disconnect would've marked the fd bad and the fd remains so even >>>>>>>>> after re-connection happens. So, its not just Quorum number of children >>>>>>>>> "currently online", but Quorum number of children "never having >>>>>>>>> disconnected with bricks after locks are acquired". >>>>>>>>> >>>>>>>> >>>>>>>> I think this requisite is not feasible. In a distributed file >>>>>>>> system, sooner or later all bricks will be disconnected. It could be >>>>>>>> because of failures or because an upgrade is done, but it will happen. >>>>>>>> >>>>>>>> The difference here is how long are fd's kept open. If applications >>>>>>>> open and close files frequently enough (i.e. the fd is not kept open more >>>>>>>> time than it takes to have more than Quorum bricks disconnected) then >>>>>>>> there's no problem. The problem can only appear on applications that open >>>>>>>> files for a long time and also use posix locks. In this case, the only good >>>>>>>> solution I see is to retake the locks on brick reconnection. >>>>>>>> >>>>>>> >>>>>>> Agree. But lock-healing should be done only by HA layers like AFR/EC >>>>>>> as only they know whether there are enough online bricks to have prevented >>>>>>> any conflicting lock. Protocol/client itself doesn't have enough >>>>>>> information to do that. If its a plain distribute, I don't see a way to >>>>>>> heal locks without loosing the property of exclusivity of locks. >>>>>>> >>>>>> >>>>>> Lock-healing of locks acquired while a brick was disconnected need to >>>>>> be handled by AFR/EC. However, locks already present at the moment of >>>>>> disconnection could be recovered by client xlator itself as long as the >>>>>> file has not been closed (which client xlator already knows). >>>>>> >>>>> >>>>> What if another client (say mount-2) took locks at the time of >>>>> disconnect from mount-1 and modified the file and unlocked? client xlator >>>>> doing the heal may not be a good idea. >>>>> >>>> >>>> To avoid that we should ensure that any lock/unlocks are sent to the >>>> client, even if we know it's disconnected, so that client xlator can track >>>> them. The alternative is to duplicate and maintain code both on AFR and EC >>>> (and not sure if even in DHT depending on how we want to handle some >>>> cases). >>>> >>> >>> Didn't understand the solution. I wanted to highlight that client xlator >>> by itself can't make a decision about healing locks because it doesn't know >>> what happened on other replicas. If we have replica-3 volume and all 3 >>> bricks get disconnected to their respective bricks. Now another mount >>> process can take a lock on that file modify it and unlock. Now upon >>> reconnection, the old mount process which had locks would think it always >>> had the lock if client xlator independently tries to heal its own locks >>> because file is not closed on it so far. But that is wrong. Let me know if >>> it makes sense.... >>> >> >> My point of view is that any configuration with these requirements will >> have an appropriate quorum value so that it's impossible to have two or >> more partitions of the nodes working at the same time. So, under this >> assumptions, mount-1 can be in two situations: >> >> 1. It has lost a single brick and it's still operational. The other >> bricks will continue locked and everything should work fine from the point >> of view of the application. Any other application trying to get a lock will >> fail due to lack of quorum. When the lost brick comes back and is >> reconnected, client xlator will still have the fd reference and locks taken >> (unless the application has released the lock or closed the fd, in which >> case client xlator should get notified and clear that information), so it >> should be able to recover the previous state. >> >> 2. It has lost 2 or 3 bricks. In this case mount-1 has lost quorum and >> any operation going to that file should fail with EIO. AFR should send a >> special request to client xlator so that it forgets any fd's and locks for >> that file. If bricks reconnect after that, no fd reopen or lock recovery >> will happen. Eventually the application should close the fd and retry >> later. This may succeed to not, depending on whether mount-2 has taken the >> lock already or not. >> >> So, it's true that client xlator doesn't know the state of the other >> bricks, but it doesn't need to as long as AFR/EC strictly enforces quorum >> and updates client xlator when quorum is lost. >> > > Just curious. Is there any reason why you think delegating the actual > responsibility of re-opening or forgetting the locks to protocol/client is > better when compared to AFR/EC doing the actual work of re-opening files > and reacquiring locks? Asking this because, in the case of plain > distribute, DHT will also have to indicate Quorum loss on every disconnect > (as Quorum consisted of just 1 brick). > The basic reason is that doing that on AFR and EC requires code duplication. The code is not expected to be simple either, so it can contain bugs or it could require improvements eventually. Every time we want to do a change, we should fix both AFR and EC, but this has not happened in many cases in the past on features that are already duplicated in AFR and EC, so it's quite unlikely that this will happen in the future. Regarding the requirement of sending a quorum loss notification from DHT, I agree it's a new thing, but it's way simpler to do than the fd and lock heal logic. Xavi > From what I understand, the design is the same one which me, Pranith, > Anoop and Vijay had discussed (in essence) but varies in implementation > details. > > >> I haven't worked out all the details of this approach, but I think it >> should work and it's simpler to maintain than trying to do the same for AFR >> and EC. >> >> Xavi >> >> >>> >>>> A similar thing could be done for open fd, since the current solution >>>> duplicates code in AFR and EC, but this is another topic... >>>> >>>> >>>>> >>>>>> >>>>>> Xavi >>>>>> >>>>>> >>>>>>> What I proposed is a short term solution. mid to long term solution >>>>>>> should be lock healing feature implemented in AFR/EC. In fact I had this >>>>>>> conversation with +Karampuri, Pranith before >>>>>>> posting this msg to ML. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> However, this use case is not affected if the application don't >>>>>>>>> acquire any POSIX locks. So, I am interested in knowing >>>>>>>>> * whether your use cases use POSIX locks? >>>>>>>>> * Is it feasible for your application to re-open fds and >>>>>>>>> re-acquire locks on seeing EBADFD errors? >>>>>>>>> >>>>>>>> >>>>>>>> I think that many applications are not prepared to handle that. >>>>>>>> >>>>>>> >>>>>>> I too suspected that and in fact not too happy with the solution. >>>>>>> But went ahead with this mail as I heard implementing lock-heal in AFR >>>>>>> will take time and hence there are no alternative short term solutions. >>>>>>> >>>>>> >>>>>>> >>>>>>>> Xavi >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 >>>>>>>>> >>>>>>>>> regards, >>>>>>>>> Raghavendra >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Gluster-users mailing list >>>>>>>>> Gluster-users at gluster.org >>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>>>> >>>>> >>>>> -- >>>>> Pranith >>>>> >>>> >>> >>> -- >>> Pranith >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Thu Mar 28 09:18:12 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Thu, 28 Mar 2019 14:48:12 +0530 Subject: [Gluster-users] POSIX locks and disconnections between clients and bricks In-Reply-To: References: Message-ID: On Thu, Mar 28, 2019 at 2:37 PM Xavi Hernandez wrote: > On Thu, Mar 28, 2019 at 3:05 AM Raghavendra Gowdappa > wrote: > >> >> >> On Wed, Mar 27, 2019 at 8:38 PM Xavi Hernandez >> wrote: >> >>> On Wed, Mar 27, 2019 at 2:20 PM Pranith Kumar Karampuri < >>> pkarampu at redhat.com> wrote: >>> >>>> >>>> >>>> On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez >>>> wrote: >>>> >>>>> On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri < >>>>> pkarampu at redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez >>>>>> wrote: >>>>>> >>>>>>> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa < >>>>>>> rgowdapp at redhat.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez < >>>>>>>> jahernan at redhat.com> wrote: >>>>>>>> >>>>>>>>> Hi Raghavendra, >>>>>>>>> >>>>>>>>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa < >>>>>>>>> rgowdapp at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> All, >>>>>>>>>> >>>>>>>>>> Glusterfs cleans up POSIX locks held on an fd when the >>>>>>>>>> client/mount through which those locks are held disconnects from >>>>>>>>>> bricks/server. This helps Glusterfs to not run into a stale lock problem >>>>>>>>>> later (For eg., if application unlocks while the connection was still >>>>>>>>>> down). However, this means the lock is no longer exclusive as other >>>>>>>>>> applications/clients can acquire the same lock. To communicate that locks >>>>>>>>>> are no longer valid, we are planning to mark the fd (which has POSIX locks) >>>>>>>>>> bad on a disconnect so that any future operations on that fd will fail, >>>>>>>>>> forcing the application to re-open the fd and re-acquire locks it needs [1]. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Wouldn't it be better to retake the locks when the brick is >>>>>>>>> reconnected if the lock is still in use ? >>>>>>>>> >>>>>>>> >>>>>>>> There is also a possibility that clients may never reconnect. >>>>>>>> That's the primary reason why bricks assume the worst (client will not >>>>>>>> reconnect) and cleanup the locks. >>>>>>>> >>>>>>> >>>>>>> True, so it's fine to cleanup the locks. I'm not saying that locks >>>>>>> shouldn't be released on disconnect. The assumption is that if the client >>>>>>> has really died, it will also disconnect from other bricks, who will >>>>>>> release the locks. So, eventually, another client will have enough quorum >>>>>>> to attempt a lock that will succeed. In other words, if a client gets >>>>>>> disconnected from too many bricks simultaneously (loses Quorum), then that >>>>>>> client can be considered as bad and can return errors to the application. >>>>>>> This should also cause to release the locks on the remaining connected >>>>>>> bricks. >>>>>>> >>>>>>> On the other hand, if the disconnection is very short and the client >>>>>>> has not died, it will keep enough locked files (it has quorum) to avoid >>>>>>> other clients to successfully acquire a lock. In this case, if the brick is >>>>>>> reconnected, all existing locks should be reacquired to recover the >>>>>>> original state before the disconnection. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> BTW, the referenced bug is not public. Should we open another bug >>>>>>>>> to track this ? >>>>>>>>> >>>>>>>> >>>>>>>> I've just opened up the comment to give enough context. I'll open a >>>>>>>> bug upstream too. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Note that with AFR/replicate in picture we can prevent errors to >>>>>>>>>> application as long as Quorum number of children "never ever" lost >>>>>>>>>> connection with bricks after locks have been acquired. I am using the term >>>>>>>>>> "never ever" as locks are not healed back after re-connection and hence >>>>>>>>>> first disconnect would've marked the fd bad and the fd remains so even >>>>>>>>>> after re-connection happens. So, its not just Quorum number of children >>>>>>>>>> "currently online", but Quorum number of children "never having >>>>>>>>>> disconnected with bricks after locks are acquired". >>>>>>>>>> >>>>>>>>> >>>>>>>>> I think this requisite is not feasible. In a distributed file >>>>>>>>> system, sooner or later all bricks will be disconnected. It could be >>>>>>>>> because of failures or because an upgrade is done, but it will happen. >>>>>>>>> >>>>>>>>> The difference here is how long are fd's kept open. If >>>>>>>>> applications open and close files frequently enough (i.e. the fd is not >>>>>>>>> kept open more time than it takes to have more than Quorum bricks >>>>>>>>> disconnected) then there's no problem. The problem can only appear on >>>>>>>>> applications that open files for a long time and also use posix locks. In >>>>>>>>> this case, the only good solution I see is to retake the locks on brick >>>>>>>>> reconnection. >>>>>>>>> >>>>>>>> >>>>>>>> Agree. But lock-healing should be done only by HA layers like >>>>>>>> AFR/EC as only they know whether there are enough online bricks to have >>>>>>>> prevented any conflicting lock. Protocol/client itself doesn't have enough >>>>>>>> information to do that. If its a plain distribute, I don't see a way to >>>>>>>> heal locks without loosing the property of exclusivity of locks. >>>>>>>> >>>>>>> >>>>>>> Lock-healing of locks acquired while a brick was disconnected need >>>>>>> to be handled by AFR/EC. However, locks already present at the moment of >>>>>>> disconnection could be recovered by client xlator itself as long as the >>>>>>> file has not been closed (which client xlator already knows). >>>>>>> >>>>>> >>>>>> What if another client (say mount-2) took locks at the time of >>>>>> disconnect from mount-1 and modified the file and unlocked? client xlator >>>>>> doing the heal may not be a good idea. >>>>>> >>>>> >>>>> To avoid that we should ensure that any lock/unlocks are sent to the >>>>> client, even if we know it's disconnected, so that client xlator can track >>>>> them. The alternative is to duplicate and maintain code both on AFR and EC >>>>> (and not sure if even in DHT depending on how we want to handle some >>>>> cases). >>>>> >>>> >>>> Didn't understand the solution. I wanted to highlight that client >>>> xlator by itself can't make a decision about healing locks because it >>>> doesn't know what happened on other replicas. If we have replica-3 volume >>>> and all 3 bricks get disconnected to their respective bricks. Now another >>>> mount process can take a lock on that file modify it and unlock. Now upon >>>> reconnection, the old mount process which had locks would think it always >>>> had the lock if client xlator independently tries to heal its own locks >>>> because file is not closed on it so far. But that is wrong. Let me know if >>>> it makes sense.... >>>> >>> >>> My point of view is that any configuration with these requirements will >>> have an appropriate quorum value so that it's impossible to have two or >>> more partitions of the nodes working at the same time. So, under this >>> assumptions, mount-1 can be in two situations: >>> >>> 1. It has lost a single brick and it's still operational. The other >>> bricks will continue locked and everything should work fine from the point >>> of view of the application. Any other application trying to get a lock will >>> fail due to lack of quorum. When the lost brick comes back and is >>> reconnected, client xlator will still have the fd reference and locks taken >>> (unless the application has released the lock or closed the fd, in which >>> case client xlator should get notified and clear that information), so it >>> should be able to recover the previous state. >>> >>> 2. It has lost 2 or 3 bricks. In this case mount-1 has lost quorum and >>> any operation going to that file should fail with EIO. AFR should send a >>> special request to client xlator so that it forgets any fd's and locks for >>> that file. If bricks reconnect after that, no fd reopen or lock recovery >>> will happen. Eventually the application should close the fd and retry >>> later. This may succeed to not, depending on whether mount-2 has taken the >>> lock already or not. >>> >>> So, it's true that client xlator doesn't know the state of the other >>> bricks, but it doesn't need to as long as AFR/EC strictly enforces quorum >>> and updates client xlator when quorum is lost. >>> >> >> Just curious. Is there any reason why you think delegating the actual >> responsibility of re-opening or forgetting the locks to protocol/client is >> better when compared to AFR/EC doing the actual work of re-opening files >> and reacquiring locks? Asking this because, in the case of plain >> distribute, DHT will also have to indicate Quorum loss on every disconnect >> (as Quorum consisted of just 1 brick). >> > > The basic reason is that doing that on AFR and EC requires code > duplication. The code is not expected to be simple either, so it can > contain bugs or it could require improvements eventually. Every time we > want to do a change, we should fix both AFR and EC, but this has not > happened in many cases in the past on features that are already duplicated > in AFR and EC, so it's quite unlikely that this will happen in the future. > That's a good reason. +1. > > Regarding the requirement of sending a quorum loss notification from DHT, > I agree it's a new thing, but it's way simpler to do than the fd and lock > heal logic. > > Xavi > > >> From what I understand, the design is the same one which me, Pranith, >> Anoop and Vijay had discussed (in essence) but varies in implementation >> details. >> >> >>> I haven't worked out all the details of this approach, but I think it >>> should work and it's simpler to maintain than trying to do the same for AFR >>> and EC. >>> >>> Xavi >>> >>> >>>> >>>>> A similar thing could be done for open fd, since the current solution >>>>> duplicates code in AFR and EC, but this is another topic... >>>>> >>>>> >>>>>> >>>>>>> >>>>>>> Xavi >>>>>>> >>>>>>> >>>>>>>> What I proposed is a short term solution. mid to long term solution >>>>>>>> should be lock healing feature implemented in AFR/EC. In fact I had this >>>>>>>> conversation with +Karampuri, Pranith before >>>>>>>> posting this msg to ML. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> However, this use case is not affected if the application don't >>>>>>>>>> acquire any POSIX locks. So, I am interested in knowing >>>>>>>>>> * whether your use cases use POSIX locks? >>>>>>>>>> * Is it feasible for your application to re-open fds and >>>>>>>>>> re-acquire locks on seeing EBADFD errors? >>>>>>>>>> >>>>>>>>> >>>>>>>>> I think that many applications are not prepared to handle that. >>>>>>>>> >>>>>>>> >>>>>>>> I too suspected that and in fact not too happy with the solution. >>>>>>>> But went ahead with this mail as I heard implementing lock-heal in AFR >>>>>>>> will take time and hence there are no alternative short term solutions. >>>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> Xavi >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 >>>>>>>>>> >>>>>>>>>> regards, >>>>>>>>>> Raghavendra >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Gluster-users mailing list >>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>> >>>>>>>>> >>>>>> >>>>>> -- >>>>>> Pranith >>>>>> >>>>> >>>> >>>> -- >>>> Pranith >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Thu Mar 28 09:38:16 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Thu, 28 Mar 2019 15:08:16 +0530 Subject: [Gluster-users] Prioritise local bricks for IO? In-Reply-To: References: <29221907.583.1553599314586.JavaMail.zimbra@li.nux.ro> Message-ID: On Wed, 27 Mar 2019 at 20:27, Poornima Gurusiddaiah wrote: > This feature is not under active development as it was not used widely. > AFAIK its not supported feature. > +Nithya +Raghavendra for further clarifications. > This is not actively supported - there has been no work done on this feature for a long time. Regards, Nithya > > Regards, > Poornima > > On Wed, Mar 27, 2019 at 12:33 PM Lucian wrote: > >> Oh, that's just what the doctor ordered! >> Hope it works, thanks >> >> On 27 March 2019 03:15:57 GMT, Vlad Kopylov wrote: >>> >>> I don't remember if it still in works >>> NUFA >>> >>> https://github.com/gluster/glusterfs-specs/blob/master/done/Features/nufa.md >>> >>> v >>> >>> On Tue, Mar 26, 2019 at 7:27 AM Nux! wrote: >>> >>>> Hello, >>>> >>>> I'm trying to set up a distributed backup storage (no replicas), but >>>> I'd like to prioritise the local bricks for any IO done on the volume. >>>> This will be a backup stor, so in other words, I'd like the files to be >>>> written locally if there is space, so as to save the NICs for other traffic. >>>> >>>> Anyone knows how this might be achievable, if at all? >>>> >>>> -- >>>> Sent from the Delta quadrant using Borg technology! >>>> >>>> Nux! >>>> www.nux.ro >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>> >> -- >> Sent from my Android device with K-9 Mail. Please excuse my brevity. >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From skoduri at redhat.com Thu Mar 28 09:53:38 2019 From: skoduri at redhat.com (Soumya Koduri) Date: Thu, 28 Mar 2019 15:23:38 +0530 Subject: [Gluster-users] glusterfs 4.1.7 + nfs-ganesha 2.7.1 freeze during write In-Reply-To: References: <1FBA8430-F957-40B3-8422-2E0D25265B68@gmail.com> <22FDC703-87F4-472D-8229-9B26F440FAB1@gmail.com> Message-ID: On 2/8/19 11:53 AM, Soumya Koduri wrote: > > > On 2/8/19 3:20 AM, Maurits Lamers wrote: >> Hi, >> >>> >>>> [2019-02-07 10:11:24.812606] E [MSGID: 104055] >>>> [glfs-fops.c:4955:glfs_cbk_upcall_data] 0-gfapi: Synctak for Upcall >>>> event_type(1) and gfid(y???? >>>> ?????????Mz???SL4_@) failed >>>> [2019-02-07 10:11:24.819376] E [MSGID: 104055] >>>> [glfs-fops.c:4955:glfs_cbk_upcall_data] 0-gfapi: Synctak for Upcall >>>> event_type(1) and gfid(eTn?EU?H.>>> [2019-02-07 10:11:24.833299] E [MSGID: 104055] >>>> [glfs-fops.c:4955:glfs_cbk_upcall_data] 0-gfapi: Synctak for Upcall >>>> event_type(1) and gfid(g?L??F??0b??k) failed >>>> [2019-02-07 10:25:01.642509] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-gv0-client-2: >>>> server [node1]:49152 has not responded in the last 42 seconds, >>>> disconnecting. >>>> [2019-02-07 10:25:01.642805] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-gv0-client-1: >>>> server [node2]:49152 has not responded in the last 42 seconds, >>>> disconnecting. >>>> [2019-02-07 10:25:01.642946] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-gv0-client-4: >>>> server [node3]:49152 has not responded in the last 42 seconds, >>>> disconnecting. >>>> [2019-02-07 10:25:02.643120] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-gv0-client-3: >>>> server 127.0.1.1:49152 has not responded in the last 42 seconds, >>>> disconnecting. >>>> [2019-02-07 10:25:02.643314] C >>>> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-gv0-client-0: >>>> server [node4]:49152 has not responded in the last 42 seconds, >>>> disconnecting. >>> >>> Strange that synctask failed. Could you please turn off >>> features.cache-invalidation volume option and check if the issue >>> still persists. >>>> >> >> Turning the cache invalidation option off seems to have solved the >> freeze. Still testing, but it looks promising. >> > > If thats the case, please turn on cache invalidation option back and > collect couple of stack traces (using gstack) when the system freezes > again. FYI - Have got a chance to reproduce and RCA the issue [1]. Posted fix for review in the upstream [2] Thanks, Soumya [1] https://bugzilla.redhat.com/show_bug.cgi?id=1693575 [2] https://review.gluster.org/22436 > > Thanks, > Soumya >> cheers >> >> Maurits >> > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From skoduri at redhat.com Thu Mar 28 10:35:10 2019 From: skoduri at redhat.com (Soumya Koduri) Date: Thu, 28 Mar 2019 16:05:10 +0530 Subject: [Gluster-users] Gluster GEO replication fault after write over nfs-ganesha In-Reply-To: References: Message-ID: <1a5fb44e-fc3b-4edb-28ee-baa4ed077251@redhat.com> On 3/27/19 7:39 PM, Alexey Talikov wrote: > I have two clusters with dispersed volumes (2+1) with GEO replication > It works fine till I use glusterfs-fuse, but as even one file written > over nfs-ganesha replication goes to Fault and recovers after I remove > this file (sometimes after stop/start) > I think nfs-hanesha writes file in some way that produces problem with > replication > I am not much familiar with geo-rep and not sure what/why exactly failed here. Request Kotresh (cc'ed) to take a look and provide his insights on the issue. Thanks, Soumya > |OSError: [Errno 61] No data available: > '.gfid/9c9514ce-a310-4a1c-a87b-a800a32a99f8' | > > but if I check over glusterfs mounted with aux-gfid-mount > > |getfattr -n trusted.glusterfs.pathinfo -e text > /mnt/TEST/.gfid/9c9514ce-a310-4a1c-a87b-a800a32a99f8 getfattr: Removing > leading '/' from absolute path names # file: > mnt/TEST/.gfid/9c9514ce-a310-4a1c-a87b-a800a32a99f8 > trusted.glusterfs.pathinfo="( ( > ))" | > > File exists > Details available here https://github.com/nfs-ganesha/nfs-ganesha/issues/408 > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > From mauryam at gmail.com Thu Mar 28 10:46:55 2019 From: mauryam at gmail.com (Maurya M) Date: Thu, 28 Mar 2019 16:16:55 +0530 Subject: [Gluster-users] Geo-replication status always on 'Created' In-Reply-To: <526b85c223325f79256dd7d991c6340a7e40ba14.camel@redhat.com> References: <5342e4c8e5bff06a22edbc6be704e3f10bd67a4e.camel@redhat.com> <47ea47b7c4709d16677c7086fe683203bdd1662e.camel@redhat.com> <526b85c223325f79256dd7d991c6340a7e40ba14.camel@redhat.com> Message-ID: Hi, In my glusterd.log i am seeing this error message , is this related to the patch i applied? or do i need to open a new thread? I [MSGID: 106327] [glusterd-geo-rep.c:4483:glusterd_read_status_file] 0-management: Using passed config template(/var/lib/glusterd/geo-replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a730578e45ed9d51b9a80df6c33f/gsyncd.conf). [2019-03-28 10:39:29.493554] E [MSGID: 106293] [glusterd-geo-rep.c:679:glusterd_query_extutil_generic] 0-management: reading data from child failed [2019-03-28 10:39:29.493589] E [MSGID: 106305] [glusterd-geo-rep.c:4377:glusterd_fetch_values_from_config] 0-management: Unable to get configuration data for vol_75a5fd373d88ba687f591f3353fa05cf(master), 172.16.201.35: :vol_e783a730578e45ed9d51b9a80df6c33f(slave) [2019-03-28 10:39:29.493617] E [MSGID: 106328] [glusterd-geo-rep.c:4517:glusterd_read_status_file] 0-management: Unable to fetch config values for vol_75a5fd373d88ba687f591f3353fa05cf(master), 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f(slave). Trying default config template [2019-03-28 10:39:29.553846] E [MSGID: 106328] [glusterd-geo-rep.c:4525:glusterd_read_status_file] 0-management: Unable to fetch config values for vol_75a5fd373d88ba687f591f3353fa05cf(master), 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f(slave) [2019-03-28 10:39:29.553836] E [MSGID: 106293] [glusterd-geo-rep.c:679:glusterd_query_extutil_generic] 0-management: reading data from child failed [2019-03-28 10:39:29.553844] E [MSGID: 106305] [glusterd-geo-rep.c:4377:glusterd_fetch_values_from_config] 0-management: Unable to get configuration data for vol_75a5fd373d88ba687f591f3353fa05cf(master), 172.16.201.35: :vol_e783a730578e45ed9d51b9a80df6c33f(slave) also while do a status call, i am not seeing one of the nodes which was reporting 'Passive' before ( did not change any configuration ) , any ideas how to troubleshoot this? thanks for your help. Maurya On Tue, Mar 26, 2019 at 8:34 PM Aravinda wrote: > Please check error message in gsyncd.log file in > /var/log/glusterfs/geo-replication/ > > On Tue, 2019-03-26 at 19:44 +0530, Maurya M wrote: > > Hi Arvind, > > Have patched my setup with your fix: re-run the setup, but this time > > getting a different error where it failed to commit the ssh-port on > > my other 2 nodes on the master cluster, so manually copied the : > > [vars] > > ssh-port = 2222 > > > > into gsyncd.conf > > > > and status reported back is as shown below : Any ideas how to > > troubleshoot this? > > > > MASTER NODE MASTER VOL MASTER > > BRICK > > SLAVE USER SLAVE > > SLAVE NODE STATUS > > CRAWL STATUS LAST_SYNCED > > ------------------------------------------------------------------- > > ------------------------------------------------------------------- > > ------------------------------------------------------------------- > > ------------------------------------------------------------------- > > -------------------------- > > 172.16.189.4 vol_75a5fd373d88ba687f591f3353fa05cf > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_116f > > b9427fb26f752d9ba8e45e183cb1/brick root > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f 172.16.201.4 > > Passive N/A N/A > > 172.16.189.35 vol_75a5fd373d88ba687f591f3353fa05cf > > /var/lib/heketi/mounts/vg_05708751110fe60b3e7da15bdcf6d4d4/brick_266b > > b08f0d466d346f8c0b19569736fb/brick root > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > Faulty N/A N/A > > 172.16.189.66 vol_75a5fd373d88ba687f591f3353fa05cf > > /var/lib/heketi/mounts/vg_4b92a2b687e59b7311055d3809b77c06/brick_dfa4 > > 4c9380cdedac708e27e2c2a443a0/brick root > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f N/A > > Initializing... N/A N/A > > > > > > > > > > On Tue, Mar 26, 2019 at 1:40 PM Aravinda wrote: > > > I got chance to investigate this issue further and identified a > > > issue > > > with Geo-replication config set and sent patch to fix the same. > > > > > > BUG: https://bugzilla.redhat.com/show_bug.cgi?id=1692666 > > > Patch: https://review.gluster.org/22418 > > > > > > On Mon, 2019-03-25 at 15:37 +0530, Maurya M wrote: > > > > ran this command : ssh -p 2222 -i /var/lib/glusterd/geo- > > > > replication/secret.pem root@gluster volume info -- > > > xml > > > > > > > > attaching the output. > > > > > > > > > > > > > > > > On Mon, Mar 25, 2019 at 2:13 PM Aravinda > > > wrote: > > > > > Geo-rep is running `ssh -i /var/lib/glusterd/geo- > > > > > replication/secret.pem > > > > > root@ gluster volume info --xml` and parsing its > > > output. > > > > > Please try to to run the command from the same node and let us > > > know > > > > > the > > > > > output. > > > > > > > > > > > > > > > On Mon, 2019-03-25 at 11:43 +0530, Maurya M wrote: > > > > > > Now the error is on the same line 860 : as highlighted below: > > > > > > > > > > > > [2019-03-25 06:11:52.376238] E > > > > > > [syncdutils(monitor):332:log_raise_exception] : FAIL: > > > > > > Traceback (most recent call last): > > > > > > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", > > > line > > > > > > 311, in main > > > > > > func(args) > > > > > > File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", > > > > > line > > > > > > 50, in subcmd_monitor > > > > > > return monitor.monitor(local, remote) > > > > > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > > > line > > > > > > 427, in monitor > > > > > > return Monitor().multiplex(*distribute(local, remote)) > > > > > > File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > > > line > > > > > > 386, in distribute > > > > > > svol = Volinfo(slave.volume, "localhost", prelude) > > > > > > File > > > "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", > > > > > line > > > > > > 860, in __init__ > > > > > > vi = XET.fromstring(vix) > > > > > > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line > > > > > 1300, in > > > > > > XML > > > > > > parser.feed(text) > > > > > > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line > > > > > 1642, in > > > > > > feed > > > > > > self._raiseerror(v) > > > > > > File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line > > > > > 1506, in > > > > > > _raiseerror > > > > > > raise err > > > > > > ParseError: syntax error: line 1, column 0 > > > > > > > > > > > > > > > > > > On Mon, Mar 25, 2019 at 11:29 AM Maurya M > > > > > wrote: > > > > > > > Sorry my bad, had put the print line to debug, i am using > > > > > gluster > > > > > > > 4.1.7, will remove the print line. > > > > > > > > > > > > > > On Mon, Mar 25, 2019 at 10:52 AM Aravinda < > > > avishwan at redhat.com> > > > > > > > wrote: > > > > > > > > Below print statement looks wrong. Latest Glusterfs code > > > > > doesn't > > > > > > > > have > > > > > > > > this print statement. Please let us know which version of > > > > > > > > glusterfs you > > > > > > > > are using. > > > > > > > > > > > > > > > > > > > > > > > > ``` > > > > > > > > File > > > > > "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", > > > > > > > > line > > > > > > > > 860, in __init__ > > > > > > > > print "debug varible " %vix > > > > > > > > ``` > > > > > > > > > > > > > > > > As a workaround, edit that file and comment the print > > > line > > > > > and > > > > > > > > test the > > > > > > > > geo-rep config command. > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 2019-03-25 at 09:46 +0530, Maurya M wrote: > > > > > > > > > hi Aravinda, > > > > > > > > > had the session created using : create ssh-port 2222 > > > push- > > > > > pem > > > > > > > > and > > > > > > > > > also the : > > > > > > > > > > > > > > > > > > gluster volume geo-replication > > > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > > config > > > > > ssh- > > > > > > > > port > > > > > > > > > 2222 > > > > > > > > > > > > > > > > > > hitting this message: > > > > > > > > > geo-replication config-set failed for > > > > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > > > > > > > > geo-replication command failed > > > > > > > > > > > > > > > > > > Below is snap of status: > > > > > > > > > > > > > > > > > > [root at k8s-agentpool1-24779565-1 > > > > > > > > > > > > > > > > > > > > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vol_e783a73057 > > > > > > > > 8e45ed9d51b9a80df6c33f]# gluster volume geo-replication > > > > > > > > vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > > status > > > > > > > > > > > > > > > > > > MASTER NODE MASTER VOL > > > > > > > > MASTER > > > > > > > > > BRICK > > > > > > > > > > > > > > > > > > > > > > > > > SLAVE USER SLAVE > > > > > > > > > > > > > > > > > > > > > > > > > SLAVE NODE STATUS > > > > > CRAWL > > > > > > > > STATUS > > > > > > > > > LAST_SYNCED > > > > > > > > > ----------------------------------------------------- > > > ---- > > > > > ---- > > > > > > > > ------ > > > > > > > > > ----------------------------------------------------- > > > ---- > > > > > ---- > > > > > > > > ------ > > > > > > > > > ----------------------------------------------------- > > > ---- > > > > > ---- > > > > > > > > ------ > > > > > > > > > ----------------------------------------------------- > > > ---- > > > > > ---- > > > > > > > > ------ > > > > > > > > > ---------------- > > > > > > > > > 172.16.189.4 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > > > > > > > > > > > > > > > > > > > > /var/lib/heketi/mounts/vg_aee3df7b0bb2451bc00a73358c5196a2/brick_ > > > > > > > > 116f > > > > > > > > > b9427fb26f752d9ba8e45e183cb1/brick root > > > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > > N/A > > > > > > > > > > > > > > > > > > > > > > Created N/A N/A > > > > > > > > > 172.16.189.35 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > > > > > > > > > > > > > > > > > > > > /var/lib/heketi/mounts/vg_05708751110fe60b3e7da15bdcf6d4d4/brick_ > > > > > > > > 266b > > > > > > > > > b08f0d466d346f8c0b19569736fb/brick root > > > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > > N/A > > > > > > > > > > > > > > > > > > > > > > Created N/A N/A > > > > > > > > > 172.16.189.66 vol_75a5fd373d88ba687f591f3353fa05cf > > > > > > > > > > > > > > > > > > > > > > > > > > > > /var/lib/heketi/mounts/vg_4b92a2b687e59b7311055d3809b77c06/brick_ > > > > > > > > dfa4 > > > > > > > > > 4c9380cdedac708e27e2c2a443a0/brick root > > > > > > > > > 172.16.201.35::vol_e783a730578e45ed9d51b9a80df6c33f > > > N/A > > > > > > > > > > > > > > > > > > > > > > Created N/A N/A > > > > > > > > > > > > > > > > > > any ideas ? where can find logs for the failed commands > > > > > check > > > > > > > > in > > > > > > > > > gysncd.log , the trace is as below: > > > > > > > > > > > > > > > > > > [2019-03-25 04:04:42.295043] I > > > [gsyncd(monitor):297:main] > > > > > > > > : > > > > > > > > > Using session config file > > > path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > > l_e7 > > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > > [2019-03-25 04:04:42.387192] E > > > > > > > > > [syncdutils(monitor):332:log_raise_exception] : > > > FAIL: > > > > > > > > > Traceback (most recent call last): > > > > > > > > > File > > > > > "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", > > > > > > > > line > > > > > > > > > 311, in main > > > > > > > > > func(args) > > > > > > > > > File > > > > > "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", > > > > > > > > line > > > > > > > > > 50, in subcmd_monitor > > > > > > > > > return monitor.monitor(local, remote) > > > > > > > > > File > > > > > "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > > > > > > line > > > > > > > > > 427, in monitor > > > > > > > > > return Monitor().multiplex(*distribute(local, > > > remote)) > > > > > > > > > File > > > > > "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", > > > > > > > > line > > > > > > > > > 370, in distribute > > > > > > > > > mvol = Volinfo(master.volume, master.host) > > > > > > > > > File > > > > > > > > "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", > > > > > line > > > > > > > > > 860, in __init__ > > > > > > > > > print "debug varible " %vix > > > > > > > > > TypeError: not all arguments converted during string > > > > > formatting > > > > > > > > > [2019-03-25 04:04:48.997519] I [gsyncd(config- > > > > > get):297:main] > > > > > > > > : > > > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > > l_e7 > > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > > [2019-03-25 04:04:49.93528] I [gsyncd(status):297:main] > > > > > : > > > > > > > > Using > > > > > > > > > session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > > l_e7 > > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > > [2019-03-25 04:08:07.194348] I [gsyncd(config- > > > > > get):297:main] > > > > > > > > : > > > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > > l_e7 > > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > > [2019-03-25 04:08:07.262588] I [gsyncd(config- > > > > > get):297:main] > > > > > > > > : > > > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > > l_e7 > > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > > [2019-03-25 04:08:07.550080] I [gsyncd(config- > > > > > get):297:main] > > > > > > > > : > > > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > > l_e7 > > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > > [2019-03-25 04:08:18.933028] I [gsyncd(config- > > > > > get):297:main] > > > > > > > > : > > > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > > l_e7 > > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > > [2019-03-25 04:08:19.25285] I [gsyncd(status):297:main] > > > > > : > > > > > > > > Using > > > > > > > > > session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > > l_e7 > > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > > [2019-03-25 04:09:15.766882] I [gsyncd(config- > > > > > get):297:main] > > > > > > > > : > > > > > > > > > Using session config file path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > > l_e7 > > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > > [2019-03-25 04:09:16.30267] I [gsyncd(config- > > > get):297:main] > > > > > > > > : > > > > > > > > > Using session config file > > > path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > > l_e7 > > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > > [2019-03-25 04:09:16.89006] I [gsyncd(config- > > > set):297:main] > > > > > > > > : > > > > > > > > > Using session config file > > > path=/var/lib/glusterd/geo- > > > > > > > > > > > > > > > > > > > > > > > > > replication/vol_75a5fd373d88ba687f591f3353fa05cf_172.16.201.35_vo > > > > > > > > l_e7 > > > > > > > > > 83a730578e45ed9d51b9a80df6c33f/gsyncd.conf > > > > > > > > > > > > > > > > > > regards, > > > > > > > > > Maurya > > > > > > > > > > > > > > > > > > On Mon, Mar 25, 2019 at 9:08 AM Aravinda < > > > > > avishwan at redhat.com> > > > > > > > > wrote: > > > > > > > > > > Use `ssh-port ` while creating the Geo-rep > > > session > > > > > > > > > > > > > > > > > > > > Ref: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/#creating-the-session > > > > > > > > > > > > > > > > > > > > And set the ssh-port option before start. > > > > > > > > > > > > > > > > > > > > ``` > > > > > > > > > > gluster volume geo-replication \ > > > > > > > > > > [@]:: > > > config > > > > > > > > > > ssh-port 2222 > > > > > > > > > > ``` > > > > > > > > > > > -- > regards > Aravinda > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sankarshan.mukhopadhyay at gmail.com Thu Mar 28 14:24:35 2019 From: sankarshan.mukhopadhyay at gmail.com (Sankarshan Mukhopadhyay) Date: Thu, 28 Mar 2019 19:54:35 +0530 Subject: [Gluster-users] [Event CfP Announce] DevConf events India and US in the month of August 2019 Message-ID: 2 editions of DevConf have their CfPs open [1] DevConf India : https://devconf.info/in (event dates 02, 03 Aug 2019, Bengaluru) [2] DevConf USA : https://devconf.info/us/ (event dates 15 -17 Aug, 2019, Boston) The DevConf events are well curated to get a good mix of developers and users. This note is to raise awareness and encourage submission of talks around Gluster, containerized storage and similar. From nbalacha at redhat.com Fri Mar 29 04:16:46 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Fri, 29 Mar 2019 09:46:46 +0530 Subject: [Gluster-users] Inconsistent issues with a client In-Reply-To: References: Message-ID: Hi, If you know which directories are problematic, please check and see if the permissions on them are correct on the individual bricks. Please also provide the following: - *gluster volume info* for the volume - The gluster version you are running regards, Nithya On Wed, 27 Mar 2019 at 19:10, Tami Greene wrote: > The system is a 5 server, 20 brick distributed system with a hardware > configured RAID 6 underneath with xfs as filesystem. This client is a data > collection node which transfers data to specific directories within one of > the gluster volumes. > > > > I have a client with submounted directories (glustervolume/project) rather > than the entire volume. Some files can be transferred no problem, but > others send an error about transport endpoint not connected. The transfer > is handed by a rsync script triggered as a cron job. > > > > When remotely connected to this client, user access to these files does > not always behave as they are set ? 2770 for directories and 440. Owners > are not always able to move the files, processes ran as the owners are not > always able to move files; root is not always allowed to move or delete > these file. > > > > This process seemed to worked smoothly before adding another server and 4 > storage bricks to the volume, logs indicate there were intermittent issues > at least a month before the last server was added. While a new collection > device has been streaming to this one machine, the issue started the day > before. > > > > Is there another level for permissions and ownership that I am not aware > of that needs to be sync?d? > > > -- > Tami > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Fri Mar 29 05:29:07 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Fri, 29 Mar 2019 10:59:07 +0530 Subject: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters In-Reply-To: References: Message-ID: +Gluster-users Sorry about the delay. There is nothing suspicious about per thread CPU utilization of glusterfs process. However looking at the volume profile attached I see huge number of lookups. I think if we cutdown the number of lookups probably we'll see improvements in performance. I need following information: * dump of fuse traffic under heavy load (use --dump-fuse option while mounting) * client volume profile for the duration of heavy load - https://docs.gluster.org/en/latest/Administrator%20Guide/Performance%20Testing/ * corresponding brick volume profile Basically I need to find out * whether these lookups are on existing files or non-existent files * whether they are on directories or files * why/whether md-cache or kernel attribute cache or nl-cache will help to cut down lookups. regards, Raghavendra On Mon, Mar 25, 2019 at 12:13 PM Hu Bert wrote: > Hi Raghavendra, > > sorry, this took a while. The last weeks the weather was bad -> less > traffic, but this weekend there was a massive peak. I made 3 profiles > with top, but at first look there's nothing special here. > > I also made a gluster profile (on one of the servers) at a later > moment. Maybe that helps. I also added some munin graphics from 2 of > the clients and 1 graphic of server network, just to show how massive > the problem is. > > Just wondering if the high io wait is related to the high network > traffic bug (https://bugzilla.redhat.com/show_bug.cgi?id=1673058); if > so, i could deactivate performance.quick-read and check if there is > less iowait. If that helps: wonderful - and yearningly awaiting > updated packages (e.g. v5.6). If not: maybe we have to switch from our > normal 10TB hdds (raid10) to SSDs if the problem is based on slow > hardware in the use case of small files (images). > > > Thx, > Hubert > > Am Mo., 4. M?rz 2019 um 16:59 Uhr schrieb Raghavendra Gowdappa > : > > > > Were you seeing high Io-wait when you captured the top output? I guess > not as you mentioned the load increases during weekend. Please note that > this data has to be captured when you are experiencing problems. > > > > On Mon, Mar 4, 2019 at 8:02 PM Hu Bert wrote: > >> > >> Hi, > >> sending the link directly to you and not the list, you can distribute > >> if necessary. the command ran for about half a minute. Is that enough? > >> More? Less? > >> > >> https://download.outdooractive.com/top.output.tar.gz > >> > >> Am Mo., 4. M?rz 2019 um 15:21 Uhr schrieb Raghavendra Gowdappa > >> : > >> > > >> > > >> > > >> > On Mon, Mar 4, 2019 at 7:47 PM Raghavendra Gowdappa < > rgowdapp at redhat.com> wrote: > >> >> > >> >> > >> >> > >> >> On Mon, Mar 4, 2019 at 4:26 PM Hu Bert > wrote: > >> >>> > >> >>> Hi Raghavendra, > >> >>> > >> >>> at the moment iowait and cpu consumption is quite low, the main > >> >>> problems appear during the weekend (high traffic, especially on > >> >>> sunday), so either we have to wait until next sunday or use a time > >> >>> machine ;-) > >> >>> > >> >>> I made a screenshot of top (https://abload.de/img/top-hvvjt2.jpg) > and > >> >>> a text output (https://pastebin.com/TkTWnqxt), maybe that helps. > Seems > >> >>> like processes like glfs_fuseproc (>204h) and glfs_epoll (64h for > each > >> >>> process) consume a lot of CPU (uptime 24 days). Is that already > >> >>> helpful? > >> >> > >> >> > >> >> Not much. The TIME field just says the amount of time the thread has > been executing. Since its a long standing mount, we can expect such large > values. But, the value itself doesn't indicate whether the thread itself > was overloaded at any (some) interval(s). > >> >> > >> >> Can you please collect output of following command and send back the > collected data? > >> >> > >> >> # top -bHd 3 > top.output > >> > > >> > > >> > Please collect this on problematic mounts and bricks. > >> > > >> >> > >> >>> > >> >>> > >> >>> Hubert > >> >>> > >> >>> Am Mo., 4. M?rz 2019 um 11:31 Uhr schrieb Raghavendra Gowdappa > >> >>> : > >> >>> > > >> >>> > what is the per thread CPU usage like on these clients? With > highly concurrent workloads we've seen single thread that reads requests > from /dev/fuse (fuse reader thread) becoming bottleneck. Would like to know > what is the cpu usage of this thread looks like (you can use top -H). > >> >>> > > >> >>> > On Mon, Mar 4, 2019 at 3:39 PM Hu Bert > wrote: > >> >>> >> > >> >>> >> Good morning, > >> >>> >> > >> >>> >> we use gluster v5.3 (replicate with 3 servers, 2 volumes, raid10 > as > >> >>> >> brick) with at the moment 10 clients; 3 of them do heavy I/O > >> >>> >> operations (apache tomcats, read+write of (small) images). These > 3 > >> >>> >> clients have a quite high I/O wait (stats from yesterday) as can > be > >> >>> >> seen here: > >> >>> >> > >> >>> >> client: https://abload.de/img/client1-cpu-dayulkza.png > >> >>> >> server: https://abload.de/img/server1-cpu-dayayjdq.png > >> >>> >> > >> >>> >> The iowait in the graphics differ a lot. I checked netstat for > the > >> >>> >> different clients; the other clients have 8 open connections: > >> >>> >> https://pastebin.com/bSN5fXwc > >> >>> >> > >> >>> >> 4 for each server and each volume. The 3 clients with the heavy > I/O > >> >>> >> have (at the moment) according to netstat 170, 139 and 153 > >> >>> >> connections. An example for one client can be found here: > >> >>> >> https://pastebin.com/2zfWXASZ > >> >>> >> > >> >>> >> gluster volume info: https://pastebin.com/13LXPhmd > >> >>> >> gluster volume status: https://pastebin.com/cYFnWjUJ > >> >>> >> > >> >>> >> I just was wondering if the iowait is based on the clients and > their > >> >>> >> workflow: requesting a lot of files (up to hundreds per second), > >> >>> >> opening a lot of connections and the servers aren't able to > answer > >> >>> >> properly. Maybe something can be tuned here? > >> >>> >> > >> >>> >> Especially the server|client.event-threads (both set to 4) and > >> >>> >> performance.(high|normal|low|least)-prio-threads (all at default > value > >> >>> >> 16) and performance.io-thread-count (32) options, maybe these > aren't > >> >>> >> properly configured for up to 170 client connections. > >> >>> >> > >> >>> >> Both servers and clients have a Xeon CPU (6 cores, 12 threads), > a 10 > >> >>> >> GBit connection and 128G (servers) respectively 256G (clients) > RAM. > >> >>> >> Enough power :-) > >> >>> >> > >> >>> >> > >> >>> >> Thx for reading && best regards, > >> >>> >> > >> >>> >> Hubert > >> >>> >> _______________________________________________ > >> >>> >> Gluster-users mailing list > >> >>> >> Gluster-users at gluster.org > >> >>> >> https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgowtham at redhat.com Fri Mar 29 06:12:33 2019 From: hgowtham at redhat.com (Hari Gowtham) Date: Fri, 29 Mar 2019 11:42:33 +0530 Subject: [Gluster-users] Upgrade testing to gluster 6 Message-ID: Hello Gluster users, As you all aware that glusterfs-6 is out, we would like to inform you that, we have spent a significant amount of time in testing glusterfs-6 in upgrade scenarios. We have done upgrade testing to glusterfs-6 from various releases like 3.12, 4.1 and 5.3. As glusterfs-6 has got in a lot of changes, we wanted to test those portions. There were xlators (and respective options to enable/disable them) added and deprecated in glusterfs-6 from various versions [1]. We had to check the following upgrade scenarios for all such options Identified in [1]: 1) option never enabled and upgraded 2) option enabled and then upgraded 3) option enabled and then disabled and then upgraded We weren't manually able to check all the combinations for all the options. So the options involving enabling and disabling xlators were prioritized. The below are the result of the ones tested. Never enabled and upgraded: checked from 3.12, 4.1, 5.3 to 6 the upgrade works. Enabled and upgraded: Tested for tier which is deprecated, It is not a recommended upgrade. As expected the volume won't be consumable and will have a few more issues as well. Tested with 3.12, 4.1 and 5.3 to 6 upgrade. Enabled, disabled before upgrade. Tested for tier with 3.12 and the upgrade went fine. There is one common issue to note in every upgrade. The node being upgraded is going into disconnected state. You have to flush the iptables and the restart glusterd on all nodes to fix this. The testing for enabling new options is still pending. The new options won't cause as much issues as the deprecated ones so this was put at the end of the priority list. It would be nice to get contributions for this. For the disable testing, tier was used as it covers most of the xlator that was removed. And all of these tests were done on a replica 3 volume. Note: This is only for upgrade testing of the newly added and removed xlators. Does not involve the normal tests for the xlator. If you have any questions, please feel free to reach us. [1] https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing Regards, Hari and Sanju. From revirii at googlemail.com Fri Mar 29 06:47:30 2019 From: revirii at googlemail.com (Hu Bert) Date: Fri, 29 Mar 2019 07:47:30 +0100 Subject: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters In-Reply-To: References: Message-ID: Hi Raghavendra, i'll try to gather the information you need, hopefully this weekend. One thing i've done this week: deactivate performance.quick-read (https://bugzilla.redhat.com/show_bug.cgi?id=1673058), which (according to munin) ended in a massive drop in network traffic and a slightly lower iowait. Maybe that has helped already. We'll see. performance.nl-cache is deactivated due to unreadable files/directories; we have a highly concurrent workload. There are some nginx backend webservers that check if a requested file exists in the glusterfs filesystem; i counted the log entries, this can be up to 5 million entries a day; about 2/3 of the files are found in the filesystem, they get delivered to the frontend; if not: the nginx's send the request via round robin to 3 backend tomcats, and they have to check whether a directory exists or not (and then create it and the requested files). So it happens that tomcatA creates a directory and a file in it, and within (milli)seconds tomcatB+C create additional files in this dir. Deactivating nl-cache helped to solve this issue, after having conversation with Nithya and Ravishankar. Just wanted to explain that. Thx so far, Hubert Am Fr., 29. M?rz 2019 um 06:29 Uhr schrieb Raghavendra Gowdappa : > > +Gluster-users > > Sorry about the delay. There is nothing suspicious about per thread CPU utilization of glusterfs process. However looking at the volume profile attached I see huge number of lookups. I think if we cutdown the number of lookups probably we'll see improvements in performance. I need following information: > > * dump of fuse traffic under heavy load (use --dump-fuse option while mounting) > * client volume profile for the duration of heavy load - https://docs.gluster.org/en/latest/Administrator%20Guide/Performance%20Testing/ > * corresponding brick volume profile > > Basically I need to find out > * whether these lookups are on existing files or non-existent files > * whether they are on directories or files > * why/whether md-cache or kernel attribute cache or nl-cache will help to cut down lookups. > > regards, > Raghavendra > > On Mon, Mar 25, 2019 at 12:13 PM Hu Bert wrote: >> >> Hi Raghavendra, >> >> sorry, this took a while. The last weeks the weather was bad -> less >> traffic, but this weekend there was a massive peak. I made 3 profiles >> with top, but at first look there's nothing special here. >> >> I also made a gluster profile (on one of the servers) at a later >> moment. Maybe that helps. I also added some munin graphics from 2 of >> the clients and 1 graphic of server network, just to show how massive >> the problem is. >> >> Just wondering if the high io wait is related to the high network >> traffic bug (https://bugzilla.redhat.com/show_bug.cgi?id=1673058); if >> so, i could deactivate performance.quick-read and check if there is >> less iowait. If that helps: wonderful - and yearningly awaiting >> updated packages (e.g. v5.6). If not: maybe we have to switch from our >> normal 10TB hdds (raid10) to SSDs if the problem is based on slow >> hardware in the use case of small files (images). >> >> >> Thx, >> Hubert >> >> Am Mo., 4. M?rz 2019 um 16:59 Uhr schrieb Raghavendra Gowdappa >> : >> > >> > Were you seeing high Io-wait when you captured the top output? I guess not as you mentioned the load increases during weekend. Please note that this data has to be captured when you are experiencing problems. >> > >> > On Mon, Mar 4, 2019 at 8:02 PM Hu Bert wrote: >> >> >> >> Hi, >> >> sending the link directly to you and not the list, you can distribute >> >> if necessary. the command ran for about half a minute. Is that enough? >> >> More? Less? >> >> >> >> https://download.outdooractive.com/top.output.tar.gz >> >> >> >> Am Mo., 4. M?rz 2019 um 15:21 Uhr schrieb Raghavendra Gowdappa >> >> : >> >> > >> >> > >> >> > >> >> > On Mon, Mar 4, 2019 at 7:47 PM Raghavendra Gowdappa wrote: >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Mar 4, 2019 at 4:26 PM Hu Bert wrote: >> >> >>> >> >> >>> Hi Raghavendra, >> >> >>> >> >> >>> at the moment iowait and cpu consumption is quite low, the main >> >> >>> problems appear during the weekend (high traffic, especially on >> >> >>> sunday), so either we have to wait until next sunday or use a time >> >> >>> machine ;-) >> >> >>> >> >> >>> I made a screenshot of top (https://abload.de/img/top-hvvjt2.jpg) and >> >> >>> a text output (https://pastebin.com/TkTWnqxt), maybe that helps. Seems >> >> >>> like processes like glfs_fuseproc (>204h) and glfs_epoll (64h for each >> >> >>> process) consume a lot of CPU (uptime 24 days). Is that already >> >> >>> helpful? >> >> >> >> >> >> >> >> >> Not much. The TIME field just says the amount of time the thread has been executing. Since its a long standing mount, we can expect such large values. But, the value itself doesn't indicate whether the thread itself was overloaded at any (some) interval(s). >> >> >> >> >> >> Can you please collect output of following command and send back the collected data? >> >> >> >> >> >> # top -bHd 3 > top.output >> >> > >> >> > >> >> > Please collect this on problematic mounts and bricks. >> >> > >> >> >> >> >> >>> >> >> >>> >> >> >>> Hubert >> >> >>> >> >> >>> Am Mo., 4. M?rz 2019 um 11:31 Uhr schrieb Raghavendra Gowdappa >> >> >>> : >> >> >>> > >> >> >>> > what is the per thread CPU usage like on these clients? With highly concurrent workloads we've seen single thread that reads requests from /dev/fuse (fuse reader thread) becoming bottleneck. Would like to know what is the cpu usage of this thread looks like (you can use top -H). >> >> >>> > >> >> >>> > On Mon, Mar 4, 2019 at 3:39 PM Hu Bert wrote: >> >> >>> >> >> >> >>> >> Good morning, >> >> >>> >> >> >> >>> >> we use gluster v5.3 (replicate with 3 servers, 2 volumes, raid10 as >> >> >>> >> brick) with at the moment 10 clients; 3 of them do heavy I/O >> >> >>> >> operations (apache tomcats, read+write of (small) images). These 3 >> >> >>> >> clients have a quite high I/O wait (stats from yesterday) as can be >> >> >>> >> seen here: >> >> >>> >> >> >> >>> >> client: https://abload.de/img/client1-cpu-dayulkza.png >> >> >>> >> server: https://abload.de/img/server1-cpu-dayayjdq.png >> >> >>> >> >> >> >>> >> The iowait in the graphics differ a lot. I checked netstat for the >> >> >>> >> different clients; the other clients have 8 open connections: >> >> >>> >> https://pastebin.com/bSN5fXwc >> >> >>> >> >> >> >>> >> 4 for each server and each volume. The 3 clients with the heavy I/O >> >> >>> >> have (at the moment) according to netstat 170, 139 and 153 >> >> >>> >> connections. An example for one client can be found here: >> >> >>> >> https://pastebin.com/2zfWXASZ >> >> >>> >> >> >> >>> >> gluster volume info: https://pastebin.com/13LXPhmd >> >> >>> >> gluster volume status: https://pastebin.com/cYFnWjUJ >> >> >>> >> >> >> >>> >> I just was wondering if the iowait is based on the clients and their >> >> >>> >> workflow: requesting a lot of files (up to hundreds per second), >> >> >>> >> opening a lot of connections and the servers aren't able to answer >> >> >>> >> properly. Maybe something can be tuned here? >> >> >>> >> >> >> >>> >> Especially the server|client.event-threads (both set to 4) and >> >> >>> >> performance.(high|normal|low|least)-prio-threads (all at default value >> >> >>> >> 16) and performance.io-thread-count (32) options, maybe these aren't >> >> >>> >> properly configured for up to 170 client connections. >> >> >>> >> >> >> >>> >> Both servers and clients have a Xeon CPU (6 cores, 12 threads), a 10 >> >> >>> >> GBit connection and 128G (servers) respectively 256G (clients) RAM. >> >> >>> >> Enough power :-) >> >> >>> >> >> >> >>> >> >> >> >>> >> Thx for reading && best regards, >> >> >>> >> >> >> >>> >> Hubert >> >> >>> >> _______________________________________________ >> >> >>> >> Gluster-users mailing list >> >> >>> >> Gluster-users at gluster.org >> >> >>> >> https://lists.gluster.org/mailman/listinfo/gluster-users From kdhananj at redhat.com Fri Mar 29 07:16:33 2019 From: kdhananj at redhat.com (Krutika Dhananjay) Date: Fri, 29 Mar 2019 12:46:33 +0530 Subject: [Gluster-users] [ovirt-users] Re: Announcing Gluster release 5.5 In-Reply-To: <20190328164716.27693.35887@mail.ovirt.org> References: <20190328164716.27693.35887@mail.ovirt.org> Message-ID: Questions/comments inline ... On Thu, Mar 28, 2019 at 10:18 PM wrote: > Dear All, > > I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While > previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a > different experience. After first trying a test upgrade on a 3 node setup, > which went fine. i headed to upgrade the 9 node production platform, > unaware of the backward compatibility issues between gluster 3.12.15 -> > 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. > Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata > was missing or couldn't be accessed. Restoring this file by getting a good > copy of the underlying bricks, removing the file from the underlying bricks > where the file was 0 bytes and mark with the stickybit, and the > corresponding gfid's. Removing the file from the mount point, and copying > back the file on the mount point. Manually mounting the engine domain, and > manually creating the corresponding symbolic links in /rhev/data-center and > /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was > root.root), i was able to start the HA engine again. Since the engine was > up again, and things seemed rather unstable i decided to continue the > upgrade on the other nodes suspecting an incompatibility in gluster > versions, i thought would be best to have them all on the same version > rather soonish. However things went from bad to worse, the engine stopped > again, and all vm?s stopped working as well. So on a machine outside the > setup and restored a backup of the engine taken from version 4.2.8 just > before the upgrade. With this engine I was at least able to start some vm?s > again, and finalize the upgrade. Once the upgraded, things didn?t stabilize > and also lose 2 vm?s during the process due to image corruption. After > figuring out gluster 5.3 had quite some issues I was as lucky to see > gluster 5.5 was about to be released, on the moment the RPM?s were > available I?ve installed those. This helped a lot in terms of stability, > for which I?m very grateful! However the performance is unfortunate > terrible, it?s about 15% of what the performance was running gluster > 3.12.15. It?s strange since a simple dd shows ok performance, but our > actual workload doesn?t. While I would expect the performance to be better, > due to all improvements made since gluster version 3.12. Does anybody share > the same experience? > I really hope gluster 6 will soon be tested with ovirt and released, and > things start to perform and stabilize again..like the good old days. Of > course when I can do anything, I?m happy to help. > > I think the following short list of issues we have after the migration; > Gluster 5.5; > - Poor performance for our workload (mostly write dependent) > For this, could you share the volume-profile output specifically for the affected volume(s)? Here's what you need to do - 1. # gluster volume profile $VOLNAME stop 2. # gluster volume profile $VOLNAME start 3. Run the test inside the vm wherein you see bad performance 4. # gluster volume profile $VOLNAME info # save the output of this command into a file 5. # gluster volume profile $VOLNAME stop 6. and attach the output file gotten in step 4 - VM?s randomly pause on un > known storage errors, which are ?stale file?s?. corresponding log; Lookup > on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 > [Stale file handle] > Could you share the complete gluster client log file (it would be a filename matching the pattern rhev-data-center-mnt-glusterSD-*) Also the output of `gluster volume info $VOLNAME` > - Some files are listed twice in a directory (probably related the > stale file issue?) > Example; > ls -la > /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ > total 3081 > drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . > drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. > -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 > 1a7cf259-6b29-421d-9688-b25dfaafb13c > -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 > 1a7cf259-6b29-421d-9688-b25dfaafb13c > -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 > 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease > -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 > 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta > -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 > 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta > Adding DHT and readdir-ahead maintainers regarding entries getting listed twice. @Nithya Balachandran ^^ @Gowdappa, Raghavendra ^^ @Poornima Gurusiddaiah ^^ > > - brick processes sometimes starts multiple times. Sometimes I?ve 5 brick > processes for a single volume. Killing all glusterfsd?s for the volume on > the machine and running gluster v start force usually just starts one > after the event, from then on things look all right. > Did you mean 5 brick processes for a single brick directory? +Mohit Agrawal ^^ -Krutika > Ovirt 4.3.2.1-1.el7 > - All vms images ownership are changed to root.root after the vm is > shutdown, probably related to; > https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only scoped > to the HA engine. I?m still in compatibility mode 4.2 for the cluster and > for the vm?s, but upgraded to version ovirt 4.3.2 > - The network provider is set to ovn, which is fine..actually cool, > only the ?ovs-vswitchd? is a CPU hog, and utilizes 100% > - It seems on all nodes vdsm tries to get the the stats for the HA > engine, which is filling the logs with (not sure if this is new); > [api.virt] FINISH getStats return={'status': {'message': "Virtual machine > does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", 'code': > 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 (api:54) > - It seems the package os_brick [root] managedvolume not supported: > Managed Volume Not Supported. Missing package os-brick.: ('Cannot import > os_brick',) (caps:149) which fills the vdsm.log, but for this I also saw > another message, so I suspect this will already be resolved shortly > - The machine I used to run the backup HA engine, doesn?t want to > get removed from the hosted-engine ?vm-status, not even after running; > hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine > --clean-metadata --force-clean from the machine itself. > > Think that's about it. > > Don?t get me wrong, I don?t want to rant, I just wanted to share my > experience and see where things can made better. > > > Best Olaf > _______________________________________________ > Users mailing list -- users at ovirt.org > To unsubscribe send an email to users-leave at ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users at ovirt.org/message/3CO35Q7VZMWNHS4LPUJNO7S47MGLSKS5/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amukherj at redhat.com Fri Mar 29 07:34:16 2019 From: amukherj at redhat.com (Atin Mukherjee) Date: Fri, 29 Mar 2019 13:04:16 +0530 Subject: [Gluster-users] [ovirt-users] Re: Announcing Gluster release 5.5 In-Reply-To: References: <20190328164716.27693.35887@mail.ovirt.org> Message-ID: On Fri, Mar 29, 2019 at 12:47 PM Krutika Dhananjay wrote: > Questions/comments inline ... > > On Thu, Mar 28, 2019 at 10:18 PM wrote: > >> Dear All, >> >> I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While >> previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a >> different experience. After first trying a test upgrade on a 3 node setup, >> which went fine. i headed to upgrade the 9 node production platform, >> unaware of the backward compatibility issues between gluster 3.12.15 -> >> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. >> Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata >> was missing or couldn't be accessed. Restoring this file by getting a good >> copy of the underlying bricks, removing the file from the underlying bricks >> where the file was 0 bytes and mark with the stickybit, and the >> corresponding gfid's. Removing the file from the mount point, and copying >> back the file on the mount point. Manually mounting the engine domain, and >> manually creating the corresponding symbolic links in /rhev/data-center and >> /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was >> root.root), i was able to start the HA engine again. Since the engine was >> up again, and things seemed rather unstable i decided to continue the >> upgrade on the other nodes suspecting an incompatibility in gluster >> versions, i thought would be best to have them all on the same version >> rather soonish. However things went from bad to worse, the engine stopped >> again, and all vm?s stopped working as well. So on a machine outside the >> setup and restored a backup of the engine taken from version 4.2.8 just >> before the upgrade. With this engine I was at least able to start some vm?s >> again, and finalize the upgrade. Once the upgraded, things didn?t stabilize >> and also lose 2 vm?s during the process due to image corruption. After >> figuring out gluster 5.3 had quite some issues I was as lucky to see >> gluster 5.5 was about to be released, on the moment the RPM?s were >> available I?ve installed those. This helped a lot in terms of stability, >> for which I?m very grateful! However the performance is unfortunate >> terrible, it?s about 15% of what the performance was running gluster >> 3.12.15. It?s strange since a simple dd shows ok performance, but our >> actual workload doesn?t. While I would expect the performance to be better, >> due to all improvements made since gluster version 3.12. Does anybody share >> the same experience? >> I really hope gluster 6 will soon be tested with ovirt and released, and >> things start to perform and stabilize again..like the good old days. Of >> course when I can do anything, I?m happy to help. >> >> I think the following short list of issues we have after the migration; >> Gluster 5.5; >> - Poor performance for our workload (mostly write dependent) >> > > For this, could you share the volume-profile output specifically for the > affected volume(s)? Here's what you need to do - > > 1. # gluster volume profile $VOLNAME stop > 2. # gluster volume profile $VOLNAME start > 3. Run the test inside the vm wherein you see bad performance > 4. # gluster volume profile $VOLNAME info # save the output of this > command into a file > 5. # gluster volume profile $VOLNAME stop > 6. and attach the output file gotten in step 4 > > - VM?s randomly pause on un >> > known storage errors, which are ?stale file?s?. corresponding log; Lookup >> on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 >> [Stale file handle] >> > > Could you share the complete gluster client log file (it would be a > filename matching the pattern rhev-data-center-mnt-glusterSD-*) > Also the output of `gluster volume info $VOLNAME` > > > >> - Some files are listed twice in a directory (probably related the >> stale file issue?) >> Example; >> ls -la >> /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ >> total 3081 >> drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . >> drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. >> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >> 1a7cf259-6b29-421d-9688-b25dfaafb13c >> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >> 1a7cf259-6b29-421d-9688-b25dfaafb13c >> -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 >> 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease >> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >> > > Adding DHT and readdir-ahead maintainers regarding entries getting listed > twice. > @Nithya Balachandran ^^ > @Gowdappa, Raghavendra ^^ > @Poornima Gurusiddaiah ^^ > > >> >> - brick processes sometimes starts multiple times. Sometimes I?ve 5 brick >> processes for a single volume. Killing all glusterfsd?s for the volume on >> the machine and running gluster v start force usually just starts one >> after the event, from then on things look all right. >> > > Did you mean 5 brick processes for a single brick directory? > +Mohit Agrawal ^^ > Mohit - Could this be because of missing the following commit in release-5 branch? It might be worth to backport this fix. commit 66986594a9023c49e61b32769b7e6b260b600626 Author: Mohit Agrawal Date: Fri Mar 1 13:41:24 2019 +0530 glusterfsd: Multiple shd processes are spawned on brick_mux environment Problem: Multiple shd processes are spawned while starting volumes in the loop on brick_mux environment.glusterd spawn a process based on a pidfile and shd daemon is taking some time to update pid in pidfile due to that glusterd is not able to get shd pid Solution: Commit cd249f4cb783f8d79e79468c455732669e835a4f changed the code to update pidfile in parent for any gluster daemon after getting the status of forking child in parent.To resolve the same correct the condition update pidfile in parent only for glusterd and for rest of the daemon pidfile is updated in child Change-Id: Ifd14797fa949562594a285ec82d58384ad717e81 fixes: bz#1684404 Signed-off-by: Mohit Agrawal > > -Krutika > > >> Ovirt 4.3.2.1-1.el7 >> - All vms images ownership are changed to root.root after the vm is >> shutdown, probably related to; >> https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only scoped >> to the HA engine. I?m still in compatibility mode 4.2 for the cluster and >> for the vm?s, but upgraded to version ovirt 4.3.2 >> - The network provider is set to ovn, which is fine..actually cool, >> only the ?ovs-vswitchd? is a CPU hog, and utilizes 100% >> - It seems on all nodes vdsm tries to get the the stats for the HA >> engine, which is filling the logs with (not sure if this is new); >> [api.virt] FINISH getStats return={'status': {'message': "Virtual machine >> does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", 'code': >> 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 (api:54) >> - It seems the package os_brick [root] managedvolume not supported: >> Managed Volume Not Supported. Missing package os-brick.: ('Cannot import >> os_brick',) (caps:149) which fills the vdsm.log, but for this I also saw >> another message, so I suspect this will already be resolved shortly >> - The machine I used to run the backup HA engine, doesn?t want to >> get removed from the hosted-engine ?vm-status, not even after running; >> hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine >> --clean-metadata --force-clean from the machine itself. >> >> Think that's about it. >> >> Don?t get me wrong, I don?t want to rant, I just wanted to share my >> experience and see where things can made better. >> >> >> Best Olaf >> _______________________________________________ >> Users mailing list -- users at ovirt.org >> To unsubscribe send an email to users-leave at ovirt.org >> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/users at ovirt.org/message/3CO35Q7VZMWNHS4LPUJNO7S47MGLSKS5/ >> > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgowtham at redhat.com Fri Mar 29 10:07:54 2019 From: hgowtham at redhat.com (Hari Gowtham) Date: Fri, 29 Mar 2019 15:37:54 +0530 Subject: [Gluster-users] Upgrade testing to gluster 6 In-Reply-To: References: Message-ID: Hi, Have added a few more info that was missed earlier. The disconnect issue being minor we are working on it with a lower priority. But yes, it will be fixed soon. The bug to track this is: https://bugzilla.redhat.com/show_bug.cgi?id=1694010 The workaround to get over this if it happens is to, upgrade the nodes one after other to the latest version. Once the upgrade is done, 1) kill the glusterd process alone in all the nodes using the command "pkill glusterd" 2) then do a "iptables -F" to flush the iptables. 3) start glusterd using "glusterd" Note: users can use systemctl stop/start glusterd.service command as well instead of the above to kill and start glusterd. On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham wrote: > > Hello Gluster users, > > As you all aware that glusterfs-6 is out, we would like to inform you > that, we have spent a significant amount of time in testing > glusterfs-6 in upgrade scenarios. We have done upgrade testing to > glusterfs-6 from various releases like 3.12, 4.1 and 5.3. > > As glusterfs-6 has got in a lot of changes, we wanted to test those portions. > There were xlators (and respective options to enable/disable them) > added and deprecated in glusterfs-6 from various versions [1]. > > We had to check the following upgrade scenarios for all such options > Identified in [1]: > 1) option never enabled and upgraded > 2) option enabled and then upgraded > 3) option enabled and then disabled and then upgraded > > We weren't manually able to check all the combinations for all the options. > So the options involving enabling and disabling xlators were prioritized. > The below are the result of the ones tested. > > Never enabled and upgraded: > checked from 3.12, 4.1, 5.3 to 6 the upgrade works. > > Enabled and upgraded: > Tested for tier which is deprecated, It is not a recommended upgrade. > As expected the volume won't be consumable and will have a few more > issues as well. > Tested with 3.12, 4.1 and 5.3 to 6 upgrade. > > Enabled, disabled before upgrade. > Tested for tier with 3.12 and the upgrade went fine. > > There is one common issue to note in every upgrade. The node being > upgraded is going into disconnected state. You have to flush the iptables > and the restart glusterd on all nodes to fix this. > > The testing for enabling new options is still pending. The new options > won't cause as much issues as the deprecated ones so this was put at > the end of the priority list. It would be nice to get contributions > for this. > > For the disable testing, tier was used as it covers most of the xlator > that was removed. And all of these tests were done on a replica 3 volume. > > Note: This is only for upgrade testing of the newly added and removed > xlators. Does not involve the normal tests for the xlator. > > If you have any questions, please feel free to reach us. > > [1] https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing > > Regards, > Hari and Sanju. -- Regards, Hari Gowtham. From amukherj at redhat.com Fri Mar 29 13:41:35 2019 From: amukherj at redhat.com (Atin Mukherjee) Date: Fri, 29 Mar 2019 19:11:35 +0530 Subject: [Gluster-users] Quick update on glusterd's volume scalability improvements Message-ID: All, As many of you already know that the design logic with which GlusterD (here on to be referred as GD1) was implemented has some fundamental scalability bottlenecks at design level, especially around it's way of handshaking configuration meta data and replicating them across all the peers. While the initial design was adopted with a factor in mind that GD1 will have to deal with just few tens of nodes/peers and volumes, the magnitude of the scaling bottleneck this design can bring in was never realized and estimated. Ever since Gluster has been adopted in container storage land as one of the storage backends, the business needs have changed. From tens of volumes, the requirements have translated to hundreds and now to thousands. We introduced brick multiplexing which had given some relief to have a better control on the memory footprint when having many number of bricks/volumes hosted in the node, but this wasn't enough. In one of our (I represent Red Hat) customer's deployment we had seen on a 3 nodes cluster, whenever the number of volumes go beyond ~1500 and for some reason if one of the storage pods get rebooted, the overall time it takes to complete the overall handshaking (not only in a factor of n X n peer handshaking but also the number of volume iterations, building up the dictionary and sending it over the write) consumes a huge time as part of the handshaking process, the hard timeout of an rpc request which is 10 minutes gets expired and we see cluster going into a state where none of the cli commands go through and get stuck. With such problem being around and more demand of volume scalability, we started looking into these areas in GD1 to focus on improving (a) volume scalability (b) node scalability. While (b) is a separate topic for some other day we're going to focus on more on (a) today. While looking into this volume scalability problem with a deep dive, we realized that most of the bottleneck which was causing the overall delay in the friend handshaking and exchanging handshake packets between peers in the cluster was iterating over the in-memory data structures of the volumes, putting them into the dictionary sequentially. With 2k like volumes the function glusterd_add_volumes_to_export_dict () was quite costly and most time consuming. From pstack output when glusterd instance was restarted in one of the pods, we could always see that control was iterating in this function. Based on our testing on a 16 vCPU, 32 GB RAM 3 nodes cluster, this function itself took almost *7.5 minutes . *The bottleneck is primarily because of sequential iteration of volumes, sequentially updating the dictionary with lot of (un)necessary keys. So what we tried out was making this loop to work on a worker thread model so that multiple threads can process a range of volume list and not all of them so that we can get more parallelism within glusterd. But with that we still didn't see any improvement and the primary reason for that was our dictionary APIs need locking. So the next idea was to actually make threads work on multiple dictionaries and then once all the volumes are iterated the subsequent dictionaries to be merged into a single one. Along with these changes there are few other improvements done on skipping comparison of snapshots if there's no snap available, excluding tiering keys if the volume type is not tier. With this enhancement [1] we see the overall time it took to complete building up the dictionary from the in-memory structure is *2 minutes 18 seconds* which is close* ~3x* improvement. We firmly believe that with this improvement, we should be able to scale up to 2000 volumes on a 3 node cluster and that'd help our users to get benefited with supporting more PVCs/volumes. Patch [1] is still in testing and might undergo few minor changes. But we welcome you for review and comment on it. We plan to get this work completed, tested and release in glusterfs-7. Last but not the least, I'd like to give a shout to Mohit Agrawal (In cc) for all the work done on this for last few days. Thank you Mohit! [1] https://review.gluster.org/#/c/glusterfs/+/22445/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim.kinney at gmail.com Fri Mar 29 16:32:57 2019 From: jim.kinney at gmail.com (Jim Kinney) Date: Fri, 29 Mar 2019 12:32:57 -0400 Subject: [Gluster-users] upgrade best practices Message-ID: <629338fe8720f63420d43fa72cc7b080ba213a4c.camel@gmail.com> Currently running 3.12 on Centos 7.6. Doing cleanups on split-brain and out of sync, need heal files. We need to migrate the three replica servers to gluster v. 5 or 6. Also will need to upgrade about 80 clients as well. Given that a complete removal of gluster will not touch the 200+TB of data on 12 volumes, we are looking at doing that process, Stop all clients, stop all glusterd services, remove all of it, install new version, setup new volumes from old bricks, install new clients, mount everything. We would like to get some better performance from nfs-ganesha mounts but that doesn't look like an option (not done any parameter tweaks in testing yet). At a bare minimum, we would like to minimize the total downtime of all systems. Does this process make more sense than a version upgrade process to 4.1, then 5, then 6? What "gotcha's" do I need to be ready for? I have until late May to prep and test on old, slow hardware with a small amount of files and volumes. -- James P. Kinney III Every time you stop a school, you will have to build a jail. What you gain at one end you lose at the other. It's like feeding a dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark Twain http://heretothereideas.blogspot.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgurusid at redhat.com Fri Mar 29 17:09:01 2019 From: pgurusid at redhat.com (Poornima Gurusiddaiah) Date: Fri, 29 Mar 2019 22:39:01 +0530 Subject: [Gluster-users] upgrade best practices In-Reply-To: <629338fe8720f63420d43fa72cc7b080ba213a4c.camel@gmail.com> References: <629338fe8720f63420d43fa72cc7b080ba213a4c.camel@gmail.com> Message-ID: On Fri, Mar 29, 2019, 10:03 PM Jim Kinney wrote: > Currently running 3.12 on Centos 7.6. Doing cleanups on split-brain and > out of sync, need heal files. > > We need to migrate the three replica servers to gluster v. 5 or 6. Also > will need to upgrade about 80 clients as well. Given that a complete > removal of gluster will not touch the 200+TB of data on 12 volumes, we are > looking at doing that process, Stop all clients, stop all glusterd > services, remove all of it, install new version, setup new volumes from old > bricks, install new clients, mount everything. > > We would like to get some better performance from nfs-ganesha mounts but > that doesn't look like an option (not done any parameter tweaks in testing > yet). At a bare minimum, we would like to minimize the total downtime of > all systems. > > Does this process make more sense than a version upgrade process to 4.1, > then 5, then 6? What "gotcha's" do I need to be ready for? I have until > late May to prep and test on old, slow hardware with a small amount of > files and volumes. > You can directly upgrade from 3.12 to 6.x. I would suggest that rather than deleting and creating Gluster volume. +Hari and +Sanju for further guidelines on upgrade, as they recently did upgrade tests. +Soumya to add to the nfs-ganesha aspect. Regards, Poornima > -- > > James P. Kinney III Every time you stop a school, you will have to build a > jail. What you gain at one end you lose at the other. It's like feeding a > dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark > Twain http://heretothereideas.blogspot.com/ > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From brandon at thinkhuge.net Fri Mar 29 19:32:36 2019 From: brandon at thinkhuge.net (brandon at thinkhuge.net) Date: Fri, 29 Mar 2019 12:32:36 -0700 Subject: [Gluster-users] Transport endpoint is not connected failures in In-Reply-To: References: <009a01d4e4b8$731a8420$594f8c60$@thinkhuge.net> Message-ID: <405101d4e666$2a97ef80$7fc7ce80$@thinkhuge.net> Hello Nithya, I removed several options that I admit I didn't quite understand and I had added from Google searches. Was dumb for me to have added in the first place not understanding them. 1 of these options apparently was causing directory listing to be about 7 seconds vs when I cut down to more minimal volume settings 1-2 seconds. That is with about 14,000 files in the largest directory. Before: Options Reconfigured: transport.address-family: inet nfs.disable: on cluster.min-free-disk: 1% performance.cache-size: 8GB performance.cache-max-file-size: 128MB diagnostics.brick-log-level: WARNING diagnostics.brick-sys-log-level: WARNING client.event-threads: 3 performance.client-io-threads: on performance.io-thread-count: 24 network.inode-lru-limit: 1048576 performance.parallel-readdir: on performance.cache-invalidation: on performance.md-cache-timeout: 600 features.cache-invalidation: on features.cache-invalidation-timeout: 600 After: Options Reconfigured: performance.io-thread-count: 32 performance.client-io-threads: on client.event-threads: 8 diagnostics.brick-sys-log-level: WARNING diagnostics.brick-log-level: WARNING performance.cache-max-file-size: 2MB performance.cache-size: 256MB cluster.min-free-disk: 1% nfs.disable: on transport.address-family: inet server.event-threads: 8 ------ Hi Brandon, Which options were removed? Thanks, Nithya -------------- next part -------------- An HTML attachment was scrubbed... URL: From brandon at thinkhuge.net Fri Mar 29 19:48:08 2019 From: brandon at thinkhuge.net (brandon at thinkhuge.net) Date: Fri, 29 Mar 2019 12:48:08 -0700 Subject: [Gluster-users] Transport endpoint is not connected failures in In-Reply-To: References: <009a01d4e4b8$731a8420$594f8c60$@thinkhuge.net> Message-ID: <407701d4e668$5623a1b0$026ae510$@thinkhuge.net> Hello, Yes I did find some hits on this in the following logs. We started seeing failures after upgrading to 5.3 from 4.6. If you want me to check for something else let me know. Thank you all on the gluster team for finding and fixing that problem whatever it was! [root at lonbaknode3 glusterfs]# zgrep ping_timer /var/log/glusterfs/home-volbackups* .... /var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16 10:34:44.419605] C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired] 0-volbackups-client-3: server 1.2.3.4:49153 has not responded in the last 42 seconds, disconnecting. /var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16 10:34:44.419672] C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired] 0-volbackups-client-6: server 1.2.3.4:49153 has not responded in the last 42 seconds, disconnecting. /var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16 10:34:57.425211] C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired] 0-volbackups-client-9: server 1.2.3.4:49153 has not responded in the last 42 seconds, disconnecting. /var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16 11:46:25.768650] C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired] 0-volbackups-client-6: server 1.2.3.4:49153 has not responded in the last 42 seconds, disconnecting. /var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16 16:02:29.921450] C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired] 0-volbackups-client-3: server 1.2.3.4:49153 has not responded in the last 42 seconds, disconnecting. .... ----- What was the version you saw failures in? Were there any logs matching with the pattern "ping_timer_expired" earlier? -------------- next part -------------- An HTML attachment was scrubbed... URL: From vbellur at redhat.com Sat Mar 30 02:36:21 2019 From: vbellur at redhat.com (Vijay Bellur) Date: Fri, 29 Mar 2019 19:36:21 -0700 Subject: [Gluster-users] Quick update on glusterd's volume scalability improvements In-Reply-To: References: Message-ID: On Fri, Mar 29, 2019 at 6:42 AM Atin Mukherjee wrote: > All, > > As many of you already know that the design logic with which GlusterD > (here on to be referred as GD1) was implemented has some fundamental > scalability bottlenecks at design level, especially around it's way of > handshaking configuration meta data and replicating them across all the > peers. While the initial design was adopted with a factor in mind that GD1 > will have to deal with just few tens of nodes/peers and volumes, the > magnitude of the scaling bottleneck this design can bring in was never > realized and estimated. > > Ever since Gluster has been adopted in container storage land as one of > the storage backends, the business needs have changed. From tens of > volumes, the requirements have translated to hundreds and now to thousands. > We introduced brick multiplexing which had given some relief to have a > better control on the memory footprint when having many number of > bricks/volumes hosted in the node, but this wasn't enough. In one of our (I > represent Red Hat) customer's deployment we had seen on a 3 nodes cluster, > whenever the number of volumes go beyond ~1500 and for some reason if one > of the storage pods get rebooted, the overall time it takes to complete the > overall handshaking (not only in a factor of n X n peer handshaking but > also the number of volume iterations, building up the dictionary and > sending it over the write) consumes a huge time as part of the handshaking > process, the hard timeout of an rpc request which is 10 minutes gets > expired and we see cluster going into a state where none of the cli > commands go through and get stuck. > > With such problem being around and more demand of volume scalability, we > started looking into these areas in GD1 to focus on improving (a) volume > scalability (b) node scalability. While (b) is a separate topic for some > other day we're going to focus on more on (a) today. > > While looking into this volume scalability problem with a deep dive, we > realized that most of the bottleneck which was causing the overall delay in > the friend handshaking and exchanging handshake packets between peers in > the cluster was iterating over the in-memory data structures of the > volumes, putting them into the dictionary sequentially. With 2k like > volumes the function glusterd_add_volumes_to_export_dict () was quite > costly and most time consuming. From pstack output when glusterd instance > was restarted in one of the pods, we could always see that control was > iterating in this function. Based on our testing on a 16 vCPU, 32 GB RAM 3 > nodes cluster, this function itself took almost *7.5 minutes . *The > bottleneck is primarily because of sequential iteration of volumes, > sequentially updating the dictionary with lot of (un)necessary keys. > > So what we tried out was making this loop to work on a worker thread model > so that multiple threads can process a range of volume list and not all of > them so that we can get more parallelism within glusterd. But with that we > still didn't see any improvement and the primary reason for that was our > dictionary APIs need locking. So the next idea was to actually make threads > work on multiple dictionaries and then once all the volumes are iterated > the subsequent dictionaries to be merged into a single one. Along with > these changes there are few other improvements done on skipping comparison > of snapshots if there's no snap available, excluding tiering keys if the > volume type is not tier. With this enhancement [1] we see the overall time > it took to complete building up the dictionary from the in-memory structure > is *2 minutes 18 seconds* which is close* ~3x* improvement. We firmly > believe that with this improvement, we should be able to scale up to 2000 > volumes on a 3 node cluster and that'd help our users to get benefited with > supporting more PVCs/volumes. > > Patch [1] is still in testing and might undergo few minor changes. But we > welcome you for review and comment on it. We plan to get this work > completed, tested and release in glusterfs-7. > > Last but not the least, I'd like to give a shout to Mohit Agrawal (In cc) > for all the work done on this for last few days. Thank you Mohit! > > This sounds good! Thank you for the update on this work. Did you ever consider using etcd with GD1 (like as it is used with GD2)? Having etcd as a backing store for configuration could remove expensive handshaking as well as persistence of configuration on every node. I am interested in understanding if you are aware of any drawbacks with that approach. If there haven't been any thoughts in that direction, it might be a fun experiment to try. Thanks, Vijay -------------- next part -------------- An HTML attachment was scrubbed... URL: From amukherj at redhat.com Sat Mar 30 04:16:29 2019 From: amukherj at redhat.com (Atin Mukherjee) Date: Sat, 30 Mar 2019 09:46:29 +0530 Subject: [Gluster-users] Quick update on glusterd's volume scalability improvements In-Reply-To: References: Message-ID: On Sat, 30 Mar 2019 at 08:06, Vijay Bellur wrote: > > > On Fri, Mar 29, 2019 at 6:42 AM Atin Mukherjee > wrote: > >> All, >> >> As many of you already know that the design logic with which GlusterD >> (here on to be referred as GD1) was implemented has some fundamental >> scalability bottlenecks at design level, especially around it's way of >> handshaking configuration meta data and replicating them across all the >> peers. While the initial design was adopted with a factor in mind that GD1 >> will have to deal with just few tens of nodes/peers and volumes, the >> magnitude of the scaling bottleneck this design can bring in was never >> realized and estimated. >> >> Ever since Gluster has been adopted in container storage land as one of >> the storage backends, the business needs have changed. From tens of >> volumes, the requirements have translated to hundreds and now to thousands. >> We introduced brick multiplexing which had given some relief to have a >> better control on the memory footprint when having many number of >> bricks/volumes hosted in the node, but this wasn't enough. In one of our (I >> represent Red Hat) customer's deployment we had seen on a 3 nodes cluster, >> whenever the number of volumes go beyond ~1500 and for some reason if one >> of the storage pods get rebooted, the overall time it takes to complete the >> overall handshaking (not only in a factor of n X n peer handshaking but >> also the number of volume iterations, building up the dictionary and >> sending it over the write) consumes a huge time as part of the handshaking >> process, the hard timeout of an rpc request which is 10 minutes gets >> expired and we see cluster going into a state where none of the cli >> commands go through and get stuck. >> >> With such problem being around and more demand of volume scalability, we >> started looking into these areas in GD1 to focus on improving (a) volume >> scalability (b) node scalability. While (b) is a separate topic for some >> other day we're going to focus on more on (a) today. >> >> While looking into this volume scalability problem with a deep dive, we >> realized that most of the bottleneck which was causing the overall delay in >> the friend handshaking and exchanging handshake packets between peers in >> the cluster was iterating over the in-memory data structures of the >> volumes, putting them into the dictionary sequentially. With 2k like >> volumes the function glusterd_add_volumes_to_export_dict () was quite >> costly and most time consuming. From pstack output when glusterd instance >> was restarted in one of the pods, we could always see that control was >> iterating in this function. Based on our testing on a 16 vCPU, 32 GB RAM 3 >> nodes cluster, this function itself took almost *7.5 minutes . *The >> bottleneck is primarily because of sequential iteration of volumes, >> sequentially updating the dictionary with lot of (un)necessary keys. >> >> So what we tried out was making this loop to work on a worker thread >> model so that multiple threads can process a range of volume list and not >> all of them so that we can get more parallelism within glusterd. But with >> that we still didn't see any improvement and the primary reason for that >> was our dictionary APIs need locking. So the next idea was to actually make >> threads work on multiple dictionaries and then once all the volumes are >> iterated the subsequent dictionaries to be merged into a single one. Along >> with these changes there are few other improvements done on skipping >> comparison of snapshots if there's no snap available, excluding tiering >> keys if the volume type is not tier. With this enhancement [1] we see the >> overall time it took to complete building up the dictionary from the >> in-memory structure is *2 minutes 18 seconds* which is close* ~3x* >> improvement. We firmly believe that with this improvement, we should be >> able to scale up to 2000 volumes on a 3 node cluster and that'd help our >> users to get benefited with supporting more PVCs/volumes. >> >> Patch [1] is still in testing and might undergo few minor changes. But we >> welcome you for review and comment on it. We plan to get this work >> completed, tested and release in glusterfs-7. >> >> Last but not the least, I'd like to give a shout to Mohit Agrawal (In cc) >> for all the work done on this for last few days. Thank you Mohit! >> >> > > This sounds good! Thank you for the update on this work. > > Did you ever consider using etcd with GD1 (like as it is used with GD2)? > Honestly I had thought about it few times, but the primary reason was not to go forward with that direction was the bandwidth as such improvements isn?t a short term and tiny tasks, and also to keep in mind that GD2 tasks were in our plate too. If any other contributors are willing to take this up, I am more than happy to collaborate and provide guidance. Having etcd as a backing store for configuration could remove expensive > handshaking as well as persistence of configuration on every node. I am > interested in understanding if you are aware of any drawbacks with that > approach. If there haven't been any thoughts in that direction, it might be > a fun experiment to try. > > Thanks, > Vijay > -- - Atin (atinm) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Sun Mar 31 02:16:53 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Sun, 31 Mar 2019 07:46:53 +0530 Subject: [Gluster-users] Transport endpoint is not connected failures in In-Reply-To: <407701d4e668$5623a1b0$026ae510$@thinkhuge.net> References: <009a01d4e4b8$731a8420$594f8c60$@thinkhuge.net> <407701d4e668$5623a1b0$026ae510$@thinkhuge.net> Message-ID: On Sat, Mar 30, 2019 at 1:18 AM wrote: > Hello, > > > > Yes I did find some hits on this in the following logs. We started seeing > failures after upgrading to 5.3 from 4.6. > There are no relevant fixes for ping timer expiry between 5.5 and 5.3. So, I attribute the failures not being seen to the increased number of client.event-threads and server.event-threads to 8 in current setup from lower values earlier. If you want me to check for something else let me know. Thank you all on > the gluster team for finding and fixing that problem whatever it was! > > > > [root at lonbaknode3 glusterfs]# zgrep ping_timer > /var/log/glusterfs/home-volbackups* > > .... > > /var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16 > 10:34:44.419605] C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired] > 0-volbackups-client-3: server 1.2.3.4:49153 has not responded in the last > 42 seconds, disconnecting. > > /var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16 > 10:34:44.419672] C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired] > 0-volbackups-client-6: server 1.2.3.4:49153 has not responded in the last > 42 seconds, disconnecting. > > /var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16 > 10:34:57.425211] C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired] > 0-volbackups-client-9: server 1.2.3.4:49153 has not responded in the last > 42 seconds, disconnecting. > > /var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16 > 11:46:25.768650] C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired] > 0-volbackups-client-6: server 1.2.3.4:49153 has not responded in the last > 42 seconds, disconnecting. > > /var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16 > 16:02:29.921450] C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired] > 0-volbackups-client-3: server 1.2.3.4:49153 has not responded in the last > 42 seconds, disconnecting. > > .... > > > > ----- > > What was the version you saw failures in? Were there any logs matching > with the pattern "ping_timer_expired" earlier? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skoduri at redhat.com Sun Mar 31 17:31:58 2019 From: skoduri at redhat.com (Soumya Koduri) Date: Sun, 31 Mar 2019 23:01:58 +0530 Subject: [Gluster-users] upgrade best practices In-Reply-To: References: <629338fe8720f63420d43fa72cc7b080ba213a4c.camel@gmail.com> Message-ID: <9c792d30-0e79-98f7-6b76-9d168c947078@redhat.com> On 3/29/19 10:39 PM, Poornima Gurusiddaiah wrote: > > > On Fri, Mar 29, 2019, 10:03 PM Jim Kinney > wrote: > > Currently running 3.12 on Centos 7.6. Doing cleanups on split-brain > and out of sync, need heal files. > > We need to migrate the three replica servers to gluster v. 5 or 6. > Also will need to upgrade about 80 clients as well. Given that a > complete removal of gluster will not touch the 200+TB of data on 12 > volumes, we are looking at doing that process, Stop all clients, > stop all glusterd services, remove all of it, install new version, > setup new volumes from old bricks, install new clients, mount > everything. > > We would like to get some better performance from nfs-ganesha mounts > but that doesn't look like an option (not done any parameter tweaks > in testing yet). At a bare minimum, we would like to minimize the > total downtime of all systems. Could you please be more specific here? As in are you looking for better performance during upgrade process or in general? Compared to 3.12, there are lot of perf improvements done in both glusterfs and esp., nfs-ganesha (latest stable - V2.7.x) stack. If you could provide more information about your workloads (for eg., large-file,small-files, metadata-intensive) , we can make some recommendations wrt to configuration. Thanks, Soumya > > Does this process make more sense than a version upgrade process to > 4.1, then 5, then 6? What "gotcha's" do I need to be ready for? I > have until late May to prep and test on old, slow hardware with a > small amount of files and volumes. > > > You can directly upgrade from 3.12 to 6.x. I would suggest that rather > than deleting and creating Gluster volume. +Hari and +Sanju for further > guidelines on upgrade, as they recently did upgrade tests. +Soumya to > add to the nfs-ganesha aspect. > > Regards, > Poornima > > -- > > James P. Kinney III > > Every time you stop a school, you will have to build a jail. What you > gain at one end you lose at the other. It's like feeding a dog on his > own tail. It won't fatten the dog. > - Speech 11/23/1900 Mark Twain > > http://heretothereideas.blogspot.com/ > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > From amye at redhat.com Sun Mar 31 23:01:07 2019 From: amye at redhat.com (Amye Scavarda) Date: Sun, 31 Mar 2019 18:01:07 -0500 Subject: [Gluster-users] Gluster Monthly Newsletter, March 2019 Message-ID: Gluster Monthly Newsletter, March 2019 Congratulations to the team for getting Gluster 6 released! https://www.gluster.org/announcing-gluster-6/ https://lists.gluster.org/pipermail/gluster-users/2019-March/036144.html Our retrospective survey is open through April 8th, give us feedback on what we should start, stop or continue! https://lists.gluster.org/pipermail/gluster-users/2019-March/036144.html Gluster 7 Roadmap Discussion kicked off for our 7 roadmap on the mailing lists, see [Gluster-users] GlusterFS v7.0 (and v8.0) roadmap discussion https://lists.gluster.org/pipermail/gluster-users/2019-March/036139.html for more details. Gluster Friday Five: See all of our Friday Five casts at https://www.youtube.com/user/GlusterCommunity Contributors Top Contributing Companies: Red Hat, Comcast, DataLab, Gentoo Linux, Facebook, BioDec, Samsung, Etersoft Top Contributors in February: Yaniv Kaul, Pranith Kumar Karampuri, Aravinda VK, Ravishankar N Noteworthy Threads: [Gluster-users] Gluster : Improvements on "heal info" command https://lists.gluster.org/pipermail/gluster-users/2019-March/035955.html [Gluster-users] Announcing Gluster release 5.5 https://lists.gluster.org/pipermail/gluster-users/2019-March/036098.html [Gluster-users] GlusterFS v7.0 (and v8.0) roadmap discussion https://lists.gluster.org/pipermail/gluster-users/2019-March/036139.html [Gluster-users] Proposal: Changes in Gluster Community meetings https://lists.gluster.org/pipermail/gluster-users/2019-March/036140.html [Gluster-users] Help: gluster-block https://lists.gluster.org/pipermail/gluster-users/2019-March/036147.html [Gluster-users] POSIX locks and disconnections between clients and bricks https://lists.gluster.org/pipermail/gluster-users/2019-March/036161.html [Gluster-users] [Gluster-infra] Gluster HA https://lists.gluster.org/pipermail/gluster-users/2019-March/036200.html [Gluster-users] [Event CfP Announce] DevConf events India and US in the month of August 2019 https://lists.gluster.org/pipermail/gluster-users/2019-March/036211.html [Gluster-users] Upgrade testing to gluster 6 https://lists.gluster.org/pipermail/gluster-users/2019-March/036214.html [Gluster-users] Quick update on glusterd's volume scalability improvements https://lists.gluster.org/pipermail/gluster-users/2019-March/036219.html [Gluster-devel] [Gluster-infra] 8/10 AWS jenkins builders disconnected https://lists.gluster.org/pipermail/gluster-devel/2019-March/055906.html [Gluster-devel] [Gluster-users] Experiences with FUSE in real world - Presentationat Vault 2019 https://lists.gluster.org/pipermail/gluster-devel/2019-March/055944.html [Gluster-devel] [Gluster-infra] Upgrading build.gluster.org https://lists.gluster.org/pipermail/gluster-devel/2019-March/055912.html [Gluster-devel] Github#268 Compatibility with Alpine Linux https://lists.gluster.org/pipermail/gluster-devel/2019-March/055921.html [Gluster-devel] GF_CALLOC to GF_MALLOC conversion - is it safe? https://lists.gluster.org/pipermail/gluster-devel/2019-March/055969.html [Gluster-devel] Issue with posix locks https://lists.gluster.org/pipermail/gluster-devel/2019-March/056027.html Events: Red Hat Summit, May 4-6, 2019 - https://www.redhat.com/en/summit/2019 Open Source Summit and KubeCon + CloudNativeCon Shanghai, June 24-26, 2019 https://www.lfasiallc.com/events/kubecon-cloudnativecon-china-2019/ DevConf India, August 2- 3 2019, Bengaluru - https://devconf.info/in DevConf USA, August 15-17, 2019, Boston - https://devconf.info/us/ -- Amye Scavarda | amye at redhat.com | Gluster Community Lead