[Gluster-devel] Need advice re some major issues with glusterfind

Fri Oct 23 02:36:05 UTC 2015

On Thu, Oct 22, 2015 at 8:41 AM, Sincock, John [FLCPTY] <J.Sincock at fugro.com
> wrote:

> Pls see below
>
>
>
> *From:* Vijaikumar Mallikarjuna [mailto:vmallika at redhat.com]
> *Sent:* Wednesday, 21 October 2015 6:37 PM
> *To:* Sincock, John [FLCPTY]
> *Cc:* gluster-devel at gluster.org
> *Subject:* Re: [Gluster-devel] Need advice re some major issues with
> glusterfind
>
>
>
>
>
>
>
> On Wed, Oct 21, 2015 at 5:53 AM, Sincock, John [FLCPTY] <
> J.Sincock at fugro.com> wrote:
>
> Hi Everybody,
>
> We have recently upgraded our 220 TB gluster to 3.7.4, and we've been
> trying to use the new glusterfind feature but have been having some serious
> problems with it. Overall the glusterfind looks very promising, so I don't
> want to offend anyone by raising these issues.
>
> If these issues can be resolved or worked around, glusterfind will be a
> great feature.  So I would really appreciate any information or advice:
>
> 1) What can be done about the vast number of tiny changelogs? We are
> seeing often 5+ small 89 byte changelog files per minute on EACH brick.
> Larger files if busier. We've been generating these changelogs for a few
> weeks and have in excess of 10,000 or 12,000 on most bricks. This makes
> glusterfinds very, very slow, especially on a node which has a lot of
> bricks, and looks unsustainable in the long run. Why are these files so
> small, and why are there so many of them, and how are they supposed to be
> managed in the long run? The sheer number of these files looks sure to
> impact performance in the long run.
>
> 2) Pgfid xattribute is wreaking havoc with our backup scheme - when
> gluster adds this extended attribute to files it changes the ctime, which
> we were using to determine which files need to be archived. There should be
> a warning added to release notes & upgrade notes, so people can make a plan
> to manage this if required.
>
> Also, we ran a rebalance immediately after the 3.7.4 upgrade, and the
> rebalance took 5 days or so to complete, which looks like a major speed
> improvement over the more serial rebalance algorithm, so that's good. But I
> was hoping that the rebalance would also have had the side-effect of
> triggering all files to be labelled with the pgfid attribute by the time
> the rebalance completed, or failing that, after creation of an mlocate
> database across our entire gluster (which would have accessed every file,
> unless it is getting the info it needs only from directory inodes). Now it
> looks like ctimes are still being modified, and I think this can only be
> caused by files still being labelled with pgfids.
>
> How can we force gluster to get this pgfid labelling over and done with,
> for all files that are already on the volume? We can't have gluster
> continuing to add pgfids in bursts here and there, eg when files are read
> for the first time since the upgrade. We need to get it over and done with.
> We have just had to turn off pgfid creation on the volume until we can
> force gluster to get it over and done with in one go.
>
>
>
>
>
> Hi John,
>
>
>
> Was quota turned on/off before/after performing re-balance? If the pgfid
> is  missing, this can be healed by performing 'find <mount_point> | xargs
> stat', all the files will get looked-up once and the pgfid healing will
> happen.
>
> Also could you please provide all the volume files under
> '/var/lib/glusterd/vols/<volname>/*.vol'?
>
>
>
> Thanks,
>
> Vijay
>
>
>
>
>
> Hi Vijay
>
>
>
> Quota has never been turned on in our gluster, so it can’t be any
> quota-related xattrs which are resetting our ctimes, so I’m pretty sure it
> must be due to pgfids still being added.
>
>
>
> Thanks for the tip re using stat, if that should trigger the pgfid build
> on each file, then I will run that when I have a chance. We’ll have to get
> our archiving of data back up to date, re-enable pgfid build option, and
> then run the stat over a weekend or something, as it will take a while.
>
>
>
> I’m still quite concerned about the number of changelogs being generated.
> Do you know if there any plans to change the way changelogs are generated
> so there aren’t so many of them, and to process them more efficiently? I
> think this will be vital to improving performance of glusterfind in future,
> as there are currently an enormous number of these small changelogs being
> generated on each of our gluster bricks.
>
>
>
Below is the volfile for one brick, the others are all equivalent. We
> haven’t tweaked the volume options much, besides increasing the io thread
> count to 32, and client/event threads to 6 (since we have a lot of small
> files on our gluster (30 million files, a lot of which are small, and some
> of which are large to very large):
>
>
>

Hi John,

PGFID xattrs are updated only when update-link-count-parent is enabled in
the brick volume file. This option is enabled when quota is enabled on a
volume.
In the volume file you provided below has update-link-count-parent disabled,
I am wondering why PGFID xattrs are updated.

Thanks,
Vijay

> [root at g-unit-1 sbin]# cat
> /var/lib/glusterd/vols/vol00/vol00.g-unit-1.mnt-glusterfs-bricks-1.vol
>
> volume
> vol00-posix
>
>
>     type storage/posix
>
>
>     option update-link-count-parent
> off
>
>     option volume-id
> 292b8701-d394-48ee-a224-b5a20ca7ce0f
>
>
>     option directory
> /mnt/glusterfs/bricks/1
>
>
> end-volume
>
>
>
>
> volume vol00-trash
>
>     type features/trash
>
>     option trash-internal-op off
>
>     option brick-path /mnt/glusterfs/bricks/1
>
>     option trash-dir .trashcan
>
>     subvolumes vol00-posix
>
> end-volume
>
>
>
> volume vol00-changetimerecorder
>
>     type features/changetimerecorder
>
>     option record-counters off
>
>     option ctr-enabled off
>
>     option record-entry on
>
>     option ctr_inode_heal_expire_period 300
>
>     option ctr_hardlink_heal_expire_period 300
>
>     option ctr_link_consistency off
>
>     option record-exit off
>
>     option db-path /mnt/glusterfs/bricks/1/.glusterfs/
>
>     option db-name 1.db
>
>     option hot-brick off
>
>     option db-type sqlite3
>
>     subvolumes vol00-trash
>
> end-volume
>
>
>
> volume vol00-changelog
>
>     type features/changelog
>
>     option capture-del-path on
>
>     option changelog-barrier-timeout 120
>
>     option changelog on
>
>     option changelog-dir /mnt/glusterfs/bricks/1/.glusterfs/changelogs
>
>     option changelog-brick /mnt/glusterfs/bricks/1
>
>     subvolumes vol00-changetimerecorder
>
> end-volume
>
>
>
> volume vol00-bitrot-stub
>
>     type features/bitrot-stub
>
>     option export /mnt/glusterfs/bricks/1
>
>     subvolumes vol00-changelog
>
> end-volume
>
>
>
> volume vol00-access-control
>
>     type features/access-control
>
>     subvolumes vol00-bitrot-stub
>
> end-volume
>
>
>
> volume vol00-locks
>
>     type features/locks
>
>     subvolumes vol00-access-control
>
> end-volume
>
>
>
> volume vol00-upcall
>
>     type features/upcall
>
>     option cache-invalidation off
>
>     subvolumes vol00-locks
>
> end-volume
>
>
>
> volume vol00-io-threads
>
>     type performance/io-threads
>
>     option thread-count 32
>
>     subvolumes vol00-upcall
>
> end-volume
>
>
>
> volume vol00-marker
>
>     type features/marker
>
>     option inode-quota off
>
>     option quota off
>
>     option gsync-force-xtime off
>
>     option xtime off
>
>     option timestamp-file /var/lib/glusterd/vols/vol00/marker.tstamp
>
>     option volume-uuid 292b8701-d394-48ee-a224-b5a20ca7ce0f
>
>     subvolumes vol00-io-threads
>
> end-volume
>
>
>
> volume vol00-barrier
>
>     type features/barrier
>
>     option barrier-timeout 120
>
>     option barrier disable
>
>     subvolumes vol00-marker
>
> end-volume
>
>
>
> volume vol00-index
>
>     type features/index
>
>     option index-base /mnt/glusterfs/bricks/1/.glusterfs/indices
>
>     subvolumes vol00-barrier
>
> end-volume
>
>
>
> volume vol00-quota
>
>     type features/quota
>
>     option deem-statfs off
>
>     option timeout 0
>
>     option server-quota off
>
>     option volume-uuid vol00
>
>     subvolumes vol00-index
>
> end-volume
>
>
>
> volume vol00-worm
>
>     type features/worm
>
>     option worm off
>
>     subvolumes vol00-quota
>
> end-volume
>
>
>
> volume vol00-read-only
>
>     type features/read-only
>
>     option read-only off
>
>     subvolumes vol00-worm
>
> end-volume
>
>
>
> volume /mnt/glusterfs/bricks/1
>
>     type debug/io-stats
>
>     option count-fop-hits off
>
>     option latency-measurement off
>
>     subvolumes vol00-read-only
>
> end-volume
>
>
>
> volume vol00-server
>
>     type protocol/server
>
>     option event-threads 6
>
>     option rpc-auth-allow-insecure on
>
>     option auth.addr./mnt/glusterfs/bricks/1.allow *
>
>     option auth.login.dc3d05ba-40ce-47ee-8f4c-a729917784dc.password
> 58c2072b-8d1c-4921-9270-bf4b477c4126
>
>     option auth.login./mnt/glusterfs/bricks/1.allow
> dc3d05ba-40ce-47ee-8f4c-a729917784dc
>
>     option transport-type tcp
>
>     subvolumes /mnt/glusterfs/bricks/1
>
> end-volume
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20151023/9fc9b5c9/attachment-0001.html>