[Gluster-devel] Need advice re some major issues with glusterfind
Sincock, John [FLCPTY]
J.Sincock at fugro.com
Thu Oct 22 03:11:15 UTC 2015
Pls see below
From: Vijaikumar Mallikarjuna [mailto:vmallika at redhat.com]
Sent: Wednesday, 21 October 2015 6:37 PM
To: Sincock, John [FLCPTY]
Cc: gluster-devel at gluster.org
Subject: Re: [Gluster-devel] Need advice re some major issues with glusterfind
On Wed, Oct 21, 2015 at 5:53 AM, Sincock, John [FLCPTY] <J.Sincock at fugro.com<mailto:J.Sincock at fugro.com>> wrote:
Hi Everybody,
We have recently upgraded our 220 TB gluster to 3.7.4, and we've been trying to use the new glusterfind feature but have been having some serious problems with it. Overall the glusterfind looks very promising, so I don't want to offend anyone by raising these issues.
If these issues can be resolved or worked around, glusterfind will be a great feature. So I would really appreciate any information or advice:
1) What can be done about the vast number of tiny changelogs? We are seeing often 5+ small 89 byte changelog files per minute on EACH brick. Larger files if busier. We've been generating these changelogs for a few weeks and have in excess of 10,000 or 12,000 on most bricks. This makes glusterfinds very, very slow, especially on a node which has a lot of bricks, and looks unsustainable in the long run. Why are these files so small, and why are there so many of them, and how are they supposed to be managed in the long run? The sheer number of these files looks sure to impact performance in the long run.
2) Pgfid xattribute is wreaking havoc with our backup scheme - when gluster adds this extended attribute to files it changes the ctime, which we were using to determine which files need to be archived. There should be a warning added to release notes & upgrade notes, so people can make a plan to manage this if required.
Also, we ran a rebalance immediately after the 3.7.4 upgrade, and the rebalance took 5 days or so to complete, which looks like a major speed improvement over the more serial rebalance algorithm, so that's good. But I was hoping that the rebalance would also have had the side-effect of triggering all files to be labelled with the pgfid attribute by the time the rebalance completed, or failing that, after creation of an mlocate database across our entire gluster (which would have accessed every file, unless it is getting the info it needs only from directory inodes). Now it looks like ctimes are still being modified, and I think this can only be caused by files still being labelled with pgfids.
How can we force gluster to get this pgfid labelling over and done with, for all files that are already on the volume? We can't have gluster continuing to add pgfids in bursts here and there, eg when files are read for the first time since the upgrade. We need to get it over and done with. We have just had to turn off pgfid creation on the volume until we can force gluster to get it over and done with in one go.
Hi John,
Was quota turned on/off before/after performing re-balance? If the pgfid is missing, this can be healed by performing 'find <mount_point> | xargs stat', all the files will get looked-up once and the pgfid healing will happen.
Also could you please provide all the volume files under '/var/lib/glusterd/vols/<volname>/*.vol'?
Thanks,
Vijay
Hi Vijay
Quota has never been turned on in our gluster, so it can’t be any quota-related xattrs which are resetting our ctimes, so I’m pretty sure it must be due to pgfids still being added.
Thanks for the tip re using stat, if that should trigger the pgfid build on each file, then I will run that when I have a chance. We’ll have to get our archiving of data back up to date, re-enable pgfid build option, and then run the stat over a weekend or something, as it will take a while.
I’m still quite concerned about the number of changelogs being generated. Do you know if there any plans to change the way changelogs are generated so there aren’t so many of them, and to process them more efficiently? I think this will be vital to improving performance of glusterfind in future, as there are currently an enormous number of these small changelogs being generated on each of our gluster bricks.
Below is the volfile for one brick, the others are all equivalent. We haven’t tweaked the volume options much, besides increasing the io thread count to 32, and client/event threads to 6 (since we have a lot of small files on our gluster (30 million files, a lot of which are small, and some of which are large to very large):
[root at g-unit-1 sbin]# cat /var/lib/glusterd/vols/vol00/vol00.g-unit-1.mnt-glusterfs-bricks-1.vol
volume vol00-posix
type storage/posix
option update-link-count-parent off
option volume-id 292b8701-d394-48ee-a224-b5a20ca7ce0f
option directory /mnt/glusterfs/bricks/1
end-volume
volume vol00-trash
type features/trash
option trash-internal-op off
option brick-path /mnt/glusterfs/bricks/1
option trash-dir .trashcan
subvolumes vol00-posix
end-volume
volume vol00-changetimerecorder
type features/changetimerecorder
option record-counters off
option ctr-enabled off
option record-entry on
option ctr_inode_heal_expire_period 300
option ctr_hardlink_heal_expire_period 300
option ctr_link_consistency off
option record-exit off
option db-path /mnt/glusterfs/bricks/1/.glusterfs/
option db-name 1.db
option hot-brick off
option db-type sqlite3
subvolumes vol00-trash
end-volume
volume vol00-changelog
type features/changelog
option capture-del-path on
option changelog-barrier-timeout 120
option changelog on
option changelog-dir /mnt/glusterfs/bricks/1/.glusterfs/changelogs
option changelog-brick /mnt/glusterfs/bricks/1
subvolumes vol00-changetimerecorder
end-volume
volume vol00-bitrot-stub
type features/bitrot-stub
option export /mnt/glusterfs/bricks/1
subvolumes vol00-changelog
end-volume
volume vol00-access-control
type features/access-control
subvolumes vol00-bitrot-stub
end-volume
volume vol00-locks
type features/locks
subvolumes vol00-access-control
end-volume
volume vol00-upcall
type features/upcall
option cache-invalidation off
subvolumes vol00-locks
end-volume
volume vol00-io-threads
type performance/io-threads
option thread-count 32
subvolumes vol00-upcall
end-volume
volume vol00-marker
type features/marker
option inode-quota off
option quota off
option gsync-force-xtime off
option xtime off
option timestamp-file /var/lib/glusterd/vols/vol00/marker.tstamp
option volume-uuid 292b8701-d394-48ee-a224-b5a20ca7ce0f
subvolumes vol00-io-threads
end-volume
volume vol00-barrier
type features/barrier
option barrier-timeout 120
option barrier disable
subvolumes vol00-marker
end-volume
volume vol00-index
type features/index
option index-base /mnt/glusterfs/bricks/1/.glusterfs/indices
subvolumes vol00-barrier
end-volume
volume vol00-quota
type features/quota
option deem-statfs off
option timeout 0
option server-quota off
option volume-uuid vol00
subvolumes vol00-index
end-volume
volume vol00-worm
type features/worm
option worm off
subvolumes vol00-quota
end-volume
volume vol00-read-only
type features/read-only
option read-only off
subvolumes vol00-worm
end-volume
volume /mnt/glusterfs/bricks/1
type debug/io-stats
option count-fop-hits off
option latency-measurement off
subvolumes vol00-read-only
end-volume
volume vol00-server
type protocol/server
option event-threads 6
option rpc-auth-allow-insecure on
option auth.addr./mnt/glusterfs/bricks/1.allow *
option auth.login.dc3d05ba-40ce-47ee-8f4c-a729917784dc.password 58c2072b-8d1c-4921-9270-bf4b477c4126
option auth.login./mnt/glusterfs/bricks/1.allow dc3d05ba-40ce-47ee-8f4c-a729917784dc
option transport-type tcp
subvolumes /mnt/glusterfs/bricks/1
end-volume
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20151022/9f5e251a/attachment-0001.html>
More information about the Gluster-devel
mailing list