[Gluster-users] Shard storage suggestions

Mon Jul 18 10:33:10 UTC 2016

2016-07-18 12:25 GMT+02:00 Krutika Dhananjay <kdhananj at redhat.com>:
> Hi,
>
> The suggestion you gave was in fact considered at the time of writing shard
> translator.
> Here are some of the considerations for sticking with a single directory as
> opposed to a two-tier classification of shards based on the initial chars of
> the uuid string:
> i) Even for a 4TB disk with the smallest possible shard size of 4MB, there
> will only be a max of 1048576 entries
>  under /.shard in the worst case - a number far less than the max number of
> inodes that are supported by most backend file systems.

This with just 1 single file.
What about thousands of huge sharded files? In a petabyte scale cluster, having
thousands of huge file should be considered normal.

> iii) Resolving shards from the original file name as given by the
> application to the corresponding shard within a single directory (/.shard in
> the existing case) would mean, looking up the parent dir /.shard first
> followed by lookup on the actual shard that is to be operated on. But having
> a two-tier sub-directory structure means that we not only have to resolve
> (or look-up) /.shard first, but also the directories '/.shard/d2',
> '/.shard/d2/18', and '/.shard/d2/18/d218cd1c-4bd9-40d7-9810-86b3f7932509'
> before finally looking up the shard, which is a lot of network operations.
> Yes, these are all one-time operations and the results can be cached in the
> inode table, but still on account of having to have dynamic gfids (as
> opposed to just /.shard, which has a fixed gfid -
> be318638-e8a0-4c6d-977d-7a937aa84806), it is trivial to resolve the name of
> the shard to gfid, or the parent name to parent gfid _even_ in memory.

What about just 1 single level?
/.shard/d218cd1c-4bd9-40d7-9810-86b3f7932509/d218cd1c-4bd9-40d7-9810-86b3f7932509.1
?

You have the GFID, thus there is no need to crawl multiple levels,
just direct-access to the proper path.

With this soulution, you have 1.048.576 entries with a 4TB shared file
with 4MB shard size.
With the current implementation, you have 1.048.576 for each sharded
file. If I have 100 4TB files, i'll end
with 1.048.576*100 = 104.857.600 files in a single directory.

> Are you unhappy with the performance? What's your typical VM image size,
> shard block size and the capacity of individual bricks?

No, i'm just thinking about this optimization.