[Gluster-users] Atomic file updates

Thu Feb 13 01:38:34 UTC 2014

Hi Jeff - many thanks for your explanation.

On 02/13/2014 11:19 AM, Jeff Darcy wrote:
> 
> Creating a file with one name and then renaming it to another *might*
> cause creation of linkfiles, but I think concerns about linkfiles are
> often overblown.  The one extra call to create a linkfile isn't much
> compared to those for creating the file, writing into it, and then
> renaming it even if the rename is local to one brick.  What really
> matters is the performance of the entire sequence, with or without the
> linkfile.

It's not the time overhead of creating a link file that I'm worried
about - it's making sure that I don't end up with millions of orphaned
link files, or link files pointing at other link files. I think the next
part of your message avoids this problem.

> That said, there's also a trick you can use to avoid creation of a
> linkfile.  Other tools, such as rsync and our own object interface,
> use the same write-then-rename idiom.  To serve them, there's an
> option called extra-hash-regex that can be used to place files on the
> "right" brick according to their final name even though they're created
> with another.  Unfortunately, specifying that option via the command line
> doesn't seem to work (it creates a malformed volfile) so you have to
> mount a bit differently.  For example:
> 
>    glusterfs --volfile-server=a_server --volfile-id=a_volume \
>    --xlator-option a_volume-dht.extra_hash_regex='(.*+)tmp' \
>    /a/mountpoint
> 
> The important part is that second line.  That causes any file with a
> "tmp" suffix to be hashed and placed as though only the part in the
> first parenthesized part of the regex (i.e. without the "tmp") was
> there.  Therefore, creating "xxxtmp" and then renaming it to "xxx" is
> the same as just creating "xxx" in the first place as far as linkfiles
> etc. are concerned.  Note that the excluded part can be anything that
> a regex can match, including a unique random number.  If I recall,
> rsync uses temp files something like this:
> 
>    fubar = .fubar.NNNNNN (where NNNNNNN is a random number)
> 
> I know this probably seems a little voodoo-ish, but with a little bit
> of experimentation to find the right regex you should be able to avoid
> those dreaded linkfiles altogether.
> 

I think I mostly understand this. Assuming I implement the volume on 4
servers with 1 brick each and use replica 2, each file will be stored on
2 nodes. Web server clients mount the volume using the syntax you showed
above then when I need to update a file I should:

--write--> file1.xml.tmp --rename--> file1.xml

extra_hash_regex will cause file1.xml.tmp to be created on the 2 bricks
that file1.xml will end up on, and therefore the rename is atomic and a
link file isn't created. The main difference from what I'm doing now
seems to be that the first part of the temporary file needs to be
identical to the final file instead having a unique random name.

Is this correct?

BTW I'll be running this on CentOS 6.5 servers and it looks like the
repo has glusterfs-3.4.0.57rhs. Is this version new enough for this?

Tom