[Gluster-users] SQLite3 on 3 node cluster FS?

Mon Mar 5 16:26:49 UTC 2018

On Mon, Mar 5, 2018 at 8:21 PM, Paul Anderson <pha at umich.edu> wrote:

> Hi,
>
> tl;dr summary of below: flock() works, but what does it take to make
> sync()/fsync() work in a 3 node GFS cluster?
>
> I am under the impression that POSIX flock, POSIX
> fcntl(F_SETLK/F_GETLK,...), and POSIX read/write/sync/fsync are all
> supported in cluster operations, such that in theory, SQLite3 should
> be able to atomically lock the file (or a subset of page), modify
> pages, flush the pages to gluster, then release the lock, and thus
> satisfy the ACID property that SQLite3 appears to try to accomplish on
> a local filesystem.
>
> In a test we wrote that fires off 10 simple concurrernt SQL insert,
> read, update loops, we discovered that we at least need to use flock()
> around the SQLite3 db connection open/update/close to protect it.
>
> However, that is not enough - although from testing, it looks like
> flock() works as advertised across gluster mounted files, sync/fsync
> don't appear to, so we end up getting corruption in the SQLite3 file
> (pragma integrity_check generally will show a bunch of problems after
> a short test).
>
> Is what we're trying to do achievable? We're testing using the docker
> container gluster/gluster-centos as the three servers, with a php test
> inside of php-cli using filesystem mounts. If we mount the gluster FS
> via sapk/plugin-gluster into the php-cli containers using docker, we
> seem to have better success sometimes, but I haven't figured out why,
> yet.
>
> I did see that I needed to set the server volume parameter
> 'performance.flush-behind off', otherwise it seems that flushes won't
> block as would be needed by SQLite3.
>

If you are relying on fsync this shouldn't matter as fsync makes sure data
is synced to disk.

> Does anyone have any suggestions? Any words of widsom would be much
> appreciated.
>

Can you experiment with turning on/off various performance xlators? Based
on earlier issues, its likely that there is stale metadata which might be
causing the issue (not necessarily improper fsync behavior). I would
suggest turning off all performance xlators. You can refer [1] for a
related discussion. In theory the only perf xlator relevant for fsync is
write-behind and I am not aware of any issues where fsync is not working.
Does glusterfs log file has any messages complaining about writes or fsync
failing? Does your application use O_DIRECT? If yes, please note that you
need to turn the option performance.strict-o-direct on for write-behind to
honour O_DIRECT

Also, is it possible to identify nature of corruption - Data or metadata?
More detailed explanation will help to RCA the issue.

Also, is your application running on a single mount or from multiple
mounts? Can you collect strace of your application (strace -ff -T -p <pid>
-o <file>)? If possible can you also collect fuse-dump using option
--dump-fuse while mounting glusterfs?

[1]
http://lists.gluster.org/pipermail/gluster-users/2018-February/033503.html

> Thanks,
>
> Paul
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180305/04347d47/attachment.html>