[Gluster-users] SQLite3 on 3 node cluster FS?
Raghavendra Gowdappa
rgowdapp at redhat.com
Tue Mar 6 03:40:38 UTC 2018
Adding csaba
On Tue, Mar 6, 2018 at 9:09 AM, Raghavendra Gowdappa <rgowdapp at redhat.com>
wrote:
> +Csaba.
>
> On Tue, Mar 6, 2018 at 2:52 AM, Paul Anderson <pha at umich.edu> wrote:
>
>> Raghavendra,
>>
>> Thanks very much for your reply.
>>
>> I fixed our data corruption problem by disabling the volume
>> performance.write-behind flag as you suggested, and simultaneously
>> disabling caching in my client side mount command.
>>
>
> Good to know it worked. Can you give us the output of
> # gluster volume info
>
> We would like to debug the problem in write-behind. Some questions:
>
> 1. What version of Glusterfs are you using?
> 2. Were you able to figure out whether its stale data or metadata that is
> causing the issue?
>
> There have been patches merged in write-behind in recent past and one in
> the works which address metadata consistency. Would like to understand
> whether you've run into any of the already identified issues.
>
> regards,
> Raghavendra
>
>>
>> In very modest testing, the flock() case appears to me to work well -
>> before it would corrupt the db within a few transactions.
>>
>> Testing using built in sqlite3 locks is better (fcntl range locks),
>> but has some behavioral issues (probably just requires query retry
>> when the file is locked). I'll research this more, although the test
>> case is not critical to our use case.
>>
>> There are no signs of O_DIRECT use in the sqlite3 code that I can see.
>>
>> I intend to set up tests that run much longer than a few minutes, to
>> see if there are any longer term issues. Also, I want to experiment
>> with data durability by killing various gluster server nodes during
>> the tests.
>>
>> If anyone would like our test scripts, I can either tar them up and
>> email them or put them in github - either is fine with me. (they rely
>> on current builds of docker and docker-compose)
>>
>> Thanks again!!
>>
>> Paul
>>
>> On Mon, Mar 5, 2018 at 11:26 AM, Raghavendra Gowdappa
>> <rgowdapp at redhat.com> wrote:
>> >
>> >
>> > On Mon, Mar 5, 2018 at 8:21 PM, Paul Anderson <pha at umich.edu> wrote:
>> >>
>> >> Hi,
>> >>
>> >> tl;dr summary of below: flock() works, but what does it take to make
>> >> sync()/fsync() work in a 3 node GFS cluster?
>> >>
>> >> I am under the impression that POSIX flock, POSIX
>> >> fcntl(F_SETLK/F_GETLK,...), and POSIX read/write/sync/fsync are all
>> >> supported in cluster operations, such that in theory, SQLite3 should
>> >> be able to atomically lock the file (or a subset of page), modify
>> >> pages, flush the pages to gluster, then release the lock, and thus
>> >> satisfy the ACID property that SQLite3 appears to try to accomplish on
>> >> a local filesystem.
>> >>
>> >> In a test we wrote that fires off 10 simple concurrernt SQL insert,
>> >> read, update loops, we discovered that we at least need to use flock()
>> >> around the SQLite3 db connection open/update/close to protect it.
>> >>
>> >> However, that is not enough - although from testing, it looks like
>> >> flock() works as advertised across gluster mounted files, sync/fsync
>> >> don't appear to, so we end up getting corruption in the SQLite3 file
>> >> (pragma integrity_check generally will show a bunch of problems after
>> >> a short test).
>> >>
>> >> Is what we're trying to do achievable? We're testing using the docker
>> >> container gluster/gluster-centos as the three servers, with a php test
>> >> inside of php-cli using filesystem mounts. If we mount the gluster FS
>> >> via sapk/plugin-gluster into the php-cli containers using docker, we
>> >> seem to have better success sometimes, but I haven't figured out why,
>> >> yet.
>> >>
>> >> I did see that I needed to set the server volume parameter
>> >> 'performance.flush-behind off', otherwise it seems that flushes won't
>> >> block as would be needed by SQLite3.
>> >
>> >
>> > If you are relying on fsync this shouldn't matter as fsync makes sure
>> data
>> > is synced to disk.
>> >
>> >>
>> >> Does anyone have any suggestions? Any words of widsom would be much
>> >> appreciated.
>> >
>> >
>> > Can you experiment with turning on/off various performance xlators?
>> Based on
>> > earlier issues, its likely that there is stale metadata which might be
>> > causing the issue (not necessarily improper fsync behavior). I would
>> suggest
>> > turning off all performance xlators. You can refer [1] for a related
>> > discussion. In theory the only perf xlator relevant for fsync is
>> > write-behind and I am not aware of any issues where fsync is not
>> working.
>> > Does glusterfs log file has any messages complaining about writes or
>> fsync
>> > failing? Does your application use O_DIRECT? If yes, please note that
>> you
>> > need to turn the option performance.strict-o-direct on for write-behind
>> to
>> > honour O_DIRECT
>> >
>> > Also, is it possible to identify nature of corruption - Data or
>> metadata?
>> > More detailed explanation will help to RCA the issue.
>> >
>> > Also, is your application running on a single mount or from multiple
>> mounts?
>> > Can you collect strace of your application (strace -ff -T -p <pid> -o
>> > <file>)? If possible can you also collect fuse-dump using option
>> --dump-fuse
>> > while mounting glusterfs?
>> >
>> > [1]
>> > http://lists.gluster.org/pipermail/gluster-users/2018-Februa
>> ry/033503.html
>> >
>> >>
>> >> Thanks,
>> >>
>> >> Paul
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> http://lists.gluster.org/mailman/listinfo/gluster-users
>> >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180306/1df55541/attachment.html>
More information about the Gluster-users
mailing list