[Gluster-users] New 3.12.7 possible split-brain on replica 3

Ravishankar N ravishankar at redhat.com
Wed May 23 07:25:01 UTC 2018



On 05/23/2018 12:47 PM, mabi wrote:
> Hello,
>
> I just wanted to ask if you had time to look into this bug I am encountering and if there is anything else I can do?
>
> For now in order to get rid of these 3 unsynched files shall I do the same method that was suggested to me in this thread?
Sorry Mabi,  I haven't had a chance to dig deeper into this. The 
workaround of resetting xattrs should be fine though.
Thanks,
Ravi
>
> Thanks,
> Mabi
>>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>
> On May 17, 2018 11:07 PM, mabi <mabi at protonmail.ch> wrote:
>
>> ​​
>>
>> Hi Ravi,
>>
>> Please fine below the answers to your questions
>>
>> 1.  I have never touched the cluster.quorum-type option. Currently it is set as following for this volume:
>>      
>>      Option Value
>>      
>>
>> cluster.quorum-type none
>>
>> 2) The .shareKey files are not supposed to be empty. They should be 512 bytes big and contain binary data (PGP Secret Sub-key). I am not in a position to say why it is in this specific case only 0 bytes and if it is the fault of the software (Nextcloud) or GlusterFS. I can just say here that I have another file server which is a simple NFS server with another Nextcloud installation and there I never saw any 0 bytes .shareKey files being created.
>>
>> 3) It seems to be quite random and I am not the person who uses the Nextcloud software so I can't say what it was doing at that specific time but I guess uploading files or moving files around. Basically I use GlusterFS to store the files/data of the Nextcloud web application where I have it mounted using a fuse mount (mount -t glusterfs).
>>
>> Regarding the logs I have attached the mount log file from the client and below are the relevant log entries from the brick log file of all 3 nodes. Let me know if you need any other log files. Also if you know any "log file sanitizer tool" which can replace sensitive file names with random file names in log files that would like to use it as right now I have to do that manually.
>>
>> NODE 1 brick log:
>>
>> [2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
>>
>> NODE 2 brick log:
>>
>> [2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE [No such file or directory]
>>
>> NODE 3 (arbiter) brick log:
>>
>> [2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155, removing it. [Stale file handle]
>>
>> [2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid handle /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13, removing it. [Stale file handle]
>>
>> [2018-05-15 06:54:20.172823] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
>>
>> [2018-05-15 06:54:20.190911] I [MSGID: 115056] [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 AVB_Photovoltaik-Versicherung 2013.pdf), client: nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0, error-xlator: myvol-private-posix [Directory not empty]
>>
>> Best regards,
>>
>> Mabi
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>
>> On May 17, 2018 7:00 AM, Ravishankar N ravishankar at redhat.com wrote:
>>
>>> Hi mabi,
>>>
>>> Some questions:
>>>
>>> -Did you by any chance change the cluster.quorum-type option from the
>>>
>>> default values?
>>>
>>> -Is filename.shareKey supposed to be any empty file? Looks like the file
>>>
>>> was fallocated with the keep-size option but never written to. (On the 2
>>>
>>> data bricks, stat output shows Size =0, but non zero Blocks and yet a
>>>
>>> 'regular empty file').
>>>
>>> -Do you have some sort of a reproducer/ steps that you perform when the
>>>
>>> issue occurs? Please also share the logs from all 3 nodes and the client(s).
>>>
>>> Thanks,
>>>
>>> Ravi
>>>
>>> On 05/15/2018 05:26 PM, mabi wrote:
>>>
>>>> Thank you Ravi for your fast answer. As requested you will find below the "stat" and "getfattr" of one of the files and its parent directory from all three nodes of my cluster.
>>>>
>>>> NODE 1:
>>>>
>>>> File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
>>>>
>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file
>>>>
>>>> Device: 23h/35d Inode: 744413 Links: 2
>>>>
>>>> Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
>>>>
>>>> Access: 2018-05-15 08:54:20.296048887 +0200
>>>>
>>>> Modify: 2018-05-15 08:54:20.296048887 +0200
>>>>
>>>> Change: 2018-05-15 08:54:20.340048505 +0200
>>>>
>>>> Birth: -
>>>>
>>>> File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
>>>>
>>>> Size: 8 Blocks: 74 IO Block: 131072 directory
>>>>
>>>> Device: 23h/35d Inode: 744410 Links: 2
>>>>
>>>> Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
>>>>
>>>> Access: 2018-04-25 09:41:24.276780766 +0200
>>>>
>>>> Modify: 2018-05-15 08:54:20.392048056 +0200
>>>>
>>>> Change: 2018-05-15 08:54:20.392048056 +0200
>>>>
>>>> Birth: -
>>>>
>>>> file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
>>>> ========================================================================================================================
>>>>
>>>> trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
>>>>
>>>> trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
>>>>
>>>> trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
>>>>
>>>> trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
>>>>
>>>> file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
>>>> =======================================================================================================
>>>>
>>>> trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
>>>>
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>
>>>> trusted.glusterfs.quota.dirty=0x3000
>>>>
>>>> trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
>>>>
>>>> trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
>>>>
>>>> NODE 2:
>>>>
>>>> File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
>>>>
>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file
>>>>
>>>> Device: 24h/36d Inode: 5428150 Links: 2
>>>>
>>>> Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
>>>>
>>>> Access: 2018-05-15 08:54:20.294280254 +0200
>>>>
>>>> Modify: 2018-05-15 08:54:20.294280254 +0200
>>>>
>>>> Change: 2018-05-15 08:54:20.338279576 +0200
>>>>
>>>> Birth: -
>>>>
>>>> File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
>>>>
>>>> Size: 8 Blocks: 74 IO Block: 131072 directory
>>>>
>>>> Device: 24h/36d Inode: 5428147 Links: 2
>>>>
>>>> Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
>>>>
>>>> Access: 2018-04-25 09:41:24.276780766 +0200
>>>>
>>>> Modify: 2018-05-15 08:54:20.394278717 +0200
>>>>
>>>> Change: 2018-05-15 08:54:20.394278717 +0200
>>>>
>>>> Birth: -
>>>>
>>>> file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
>>>> ========================================================================================================================
>>>>
>>>> trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
>>>>
>>>> trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
>>>>
>>>> trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
>>>>
>>>> trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
>>>>
>>>> file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
>>>> =======================================================================================================
>>>>
>>>> trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
>>>>
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>
>>>> trusted.glusterfs.quota.dirty=0x3000
>>>>
>>>> trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
>>>>
>>>> trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
>>>>
>>>> NODE 3 (arbiter):
>>>>
>>>> File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
>>>>
>>>> Size: 0 Blocks: 8 IO Block: 4096 regular empty file
>>>>
>>>> Device: ca11h/51729d Inode: 271434295 Links: 2
>>>>
>>>> Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
>>>>
>>>> Access: 2018-04-25 09:41:24.322527555 +0200
>>>>
>>>> Modify: 2018-04-25 09:41:24.322527555 +0200
>>>>
>>>> Change: 2018-05-15 08:54:20.343667380 +0200
>>>>
>>>> Birth: -
>>>>
>>>> File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
>>>>
>>>> Size: 8192 Blocks: 24 IO Block: 4096 directory
>>>>
>>>> Device: ca11h/51729d Inode: 271434288 Links: 2
>>>>
>>>> Access: (0755/drwxr-xr-x) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
>>>>
>>>> Access: 2018-04-25 09:41:24.276780766 +0200
>>>>
>>>> Modify: 2018-05-15 08:54:20.391667997 +0200
>>>>
>>>> Change: 2018-05-15 08:54:20.395668048 +0200
>>>>
>>>> Birth: -
>>>>
>>>> file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
>>>> =================================================================================================================================
>>>>
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>
>>>> trusted.afr.myvolume-private-client-0=0x000000010000000000000000
>>>>
>>>> trusted.afr.myvolume-private-client-1=0x000000010000000000000000
>>>>
>>>> trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
>>>>
>>>> trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
>>>>
>>>> trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
>>>>
>>>> trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
>>>>
>>>> file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
>>>> ================================================================================================================
>>>>
>>>> trusted.afr.myvolume-private-client-0=0x000000000000000000000000
>>>>
>>>> trusted.afr.myvolume-private-client-1=0x000000000000000000000000
>>>>
>>>> trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
>>>>
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>
>>>> trusted.glusterfs.quota.dirty=0x3000
>>>>
>>>> trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000000000000000000000060000000000000001
>>>>
>>>> trusted.glusterfs.quota.size.1=0x000000000000000000000000000000060000000000000001
>>>>
>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>
>>>> On May 15, 2018 10:52 AM, Ravishankar N ravishankar at redhat.com wrote:
>>>>
>>>>> On 05/15/2018 12:38 PM, mabi wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread.
>>>>>>
>>>>>> It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7...
>>>>>>
>>>>>> Could you provide the stat and 'getfattr -d -m . - hex
>>>>> brick/path/to/file' outputs of one of these files and also the
>>>>>
>>>>> corresponding parent directory from all 3 bricks?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Ravi
>>>>>
>>>>>> Thank you very much in advance for your advise.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Mabi
>>>>>>
>>>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>>>
>>>>>> On April 9, 2018 2:31 PM, Ravishankar N ravishankar at redhat.com wrote:
>>>>>>
>>>>>>> On 04/09/2018 05:54 PM, Dmitry Melekhov wrote:
>>>>>>>
>>>>>>>> 09.04.2018 16:18, Ravishankar N пишет:
>>>>>>>>
>>>>>>>>> On 04/09/2018 05:40 PM, mabi wrote:
>>>>>>>>>
>>>>>>>>>> Again thanks that worked and I have now no more unsynched files.
>>>>>>>>>>
>>>>>>>>>> You mentioned that this bug has been fixed in 3.13, would it be
>>>>>>>>>>
>>>>>>>>>> possible to backport it to 3.12? I am asking because 3.13 is not a
>>>>>>>>>>
>>>>>>>>>> long-term release and as such I would not like to have to upgrade to
>>>>>>>>>>
>>>>>>>>>> 3.13.
>>>>>>>>>>
>>>>>>>>>> I don't think there will be another 3.12 release.
>>>>>>>>>>
>>>>>>>>>> Why not? It is LTS, right?
>>>>>>>>>>
>>>>>>>>>> My bad. Just checked  the schedule [1], and you are right. It is LTM.
>>>>>>>>>>
>>>>>>>>>> [1] https://www.gluster.org/release-schedule/
>>>>>>>> Gluster-users mailing list
>>>>>>>>
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
>>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>> Gluster-users mailing list
>>>>>>>>
>>>>>>>> Gluster-users at gluster.org
>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>



More information about the Gluster-users mailing list