[Gluster-users] New 3.12.7 possible split-brain on replica 3

Ravishankar N ravishankar at redhat.com
Thu May 17 05:00:58 UTC 2018


Hi mabi,
Some questions:
-Did you by any chance change the cluster.quorum-type option from the 
default values?
-Is filename.shareKey supposed to be any empty file? Looks like the file 
was fallocated with the keep-size option but never written to. (On the 2 
data bricks, stat output shows Size =0, but non zero Blocks and yet a 
'regular empty file').
-Do you have some sort of a reproducer/ steps that you perform when the 
issue occurs? Please also share the logs from all 3 nodes and the client(s).
Thanks,
Ravi

On 05/15/2018 05:26 PM, mabi wrote:
> Thank you Ravi for your fast answer. As requested you will find below the "stat" and "getfattr" of one of the files and its parent directory from all three nodes of my cluster.
>
> NODE 1:
>
>    File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
>    Size: 0         	Blocks: 38         IO Block: 131072 regular empty file
> Device: 23h/35d	Inode: 744413      Links: 2
> Access: (0644/-rw-r--r--)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
> Access: 2018-05-15 08:54:20.296048887 +0200
> Modify: 2018-05-15 08:54:20.296048887 +0200
> Change: 2018-05-15 08:54:20.340048505 +0200
>   Birth: -
>
>    File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
>    Size: 8         	Blocks: 74         IO Block: 131072 directory
> Device: 23h/35d	Inode: 744410      Links: 2
> Access: (0755/drwxr-xr-x)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
> Access: 2018-04-25 09:41:24.276780766 +0200
> Modify: 2018-05-15 08:54:20.392048056 +0200
> Change: 2018-05-15 08:54:20.392048056 +0200
>   Birth: -
>
> # file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
> trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
> trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
> trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
> trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
>
> # file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
> trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.quota.dirty=0x3000
> trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
> trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
>
>
> NODE 2:
>
>    File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
>    Size: 0         	Blocks: 38         IO Block: 131072 regular empty file
> Device: 24h/36d	Inode: 5428150     Links: 2
> Access: (0644/-rw-r--r--)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
> Access: 2018-05-15 08:54:20.294280254 +0200
> Modify: 2018-05-15 08:54:20.294280254 +0200
> Change: 2018-05-15 08:54:20.338279576 +0200
>   Birth: -
>
>    File: ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
>    Size: 8         	Blocks: 74         IO Block: 131072 directory
> Device: 24h/36d	Inode: 5428147     Links: 2
> Access: (0755/drwxr-xr-x)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
> Access: 2018-04-25 09:41:24.276780766 +0200
> Modify: 2018-05-15 08:54:20.394278717 +0200
> Change: 2018-05-15 08:54:20.394278717 +0200
>   Birth: -
>
> # file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
> trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
> trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
> trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
> trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
>
> # file: data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
> trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.quota.dirty=0x3000
> trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000060000000000000000060000000000000001
> trusted.glusterfs.quota.size.1=0x000000000000060000000000000000060000000000000001
>
>
> NODE 3 (arbiter):
>
>    File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
>    Size: 0         	Blocks: 8          IO Block: 4096   regular empty file
> Device: ca11h/51729d	Inode: 271434295   Links: 2
> Access: (0644/-rw-r--r--)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
> Access: 2018-04-25 09:41:24.322527555 +0200
> Modify: 2018-04-25 09:41:24.322527555 +0200
> Change: 2018-05-15 08:54:20.343667380 +0200
>   Birth: -
>
>    File: /srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
>    Size: 8192      	Blocks: 24         IO Block: 4096   directory
> Device: ca11h/51729d	Inode: 271434288   Links: 2
> Access: (0755/drwxr-xr-x)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
> Access: 2018-04-25 09:41:24.276780766 +0200
> Modify: 2018-05-15 08:54:20.391667997 +0200
> Change: 2018-05-15 08:54:20.395668048 +0200
>   Birth: -
>
> # file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.myvolume-private-client-0=0x000000010000000000000000
> trusted.afr.myvolume-private-client-1=0x000000010000000000000000
> trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
> trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
> trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x00000000000000000000000000000001
> trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x00000001
>
> # file: srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
> trusted.afr.myvolume-private-client-0=0x000000000000000000000000
> trusted.afr.myvolume-private-client-1=0x000000000000000000000000
> trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.quota.dirty=0x3000
> trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x000000000000000000000000000000060000000000000001
> trusted.glusterfs.quota.size.1=0x000000000000000000000000000000060000000000000001
>
>
> ​​
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>
> On May 15, 2018 10:52 AM, Ravishankar N <ravishankar at redhat.com> wrote:
>
>> ​​
>>
>> On 05/15/2018 12:38 PM, mabi wrote:
>>
>>> Dear all,
>>>
>>> I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I still have exactly the same problem as initially posted in this thread.
>>>
>>> It looks like this bug is not resolved as I just got right now 3 unsynched files on my arbiter node like I used to do before upgrading. This problem started since I upgraded to 3.12.7...
>> Could you provide the stat and 'getfattr -d -m . - hex
>>
>> brick/path/to/file' outputs of one of these files and also the
>>
>> corresponding parent directory from all 3 bricks?
>>
>> Thanks,
>>
>> Ravi
>>
>>> Thank you very much in advance for your advise.
>>>
>>> Best regards,
>>>
>>> Mabi
>>>
>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>
>>> On April 9, 2018 2:31 PM, Ravishankar N ravishankar at redhat.com wrote:
>>>
>>>> On 04/09/2018 05:54 PM, Dmitry Melekhov wrote:
>>>>
>>>>> 09.04.2018 16:18, Ravishankar N пишет:
>>>>>
>>>>>> On 04/09/2018 05:40 PM, mabi wrote:
>>>>>>
>>>>>>> Again thanks that worked and I have now no more unsynched files.
>>>>>>>
>>>>>>> You mentioned that this bug has been fixed in 3.13, would it be
>>>>>>>
>>>>>>> possible to backport it to 3.12? I am asking because 3.13 is not a
>>>>>>>
>>>>>>> long-term release and as such I would not like to have to upgrade to
>>>>>>>
>>>>>>> 3.13.
>>>>>>>
>>>>>>> I don't think there will be another 3.12 release.
>>>>>>>
>>>>>>> Why not? It is LTS, right?
>>>>>>>
>>>>>>> My bad. Just checked  the schedule [1], and you are right. It is LTM.
>>>> [1] https://www.gluster.org/release-schedule/
>>>>
>>>>> Gluster-users mailing list
>>>>>
>>>>> Gluster-users at gluster.org
>>>>>
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>>
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>



More information about the Gluster-users mailing list