[Gluster-users] GlusterFS Split Brain issue

Ankur Pandey ankur.p at ambab.com
Wed Sep 23 05:26:10 UTC 2015


HI Krutika,

Thanks for the reply. However I am afraid that its not too late for us. I
already replaced GlusterFS server and copied my data on the new bricks. Now
Again its working flawlessly like it was working before. However, I still
have old server and snapshots I'd try to implement your solution and I'll
let you know about it.



Regards
Ankur Pandey
+91 9702 831 855

On Tue, Sep 22, 2015 at 4:54 PM, Krutika Dhananjay <kdhananj at redhat.com>
wrote:

> Hi Ankur,
>
> It looks like some of the files/directories are in gfid split-brain.
> From the logs that you attached, here is the list of gfids of directories
> in gfid split-brain, based on the message id for gfid split-brain log
> message (108008):
>
> [kdhananjay at dhcp35-215 logs]$ grep -iarT '108008' * | awk '{print $13}' |
> cut -f1 -d'/' | sort | uniq
> <16d8005d-3ae2-4c72-9097-2aedd458b5e0
> <3539c175-d694-409d-949f-f9a3e18df17b
> <3fd13508-b29e-4d52-8c9c-14ccd2f24b9f
> <6b1e5a5a-bb65-46c1-a7c3-0526847beece
> <971b5249-92fb-4166-b1a0-33b7efcc39a8
> <b582f326-c8ee-4b04-aba0-d37cb0a6f89a
> <cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3
>
> There are 7 such directories.
>
> Also, there are 457 entries in gfid split-brain:
> [kdhananjay at dhcp35-215 logs]$ grep -iarT '108008' glustershd.log | awk
> '{print $13}' | sort | uniq | wc -l
> 457
>
> You will need to do the following to get things back to the normal state:
>
> 1) For each gfid in the list of the 7 directories in split-brain, get the
> list of files in split-brain.
> For example, for <16d8005d-3ae2-4c72-9097-2aedd458b5e0 , the command would
> be `grep -iarT '108008' * | grep 16d8005d-3ae2-4c72-9097-2aedd458b5e0`
> You will need to omit the repeating messages of course.
> You would get messages of the following kind:
> glustershd.log    :[2015-09-10 01:44:05.512589] E [MSGID: 108008]
> [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch]
> 0-repl-vol-replicate-0: Gfid mismatch detected for
> <16d8005d-3ae2-4c72-9097-2aedd458b5e0/100000075944.jpg>,
> d9f15b28-9c9c-4f31-ba3c-543a5331cb9d on repl-vol-client-1 and
> 583295f0-1ec4-4783-9b35-1e18b8b4f92c on repl-vol-client-0. Skipping
> conservative merge on the file.
>
> 2) Examine the two copies (one per replica) of each such file, choose one
> copy and delete the copy from the other replica.
> In the above example, the parent is 16d8005d-3ae2-4c72-9097-2aedd458b5e0
> and the entry is '100000075944.jpg'.
> So you can examine the two different copies at
> <brick-path>/.glusterfs/16/d8/16d8005d-3ae2-4c72-9097-2aedd458b5e0/100000075944.jpg
> to decide which one you want to keep.
> Once you have decided on the copy you choose to keep, you need to delete
> the bad copy and its hard link. This is assuming all of the entries in gfid
> split-brain are regular files. At least that is what I gathered from the
> logs since they were all .jpg files.
> You can get the absolute path of the entry by noting down inode number of
> the gfid link on the bad brick and then grepping for the corresponding
> number under the same brick.
> In this example, the gfid link would be
> <bad-brick-path>/.glusterfs/16/d8/16d8005d-3ae2-4c72-9097-2aedd458b5e0/100000075944.jpg.
> So you would need to get its inode number (by doing stat on it) and do a
> 'find <bad-brick-path> -inum <inodenumber of gfid link> to get its absolute
> path.
> Once you have both, unlink them both. If hard links exist, delete them as
> well on the bad brick.
>
> There are about 457 files where you need to repeat this exercise.
>
> Once you are done, you could execute 'gluster volume heal <VOL>'. This
> would take care of healing the good copies to the bricks where the file was
> deleted from.
> After the heal is complete, heal info split-brain should not be showing
> any entries.
>
> As for the performance problem, it is possible that it was due to
> self-heal daemon periodically trying to heal the files in gfid-split-brain
> in vain, and should most likely go away once the split-brain is resolved.
>
> As an aside, it is not clear why so many files ran into gfid split-brain.
> You might want to check if the network link between the clients and the
> servers was fine.
>
> Hope that helps. Let me know if you need more clarification.
> -Krutika
> ------------------------------
>
> *From: *"Ankur Pandey" <ankur.p at ambab.com>
> *To: *gluster-users at gluster.org, pkarampu at redhat.com
> *Cc: *"Dhaval Kamani" <dhaval at ambab.com>
> *Sent: *Saturday, September 12, 2015 12:51:31 PM
> *Subject: *[Gluster-users] GlusterFS Split Brain issue
>
> HI Team GlusterFS,
>
> With reference to Question on server fault.
>
> http://serverfault.com/questions/721067/glusterfs-split-brain-issue
>
> On request of Pranith I am sending you logs. Please tell me if you need
> anything else.
>
> Attaching logs for 2 master servers.
>
> Regards
> Ankur Pandey
> +91 9702 831 855
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150923/79f0ff7e/attachment.html>


More information about the Gluster-users mailing list