[Gluster-users] RAID-1 over network scenario - incredible problems

Ondrej Jombik nepto at platon.sk
Wed Apr 1 04:21:31 UTC 2009

I'm trying to configure GlusterFS setup with two replicating servers.
For now just without any client. Worked well so far, however after
I rebooted the second server I started to have difficult times...
(note: first server remains unrebooted)

1. are all changes made on non-rebooted server during the second server
    reboot lost? they are not replicated after rebooted server is online
    again... is there OFFICIAL way how to acheive this? Does it have some
    binary log of non-performed write operations?

2. on rebooted server I tried to configure glusterfs to get missing
    files, here is my basic configuration (afr part only):

     volume afr
       type cluster/afr
       subvolumes local remote

4. this shows on the second server only files which were there before
    reboot; files created during the reboot are not there, but they still
    remain on the non-rebooted server

5. option read-subvolume remote
    This actually does nothing. Is this implemented?
    I expecting to read all the data from the remote volume.

6. option favorite-child remote
    This does nothing as well, but at least print some warning into the
    log files. However what is written in the warning actually does not
    happen. I tried to access all files on remote/local
    device/mountpoint (4 ways), no change at all.

7. if I define "subvolumes remote" (so kicking local from subvolumes)
    than I finally get the right file contents, but only at mountpoint,
    not in actual device; I need to get files into the actual device
    (local disk) of rebooted server

8. and finally I deleted all the files from device of rebooted server
    and I was hopping for the replication to do the rest; and viola,
    I have them replicated, so all files created during reboot are there,
    but they are all filed with zeros!
    (and no this is not that known XFS bug, it is actually on EXT3)

I know this all is pretty incredible and looks like a horror story, but
I have read tons of documentation and still I'm not able to figure that
out. I wish that it is problem between keyboard and chair and not in the
software itself.

I'm only trying to have RAID-1 over network with automatic recovery
after reboot/outage. Is this that complicated??

(I need to metion that I did not started with clients yet, there I'm
expecting even bigger troubles like this)

I will much appreciate any kind of help (even confirming me this
behaviour will help me a lot)

Thank you


