[Gluster-users] Recovering a broken distributed volume

Wed Jul 11 14:07:42 UTC 2012

On Wed, Jul 11, 2012 at 11:27:58AM +0100, Brian Candler wrote:
> (1) The "single1" volume is empty, which I expected since it's a brand new
> empty directory, but I cannot create files in it.
> 
> root at dev-storage1:~# touch /gluster/single1/test
> touch: cannot touch `/gluster/single1/test': Read-only file system

Sorry, this was my problem: it turns out a few more drives failed, and the
underlying brick filesystem went read-only.  Unbelievably that's 7 seagate
drives failed out of an array of 12!

Anyway, rebuilding the array with the remaining 5 working disks, the single
volume came up fine. Also the distributed volume healed itself after I
did 'ls' a few times on it.

root at dev-storage1:~# ls /gluster/fast
...
root at dev-storage1:~# ls /gluster/fast
images  iso
root at dev-storage1:~# ls /gluster/fast/images/
root at dev-storage1:~# ls /gluster/fast/iso
linuxmint-11-gnome-dvd-64bit.iso
root at dev-storage1:~# ls /gluster/fast/images/
lucidtest
root at dev-storage1:~# ls /gluster/fast/images/lucidtest/
tmpaJqTD9.qcow2

I can only see one other strange thing: the newly-created replica appears to
have made a sparse copy of a file which wasn't sparse on the original.

On the original working side of the replicated volume:

root at dev-storage2:~# ls -l /disk/storage2/safe/images/lucidtest/
total 756108
-rw-r--r-- 2 root root 774307840 Jul 11 14:55 tmpaJqTD9.qcow2
root at dev-storage2:~# du -k /disk/storage2/safe/images/lucidtest/
756116	/disk/storage2/safe/images/lucidtest/

On the newly-created side, which glustershd rebuilt automatically:

root at dev-storage1:~# ls -l /disk/storage1/safe/images/lucidtest/
total 422728
-rw-r--r-- 2 root root 774307840 Jul 11 14:55 tmpaJqTD9.qcow2
root at dev-storage1:~# du -k /disk/storage1/safe/images/lucidtest/
422736	/disk/storage1/safe/images/lucidtest/

Is this intentional?  Does glustershd notice runs of zeros and create a
sparse file on the target?

(This may or may not be desirable, e.g.  for performance you might want to
fully preallocate a VM image)

Regards,

Brian.