[Gluster-devel] glusterfsd crash with bizarre (dangerous?) results...
Anand Avati
avati at zresearch.com
Fri Apr 4 10:37:15 UTC 2008
Daniel,
what is the tla revision of your software? (seen from glusterfs --version)
avati
2008/4/4, Daniel Maher <dma+gluster at witbe.net <dma%2Bgluster at witbe.net>>:
>
>
> Hi all,
>
> While running a series of FFSB tests against my newly-created Gluster
> cluster, i caused glusterfsd to crash on one of the two storage nodes.
> The relevant lines from the log file are pastebin'd :
> http://pastebin.ca/970831
>
>
> Even more troubling, is that when i restarted glusterfsd, the node
> did /not/ self-heal :
>
> The mointpoint on the client :
> [dfsA]# du -s /opt/gfs-mount/
> 2685304 /opt/gfs-mount/
>
> The DS on the node which did not fail :
> [dfsC]# du -s /opt/gfs-ds/
> 2685328 /opt/gfs-ds/
>
> The DS on the node which failed, ~ 5 minutes after restarting
> glusterfsd :
> [dfsD]# du -s /opt/gfs-ds/
> 27092 /opt/gfs-ds/
>
>
> Even MORE troubling, i restarted glusterfsd on the node which did not
> fail, to see if that would help - and it created even more bizarre
> results :
>
> The mountpoint on the client :
> [dfsA]# du -s /opt/gfs-mount/
> 17520 /opt/gfs-mount/
>
> The DS on the node which did not fail :
> [dfsC]# du -s /opt/gfs-ds/
> 2685328 /opt/gfs-ds/
>
> The DS on the node which failed :
> [dfsD]# du -s /opt/gfs-ds/
> 27092 /opt/gfs-ds/
>
>
> A simple visual inspection of the files and directories shows that
> the files and directories are clearly different between the client and
> between the two nodes. For example :
>
> (Client)
> [dfsA]# ls fillfile*
> fillfile0 fillfile11 fillfile14 fillfile2 fillfile5 fillfile8
> fillfile1 fillfile12 fillfile15 fillfile3 fillfile6 fillfile9
> fillfile10 fillfile13 fillfile16 fillfile4 fillfile7
> [dfsA]# ls -l fillfile?
> -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile0
> -rwx------ 1 root root 131072 2008-04-04 09:42 fillfile1
> -rwx------ 1 root root 131072 2008-04-04 09:42 fillfile2
> -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile3
> -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile4
> -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile5
> -rwx------ 1 root root 0 2008-04-04 09:42 fillfile6
> -rwx------ 1 root root 0 2008-04-04 09:42 fillfile7
> -rwx------ 1 root root 196608 2008-04-04 09:42 fillfile8
> -rwx------ 1 root root 0 2008-04-04 09:42 fillfile9
>
> (Node that didn't fail)
> [dfsC]# ls fillfile*
> fillfile0 fillfile13 fillfile18 fillfile22 fillfile4 fillfile9
> fillfile1 fillfile14 fillfile19 fillfile23 fillfile5
> fillfile10 fillfile15 fillfile2 fillfile24 fillfile6
> fillfile11 fillfile16 fillfile20 fillfile25 fillfile7
> fillfile12 fillfile17 fillfile21 fillfile3 fillfile8
> [dfsC]# ls -l fillfile?
> -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile0
> -rwx------ 1 root root 131072 2008-04-04 09:42 fillfile1
> -rwx------ 1 root root 131072 2008-04-04 09:42 fillfile2
> -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile3
> -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile4
> -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile5
> -rwx------ 1 root root 0 2008-04-04 09:42 fillfile6
> -rwx------ 1 root root 0 2008-04-04 09:42 fillfile7
> -rwx------ 1 root root 196608 2008-04-04 09:42 fillfile8
> -rwx------ 1 root root 0 2008-04-04 09:42 fillfile9
>
> (Node that failed)
> [dfsD]# ls fillfile*
> fillfile0 fillfile11 fillfile14 fillfile2 fillfile5 fillfile8
> fillfile1 fillfile12 fillfile15 fillfile3 fillfile6 fillfile9
> fillfile10 fillfile13 fillfile16 fillfile4 fillfile7
> [dfsD]# ls -l fillfile?
> -rwx------ 1 root root 65536 2008-04-04 09:08 fillfile0
> -rwx------ 1 root root 131072 2008-04-04 09:08 fillfile1
> -rwx------ 1 root root 4160139 2008-04-04 09:08 fillfile2
> -rwx------ 1 root root 327680 2008-04-04 09:08 fillfile3
> -rwx------ 1 root root 262144 2008-04-04 09:08 fillfile4
> -rwx------ 1 root root 65536 2008-04-04 09:08 fillfile5
> -rwx------ 1 root root 1196446 2008-04-04 09:08 fillfile6
> -rwx------ 1 root root 131072 2008-04-04 09:08 fillfile7
> -rwx------ 1 root root 3634506 2008-04-04 09:08 fillfile8
> -rwx------ 1 root root 131072 2008-04-04 09:08 fillfile9
>
>
> What the heck is going on here ? Three wildly different results -
> that's really not a good thing. These results seem "permanent" as well
> - after waiting a good 10 minutes (and executing the same du command a
> few more times), the results are the same...
>
>
> Finally, i edited "fillfile6" (0 bytes on dfsA and dfsC, 1196446
> bytes on dfsD) via the mountpoint on dfsA, and the changes were
> immediately reflected on the storage nodes. Clearly the AFR translator
> is operational /now/, but the enormous discrepancy is not a good thing,
> to say the least.
>
>
>
>
> --
> Daniel Maher <dma AT witbe.net>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
--
If I traveled to the end of the rainbow
As Dame Fortune did intend,
Murphy would be there to tell me
The pot's at the other end.
More information about the Gluster-devel
mailing list