[Gluster-devel] Buggy writebehind translators !!!

Anand Avati avati at zresearch.com
Sun Jun 24 13:52:55 UTC 2007


Teo,
 can you please send the logs from glusterfs? I'm somehow suspicious that
traces of the old installation have been left behind in your system.
probably manually deleting the old files (rm -rf
${prefix}/{lib,sbin}/glusterfs*) and try with another make install. Also
please send the logs of both client and server running with -LDEBUG.

thanks!
avati

2007/6/24, Constantin Teodorescu <teo at flex.ro>:
>
> Finally , I succeeded in compiling the last patched version of glusterfs.
>
> I succeeded in configuring and compiling the latest patched sources
> fetched with tla thought I had a problem with an older automake on my
> servers, the same archive has been autogen.sh-ed and configured just
> fine on the client computer with CentOS 5.0. So, I picked up the whole
> tree and ran ./configure on the final destination computer server,
> everything went OK.
>
> Started the servers (3) , mounted locally the client, tried again the
> PostgreSQL database on the mounted disk.
>
> No more "Transport endpoint is not connected" errors .. BUT the database
> cannot complete a simple import into a table complaining about some
> "unexpected data beyond EOF in a file block"
>
> COPY animal (id, stare_animal_fk, rasa_fk, sex, data_inregistrare,
> data_nasterii, prima_exploatatie_fk, primul_proprietar_fk, cod_anarz,
> data_trecere_granita, tara_origine_fk, cod_crotalie_non_eu,
> crotalie_mama, observatii, data_upload, versiune) FROM stdin;
> ERROR:  unexpected data beyond EOF in block 2480 of relation "animal"
> HINT:  This has been seen to occur with buggy kernels; consider updating
> your system.
>
> I tried 5 times, exactly the same error ! I suspect some data corruption
> when assembling data file blocks.
>
> The client volume is configured with AFR , READAHEAD and WRITEBEHIND
> translators like this :
>
> volume afr
>   type cluster/afr
>   subvolumes client1 client2 client3
>   option replicate *:3
> end-volume
>
> volume writebehind
>    type performance/write-behind
>    option aggregate-size 131072 # aggregate block size in bytes
>    subvolumes afr
> end-volume
>
> volume readahead
>    type performance/read-ahead
>    option page-size 131072 ### size in bytes
>    option page-count 16 ### page-size x page-count is the amount of
> read-ahead data per file
>    subvolumes writebehind
> end-volume
>
> I suspected that readahead & writebehind might have some problems so I
> commented them, leaving the afr volume alone, non-optimized.
> The operations were obviously more slower but everything went OK, I
> tried multiple reads and updates, everything is OK now.
>
> Then I tried to test just the writebehind translator in order to point
> exactly to the buggy code.
> Introduced again just the writebehind translator , everything seems to
> work, imported the table, done 9 full updates on over 700.000 records,
> when tried to vacuum the table ... dang, another error :
>
> glu=# update animal set observatii='ok1';
> UPDATE 713268
> glu=# update animal set observatii='ok2';
> UPDATE 713268
> .............
> glu=# update animal set observatii='ok8';
> UPDATE 713268
> glu=# update animal set observatii='ok9';
> UPDATE 713268
> glu=# vacuum full analyze;
> ERROR:  could not read block 69998 of relation
> 531069804/531069805/531069806: File descriptor in bad state
> glu=# vacuum full analyze;
> ERROR:  could not read block 69998 of relation
> 531069804/531069805/531069806: File descriptor in bad state
>
> dropped the database, the files, cleaned everything, start it over again
> with a fresh and empty volumes, created the database again, imported the
> table, OK, updated 1 time, vacuum -> ERROR
> glu=# update animal set observatii='ok1';
> UPDATE 713268
> glu=# vacuum full analyze;
> ERROR:  could not read block 478 of relation
> 531783093/531783094/531783095: File descriptor in bad state
>
> Removed the wribehind translator, activate the readahed, done the same
> tests all and over again -> EVERYTHING IS OK !
> So , the write-behind translator should be revised.
> How can I help you in order to pinpoint the bug?
>
> --
> Constantin Teodorescu
> Braila
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>



-- 
Anand V. Avati



More information about the Gluster-devel mailing list