[Gluster-devel] error while reading from an open file

Brian Hirt bhirt at mobygames.com
Wed Sep 2 02:25:45 UTC 2009


Vijay,

I haven't heard back from anyone yet. I have some more information  
about one of the problems.

I have a program that write()'s to a file, keeping the file open.   
While this program is writing, restart the nodes one by one.  After  
the nodes have been restarted no new data is written to the file.   
However, the program doing the write() still gets the correct num  
bytes returned by the system call and behaves as if everything is  
working when it clearly isn't.

Meanwhile, if I tail this same file on another client while I reboot  
the nodes, I eventually get "tail: /gluster/m/test: File descriptor in  
bad state"

At some point gluster realizes it can't deal with this file and  
reports back file descriptor in bad state to the reader, but continues  
to happily report success to the program doing the writes.

The first part of this problem (open files not surviving gluster  
restarts) seems like a pretty major design flaw that needs to be  
fixed.   The second part (gluster not reporting the error to the  
writer when gluster chokes) is a critical problem that needs to be  
fixed.    However, it seems that there isn't much interest in fixing  
these types of things.  I've spent some time reading back in the mail  
archives and there seems to be a pattern of instability and silence on  
the part of the developers.   This really isn't the way to make your  
project a success and get advocates of your software.

I want to help identify issues and provide information to help get  
things fixed, but I feel like i'm talking to deaf ears.

Please advice on how I can help on these issues.

--brian

On Aug 31, 2009, at 12:58 PM, Brian Hirt wrote:

> Vijay,
>
> Yes, I am using the same distributed-replicate scenario.
>
> The file in the export directory does contains the correct  
> information, but somewhere along the line something being  
> communicated to the operating system by gluster must be wrong.  I  
> say this because the client trying to read from an open file is not  
> getting the proper data returned from the system calls which seems  
> to point to a bug in glusterfs.
>
> I've also run into something the might be related but seems much  
> more serious.  A program writing to a glusterfs file will fail when  
> you restart You can recreate the problem by:
>
> 	1) have a program open a file on a glusterfs, write data to a file  
> periodically
> 	2) while the file is being written to, one by one restart all the  
> gluster servers, waiting for the previous server to come back online
>
> At all points in time, three of the four gluster servers are up and  
> running, however the program trying to write data to the file  
> fails.  This is a huge issue for any program that keeps a file open  
> for writing for more than a second or two.
>
> As for the temporary files created by rsync, I'm willing to believe  
> they are benign in this particular situation.  However, something  
> seems wrong the idea that gluster would expect to have a file, try  
> to lstat it only to find it's not there.   Shouldn't gluster know  
> where the files it maintains are?  It really feels like a race  
> condition that will be triggered in other situations where it's not  
> so benign.
>
> Thanks for any help you can provide.
>
> --brian
>
> On Aug 30, 2009, at 10:05 AM, Vijay Bellur wrote:
>
>> Brian Hirt wrote:
>>>
>>> I'm running into some problems where one process is writing a log  
>>> file to a and another is reading from it.  The process reading the  
>>> file is not behaving as expected.
>> I am assuming you are using the distributed-replicate scenario that  
>> you mentioned in the previous mail. Can you please confirm if the  
>> file in the export
>> directory contains data that  you did not intend to create?
>>
>>> I'm also continuing to get hundreds of the errors I mentioned in  
>>> that message with rsync.
>>>
>>> [2009-08-28 10:21:20] E [posix.c:1155:posix_chmod] posix: lstat  
>>> on /gluster/exports/redacted/.1218486082-01.jpg.nkOkw9 failed: No  
>>> such file or directory
>>
>> These are usually to do with temporary files created during a  
>> rsync. These error messages would be benign in nature unless you  
>> notice a discrepancy between the original and rsync'd directories.
>>
>> Regards,
>> Vijay
>>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>






More information about the Gluster-devel mailing list