[Gluster-devel] a bug when read files in a symbol-link directory

Vijay Bellur vijay at gluster.com
Mon Sep 7 05:10:26 UTC 2009


Hi He,

Can you please re-create the problem with -L DEBUG and post both the 
client and server side logs?

Thanks,
Vijay


He Xiaobin wrote:
>
>  I use glusterfs in a cluster system (configured as: 
> dht->afr->client->server->iothreads->locks->posix), after days 
> running, it is stable, but with a poor porformance (slower thann NFS 
> exported from only one server), and most important is that a bug came 
> to me these days. This is really an emergency, so I need your help!
>
> What is the BUG? In this system, I use mvapich+blcr for task 
> checkpoint and restore. I don't know how mvapich works, but I am sure 
> it used glusterfs in my case. When using glusterfs in checkpointing a 
> task, it created one ckpt file for each proccess of the task, all the 
> ckpt files placed in directory called 1, and it will create a symbol 
> link called 0 pointing to directory 1. There is example, fortest is 
> username, .ckpt is the ckpt file directory for this user, 1972 is the 
> task id, 0 is the symbol link and bt.C.64-19.ckpt is a ckpt file the 
> task's 19th proccess
> [fortest at gfsclient02 1972]$ pwd
> /mnt/glusterfs/.ckpt/1972
> [fortest at gfsclient02 1972]$ ll
> total 132
> lrwxrwxrwx 1 fortest fortest    31 Sep  4 17:09 0 -> 
> /mnt/glusterfs/fortest/.ckpt/1972/1
> drwx------ 2 fortest fortest 65536 Sep  4 20:06 1
> [fortest at gfsclient02 1972]$ ls 1/
> bt.C.64-0.ckpt   bt.C.64-21.ckpt  bt.C.64-33.ckpt  bt.C.64-45.ckpt  
> bt.C.64-57.ckpt
> bt.C.64-10.ckpt  bt.C.64-22.ckpt  bt.C.64-34.ckpt  bt.C.64-46.ckpt  
> bt.C.64-58.ckpt
> bt.C.64-11.ckpt  bt.C.64-23.ckpt  bt.C.64-35.ckpt  bt.C.64-47.ckpt  
> bt.C.64-59.ckpt
> bt.C.64-12.ckpt  bt.C.64-24.ckpt  bt.C.64-36.ckpt  bt.C.64-48.ckpt  
> bt.C.64-5.ckpt
> bt.C.64-13.ckpt  bt.C.64-25.ckpt  bt.C.64-37.ckpt  bt.C.64-49.ckpt  
> bt.C.64-60.ckpt
> bt.C.64-14.ckpt  bt.C.64-26.ckpt  bt.C.64-38.ckpt  bt.C.64-4.ckpt   
> bt.C.64-61.ckpt
> bt.C.64-15.ckpt  bt.C.64-27.ckpt  bt.C.64-39.ckpt  bt.C.64-50.ckpt  
> bt.C.64-62.ckpt
> bt.C.64-16.ckpt  bt.C.64-28.ckpt  bt.C.64-3.ckpt   bt.C.64-51.ckpt  
> bt.C.64-63.ckpt
> bt.C.64-17.ckpt  bt.C.64-29.ckpt  bt.C.64-40.ckpt  bt.C.64-52.ckpt  
> bt.C.64-6.ckpt
> bt.C.64-18.ckpt  bt.C.64-2.ckpt   bt.C.64-41.ckpt  bt.C.64-53.ckpt  
> bt.C.64-7.ckpt
> bt.C.64-19.ckpt  bt.C.64-30.ckpt  bt.C.64-42.ckpt  bt.C.64-54.ckpt  
> bt.C.64-8.ckpt
> bt.C.64-1.ckpt   bt.C.64-31.ckpt  bt.C.64-43.ckpt  bt.C.64-55.ckpt  
> bt.C.64-9.ckpt
> bt.C.64-20.ckpt  bt.C.64-32.ckpt  bt.C.64-44.ckpt  bt.C.64-56.ckpt
>   
> When the task need to be restored, mvapich will read the ckpt file 
> from 0 (the symbol link) and restore the task! All this perform 
> smoothly in NFS, but in glusterfs it will output following messages. 
> However sometimes task restoring can finish at last, while others 
> can't almost with the same messages. I have verifed the missing files 
> mvapich outputed was indeed there. Another useful tips is that fewer 
> gluster client doing the task, few times it would be came to this bug 
> when task restoring. And startup glusterfs without direct-io could not 
> help too. 
>
> OUTPUT OF THE TASK WHEN RESTORE:
>
> 19: Restart: path /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-19.ckpt: 
> No such file or directory20: Restart: path 
> /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-20.ckpt: No such file or 
> directorysrun: error: gfsclient10: task[19-20]: Exited with exit code 1
> 21: Restart: path /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-21.ckpt: 
> No such file or directory18: Restart: path 
> /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-18.ckpt: No such file or 
> directorysrun: error: gfsclient10: task21: Exited with exit code 1
> srun: error: cn010: task18: Exited with exit code 1
> 17: Restart: path /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-17.ckpt: 
> No such file or directorysrun: error: gfsclient10: task17: Exited with 
> exit code 1
> 23: Restart: path /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-23.ckpt: 
> No such file or directory22: Restart: path 
> /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-22.ckpt: No such file or 
> directorysrun: error: gfsclient10: task23: Exited with exit code 1
> srun: error: cn010: task[16,22]: Exited with exit code 1
> 16: Restart: path /mnt/glusterfs/fortest/.ckpt/1972/0/bt.C.64-16.ckpt: 
> No such file or directory
>
>
> I use "debug/trace" and start the gluster with "-L DEBUG", and got the 
> following logs when the ckpt can't to be found:
>
> [2009-09-04 17:12:35] N [trace.c:1290:trace_readlink] tr0: 174536: 
> (loc {path=/fortest/.ckp
> t/1972/0, ino=1380450540}, size=4096)
> [2009-09-04 17:12:35] N [trace.c:484:trace_readlink_cbk] tr0: 174536: 
> (op_ret=31, op_errno=
> 0, buf=/mnt/glusterfs/fortest/.ckpt/1972/1)
> [2009-09-04 17:12:35] E [fuse-bridge.c:987:fuse_readlink_cbk] 
> glusterfs-fuse: 174536: /fortest/
> .ckpt/1972/0 => /mnt/glusterfs/fortest/.ckpt/1972/1 @ 1252055555
> [2009-09-04 17:12:35] N [trace.c:1245:trace_lookup] tr0: 174537: (loc 
> {path=/fortest/.ckpt/
> 1972/1, ino=0})
> [2009-09-04 17:12:35] N [trace.c:513:trace_lookup_cbk] tr0: 174508: 
> (op_ret=0, ino=0, *buf
> {st_dev=2065, st_ino=7068450884, st_mode=40700, st_nlink=2, 
> st_uid=1001, st_gid=1001, st_rd
> ev=0, st_size=65536, st_blksize=4096, st_blocks=256})
> [2009-09-04 17:12:35] E [fuse-bridge.c:255:fuse_loc_fill] 
> glusterfs-fuse: inode_path failed for
>  8003256399/bt.C.64-22.ckpt @ 1252055555
> [2009-09-04 17:12:35] W [fuse-bridge.c:436:fuse_lookup] 
> glusterfs-fuse: 174539: LOOKUP 80032563
> 99/bt.C.64-22.ckpt (fuse_loc_fill() failed)
> [2009-09-04 17:12:35] N [trace.c:513:trace_lookup_cbk] tr0: 174522: 
> (op_ret=0, ino=0, *buf
> {st_dev=2065, st_ino=7068450884, st_mode=40700, st_nlink=2, 
> st_uid=1001, st_gid=1001, st_rd
> ev=0, st_size=65536, st_blksize=4096, st_blocks=256})
> [2009-09-04 17:12:35] E [fuse-bridge.c:255:fuse_loc_fill] 
> glusterfs-fuse: inode_path failed for
>  8003256399/bt.C.64-16.ckpt @ 1252055555
>  
>  
>  
> ------------------------------------------------------------------------
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>   






More information about the Gluster-devel mailing list