[Gluster-users] Slow open on Server or Client

Jürgen Winkler juergen.winkler at xidras.com
Thu Sep 16 16:05:07 UTC 2010


  Hi,

I found a reason, what happens, when our webserver needs a long time ( 
more than 1 second) to deliver some picture via glusterfs:
Here is what happens at the servers:

gfs-server 1:

A request was sent from the glusterfs-client at 11:09:31.XXXXX


4583 11:09:36.299385 <... futex resumed> ) = 0 <7.740114>
4570 11:09:36.299404 epoll_wait(3, <unfinished ...>
4583 11:09:36.299416 
lstat("/gluster-storage/macm01//videos02/6839/preview/490ab55407a36.jpg", {st_mode=S_IFREG|0777, 
st_size=5116, ...}) = 0 <0.000155>
4583 11:09:36.299628 
lgetxattr("/gluster-storage/macm01//videos02/6839/preview/490ab55407a36.jpg", 
"trusted.posix.gen", "Lj3\xe7\x00q\xeb\xe1", 8) = 8 <0.0
00009>
4583 11:09:36.299680 
open("/gluster-storage/macm01//videos02/6839/preview/490ab55407a36.jpg", 
O_RDONLY) = 56 <0.000070>
4583 11:09:36.299789 read(56, 
"\377\330\377\340\0\20JFIF\0\1\1\0\0\1\0\1\0\0\377\376\0>CREATOR:"..., 
5116) = 5116 <0.016504>
4583 11:09:36.316340 close(56) = 0 <0.000006>
4583 11:09:36.316371 
lgetxattr("/gluster-storage/macm01//videos02/6839/preview/490ab55407a36.jpg", 
"trusted.afr.distribute1", 0x0, 0) = 12 <0.000008>
4583 11:09:36.316409 
lgetxattr("/gluster-storage/macm01//videos02/6839/preview/490ab55407a36.jpg", 
"trusted.afr.distribute1", "\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00", 12) = 12 <0.000007>
4583 11:09:36.316447 
lgetxattr("/gluster-storage/macm01//videos02/6839/preview/490ab55407a36.jpg", 
"trusted.afr.distribute2", 0x0, 0) = 12 <0.000035>
4583 11:09:36.316528 
lgetxattr("/gluster-storage/macm01//videos02/6839/preview/490ab55407a36.jpg", 
"trusted.afr.distribute2", "\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00", 12) = 12 <0.000142>
4583 11:09:36.316712 
lgetxattr("/gluster-storage/macm01//videos02/6839/preview/490ab55407a36.jpg", 
"glusterfs.inodelk-count", 0x0, 0) = -1 EOPNOTSUPP (Oper
ation not supported) <0.000008>
4583 11:09:36.316752 
lgetxattr("/gluster-storage/macm01//videos02/6839/preview/490ab55407a36.jpg", 
"glusterfs.entrylk-count", 0x0, 0) = -1 EOPNOTSUPP (Oper
ation not supported) <0.000006>
4583 11:09:36.316787 
lgetxattr("/gluster-storage/macm01//videos02/6839/preview/490ab55407a36.jpg", 
"trusted.glusterfs.dht", 0x0, 0) = -1 ENODATA (No data a
vailable) <0.000033>
4583 11:09:36.316850 
lgetxattr("/gluster-storage/macm01//videos02/6839/preview/490ab55407a36.jpg", 
"trusted.glusterfs.dht.linkto", 0x0, 0) = -1 ENODATA (No
data available) <0.000028>



What could it be, that locks down the futex for 7.7 seconds? Why is no 
other thread handling this situation?

gfs-server 2:

According to the straces the same share on the second server does 
nothing for seconds. Why?

21089 11:09:28.582044 readv(16, [{":O\0\0\0\2\"\0\0\0\0*", 12}], 1) = 12 
<0.000011>
21089 11:09:28.583087 readv(16, <unfinished ...>
21090 11:09:33.739852 <... nanosleep resumed> NULL) = 0 <5.228398>
21089 11:09:33.739884 <... readv resumed> 
[{"\0\0\0\0\3\361s\261\0\0\0\1\0\0\0\33\0\0\1\266\0\0@\226\0\0\377\376\0\0\377\375"..., 
546}], 1) = 546 <0.000067>
21090 11:09:33.739956 nanosleep({1, 0}, <unfinished ...>
21089 11:09:33.739975 futex(0x615188, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000008>
21103 11:09:33.740006 <... futex resumed> ) = 0 <5.251553>
21089 11:09:33.740019 epoll_wait(3, {{EPOLLIN, {u32=16, 
u64=38654705680}}, {EPOLLIN, {u32=19, u64=51539607571}}, {EPOLLIN, 
{u32=27, u64=81604378651}}, {EPOL
LIN, {u32=24, u64=73014444056}}, {EPOLLIN, {u32=13, u64=103079215117}}}, 
257, 4294967295) = 5 <0.000007>
21103 11:09:33.740057 
lstat("/gluster-storage/macm01//videos08/content/galleries/20634/deleted/ur3efhj5fnvs5g8n4ln1bkdms2_14_1232038384", 
<unfinished ...>
21089 11:09:33.740081 readv(16, [{":O\0\0\0\0l\0\0\0\0*", 12}], 1) = 12 
<0.000006>
21103 11:09:33.740112 <... lstat resumed> {st_mode=S_IFDIR|0755, 
st_size=4096, ...}) = 0 <0.000035>
21089 11:09:33.740140 readv(16, 
[{"\0\0\0\0\3\361s\262\0\0\0\2\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
108}], 1) = 108 <0.000007>
21103 11:09:33.740184 
lgetxattr("/gluster-storage/macm01//videos08/content/galleries/20634/deleted/ur3efhj5fnvs5g8n4ln1bkdms2_14_1232038384", 
"trusted.posix
.gen" <unfinished ...>
21089 11:09:33.740210 writev(16, [{":O\0\0\0\0l\0\0\0\0*", 12}, 
{"\0\0\0\0\3\361s\262\0\0\0\5\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
108}], 2 <unfinis
hed ...>
21103 11:09:33.740254 <... lgetxattr resumed> , 
"LR\xdc\xdd\x00\xa4\x91\x80", 8) = 8 <0.000047>
21089 11:09:33.740273 <... writev resumed> ) = 120 <0.000042>
21103 11:09:33.740292 
lgetxattr("/gluster-storage/macm01//videos08/content/galleries/20634/deleted/ur3efhj5fnvs5g8n4ln1bkdms2_14_1232038384", 
"glusterfs.con
tent" <unfinished ...>
21089 11:09:33.740318 readv(19, [{":O\0\0\0\0\204\0\0\0\0*", 12}], 1) = 
12 <0.000008>
21103 11:09:33.740351 <... lgetxattr resumed> , 0x0, 0) = -1 EOPNOTSUPP 
(Operation not supported) <0.000036>
21089 11:09:33.740368 readv(19, <unfinished ...>
21103 11:09:33.740384 
lgetxattr("/gluster-storage/macm01//videos08/content/galleries/20634/deleted/ur3efhj5fnvs5g8n4ln1bkdms2_14_1232038384", 
"trusted.afr.d
istribute1" <unfinished ...>
21090 11:09:35.502257 <... nanosleep resumed> NULL) = 0 <1.762285>




Later at 11:09:44 when another request shows up for the same file, the 
servers lookup the file (lstat and lgetxattr) and everything works 
smoothly.
But at the first attempt, the server behave very differently!


and her is what client say´s :

37512 11:09:31.027420 
open("/SAN/contentmanager/content_videos02/6839/preview/490ab55407a36.jpg", 
O_RDONLY|O_NONBLOCK) = 78 <5.290506>



Thx





More information about the Gluster-users mailing list