[Gluster-devel] cp taking 100% cpu and never terminating

Mon May 12 11:32:20 UTC 2008

Heh yes sorry on the server side I'm seeing errors like:
2008-05-11 17:02:22 E [posix.c:1982:posix_setdents] system-ns: Error 
creating file /mnt/gluster/system-ns/scripts/drbl/drblupdateusr.sh with 
mode (0100755)
2008-05-11 17:02:22 E [posix.c:1982:posix_setdents] system-ns: Error 
creating file /mnt/gluster/system-ns/scripts/drbl/drblrebu.swp with mode 
(0100644)
2008-05-11 17:02:22 E [posix.c:1982:posix_setdents] system-ns: Error 
creating file /mnt/gluster/system-ns/scripts/drbl/getexefiles.sh with 
mode (0100755)
2008-05-11 17:39:33 E [posix.c:1990:posix_setdents] system-ns: error 
creating symlink 
/mnt/gluster/system-ns/usr/lib64/perl5/5.8.2/x86_64-linux-thread-multi/CORE/libperl.so
2008-05-11 17:39:44 E [posix.c:1990:posix_setdents] system-ns: error 
creating symlink 
/mnt/gluster/system-ns/usr/lib64/perl5/5.8.1/x86_64-linux-thread-multi/CORE/libperl.so
2008-05-11 18:48:32 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (192.168.1.204:1013)
2008-05-11 18:48:32 E [protocol.c:271:gf_block_unserialize_transport] 
server: EOF from peer (192.168.1.204:1015)
.
The times don't correspond to the errors on the client. This is from the 
storage brick "system1" mentioned in the client logs below.

Thanks!
-Mickey Mazarick

Raghavendra G wrote:
> Hi Mickey,
> Is it possible to provide server side logs?
>
> regards,
>
> On Mon, May 12, 2008 at 1:43 AM, Mickey Mazarick 
> <mic at digitaltadpole.com <mailto:mic at digitaltadpole.com>> wrote:
>
>     Something odd is happening when I run a shell script with cp
>     commands in it. This happens infrequently but I have to reboot the
>     system to get my processor back. I'm never taring or copying more
>     than 50 megs of data.
>
>     It either hangs on a command like:
>     cp --reply=yes /usr/src/linux-${kernver}/.config
>     /tftpboot/node_root/boot/config-${kernver}
>     or
>     tar cf - etc | gzip > /tftpboot/node_root/drbl_ssi/template_etc.tgz
>
>     when I do a top I see:
>      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>     1603 root      20   0 54160 1616  508 R  100  0.0  33:02.72 cp
>     (100% cpu time)
>
>     I'm unable to kill that process in any way, but I can kill the
>     shell script that spawned it. The CP command is still running.
>
>     I see the below errors on the client:
>     2008-05-11 17:02:32 E [client-protocol.c:1238:client_flush]
>     system1: : returning EBADFD
>     2008-05-11 17:02:32 E [afr.c:2623:afr_flush_cbk] afr1:
>     (path=/scripts/gluster/afrheal.sh child=system1) op_ret=-1 op_errno=77
>     2008-05-11 17:02:32 W [client-protocol.c:1296:client_close]
>     system1: no valid fd found, returning
>     2008-05-11 17:02:32 W [client-protocol.c:1296:client_close]
>     system-ns1: no valid fd found, returning
>
>     My client and server specs are identical to:
>     http://www.gluster.org/docs/index.php/Simple_High_Availability_Storage_with_GlusterFS_1.3
>
>     This happens equally over ib-verbs and tcp transports.
>
>     -- 
>
>
>     _______________________________________________
>     Gluster-devel mailing list
>     Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>     http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>
>
> -- 
> Raghavendra G
>
> A centipede was happy quite, until a toad in fun,
> Said, "Prey, which leg comes after which?",
> This raised his doubts to such a pitch,
> He fell flat into the ditch,
> Not knowing how to run.
> -Anonymous 

--