[Gluster-users] Many logs (errors?) on client → memory problem

Fri Jun 10 11:39:56 UTC 2016

Le 10/06/2016 13:08, Mohammed Rafi K C a écrit :
>
>
> On 06/10/2016 02:41 PM, Yannick Perret wrote:
>> I get no feedback on that but I think I found the problem:
>> the glusterfs client grows on memory until no memory available and 
>> them it crashes.
>
> If we you can take a statedump (kill -SIGUSR1 $client_pid) and send it 
> across, I can take a look to see where it consumes so many memory. 
> Since you said it is not reproducible with latest Debian system and If 
> it is not important, that is fine for me.
>
Thanks for your feedback.
As it is clearly for me a transitional installation (moving local disks 
from our last old machine to network storage before big 
migration/upgrade) and as it works fine (and faster btw) on up-to-date 
machines I don't need to investigate more than that: using NFS gluster 
mount is fine − even if using directly gluster would have been better − 
and will do the job until this old machine will die.
>
>>
>> I performed the same operations on an other machine without being 
>> able to reproduce the problem.
>> The machine with the problem is an old machine (debian, 3.2.50 
>> kernel, 32bit), whereas the other machine is an up-to-date debian 64bit.
>>
>> To give some stats the glusterfs on the client starts with less than 
>> 810220 of resident size and finished with 3055336 (3Go!) when it 
>> crashes again. The volume was mounted only on this machine, used by 
>> only one process (a 'cp -Rp').
>>
>> Running the same from a recent machine gives far more stable memory 
>> usage (43364 of resident size and few and small increasing).
>> Of course I'm using the same glusterfs version (compiled from sources 
>> on both machines).
>>
>> As I can't upgrade this old machine due to version compatibility with 
>> old softs − at least until we replace these old softs − I will so use 
>> a NFS mountpoint from the gluster servers.
>>
>> Whatever I still get on the recent machine very verbose logs for each 
>> directory creation:
>> [2016-06-10 08:35:12.965438] I 
>> [dht-selfheal.c:1065:dht_selfheal_layout_new_directory] 
>> 0-HOME-LIRIS-dht: chunk size = 0xffffffff / 2064114 = 0x820
>> [2016-06-10 08:35:12.965473] I 
>> [dht-selfheal.c:1103:dht_selfheal_layout_new_directory] 
>> 0-HOME-LIRIS-dht: assigning range size 0xffe76e40 to 
>> HOME-LIRIS-replicate-0
>> [2016-06-10 08:35:12.966987] I [MSGID: 109036] 
>> [dht-common.c:6296:dht_log_new_layout_for_dir_selfheal] 
>> 0-HOME-LIRIS-dht: Setting layout of /log_apache_error with 
>> [Subvol_name: HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop: 
>> 4294967295 ],
>
> This is an INFO level message which says about the layout of a 
> directory. Gluster-fuse client will print this INFO when it sets the 
> layout on a directory. This error messages can be safely ignore.
>
>
>>
>> I switched clients to WARNING log level (gluster volume set 
>> HOME-LIRIS diagnostics.client-sys-log-level WARNING) which is fine 
>> for me.
>> But maybe WARNING should be the default log level, at least for 
>> clients, no? In production getting 3 lines per created directory is 
>> useless, and anyone who wants to analyze a problem will switch to 
>> INFO or DEBUG.
>
> I see many uses get panic about this error message. I agree, we have 
> to do something with this log entry's.
Yes first I wonder if it was "problems" (i.e. Err: -1). In particular 
because you often read logs when you have problems :). Then I saw it was 
"INFO" data.
But even with that in mind writing +500 bytes in logs for each directory 
creation is a bit too much in production context (log size, sorting 
useful data from all these entries…).
Many tools are by default in "warning/error" mode by default, which may 
be also the case for gluster, at least for the clients side.

Thanks.

Regards,
--
Y.

>
>>
>> Regards,
>> --
>> Y.
>>
>>
>>
>> Le 08/06/2016 17:35, Yannick Perret a écrit :
>>> Hello,
>>>
>>> I have a replica 2 volume managed on 2 identical server, using 3.6.7 
>>> version of gluster. Here is the volume info:
>>> Volume Name: HOME-LIRIS
>>> Type: Replicate
>>> Volume ID: 47b4b856-371b-4b8c-8baa-2b7c32d7bb23
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: sto1.mydomain:/glusterfs/home-liris/data
>>> Brick2: sto2.mydomain:/glusterfs/home-liris/data
>>>
>>> It is mounted on a (single) client with mount -t glusterfs 
>>> sto1.mydomain:/HOME-LIRIS /futur-home/
>>>
>>> I started to copy a directory (~550Go, ~660 directories with many 
>>> files) into it. Copy was done using 'cp -Rp'.
>>>
>>> It seems to work fine but I get *many* log entries in the 
>>> corresponding mountpoint logs:
>>> [2016-06-07 14:01:27.587300] I 
>>> [dht-selfheal.c:1065:dht_selfheal_layout_new_directory] 
>>> 0-HOME-LIRIS-dht: chunk size = 0xffffffff / 2064114 = 0x820
>>> [2016-06-07 14:01:27.587338] I 
>>> [dht-selfheal.c:1103:dht_selfheal_layout_new_directory] 
>>> 0-HOME-LIRIS-dht: assigning range size 0xffe76e40 to 
>>> HOME-LIRIS-replicate-0
>>> [2016-06-07 14:01:27.588436] I [MSGID: 109036] 
>>> [dht-common.c:6296:dht_log_new_layout_for_dir_selfheal] 
>>> 0-HOME-LIRIS-dht: Setting layout of /olfamine with [Subvol_name: 
>>> HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 ],
>>>
>>> This is repeated for many files (124088 exactly). Is it normal? If 
>>> yes I use default settings on the client so I find it a little bit 
>>> verbose. If no can someone tell me what is the problem here?
>>>
>>> Moreover at the end of the log file I have:
>>> [2016-06-08 04:42:58.210617] A [MSGID: 0] 
>>> [mem-pool.c:110:__gf_calloc] : no memory available for size (14651) 
>>> [call stack follows]
>>> [2016-06-08 04:42:58.219060] A [MSGID: 0] 
>>> [mem-pool.c:134:__gf_malloc] : no memory available for size (21026) 
>>> [call stack follows]
>>> pending frames:
>>> frame : type(1) op(CREATE)
>>> frame : type(1) op(CREATE)
>>> frame : type(1) op(LOOKUP)
>>> frame : type(0) op(0)
>>> patchset: git://git.gluster.com/glusterfs.git
>>> signal received: 11
>>> time of crash:
>>> 2016-06-08 04:42:58
>>> configuration details:
>>> argp 1
>>> backtrace 1
>>> dlfcn 1
>>> libpthread 1
>>> llistxattr 1
>>> setfsid 1
>>> spinlock 1
>>> epoll.h 1
>>> xattr.h 1
>>> st_atim.tv_nsec 1
>>> package-string: glusterfs 3.6.7
>>>
>>> Which clearly don't seems right.
>>> The data were not all copied (logs of copy got a logical list of 
>>> "final transport node not connected" (or similar, it was translated 
>>> in my language)).
>>>
>>> I re-mounted the volume and created a directory with 'mkdir TOTO' 
>>> and get a similar:
>>> [2016-06-08 15:32:23.692936] I 
>>> [dht-selfheal.c:1065:dht_selfheal_layout_new_directory] 
>>> 0-HOME-LIRIS-dht: chunk size = 0xffffffff / 2064114 = 0x820
>>> [2016-06-08 15:32:23.692982] I 
>>> [dht-selfheal.c:1103:dht_selfheal_layout_new_directory] 
>>> 0-HOME-LIRIS-dht: assigning range size 0xffe76e40 to 
>>> HOME-LIRIS-replicate-0
>>> [2016-06-08 15:32:23.694144] I [MSGID: 109036] 
>>> [dht-common.c:6296:dht_log_new_layout_for_dir_selfheal] 
>>> 0-HOME-LIRIS-dht: Setting layout of /TOTO with [Subvol_name: 
>>> HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 ],
>>> but I don't get such message with files.
>>>
>>> If it can help volumes are ~2To and content is far from that, and 
>>> both bricks are ext4 (both same size).
>>>
>>>
>>> Any help would be appreciated.
>>>
>>> Regards,
>>> -- 
>>> Y.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160610/0a3bda4a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3369 bytes
Desc: Signature cryptographique S/MIME
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160610/0a3bda4a/attachment.p7s>