[Bugs] [Bug 1217927] New: [Backup]: Crash observed when multiple sessions were created for the same volume

bugzilla at redhat.com bugzilla at redhat.com
Sun May 3 04:55:52 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1217927

            Bug ID: 1217927
           Summary: [Backup]: Crash observed when multiple sessions were
                    created for the same volume
           Product: GlusterFS
           Version: 3.7.0
         Component: glusterfind
          Assignee: bugs at gluster.org
          Reporter: avishwan at redhat.com
        QA Contact: bugs at gluster.org
                CC: avishwan at redhat.com, bugs at gluster.org,
                    gluster-bugs at redhat.com, khiremat at redhat.com,
                    nsathyan at redhat.com, sanandpa at redhat.com
        Depends On: 1209843
            Blocks: 1186580 (qe_tracker_everglades)



+++ This bug was initially created as a clone of Bug #1209843 +++

Description of problem:

Had a distribute volume 'nash' across 4 bricks in 2 nodes (2bricks in each
node). Did a glusterfind create session and tried to create multiple sessions
for the same volume. A crash was observed while creating the 20th session. The
location /var/lib/glusterd/glusterfind/ shows the mention of all 20 sessions,
but the log of the 20th session does not seem to be complete. 

Version-Release number of selected component (if applicable):
Glusterfs 3.7 nightly glusterfs-3.7dev-0.910.git17827de.el6.x86_64


Steps to Reproduce:
1. Create a gluster volume
2. Try to create >20 sessions for the same volume using 'glusterfind create'
command

Actual results:

Step2 results in a crash. 

Expected results:

Step2 should either be successful in creating multiple sessions, or should fail
gracefully stating the maximum allowed sessions for a single volume

Additional info:


[root at dhcp43-140 ~]# glusterfind create --force sess*nash nash 
[root at dhcp43-140 ~]# 
[root at dhcp43-140 ~]# vi /var/lib/glusterd/glusterfind
glusterfind/                glusterfind.secret.pem     
glusterfind.secret.pem.pub  
[root at dhcp43-140 ~]# cat /var/lib/glusterd/glusterfind.secret.pem
-----BEGIN RSA PRIVATE KEY-----
MIIEoQIBAAKCAQEA2918osxpjW/GwdY2hdAbLBLqofdggFjAWWPsbaTfMDgqif6R
b6kUipg4htxz820i8oryHo7g8qe/MoO3pyFM+Av+iZKpGcA47SlmuBB5ho/X37aO
UDJ7JVlUUsGssgd0CrH3Hm++HQUlf4PoGGX997w4UeyI9oEpmVmnkgVkO0G0ayMF
zaCVLl8BlMOb3uzOoUIBHZIoMYKFFpuCi0k9I6C4DUVEyYo1LgwwIcopOUdwXX5p
AeRwFmeLdEtfq3Fomh/B/aTIt9IRdjPvBGZXmxCPFfHMnWINCslTYdd4CrUfB2Jz
WPGRlEvjzLAjPlcEQCGeTHDgDqFbOwFOJ0ELQQIBIwKCAQAlsPDLc3/PGnnYFhf5
rqWShuZk6JQzQmod899jQNXNwHxSK6PnQY5+KLk7sMNrjw1OJnKmJx8/iniMURgr
R4mJm6f6U6f1yS5UisEm3kC3+2bV5Mfwfa618g50ljrcsNIQdkee0VPKdekrzXg+
smYNNjWKYxAqQgchmlc9moY2CvIosV+9aJ0fjCnW/hz5Q3ltgHZa8YEHoZgHxnPQ
vCK+RAa8RyzpzeF1onZqQc/ZcjDv5VChZZ+KGlyF+Du5P2QErQRzxVj3au72hnP1
Bqo1Hwf2DUNTZ5L8JjK1Ft1NgM0htKhfchn4EoqjRQCorzEkb72/xg5b9DPK4yrE
Y7ozAoGBAO1IQ564nNMT4mviVcTNeaJqp1f7YyViI9mE46eqeg3oKyYkYiMFvHrc
i90atBCKnHyZxAlcX6mXaeCPaNDidyQ7vIZf9QAzip9SxZJn0rrQ9JioGY3AAN9k
2N9SOXKlOtoL8KEIAIVPc6vV905iSxdoPwgha1AB3yD65Tejfx6DAoGBAO01f2NE
IYLidaGho4NjaZKNLUwPrjzNhV9kXOH0hbnlk2qemUZv8hcjTKBF64bk5iuhq16i
s/kuuTnXZTJlsyeYcfb6W0GLrQAwCLgx7LvpkZ1SiuDML6mRvExEmKUhYEMMKVF5
gh9zFrno+6ZZY04uAW33AjxJsbbLREe00YPrAoGAFFamtdVPRUrY5K0ACY30e6K9
4vhKU6lTh6xOBw6cv10LA0TyeAfOU6yBBFK++hqCcRR+hHWhzLU09f2p5gTIYjEB
h9xW1CGsy9PlDIyOZ8jEgh0JgS227pOdj3wTjXvgeRb3XkKDs6e5dSD38MaYuN0M
t43V/4shwP+P/XRp+00CgYEA36d4G8PlFQFnmGUzxQX101Ho4VCc93/yx6fFShJv
dMKDrat6kt6TzKxA7uLWwQO0ckf5WTpR6vGKEfb5A6G3i7uepwlOuiSbzPoW2Y4o
WWcxhbQ5zKNC4bVLI1aejRDXGqURA60USYnE+G30lYd640/yuCNogfxXGhAbzpSL
B1kCgYBxWaHoaHQ/peupGu5+dd+wN00qj5VNE18sQWEkL1dIAwSD5tL0zrBY8Btw
sN9/GlDvHXtiT92Yo+j90IB0tScm7p+itPKDTnkUT52A9jzmf+e1utCTykTwI0cg
XYS7DzEsU3Pzz59UuImONPaW/CMd8zZZJbCD3zT6Po4PJ+jlJA==
-----END RSA PRIVATE KEY-----
[root at dhcp43-140 ~]# cat /var/lib/glusterd/glusterfind.secret.pem.pub 
ssh-rsa
AAAAB3NzaC1yc2EAAAABIwAAAQEA2918osxpjW/GwdY2hdAbLBLqofdggFjAWWPsbaTfMDgqif6Rb6kUipg4htxz820i8oryHo7g8qe/MoO3pyFM+Av+iZKpGcA47SlmuBB5ho/X37aOUDJ7JVlUUsGssgd0CrH3Hm++HQUlf4PoGGX997w4UeyI9oEpmVmnkgVkO0G0ayMFzaCVLl8BlMOb3uzOoUIBHZIoMYKFFpuCi0k9I6C4DUVEyYo1LgwwIcopOUdwXX5pAeRwFmeLdEtfq3Fomh/B/aTIt9IRdjPvBGZXmxCPFfHMnWINCslTYdd4CrUfB2JzWPGRlEvjzLAjPlcEQCGeTHDgDqFbOwFOJ0ELQQ==
root at dhcp43-140.lab.eng.blr.redhat.com
[root at dhcp43-140 ~]# 
[root at dhcp43-140 ~]# glusterfind pre sess*nash nash /tmp/outt
Generated output file /tmp/outt
[root at dhcp43-140 ~]# cat /tmp/outt
44_55
b*c
+dir/
file1
.trashcan/
[root at dhcp43-140 ~]# glusterfind post sess*nash nash
[root at dhcp43-140 ~]# cat /var/lib/glusterd/glusterfind/sess
sess/      sess*nash/ 
[root at dhcp43-140 ~]# cat /var/lib/glusterd/glusterfind/sess\*nash/nash/status 
1428506824[root at dhcp43-140 ~]# 
[root at dhcp43-140 ~]# 
[root at dhcp43-140 ~]# 
[root at dhcp43-140 ~]# 
[root at dhcp43-140 ~]# glusterfind create 
usage: glusterfind create [-h] [--debug] [--force] session volume
glusterfind create: error: too few arguments
[root at dhcp43-140 ~]# glusterfind create sess2 --debug nash 
[root at dhcp43-140 ~]# vi /var/lib/glusterd/
bitd/                       glusterfind.secret.pem      hooks/                 
    quotad/
geo-replication/            glusterfind.secret.pem.pub  nfs/                   
    scrub/
glusterd.info               glustershd/                 options                
    snaps/
glusterfind/                groups/                     peers/                 
    vols/
[root at dhcp43-140 ~]# vi /var/lib/glusterd/glusterfind
glusterfind/                glusterfind.secret.pem     
glusterfind.secret.pem.pub  
[root at dhcp43-140 ~]# vi /var/lib/glusterd/glusterfind/^C
[root at dhcp43-140 ~]# 
[root at dhcp43-140 ~]# 
[root at dhcp43-140 ~]# glusterfind list
SESSION                   VOLUME                    SESSION TIME             
---------------------------------------------------------------------------
sess                      ozone                     2015-03-31 21:32:49      
sess2                     nash                      2015-04-08 21:09:49      
sess*nash                 nash                      2015-04-08 20:57:04      
[root at dhcp43-140 ~]# for i in {1..100}; do
> glusterfind create sess^$i nash;
> done
^CTraceback (most recent call last):
  File "/usr/bin/glusterfind", line 17, in <module>
    main()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main
    globals()["mode_" + args.mode](session_dir, args)
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 323, in mode_create
    ssh_setup()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 288, in ssh_setup
    execute(cmd, exit_msg="Failed to distribute ssh keys", logger=logger)
  File "/usr/libexec/glusterfs/glusterfind/utils.py", line 127, in execute
    (out, err) = p.communicate()
  File "/usr/lib64/python2.6/subprocess.py", line 732, in communicate
    stdout, stderr = self._communicate(input, endtime)
  File "/usr/lib64/python2.6/subprocess.py", line 1316, in _communicate
    stdout, stderr = self._communicate_with_poll(input, endtime)
  File "/usr/lib64/python2.6/subprocess.py", line 1388, in
_communicate_with_poll
    ready = poller.poll(self._remaining_time(endtime))
KeyboardInterrupt

^CTraceback (most recent call last):
  File "/usr/bin/glusterfind", line 17, in <module>
    main()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main
    globals()["mode_" + args.mode](session_dir, args)
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 323, in mode_create
    ssh_setup()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 301, in ssh_setup
    logger=logger)
  File "/usr/libexec/glusterfs/glusterfind/utils.py", line 127, in execute
    (out, err) = p.communicate()
  File "/usr/lib64/python2.6/subprocess.py", line 732, in communicate
    stdout, stderr = self._communicate(input, endtime)
  File "/usr/lib64/python2.6/subprocess.py", line 1316, in _communicate
    stdout, stderr = self._communicate_with_poll(input, endtime)
  File "/usr/lib64/python2.6/subprocess.py", line 1388, in
_communicate_with_poll
    ready = poller.poll(self._remaining_time(endtime))
KeyboardInterrupt
^CTraceback (most recent call last):
  File "/usr/bin/glusterfind", line 17, in <module>
    main()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main
    globals()["mode_" + args.mode](session_dir, args)
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 323, in mode_create
    ssh_setup()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 301, in ssh_setup
    logger=logger)
  File "/usr/libexec/glusterfs/glusterfind/utils.py", line 127, in execute
    (out, err) = p.communicate()
  File "/usr/lib64/python2.6/subprocess.py", line 732, in communicate
    stdout, stderr = self._communicate(input, endtime)
  File "/usr/lib64/python2.6/subprocess.py", line 1316, in _communicate
    stdout, stderr = self._communicate_with_poll(input, endtime)
  File "/usr/lib64/python2.6/subprocess.py", line 1388, in
_communicate_with_poll
    ready = poller.poll(self._remaining_time(endtime))
KeyboardInterrupt
^Z
[1]+  Stopped                 glusterfind create sess^$i nash
[root at dhcp43-140 ~]# glusterfind list
SESSION                   VOLUME                    SESSION TIME             
---------------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/bin/glusterfind", line 17, in <module>
    main()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main
    globals()["mode_" + args.mode](session_dir, args)
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 454, in mode_list
    human_time(last_processed).ljust(25)))
  File "/usr/libexec/glusterfs/glusterfind/utils.py", line 54, in human_time
    return datetime.fromtimestamp(float(ts)).strftime("%Y-%m-%d %H:%M:%S")
TypeError: float() argument must be a string or a number
[root at dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/
cli.log    sess^1/    sess^11/   sess^13/   sess^15/   sess^17/   sess2/    
sess^3/    sess^5/    sess^7/    sess^9/    session2/
sess/      sess^10/   sess^12/   sess^14/   sess^16/   sess^18/   sess^2/   
sess^4/    sess^6/    sess^8/    session1/  sess*nash/
[root at dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/cli.log 
[root at dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/sess^18/nash/cli.log 
[root at dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/sess^12/nash/cli.log 
[root at dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/sess^14/nash/cli.log 
[root at dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/sess^17/nash/cli.log 
[root at dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/sess^18/nash/cli.log 
[root at dhcp43-140 ~]# gluster v i nash

Volume Name: nash
Type: Distribute
Volume ID: d5c4752d-a37f-44fb-8e22-11f5cc427bea
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: 10.70.43.140:/rhs/thinbrick1/nash/dd
Brick2: 10.70.43.140:/rhs/thinbrick2/nash/dd
Brick3: 10.70.42.75:/rhs/thinbrick1/nash/dd
Brick4: 10.70.42.75:/rhs/thinbrick2/nash/dd
Options Reconfigured:
changelog.changelog: on
storage.build-pgfid: on
[root at dhcp43-140 ~]# 
[root at dhcp43-140 ~]# rpm -qa | grep gluster
glusterfs-fuse-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-devel-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-rdma-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-libs-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-api-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-extra-xlators-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-api-devel-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-resource-agents-3.7dev-0.910.git17827de.el6.noarch
glusterfs-debuginfo-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-cli-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-server-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-geo-replication-3.7dev-0.910.git17827de.el6.x86_64
[root at dhcp43-140 ~]#

--- Additional comment from Aravinda VK on 2015-04-08 06:43:05 EDT ---

I see KeyboardInterrupt during session creation, Looks like create in for loop
is interpreted. Due to this partial session is created. That is why
`glusterfind list` is failing.

--- Additional comment from Sweta Anandpara on 2015-04-08 07:38:41 EDT ---

I reran the command, and all the 100 sessions were created. But why is it that
'glusterfind list' lists only 3 (random) sessions and not all 100 of them?

[root at dhcp43-140 ~]# glusterfind list
SESSION                   VOLUME                    SESSION TIME             
---------------------------------------------------------------------------
sess^88                   nash                      2015-04-08 22:40:43      
sess^24                   nash                      2015-04-08 22:38:26      
sess^32                   nash                      2015-04-08 22:38:43      
Traceback (most recent call last):
  File "/usr/bin/glusterfind", line 17, in <module>
    main()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main
    globals()["mode_" + args.mode](session_dir, args)
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 454, in mode_list
    human_time(last_processed).ljust(25)))
  File "/usr/libexec/glusterfs/glusterfind/utils.py", line 54, in human_time
    return datetime.fromtimestamp(float(ts)).strftime("%Y-%m-%d %H:%M:%S")
TypeError: float() argument must be a string or a number
[root at dhcp43-140 ~]# 

How do I get the output back to normal after a failed 'glusterfind list'?

--- Additional comment from Aravinda VK on 2015-04-08 07:41:44 EDT ---

(In reply to Sweta Anandpara from comment #2)
> I reran the command, and all the 100 sessions were created. But why is it
> that 'glusterfind list' lists only 3 (random) sessions and not all 100 of
> them?
> 
> [root at dhcp43-140 ~]# glusterfind list
> SESSION                   VOLUME                    SESSION TIME             
> ---------------------------------------------------------------------------
> sess^88                   nash                      2015-04-08 22:40:43      
> sess^24                   nash                      2015-04-08 22:38:26      
> sess^32                   nash                      2015-04-08 22:38:43      
> Traceback (most recent call last):
>   File "/usr/bin/glusterfind", line 17, in <module>
>     main()
>   File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main
>     globals()["mode_" + args.mode](session_dir, args)
>   File "/usr/libexec/glusterfs/glusterfind/main.py", line 454, in mode_list
>     human_time(last_processed).ljust(25)))
>   File "/usr/libexec/glusterfs/glusterfind/utils.py", line 54, in human_time
>     return datetime.fromtimestamp(float(ts)).strftime("%Y-%m-%d %H:%M:%S")
> TypeError: float() argument must be a string or a number
> [root at dhcp43-140 ~]# 
> 
> How do I get the output back to normal after a failed 'glusterfind list'?

For some session session file doesn't exist or invalid value.

tree /var/lib/glusterd/glusterfind

to see any session directory without status file. I will fix the list command
not to fail for invalid value.

--- Additional comment from Anand Avati on 2015-04-08 08:15:11 EDT ---

REVIEW: http://review.gluster.org/10164 (tools/glusterfind: Do not fail to list
if Session is invalid) posted (#1) for review on master by Aravinda VK
(avishwan at redhat.com)

--- Additional comment from Aravinda VK on 2015-04-08 08:17:36 EDT ---

Sent patch to prevent crash in `glusterfind list`
http://review.gluster.org/10164

--- Additional comment from Nagaprasad Sathyanarayana on 2015-04-09 02:36:53
EDT ---

Not clear about this specific use case.  Was a session create followed by close
before opening the new session? If not, is glusterfind expected to have keep
multiple open sessions?

--- Additional comment from Aravinda VK on 2015-04-09 02:39:39 EDT ---

(In reply to Nagaprasad Sathyanarayana from comment #6)
> Not clear about this specific use case.  Was a session create followed by
> close before opening the new session? If not, is glusterfind expected to
> have keep multiple open sessions?

Sessions are per consumers. For example, Backup utility and Antivirus scanner.
Both consumer can be scheduled to run in different timings, Using same session
will not make sence.

--- Additional comment from Anand Avati on 2015-04-09 06:32:54 EDT ---

REVIEW: http://review.gluster.org/10164 (tools/glusterfind: Do not fail to list
if Session is invalid) posted (#2) for review on master by Aravinda VK
(avishwan at redhat.com)

--- Additional comment from Anand Avati on 2015-04-13 08:14:43 EDT ---

REVIEW: http://review.gluster.org/10164 (tools/glusterfind: Do not fail to list
if Session is invalid) posted (#3) for review on master by Aravinda VK
(avishwan at redhat.com)

--- Additional comment from Anand Avati on 2015-04-23 06:32:55 EDT ---

REVIEW: http://review.gluster.org/10164 (tools/glusterfind: Do not fail to list
if Session is invalid) posted (#4) for review on master by Aravinda VK
(avishwan at redhat.com)

--- Additional comment from Anand Avati on 2015-04-23 09:56:09 EDT ---

COMMIT: http://review.gluster.org/10164 committed in master by Vijay Bellur
(vbellur at redhat.com) 
------
commit 0fc310accb6bf86e42e09e0876b51b8860075fda
Author: Aravinda VK <avishwan at redhat.com>
Date:   Wed Apr 8 17:39:56 2015 +0530

    tools/glusterfind: Do not fail to list if Session is invalid

    If session is invalid, Show the session time as "Session Corrupted"

    BUG: 1209843
    Change-Id: I10f7fb79f6a3a2cabab3f21bce86f2a2f424e8b6
    Signed-off-by: Aravinda VK <avishwan at redhat.com>
    Reviewed-on: http://review.gluster.org/10164
    Reviewed-by: Kotresh HR <khiremat at redhat.com>
    Tested-by: NetBSD Build System


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1186580
[Bug 1186580] QE tracker bug for Everglades
https://bugzilla.redhat.com/show_bug.cgi?id=1209843
[Bug 1209843] [Backup]: Crash observed when multiple sessions were created
for the same volume
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list