[Bugs] [Bug 1361517] New: Bricks didn't become online after reboot. [Disk Full ]

Fri Jul 29 09:20:20 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1361517

            Bug ID: 1361517
           Summary: Bricks didn't become online after reboot. [Disk Full ]
           Product: Red Hat Gluster Storage
           Version: 3.1
         Component: posix
          Keywords: Triaged
          Severity: high
          Assignee: pkarampu at redhat.com
          Reporter: pkarampu at redhat.com
        QA Contact: storage-qa-internal at redhat.com
                CC: aspandey at redhat.com, bugs at gluster.org,
                    ksandha at redhat.com, pkarampu at redhat.com,
                    ravishankar at redhat.com, rhs-bugs at redhat.com
        Depends On: 1333341

+++ This bug was initially created as a clone of Bug #1333341 +++

Description of problem:
Rebooted the brick2 and started renaming the files in a brick1 which is full.
The brick2 didn't came online after the reboot. Errors were seen in the brick
logs.
"Creation of unlink directory failed"

sosreport kept at
rhsqe-repo.lab.eng.blr.redhat.com://var/www/html/sosreports/<bugid>

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Create replica 3 volume and mount the volume on client using fuse.
2. Create files using 
for (( i=1; i <= 50; i++ ))
do
 dd if=/dev/zero of=file$i count=1000 bs=5M status=progress

done
3. After the creation is done. reboot the second brick.
4. start the renaming process of the files to test$i..n
5. When the second brick comes up it fails with below errors.

[2016-05-05 14:37:45.826772] E [MSGID: 113096]
[posix.c:6443:posix_create_unlink_dir] 0-arbiter-posix: Creating directory
/rhs/brick1/arbiter/.glusterfs/unlink failed [No space left on device]
[2016-05-05 14:37:45.826856] E [MSGID: 113096] [posix.c:6866:init]
0-arbiter-posix: Creation of unlink directory failed
[2016-05-05 14:37:45.826880] E [MSGID: 101019] [xlator.c:433:xlator_init]
0-arbiter-posix: Initialization of volume 'arbiter-posix' failed, review your
volfile again
[2016-05-05 14:37:45.826925] E [graph.c:322:glusterfs_graph_init]
0-arbiter-posix: initializing translator failed
[2016-05-05 14:37:45.826943] E [graph.c:661:glusterfs_graph_activate] 0-graph:
init failed
[2016-05-05 14:37:45.828349] W [glusterfsd.c:1251:cleanup_and_exit]
(-->/usr/sbin/glusterfsd(mgmt_getspec_cbk+0x331) [0x7f6ba63797d1]
-->/usr/sbin/glusterfsd(glusterfs_process_volfp+0x120) [0x7f6ba6374150]
-->/usr/sbin/glusterfsd(cleanup_and_exit+0x69) [0x7f6ba6373739] ) 0-: received
signum (0), shutting down

Actual results:
[root at dhcp43-167 arbiter]# gluster volume status
Status of volume: arbiter
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs
/brick1/arbiter                             N/A       N/A        N       N/A  
Brick dhcp43-142.lab.eng.blr.redhat.com:/rh
s/brick1/arbiter                            49157     0          Y       2120 
Brick dhcp43-167.lab.eng.blr.redhat.com:/rh
s/brick1/arbiter                            49156     0          Y       2094 
NFS Server on localhost                     2049      0          Y       2679 
Self-heal Daemon on localhost               N/A       N/A        Y       3172 
NFS Server on dhcp42-58.lab.eng.blr.redhat.
com                                         2049      0          Y       2195 
Self-heal Daemon on dhcp42-58.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       2816 
NFS Server on dhcp43-142.lab.eng.blr.redhat
.com                                        2049      0          Y       3072 
Self-heal Daemon on dhcp43-142.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       3160 

Task Status of Volume arbiter
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: arbiternfs
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs
/brick2/arbiternfs                          N/A       N/A        N       N/A  
Brick dhcp43-142.lab.eng.blr.redhat.com:/rh
s/brick2/arbiternfs                         49158     0          Y       2128 
Brick dhcp43-167.lab.eng.blr.redhat.com:/rh
s/brick2/arbiternfs                         49157     0          Y       2109 
NFS Server on localhost                     2049      0          Y       2679 
Self-heal Daemon on localhost               N/A       N/A        Y       3172 
NFS Server on dhcp42-58.lab.eng.blr.redhat.
com                                         2049      0          Y       2195 
Self-heal Daemon on dhcp42-58.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       2816 
NFS Server on dhcp43-142.lab.eng.blr.redhat
.com                                        2049      0          Y       3072 
Self-heal Daemon on dhcp43-142.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       3160 

Task Status of Volume arbiternfs
------------------------------------------------------------------------------
There are no active volume tasks

*************************************************************
[root at dhcp43-142 arbiter]# gluster volume heal arbiternfs info
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs/brick2/arbiternfs
Status: Transport endpoint is not connected
Number of entries: -

Brick dhcp43-142.lab.eng.blr.redhat.com:/rhs/brick2/arbiternfs
/file4 
/file5 
Status: Connected
Number of entries: 2

Brick dhcp43-167.lab.eng.blr.redhat.com:/rhs/brick2/arbiternfs
/file4 
/file5 
Status: Connected
Number of entries: 2

[root at dhcp43-142 arbiter]# 
[root at dhcp43-142 arbiter]# 
[root at dhcp43-142 arbiter]# 
[root at dhcp43-142 arbiter]# gluster volume heal arbiter info
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs/brick1/arbiter
Status: Transport endpoint is not connected
Number of entries: -

Brick dhcp43-142.lab.eng.blr.redhat.com:/rhs/brick1/arbiter
/ - Possibly undergoing heal

Status: Connected
Number of entries: 1

Brick dhcp43-167.lab.eng.blr.redhat.com:/rhs/brick1/arbiter
/ 
Status: Connected
Number of entries: 1

[root at dhcp43-142 arbiter]# 

Expected results:
The bricks should be up and running and file names should have been renamed.

Additional info:

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1333341
[Bug 1333341] Bricks didn't become online after reboot. [Disk Full ]
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=fNupqjrZPQ&a=cc_unsubscribe