[Gluster-users] files on gluster brick that have '---------T' designation.

Shishir Gowda sgowda at redhat.com
Tue Aug 21 04:40:52 UTC 2012


Hi Harry,

If your intention was only to mitigate the load, then please use rebalance command.

You could stop the current remove-brick op. And issue a rebalance command.

The difference is:
Rebalance: will re-distribute the data across all nodes, and all nodes will participate in migration.
Remove-brick: The node which has the brick will migrate the data, and this brick will have no data at the end of the day.

With regards,
Shishir

----- Original Message -----
From: "Harry Mangalam" <hjmangalam at gmail.com>
To: "Shishir Gowda" <sgowda at redhat.com>
Cc: "gluster-users" <gluster-users at gluster.org>
Sent: Tuesday, August 21, 2012 10:01:07 AM
Subject: Re: [Gluster-users] files on gluster brick that have '---------T' designation.

All the bricks are 3.3, and all the bricks were started via starting glusterd on each of them and then peer-probing etc. 


The initial reason for starting this fix-layout/rebalance/remove brick was a CPU overload on the pbs3 brick (load of >30 on a 4 CPU server) that was dramatically decreasing performance. I killed glusterd, restarted it and checked that it has re-established connection with the 'gluster peer status' command which implied that all 4 peers were connected. I didn't find out until later that this was incorrect using the ' gluster volume status <VOLUME> detail' command. 


So this peer has been hashed and thrashed somewhat (and miraculously is still serving files), but in the process, has gone out of proper balance with the other peers. 


It sounds like you're saying that this: 



Node Rebalanced-files size scanned failures status 
--------- ----------- ----------- ----------- ----------- ------------ 
localhost 0 0 137702 21406 stopped 
pbs2ib 0 0 168991 6921 stopped 
pbs3ib 724683 594890945282 4402804 0 in progress 
pbs4ib 0 0 169081 7923 stopped 


implies that the other peers are not participating in the remove-brick? 
The change in storage across the servers implies that they are participating, just very slowly. 


On the other hand the last of the errors stopped 2 days ago (there are no more errors in the last 350MB of the rebalance logs, which also implies that the rest of the files are being migrated, just very slowly.. 


At any rate, if you've diagnosed the problem, what is the solution? A cluster-wide glusterd restart to sync the uuids? Or is there another way to re-identify them to each other? 


Best, 
Harry 






On Mon, Aug 20, 2012 at 9:06 PM, Shishir Gowda < sgowda at redhat.com > wrote: 


Hi Harry, 

Are all the bricks from 3.3? Or did you start any of the bricks manually (not through gluster volume commands) remove-brick/rebalance processes are started across all nodes(1-per node) of the volume. 
We use the node-uuid to distribute work across nodes. So migration is handled by all the nodes, to which the data belongs. In your case, there are errors being reported that the node-uuid is not 
available. 

With regards, 
Shishir 

----- Original Message ----- 
From: "Harry Mangalam" < hjmangalam at gmail.com > 
To: "Shishir Gowda" < sgowda at redhat.com > 
Cc: "gluster-users" < gluster-users at gluster.org > 
Sent: Tuesday, August 21, 2012 8:37:06 AM 
Subject: Re: [Gluster-users] files on gluster brick that have '---------T' designation. 


Hi Shishir, 


Here's the 'df -h' of the appropriate filesystem on all 4 of the gluster servers. 
It has equilibrated a bit since the original post - pbs3 has decreased from 73% and the others have increased from about 29%, but still, slow. 


pbs1:/dev/sdb 6.4T 2.0T 4.4T 32% /bducgl 
pbs2:/dev/md0 8.2T 2.9T 5.4T 35% /bducgl 
pbs3:/dev/md127 8.2T 5.3T 3.0T 65% /bducgl 
pbs4:/dev/sda 6.4T 2.2T 4.3T 34% /bducgl 


The 'errors-only' extract of the log (since the remove-brick was started) is here: 
< http://moo.nac.uci.edu/~hjm/gluster/remove-brick_errors.log.gz > (2707 lines) 
and the last 100 lines of the active log (gli-rebalance.log) is here: 
< http://pastie.org/4559913 > 


Thanks for your help. 
Harry 

On Mon, Aug 20, 2012 at 7:42 PM, Shishir Gowda < sgowda at redhat.com > wrote: 


Hi Harry, 

That is correct, the files wont be seen on the client. 
Can you provide an output of these: 
1. df of all exports 
2. Provide remove-brick/rebalance (<volname>-rebalance.log) log (if large just the failure messages, and tail of the the file). 

With regards, 
Shishir 

----- Original Message ----- 
From: "Harry Mangalam" < hjmangalam at gmail.com > 
To: "Shishir Gowda" < sgowda at redhat.com > 
Cc: "gluster-users" < gluster-users at gluster.org > 
Sent: Tuesday, August 21, 2012 8:00:42 AM 
Subject: Re: [Gluster-users] files on gluster brick that have '---------T' designation. 

Hi Shishir, 


Thanks for your attention. 



Hmm - your explanation makes some sense, but those 'T' files don't show up in the client view of the dir - only in the brick view. Is that valid? 


I'm using 3.3 on 4 ubuntu 12.04 servers over DDR IPoIB, and the command to initiate the remove brick was: 


$ gluster volume remove-brick gli pbs3ib:/bducgl start 


and the current status is: 



$ gluster volume remove-brick gli pbs3ib:/bducgl status 
Node Rebalanced-files size scanned failures status 
--------- ----------- ----------- ----------- ----------- ------------ 
localhost 0 0 137702 21406 stopped 
pbs2ib 0 0 168991 6921 stopped 
pbs3ib 724683 594890945282 4402804 0 in progress 
pbs4ib 0 0 169081 7923 stopped 


(the failures were the same as were seen as when I tried the rebalance command previously). 


Best 
harry 


On Mon, Aug 20, 2012 at 7:09 PM, Shishir Gowda < sgowda at redhat.com > wrote: 


Hi Harry, 

These are valid files in glusterfs-dht xlator configured volumes. These are known as link files, which dht uses to maintain files on the hashed subvol, when the actual data resides in non hashed subvolumes(rename can lead to these). The cleanup of these files will be taken care of by running rebalance. 
Can you please provide the gluster version you are using, and the remove brick command you used? 

With regards, 
Shishir 

----- Original Message ----- 
From: "Harry Mangalam" < hjmangalam at gmail.com > 
To: "gluster-users" < gluster-users at gluster.org > 
Sent: Tuesday, August 21, 2012 5:01:05 AM 
Subject: [Gluster-users] files on gluster brick that have '---------T' designation. 


I have a working but unbalanced gluster config where one brick has about 2X the usage of the 3 others. I started a remove-brick to force a resolution of this problem (Thanks to JD for the help!), but it's going very slowly, about 2.2MB/s over DDR IPoIB or ~2.3 files/s. In investigating the problem, I may have found a partial explanation - I have found 100s of thousands (maybe millions) of zero-length files existing on the problem brick that do not exist on the client view that have the designation ' ---------T' via 'ls -l' 


ie: 



/bducgl/alamng/Research/Yuki/newF20/runF20_2513/data: 
total 0 
---------T 2 root root 0 2012-08-04 11:23 backward_sm1003 
---------T 2 root root 0 2012-08-04 11:23 backward_sm1007 
---------T 2 root root 0 2012-08-04 11:23 backward_sm1029 


I suspect that these are the ones that are responsible for the enormous expansion of the storage space on this brick and the very slow speed of the 'remove-brick' operation. 


Does this sound possible? Can I delete these files on the brick to resolve the imbalance? If not, is there a way to process them in some better way to rationalize the imbalance? 

-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine 
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 
415 South Circle View Dr, Irvine, CA, 92697 [shipping] 
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) 


_______________________________________________ 
Gluster-users mailing list 
Gluster-users at gluster.org 
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users 




-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine 
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 
415 South Circle View Dr, Irvine, CA, 92697 [shipping] 
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) 





-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine 
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 
415 South Circle View Dr, Irvine, CA, 92697 [shipping] 
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) 





-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine 
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 
415 South Circle View Dr, Irvine, CA, 92697 [shipping] 
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) 




More information about the Gluster-users mailing list