[Gluster-users] geo-replication 3.6.7 - no trusted.gfid on some slave nodes - stale file handle

Dietmar Putz putz at 3qmedien.net
Tue Dec 22 10:47:13 UTC 2015


Hi Saravana,

thanks for your reply...
all gluster-nodes running ubuntu 14.04 using apparmor. Even when it is 
running without any configuration i have unloaded the module to prevent 
any influence.

i have stopped and deleted geo-replication one more time and started the 
slave-upgrade.sh again but this time on gluster-wien-07, geo-replication 
is currently not started again.
the result is the same as before and more comprehensive than first 
identified by myself...
i have checked all directories in the root of each brick for a 
trusted.gfid (567 dir's).
only on subvolume aut-wien-01-replicate-0 each directory has an 
trusted.gfid assigned.
on subvolume ~replicate-1 and ~replicate-2 186 resp. 206 of 567 
directories have an trusted.gfid assigned.

for example the directory /gluster-export/1050 which have been seen in 
the geo-replication logs before...
the screenlog of the slave-upgrade.sh shows a 'failed' for setxattr on 
1050 but this folder exist and contains data / folders on each subvolume.

[ 09:50:43 ] - root at gluster-wien-07 /usr/share/glusterfs/scripts $grep 
1050 screenlog.0 | head -3
setxattr on ./1050="d4815ee4-3348-4105-9136-d0219d956ed8" failed (No 
such file or directory)
setxattr on 1050/recordings="6056c887-99bc-4fcc-bf39-8ea2478bb780" 
failed (No such file or directory)
setxattr on 
1050/recordings/REC_22_3619210_63112.mp4="63d127a3-a387-4cb6-bb4b-792dc422ebbf" 
failed (No such file or directory)
[ 09:50:53 ] - root at gluster-wien-07 /usr/share/glusterfs/scripts $

[ 10:11:01 ] - root at gluster-wien-07  /gluster-export $getfattr -m . -d 
-e hex 1050
# file: 1050
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9

[ 10:11:10 ] - root at gluster-wien-07  /gluster-export $ls -li | grep 1050
  17179869881 drwxr-xr-x 72  1009 admin   4096 Dec  2 21:34 1050
[ 10:11:21 ] - root at gluster-wien-07  /gluster-export $du -hs 1050
877G    1050
[ 10:11:29 ] - root at gluster-wien-07  /gluster-export $

as far as i understood folder 1050 and many other folders should have an 
unique trusted.gfid assigned like on all master nodes resp. on subvolume 
aut-wien-01-replicate-0.
does it make sense to start the geo-replication again or does this issue 
need to be fixed before starting another attempt...?
...and if yes, does anybody know how to fix the missing trusted.gfid ? 
just restarting slave-upgrade did not help.

any help is appreciated.

best regards
dietmar



volume aut-wien-01-client-0 remote-host gluster-wien-02-int
volume aut-wien-01-client-1 remote-host gluster-wien-03-int
volume aut-wien-01-client-2 remote-host gluster-wien-04-int
volume aut-wien-01-client-3 remote-host gluster-wien-05-int
volume aut-wien-01-client-4 remote-host gluster-wien-06-int
volume aut-wien-01-client-5 remote-host gluster-wien-07-int
volume aut-wien-01-replicate-0  subvolumes aut-wien-01-client-0 
aut-wien-01-client-1
volume aut-wien-01-replicate-1  subvolumes aut-wien-01-client-2 
aut-wien-01-client-3
volume aut-wien-01-replicate-2  subvolumes aut-wien-01-client-4 
aut-wien-01-client-5
volume glustershd
     type debug/io-stats
     subvolumes aut-wien-01-replicate-0 aut-wien-01-replicate-1 
aut-wien-01-replicate-2
end-volume


Am 21.12.2015 um 08:08 schrieb Saravanakumar Arumugam:
> Hi,
> Replies inline..
>
> Thanks,
> Saravana
>
> On 12/18/2015 10:02 PM, Dietmar Putz wrote:
>> Hello again...
>>
>> after having some big trouble with an xfs issue in kernel 3.13.0-x 
>> and 3.19.0-39 which has been 'solved' by downgrading to 3.8.4 
>> (http://comments.gmane.org/gmane.comp.file-systems.xfs.general/71629)
>> we decided to start a new geo-replication attempt from scratch...
>> we have deleted the former geo-replication session and started a new 
>> one as described in :
>> http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.6
>>
>> master and slave is a distributed replicated volume running on 
>> gluster 3.6.7 / ubuntu 14.04.
>> setup worked as described but unfortunately geo-replication isn't 
>> syncing files and remains in the below shown status.
>>
>> in the ~geo-replication-slaves/...gluster.log i can found on all 
>> slave nodes messages like :
>>
>> [2015-12-16 15:06:46.837748] W [dht-layout.c:180:dht_layout_search] 
>> 0-aut-wien-01-dht: no subvolume for hash (value) = 1448787070
>> [2015-12-16 15:06:46.837789] W [fuse-bridge.c:1261:fuse_err_cbk] 
>> 0-glusterfs-fuse: 74203: SETXATTR() 
>> /.gfid/d4815ee4-3348-4105-9136-d0219d956ed8 => -1 (No such file or 
>> directory)
>> [2015-12-16 15:06:47.090212] I 
>> [dht-layout.c:663:dht_layout_normalize] 0-aut-wien-01-dht: Found 
>> anomalies in (null) (gfid = d4815ee4-3348-4105-9136-d0219d956ed8). 
>> Holes=1 overlaps=0
>>
>> [2015-12-16 20:25:55.327874] W [fuse-bridge.c:1967:fuse_create_cbk] 
>> 0-glusterfs-fuse: 199968: /.gfid/603de79d-8d41-44bd-845e-3727cf64a617 
>> => -1 (Operation not permitted)
>> [2015-12-16 20:25:55.617016] W [fuse-bridge.c:1967:fuse_create_cbk] 
>> 0-glusterfs-fuse: 199971: /.gfid/8622fb7d-8909-42de-adb5-c67ed6f006c0 
>> => -1 (Operation not permitted)
> Please check whether selinux is enabled in both Master/Slave..I 
> remember seeing such errors if selinux enabled.
>
>>
>> this is found only on gluster-wien-03-int which is in 'Hybrid Crawl' :
>> [2015-12-16 17:17:07.219939] W [fuse-bridge.c:1261:fuse_err_cbk] 
>> 0-glusterfs-fuse: 123841: SETXATTR() 
>> /.gfid/00000000-0000-0000-0000-000000000001 => -1 (File exists)
>> [2015-12-16 17:17:07.220658] W 
>> [client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-aut-wien-01-client-3: 
>> remote operation failed: File exists. Path: /2301
>> [2015-12-16 17:17:07.220702] W 
>> [client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-aut-wien-01-client-2: 
>> remote operation failed: File exists. Path: /2301
>>
> Some errors like "file exists" can be ignored.
>>
>> But first of all i would like to have a look at this message, found 
>> about 6000 times on gluster-wien-05-int and ~07-int which are in 
>> 'History Crawl':
>> [2015-12-16 13:03:25.658359] W [fuse-bridge.c:483:fuse_entry_cbk] 
>> 0-glusterfs-fuse: 119569: LOOKUP() 
>> /.gfid/d4815ee4-3348-4105-9136-d0219d956ed8/.dstXXXfDyaP9 => -1 
>> (Stale file handle)
>>
>> The gfid d4815ee4-3348-4105-9136-d0219d956ed8 
>> 1050="d4815ee4-3348-4105-9136-d0219d956ed8" belongs as shown to the 
>> folder 1050 in the brick-directory.
>>
>> any brick in the master volume looks like this one ...:
>> Host : gluster-ger-ber-12-int
>> # file: gluster-export/1050
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.ger-ber-01-client-0=0x000000000000000000000000
>> trusted.afr.ger-ber-01-client-1=0x000000000000000000000000
>> trusted.afr.ger-ber-01-client-2=0x000000000000000000000000
>> trusted.afr.ger-ber-01-client-3=0x000000000000000000000000
>> trusted.gfid=0xd4815ee4334841059136d0219d956ed8
>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.1c31dc4d-7ee3-423b-8577-c7b0ce2e356a.stime=0x56606290000c7e4e 
>>
>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x567428e000042116 
>>
>> trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9
>>
>> on the slave volume just the brick of wien-02 and wien-03 have the 
>> same trusted.gfid
>> Host : gluster-wien-03
>> # file: gluster-export/1050
>> trusted.afr.aut-wien-01-client-0=0x000000000000000000000000
>> trusted.afr.aut-wien-01-client-1=0x000000000000000000000000
>> trusted.gfid=0xd4815ee4334841059136d0219d956ed8
>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0 
>>
>> trusted.glusterfs.dht=0x00000001000000000000000055555554
>>
>> all nodes in 'History Crawl' haven't this trusted.gfid assigned.
>> Host : gluster-wien-05
>> # file: gluster-export/1050
>> trusted.afr.aut-wien-01-client-2=0x000000000000000000000000
>> trusted.afr.aut-wien-01-client-3=0x000000000000000000000000
>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0 
>>
>> trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff
>>
>> I'm not sure if it is normal or if that trusted.gfid should have been 
>> assigned on all slave nodes by the slave-upgrade.sh script.
>
> As per the doc, it applies gfid on all slave nodes.
>
>> bash slave-upgrade.sh localhost:<aut-wien-01> 
>> /tmp/master_gfid_file.txt $PWD/gsync-sync-gfid was running on wien-02 
>> which has password less login for any other slave node.
>> as i could see in the process list slave-upgrade.sh was running on 
>> each slave node and starts as far as i can remember with a 'rm -rf 
>> ~/.glusterfs/...'
>> so the mentioned gfid should disappeared by the slave-upgrade.sh but 
>> should the trusted.gfid also be re-assigned by the script ?
>> ...I'm confused,
>> is the 'Stale file handle' message based on the missing trusted.gfid 
>> for /gluster-export/1050/ on the nodes where the message appears ?
>> does it make sense to geo-rep and to start the slave-upgrade.sh 
>> script on the affected nodes without having access to the other nodes 
>> to fix this ?
>>
>> currently I'm not sure if the 'stale file handle' messages prevent us 
>> from getting a running geo-replication but i guess best way is trying 
>> to get it running step by step...
>> any help is appreciated.
>>
>> best regards
>> dietmar
>>
>>
>>
>> [ 14:45:42 ] - root at gluster-ger-ber-07 
>> /var/log/glusterfs/geo-replication/ger-ber-01 $gluster volume 
>> geo-replication ger-ber-01 gluster-wien-02::aut-wien-01 status detail
>>
>> MASTER NODE           MASTER VOL    MASTER BRICK 
>> SLAVE                               STATUS     CHECKPOINT STATUS    
>> CRAWL STATUS     FILES SYNCD    FILES PENDING    BYTES PENDING    
>> DELETES PENDING    FILES SKIPPED
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
>>
>> gluster-ger-ber-07    ger-ber-01    /gluster-export 
>> gluster-wien-07-int::aut-wien-01    Active N/A History Crawl -6500 
>> 0                0                5 6500
>> gluster-ger-ber-12    ger-ber-01    /gluster-export 
>> gluster-wien-06-int::aut-wien-01    Passive N/A N/A 0 
>> 0                0                0 0
>> gluster-ger-ber-11    ger-ber-01    /gluster-export 
>> gluster-wien-03-int::aut-wien-01    Active N/A Hybrid Crawl 0 
>> 8191             0                0 0
>> gluster-ger-ber-09    ger-ber-01    /gluster-export 
>> gluster-wien-05-int::aut-wien-01    Active N/A History Crawl -5792 
>> 0                0                0 5793
>> gluster-ger-ber-10    ger-ber-01    /gluster-export 
>> gluster-wien-02-int::aut-wien-01    Passive N/A N/A 0 
>> 0                0                0 0
>> gluster-ger-ber-08    ger-ber-01    /gluster-export 
>> gluster-wien-04-int::aut-wien-01    Passive N/A N/A 0 
>> 0                0                0 0
>> [ 14:45:46 ] - root at gluster-ger-ber-07 
>> /var/log/glusterfs/geo-replication/ger-ber-01 $
>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>



More information about the Gluster-users mailing list