[Gluster-users] geo-replication 3.6.7 - no trusted.gfid on some slave nodes - stale file handle

Dietmar Putz putz at 3qmedien.net
Tue Dec 22 13:30:25 UTC 2015


one correction...
after running slave-upgrade.sh on gluster-wien-07 for example the folder 
1050 has a trusted.gfid only assigned on subvolume replicate-0 but 
otherwise as stated in the last mail this is a totally wrong gfid and 
does not appear in the master_gfid_file.txt.


[ 13:05:07 ] - root at gluster-wien-02 /usr/share/glusterfs/scripts 
$getfattr -m . -d -e hex /gluster-export/1050
getfattr: Removing leading '/' from absolute path names
# file: gluster-export/1050
trusted.afr.aut-wien-01-client-0=0x000000000000000000000000
trusted.afr.aut-wien-01-client-1=0x000000000000000000000000
trusted.gfid=0x564d2217600b4e9c9ab5b34c53b1841c
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0
trusted.glusterfs.dht=0x00000001000000000000000055555554

[ 13:11:31 ] - root at gluster-wien-02 /usr/share/glusterfs/scripts $
[ 12:59:40 ] - root at gluster-wien-07  /usr/share/glusterfs/scripts $grep 
564d2217-600b-4e9c-9ab5-b34c53b1841c /tmp/master_gfid_file.txt
[ 13:12:30 ] - root at gluster-wien-07  /usr/share/glusterfs/scripts $grep 
d4815ee4-3348-4105-9136-d0219d956ed8 /tmp/master_gfid_file.txt
d4815ee4-3348-4105-9136-d0219d956ed8 
1050="d4815ee4-3348-4105-9136-d0219d956ed8"
[ 13:12:36 ] - root at gluster-wien-07  /usr/share/glusterfs/scripts $

this confuses me, slave-upgrade.sh removes all in ~.glusterfs and makes 
a setfattr -x on everything in the brick directory and assign obviously 
a random gfid ?
when i was running slave-upgrade on gluster-wien-02 the trusted.gfid was 
missing on four nodes but at least on the remaining two nodes the gfid 
for 1050 was the same like on the master volume.
I'll try it again on wien-02..

best regards
dietmar



Am 22.12.2015 um 11:47 schrieb Dietmar Putz:
> Hi Saravana,
>
> thanks for your reply...
> all gluster-nodes running ubuntu 14.04 using apparmor. Even when it is 
> running without any configuration i have unloaded the module to 
> prevent any influence.
>
> i have stopped and deleted geo-replication one more time and started 
> the slave-upgrade.sh again but this time on gluster-wien-07, 
> geo-replication is currently not started again.
> the result is the same as before and more comprehensive than first 
> identified by myself...
> i have checked all directories in the root of each brick for a 
> trusted.gfid (567 dir's).
> only on subvolume aut-wien-01-replicate-0 each directory has an 
> trusted.gfid assigned.
> on subvolume ~replicate-1 and ~replicate-2 186 resp. 206 of 567 
> directories have an trusted.gfid assigned.
>
> for example the directory /gluster-export/1050 which have been seen in 
> the geo-replication logs before...
> the screenlog of the slave-upgrade.sh shows a 'failed' for setxattr on 
> 1050 but this folder exist and contains data / folders on each subvolume.
>
> [ 09:50:43 ] - root at gluster-wien-07 /usr/share/glusterfs/scripts $grep 
> 1050 screenlog.0 | head -3
> setxattr on ./1050="d4815ee4-3348-4105-9136-d0219d956ed8" failed (No 
> such file or directory)
> setxattr on 1050/recordings="6056c887-99bc-4fcc-bf39-8ea2478bb780" 
> failed (No such file or directory)
> setxattr on 
> 1050/recordings/REC_22_3619210_63112.mp4="63d127a3-a387-4cb6-bb4b-792dc422ebbf" 
> failed (No such file or directory)
> [ 09:50:53 ] - root at gluster-wien-07 /usr/share/glusterfs/scripts $
>
> [ 10:11:01 ] - root at gluster-wien-07  /gluster-export $getfattr -m . -d 
> -e hex 1050
> # file: 1050
> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0 
>
> trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9
>
> [ 10:11:10 ] - root at gluster-wien-07  /gluster-export $ls -li | grep 1050
>  17179869881 drwxr-xr-x 72  1009 admin   4096 Dec  2 21:34 1050
> [ 10:11:21 ] - root at gluster-wien-07  /gluster-export $du -hs 1050
> 877G    1050
> [ 10:11:29 ] - root at gluster-wien-07  /gluster-export $
>
> as far as i understood folder 1050 and many other folders should have 
> an unique trusted.gfid assigned like on all master nodes resp. on 
> subvolume aut-wien-01-replicate-0.
> does it make sense to start the geo-replication again or does this 
> issue need to be fixed before starting another attempt...?
> ...and if yes, does anybody know how to fix the missing trusted.gfid ? 
> just restarting slave-upgrade did not help.
>
> any help is appreciated.
>
> best regards
> dietmar
>
>
>
> volume aut-wien-01-client-0 remote-host gluster-wien-02-int
> volume aut-wien-01-client-1 remote-host gluster-wien-03-int
> volume aut-wien-01-client-2 remote-host gluster-wien-04-int
> volume aut-wien-01-client-3 remote-host gluster-wien-05-int
> volume aut-wien-01-client-4 remote-host gluster-wien-06-int
> volume aut-wien-01-client-5 remote-host gluster-wien-07-int
> volume aut-wien-01-replicate-0  subvolumes aut-wien-01-client-0 
> aut-wien-01-client-1
> volume aut-wien-01-replicate-1  subvolumes aut-wien-01-client-2 
> aut-wien-01-client-3
> volume aut-wien-01-replicate-2  subvolumes aut-wien-01-client-4 
> aut-wien-01-client-5
> volume glustershd
>     type debug/io-stats
>     subvolumes aut-wien-01-replicate-0 aut-wien-01-replicate-1 
> aut-wien-01-replicate-2
> end-volume
>
>
> Am 21.12.2015 um 08:08 schrieb Saravanakumar Arumugam:
>> Hi,
>> Replies inline..
>>
>> Thanks,
>> Saravana
>>
>> On 12/18/2015 10:02 PM, Dietmar Putz wrote:
>>> Hello again...
>>>
>>> after having some big trouble with an xfs issue in kernel 3.13.0-x 
>>> and 3.19.0-39 which has been 'solved' by downgrading to 3.8.4 
>>> (http://comments.gmane.org/gmane.comp.file-systems.xfs.general/71629)
>>> we decided to start a new geo-replication attempt from scratch...
>>> we have deleted the former geo-replication session and started a new 
>>> one as described in :
>>> http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.6
>>>
>>> master and slave is a distributed replicated volume running on 
>>> gluster 3.6.7 / ubuntu 14.04.
>>> setup worked as described but unfortunately geo-replication isn't 
>>> syncing files and remains in the below shown status.
>>>
>>> in the ~geo-replication-slaves/...gluster.log i can found on all 
>>> slave nodes messages like :
>>>
>>> [2015-12-16 15:06:46.837748] W [dht-layout.c:180:dht_layout_search] 
>>> 0-aut-wien-01-dht: no subvolume for hash (value) = 1448787070
>>> [2015-12-16 15:06:46.837789] W [fuse-bridge.c:1261:fuse_err_cbk] 
>>> 0-glusterfs-fuse: 74203: SETXATTR() 
>>> /.gfid/d4815ee4-3348-4105-9136-d0219d956ed8 => -1 (No such file or 
>>> directory)
>>> [2015-12-16 15:06:47.090212] I 
>>> [dht-layout.c:663:dht_layout_normalize] 0-aut-wien-01-dht: Found 
>>> anomalies in (null) (gfid = d4815ee4-3348-4105-9136-d0219d956ed8). 
>>> Holes=1 overlaps=0
>>>
>>> [2015-12-16 20:25:55.327874] W [fuse-bridge.c:1967:fuse_create_cbk] 
>>> 0-glusterfs-fuse: 199968: 
>>> /.gfid/603de79d-8d41-44bd-845e-3727cf64a617 => -1 (Operation not 
>>> permitted)
>>> [2015-12-16 20:25:55.617016] W [fuse-bridge.c:1967:fuse_create_cbk] 
>>> 0-glusterfs-fuse: 199971: 
>>> /.gfid/8622fb7d-8909-42de-adb5-c67ed6f006c0 => -1 (Operation not 
>>> permitted)
>> Please check whether selinux is enabled in both Master/Slave..I 
>> remember seeing such errors if selinux enabled.
>>
>>>
>>> this is found only on gluster-wien-03-int which is in 'Hybrid Crawl' :
>>> [2015-12-16 17:17:07.219939] W [fuse-bridge.c:1261:fuse_err_cbk] 
>>> 0-glusterfs-fuse: 123841: SETXATTR() 
>>> /.gfid/00000000-0000-0000-0000-000000000001 => -1 (File exists)
>>> [2015-12-16 17:17:07.220658] W 
>>> [client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-aut-wien-01-client-3: 
>>> remote operation failed: File exists. Path: /2301
>>> [2015-12-16 17:17:07.220702] W 
>>> [client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-aut-wien-01-client-2: 
>>> remote operation failed: File exists. Path: /2301
>>>
>> Some errors like "file exists" can be ignored.
>>>
>>> But first of all i would like to have a look at this message, found 
>>> about 6000 times on gluster-wien-05-int and ~07-int which are in 
>>> 'History Crawl':
>>> [2015-12-16 13:03:25.658359] W [fuse-bridge.c:483:fuse_entry_cbk] 
>>> 0-glusterfs-fuse: 119569: LOOKUP() 
>>> /.gfid/d4815ee4-3348-4105-9136-d0219d956ed8/.dstXXXfDyaP9 => -1 
>>> (Stale file handle)
>>>
>>> The gfid d4815ee4-3348-4105-9136-d0219d956ed8 
>>> 1050="d4815ee4-3348-4105-9136-d0219d956ed8" belongs as shown to the 
>>> folder 1050 in the brick-directory.
>>>
>>> any brick in the master volume looks like this one ...:
>>> Host : gluster-ger-ber-12-int
>>> # file: gluster-export/1050
>>> trusted.afr.dirty=0x000000000000000000000000
>>> trusted.afr.ger-ber-01-client-0=0x000000000000000000000000
>>> trusted.afr.ger-ber-01-client-1=0x000000000000000000000000
>>> trusted.afr.ger-ber-01-client-2=0x000000000000000000000000
>>> trusted.afr.ger-ber-01-client-3=0x000000000000000000000000
>>> trusted.gfid=0xd4815ee4334841059136d0219d956ed8
>>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.1c31dc4d-7ee3-423b-8577-c7b0ce2e356a.stime=0x56606290000c7e4e 
>>>
>>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x567428e000042116 
>>>
>>> trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9
>>>
>>> on the slave volume just the brick of wien-02 and wien-03 have the 
>>> same trusted.gfid
>>> Host : gluster-wien-03
>>> # file: gluster-export/1050
>>> trusted.afr.aut-wien-01-client-0=0x000000000000000000000000
>>> trusted.afr.aut-wien-01-client-1=0x000000000000000000000000
>>> trusted.gfid=0xd4815ee4334841059136d0219d956ed8
>>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0 
>>>
>>> trusted.glusterfs.dht=0x00000001000000000000000055555554
>>>
>>> all nodes in 'History Crawl' haven't this trusted.gfid assigned.
>>> Host : gluster-wien-05
>>> # file: gluster-export/1050
>>> trusted.afr.aut-wien-01-client-2=0x000000000000000000000000
>>> trusted.afr.aut-wien-01-client-3=0x000000000000000000000000
>>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0 
>>>
>>> trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff
>>>
>>> I'm not sure if it is normal or if that trusted.gfid should have 
>>> been assigned on all slave nodes by the slave-upgrade.sh script.
>>
>> As per the doc, it applies gfid on all slave nodes.
>>
>>> bash slave-upgrade.sh localhost:<aut-wien-01> 
>>> /tmp/master_gfid_file.txt $PWD/gsync-sync-gfid was running on 
>>> wien-02 which has password less login for any other slave node.
>>> as i could see in the process list slave-upgrade.sh was running on 
>>> each slave node and starts as far as i can remember with a 'rm -rf 
>>> ~/.glusterfs/...'
>>> so the mentioned gfid should disappeared by the slave-upgrade.sh but 
>>> should the trusted.gfid also be re-assigned by the script ?
>>> ...I'm confused,
>>> is the 'Stale file handle' message based on the missing trusted.gfid 
>>> for /gluster-export/1050/ on the nodes where the message appears ?
>>> does it make sense to geo-rep and to start the slave-upgrade.sh 
>>> script on the affected nodes without having access to the other 
>>> nodes to fix this ?
>>>
>>> currently I'm not sure if the 'stale file handle' messages prevent 
>>> us from getting a running geo-replication but i guess best way is 
>>> trying to get it running step by step...
>>> any help is appreciated.
>>>
>>> best regards
>>> dietmar
>>>
>>>
>>>
>>> [ 14:45:42 ] - root at gluster-ger-ber-07 
>>> /var/log/glusterfs/geo-replication/ger-ber-01 $gluster volume 
>>> geo-replication ger-ber-01 gluster-wien-02::aut-wien-01 status detail
>>>
>>> MASTER NODE           MASTER VOL    MASTER BRICK 
>>> SLAVE                               STATUS     CHECKPOINT STATUS    
>>> CRAWL STATUS     FILES SYNCD    FILES PENDING BYTES PENDING    
>>> DELETES PENDING    FILES SKIPPED
>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>> gluster-ger-ber-07    ger-ber-01    /gluster-export 
>>> gluster-wien-07-int::aut-wien-01    Active N/A History Crawl -6500 
>>> 0                0                5 6500
>>> gluster-ger-ber-12    ger-ber-01    /gluster-export 
>>> gluster-wien-06-int::aut-wien-01    Passive N/A N/A 0 
>>> 0                0                0 0
>>> gluster-ger-ber-11    ger-ber-01    /gluster-export 
>>> gluster-wien-03-int::aut-wien-01    Active N/A Hybrid Crawl 0 
>>> 8191             0                0 0
>>> gluster-ger-ber-09    ger-ber-01    /gluster-export 
>>> gluster-wien-05-int::aut-wien-01    Active N/A History Crawl -5792 
>>> 0                0                0 5793
>>> gluster-ger-ber-10    ger-ber-01    /gluster-export 
>>> gluster-wien-02-int::aut-wien-01    Passive N/A N/A 0 
>>> 0                0                0 0
>>> gluster-ger-ber-08    ger-ber-01    /gluster-export 
>>> gluster-wien-04-int::aut-wien-01    Passive N/A N/A 0 
>>> 0                0                0 0
>>> [ 14:45:46 ] - root at gluster-ger-ber-07 
>>> /var/log/glusterfs/geo-replication/ger-ber-01 $
>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users



More information about the Gluster-users mailing list