[Gluster-users] Missing files on one of the bricks

Frederic Harmignies frederic.harmignies at elementai.com
Thu Nov 16 15:13:35 UTC 2017


Hello, we are using glusterfs 3.10.3.

We currently have a gluster heal volume full running, the crawl is still
running.

Starting time of crawl: Tue Nov 14 15:58:35 2017

Crawl is in progress
Type of crawl: FULL
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 0

getfattr from both files:

# getfattr -d -m . -e hex
/mnt/AIDATA/data//ishmaelb/experiments/omie/omieali/cifar10/donsker_grad_reg_ali_dcgan_stat_dcgan_ac_True/omieali_cifar10_zdim_100_enc_dcgan_dec_dcgan_stat_dcgan_posterior_propagated_enc_beta1.0_dec_beta_1.0_info_metric_donsker_varadhan_info_lam_0.334726025306_222219-23_10_17/data/data_gen_iter_86000.pkl
getfattr: Removing leading '/' from absolute path names
# file:
mnt/AIDATA/data//ishmaelb/experiments/omie/omieali/cifar10/donsker_grad_reg_ali_dcgan_stat_dcgan_ac_True/omieali_cifar10_zdim_100_enc_dcgan_dec_dcgan_stat_dcgan_posterior_propagated_enc_beta1.0_dec_beta_1.0_info_metric_donsker_varadhan_info_lam_0.334726025306_222219-23_10_17/data/data_gen_iter_86000.pkl
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.data01-client-0=0x000000000000000100000000
trusted.gfid=0x7e8513f4d4e24e66b0ba2dbe4c803c54

# getfattr -d -m . -e hex
/mnt/AIDATA/data/home/allac/experiments/171023_105655_mini_imagenet_projection_size_mixing_depth_num_filters_filter_size_block_depth_Explore\
architecture\ capacity/Explore\ architecture\
capacity\(projection_size\=32\;mixing_depth\=0\;num_filters\=64\;filter_size\=3\;block_depth\=3\)/model.ckpt-70001.data-00000-of-00001.tempstate1629411508065733704
getfattr: Removing leading '/' from absolute path names
# file:
mnt/AIDATA/data/home/allac/experiments/171023_105655_mini_imagenet_projection_size_mixing_depth_num_filters_filter_size_block_depth_Explore
architecture capacity/Explore architecture
capacity(projection_size=32;mixing_depth=0;num_filters=64;filter_size=3;block_depth=3)/model.ckpt-70001.data-00000-of-00001.tempstate1629411508065733704
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.data01-client-0=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005979d278000af1e7
trusted.gfid=0x9612ecd2106d42f295ebfef495c1d8ab


# gluster volume heal data01
Launching heal operation to perform index self heal on volume data01 has
been successful
Use heal info commands to check status
# cat /var/log/glusterfs/glustershd.log
[2017-11-12 08:39:01.907287] I [glusterfsd-mgmt.c:1789:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2017-11-15 08:18:02.084766] I [MSGID: 100011]
[glusterfsd.c:1414:reincarnate] 0-glusterfsd: Fetching the volume file from
server...
[2017-11-15 08:18:02.085718] I [glusterfsd-mgmt.c:1789:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2017-11-15 19:13:42.005307] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-data01-client-0: remote
operation failed. Path: <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54>
(7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) [No such file or directory]
The message "W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-data01-client-0: remote
operation failed. Path: <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54>
(7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) [No such file or directory]"
repeated 5 times between [2017-11-15 19:13:42.005307] and [2017-11-15
19:13:42.166579]
[2017-11-15 19:23:43.041956] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-data01-client-0: remote
operation failed. Path: <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54>
(7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) [No such file or directory]
The message "W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-data01-client-0: remote
operation failed. Path: <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54>
(7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) [No such file or directory]"
repeated 5 times between [2017-11-15 19:23:43.041956] and [2017-11-15
19:23:43.235831]
[2017-11-15 19:30:22.726808] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-data01-client-0: remote
operation failed. Path: <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54>
(7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) [No such file or directory]
The message "W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-data01-client-0: remote
operation failed. Path: <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54>
(7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) [No such file or directory]"
repeated 4 times between [2017-11-15 19:30:22.726808] and [2017-11-15
19:30:22.827631]
[2017-11-16 15:04:34.102010] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-data01-replicate-0: performing metadata selfheal on
9612ecd2-106d-42f2-95eb-fef495c1d8ab
[2017-11-16 15:04:34.186781] I [MSGID: 108026]
[afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0:
Completed metadata selfheal on 9612ecd2-106d-42f2-95eb-fef495c1d8ab.
sources=[1]  sinks=0
[2017-11-16 15:04:38.776070] I [MSGID: 108026]
[afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0:
Completed data selfheal on 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54.
sources=[1]  sinks=0
[2017-11-16 15:04:38.811744] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-data01-replicate-0: performing metadata selfheal on
7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54
[2017-11-16 15:04:38.867474] I [MSGID: 108026]
[afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0:
Completed metadata selfheal on 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54.
sources=[1]  sinks=0




On Thu, Nov 16, 2017 at 7:14 AM, Ravishankar N <ravishankar at redhat.com>
wrote:

>
>
> On 11/16/2017 04:12 PM, Nithya Balachandran wrote:
>
>
>
> On 15 November 2017 at 19:57, Frederic Harmignies <frederic.harmignies@
> elementai.com> wrote:
>
>> Hello, we have 2x files that are missing from one of the bricks. No idea
>> how to fix this.
>>
>> Details:
>>
>> # gluster volume info
>>
>> Volume Name: data01
>> Type: Replicate
>> Volume ID: 39b4479c-31f0-4696-9435-5454e4f8d310
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: 192.168.186.11:/mnt/AIDATA/data
>> Brick2: 192.168.186.12:/mnt/AIDATA/data
>> Options Reconfigured:
>> performance.cache-refresh-timeout: 30
>> client.event-threads: 16
>> server.event-threads: 32
>> performance.readdir-ahead: off
>> performance.io-thread-count: 32
>> performance.cache-size: 32GB
>> transport.address-family: inet
>> nfs.disable: on
>> features.trash: off
>> features.trash-max-filesize: 500MB
>>
>> # gluster volume heal data01 info
>> Brick 192.168.186.11:/mnt/AIDATA/data
>> Status: Connected
>> Number of entries: 0
>>
>> Brick 192.168.186.12:/mnt/AIDATA/data
>> <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54>
>> <gfid:9612ecd2-106d-42f2-95eb-fef495c1d8ab>
>> Status: Connected
>> Number of entries: 2
>>
>> # gluster volume heal data01 info split-brain
>> Brick 192.168.186.11:/mnt/AIDATA/data
>> Status: Connected
>> Number of entries in split-brain: 0
>>
>> Brick 192.168.186.12:/mnt/AIDATA/data
>> Status: Connected
>> Number of entries in split-brain: 0
>>
>>
>> Both files is missing from the folder on Brick1, the gfid files are also
>> missing in the .gluster folder on that same Brick1.
>> Brick2 has both the files and the gfid file in .gluster
>>
>> We already tried:
>>
>>  #gluster heal volume full
>> Running a stat and ls -l on both files from a mounted client to try and
>> trigger a heal
>>
>> Would a re-balance fix this? Any guidance would be greatly appreciated!
>>
>
> A rebalance would not help here as this is a replicate volume. Ravi, any
> idea what could be going wrong here?
>
> No, explicit lookup should have healed the file on the missing brick.
> Unless lookup did not hit afr and is served from caching translators.
> Frederic, what version of gluster are you running? Can you launch 'gluster
> heal volume' and see glustershd logs for possible warnings? Use DEBUG
> client-log-level if you have to.  Also, instead of stat, try a getfattr on
> the file from the mount.
>
> -Ravi
>
>
> Regards,
> Nithya
>
>>
>> Thank you in advance!
>>
>> --
>>
>> *Frederic Harmignies*
>> *High Performance Computer Administrator*
>>
>> www.elementai.com
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>


-- 

*Frederic Harmignies*
*High Performance Computer Administrator*

www.elementai.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171116/008ee18a/attachment.html>


More information about the Gluster-users mailing list