[Gluster-users] Missing files on one of the bricks

Frederic Harmignies frederic.harmignies at elementai.com
Thu Nov 16 18:13:21 UTC 2017


Hello, looks like the full heal fixed the problem, i was just impatient :)

[2017-11-16 15:04:34.102010] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-data01-replicate-0: performing metadata selfheal on
9612ecd2-106d-42f2-95eb-fef495c1d8ab
[2017-11-16 15:04:34.186781] I [MSGID: 108026]
[afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0:
Completed metadata selfheal on 9612ecd2-106d-42f2-95eb-fef495c1d8ab.
sources=[1]  sinks=0
[2017-11-16 15:04:38.776070] I [MSGID: 108026]
[afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0:
Completed data selfheal on 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54.
sources=[1]  sinks=0
[2017-11-16 15:04:38.811744] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-data01-replicate-0: performing metadata selfheal on
7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54
[2017-11-16 15:04:38.867474] I [MSGID: 108026]
[afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0:
Completed metadata selfheal on 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54.
sources=[1]  sinks=0

# gluster volume heal data01 info
Brick 192.168.186.11:/mnt/AIDATA/data
Status: Connected
Number of entries: 0

Brick 192.168.186.12:/mnt/AIDATA/data
Status: Connected
Number of entries: 0

Thank you for your fast response!

On Thu, Nov 16, 2017 at 10:13 AM, Frederic Harmignies <
frederic.harmignies at elementai.com> wrote:

> Hello, we are using glusterfs 3.10.3.
>
> We currently have a gluster heal volume full running, the crawl is still
> running.
>
> Starting time of crawl: Tue Nov 14 15:58:35 2017
>
> Crawl is in progress
> Type of crawl: FULL
> No. of entries healed: 0
> No. of entries in split-brain: 0
> No. of heal failed entries: 0
>
> getfattr from both files:
>
> # getfattr -d -m . -e hex /mnt/AIDATA/data//ishmaelb/
> experiments/omie/omieali/cifar10/donsker_grad_reg_ali_
> dcgan_stat_dcgan_ac_True/omieali_cifar10_zdim_100_enc_
> dcgan_dec_dcgan_stat_dcgan_posterior_propagated_enc_
> beta1.0_dec_beta_1.0_info_metric_donsker_varadhan_info_
> lam_0.334726025306_222219-23_10_17/data/data_gen_iter_86000.pkl
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/AIDATA/data//ishmaelb/experiments/omie/omieali/
> cifar10/donsker_grad_reg_ali_dcgan_stat_dcgan_ac_True/
> omieali_cifar10_zdim_100_enc_dcgan_dec_dcgan_stat_dcgan_
> posterior_propagated_enc_beta1.0_dec_beta_1.0_info_
> metric_donsker_varadhan_info_lam_0.334726025306_222219-23_
> 10_17/data/data_gen_iter_86000.pkl
> security.selinux=0x73797374656d5f753a6f626a6563
> 745f723a756e6c6162656c65645f743a733000
> trusted.afr.data01-client-0=0x000000000000000100000000
> trusted.gfid=0x7e8513f4d4e24e66b0ba2dbe4c803c54
>
> # getfattr -d -m . -e hex /mnt/AIDATA/data/home/allac/
> experiments/171023_105655_mini_imagenet_projection_size_
> mixing_depth_num_filters_filter_size_block_depth_Explore\ architecture\
> capacity/Explore\ architecture\ capacity\(projection_size\=32\
> ;mixing_depth\=0\;num_filters\=64\;filter_size\=3\;block_
> depth\=3\)/model.ckpt-70001.data-00000-of-00001.
> tempstate1629411508065733704
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/AIDATA/data/home/allac/experiments/171023_105655_
> mini_imagenet_projection_size_mixing_depth_num_filters_
> filter_size_block_depth_Explore architecture capacity/Explore
> architecture capacity(projection_size=32;mixing_depth=0;num_filters=64;
> filter_size=3;block_depth=3)/model.ckpt-70001.data-00000-of-00001.
> tempstate1629411508065733704
> security.selinux=0x73797374656d5f753a6f626a6563
> 745f723a756e6c6162656c65645f743a733000
> trusted.afr.data01-client-0=0x000000000000000000000000
> trusted.bit-rot.version=0x02000000000000005979d278000af1e7
> trusted.gfid=0x9612ecd2106d42f295ebfef495c1d8ab
>
>
> # gluster volume heal data01
> Launching heal operation to perform index self heal on volume data01 has
> been successful
> Use heal info commands to check status
> # cat /var/log/glusterfs/glustershd.log
> [2017-11-12 08:39:01.907287] I [glusterfsd-mgmt.c:1789:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2017-11-15 08:18:02.084766] I [MSGID: 100011] [glusterfsd.c:1414:reincarnate]
> 0-glusterfsd: Fetching the volume file from server...
> [2017-11-15 08:18:02.085718] I [glusterfsd-mgmt.c:1789:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2017-11-15 19:13:42.005307] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 0-data01-client-0: remote operation failed. Path:
> <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54)
> [No such file or directory]
> The message "W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 0-data01-client-0: remote operation failed. Path:
> <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54)
> [No such file or directory]" repeated 5 times between [2017-11-15
> 19:13:42.005307] and [2017-11-15 19:13:42.166579]
> [2017-11-15 19:23:43.041956] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 0-data01-client-0: remote operation failed. Path:
> <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54)
> [No such file or directory]
> The message "W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 0-data01-client-0: remote operation failed. Path:
> <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54)
> [No such file or directory]" repeated 5 times between [2017-11-15
> 19:23:43.041956] and [2017-11-15 19:23:43.235831]
> [2017-11-15 19:30:22.726808] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 0-data01-client-0: remote operation failed. Path:
> <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54)
> [No such file or directory]
> The message "W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 0-data01-client-0: remote operation failed. Path:
> <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54)
> [No such file or directory]" repeated 4 times between [2017-11-15
> 19:30:22.726808] and [2017-11-15 19:30:22.827631]
> [2017-11-16 15:04:34.102010] I [MSGID: 108026]
> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> 0-data01-replicate-0: performing metadata selfheal on
> 9612ecd2-106d-42f2-95eb-fef495c1d8ab
> [2017-11-16 15:04:34.186781] I [MSGID: 108026]
> [afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0:
> Completed metadata selfheal on 9612ecd2-106d-42f2-95eb-fef495c1d8ab.
> sources=[1]  sinks=0
> [2017-11-16 15:04:38.776070] I [MSGID: 108026]
> [afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0:
> Completed data selfheal on 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54.
> sources=[1]  sinks=0
> [2017-11-16 15:04:38.811744] I [MSGID: 108026]
> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> 0-data01-replicate-0: performing metadata selfheal on
> 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54
> [2017-11-16 15:04:38.867474] I [MSGID: 108026]
> [afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0:
> Completed metadata selfheal on 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54.
> sources=[1]  sinks=0
>
>
>
>
> On Thu, Nov 16, 2017 at 7:14 AM, Ravishankar N <ravishankar at redhat.com>
> wrote:
>
>>
>>
>> On 11/16/2017 04:12 PM, Nithya Balachandran wrote:
>>
>>
>>
>> On 15 November 2017 at 19:57, Frederic Harmignies <
>> frederic.harmignies at elementai.com> wrote:
>>
>>> Hello, we have 2x files that are missing from one of the bricks. No idea
>>> how to fix this.
>>>
>>> Details:
>>>
>>> # gluster volume info
>>>
>>> Volume Name: data01
>>> Type: Replicate
>>> Volume ID: 39b4479c-31f0-4696-9435-5454e4f8d310
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: 192.168.186.11:/mnt/AIDATA/data
>>> Brick2: 192.168.186.12:/mnt/AIDATA/data
>>> Options Reconfigured:
>>> performance.cache-refresh-timeout: 30
>>> client.event-threads: 16
>>> server.event-threads: 32
>>> performance.readdir-ahead: off
>>> performance.io-thread-count: 32
>>> performance.cache-size: 32GB
>>> transport.address-family: inet
>>> nfs.disable: on
>>> features.trash: off
>>> features.trash-max-filesize: 500MB
>>>
>>> # gluster volume heal data01 info
>>> Brick 192.168.186.11:/mnt/AIDATA/data
>>> Status: Connected
>>> Number of entries: 0
>>>
>>> Brick 192.168.186.12:/mnt/AIDATA/data
>>> <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54>
>>> <gfid:9612ecd2-106d-42f2-95eb-fef495c1d8ab>
>>> Status: Connected
>>> Number of entries: 2
>>>
>>> # gluster volume heal data01 info split-brain
>>> Brick 192.168.186.11:/mnt/AIDATA/data
>>> Status: Connected
>>> Number of entries in split-brain: 0
>>>
>>> Brick 192.168.186.12:/mnt/AIDATA/data
>>> Status: Connected
>>> Number of entries in split-brain: 0
>>>
>>>
>>> Both files is missing from the folder on Brick1, the gfid files are also
>>> missing in the .gluster folder on that same Brick1.
>>> Brick2 has both the files and the gfid file in .gluster
>>>
>>> We already tried:
>>>
>>>  #gluster heal volume full
>>> Running a stat and ls -l on both files from a mounted client to try and
>>> trigger a heal
>>>
>>> Would a re-balance fix this? Any guidance would be greatly appreciated!
>>>
>>
>> A rebalance would not help here as this is a replicate volume. Ravi, any
>> idea what could be going wrong here?
>>
>> No, explicit lookup should have healed the file on the missing brick.
>> Unless lookup did not hit afr and is served from caching translators.
>> Frederic, what version of gluster are you running? Can you launch
>> 'gluster heal volume' and see glustershd logs for possible warnings? Use
>> DEBUG client-log-level if you have to.  Also, instead of stat, try a
>> getfattr on the file from the mount.
>>
>> -Ravi
>>
>>
>> Regards,
>> Nithya
>>
>>>
>>> Thank you in advance!
>>>
>>> --
>>>
>>> *Frederic Harmignies*
>>> *High Performance Computer Administrator*
>>>
>>> www.elementai.com
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>
>
> --
>
> *Frederic Harmignies*
> *High Performance Computer Administrator*
>
> www.elementai.com
>



-- 

*Frederic Harmignies*
*High Performance Computer Administrator*

www.elementai.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171116/e8592d5a/attachment.html>


More information about the Gluster-users mailing list