[Gluster-devel] Input/output error when files in .shard folder are deleted

Krutika Dhananjay kdhananj at redhat.com
Wed Oct 26 12:09:20 UTC 2016


Do you also have the brick logs? Looks like the bricks are returning EINVAL
on lookup
which AFR is subsequently converting into an EIO. And sharding is merely
delivering the same error code upwards.

-Krutika

On Wed, Oct 26, 2016 at 6:38 AM, qingwei wei <tchengwee at gmail.com> wrote:

> Hi,
>
> Pls see the client log below.
>
> [2016-10-24 10:29:51.111603] I [fuse-bridge.c:5171:fuse_graph_setup]
> 0-fuse: switched to graph 0
> [2016-10-24 10:29:51.111662] I [MSGID: 114035]
> [client-handshake.c:193:client_set_lk_version_cbk]
> 0-testHeal-client-2: Server lk version = 1
> [2016-10-24 10:29:51.112371] I [fuse-bridge.c:4083:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22
> kernel 7.22
> [2016-10-24 10:29:51.113563] I [MSGID: 108031]
> [afr-common.c:2071:afr_local_discovery_cbk] 0-testHeal-replicate-0:
> selecting local read_child testHeal-client-2
> [2016-10-24 10:29:51.113604] I [MSGID: 108031]
> [afr-common.c:2071:afr_local_discovery_cbk] 0-testHeal-replicate-0:
> selecting local read_child testHeal-client-0
> [2016-10-24 10:29:51.113630] I [MSGID: 108031]
> [afr-common.c:2071:afr_local_discovery_cbk] 0-testHeal-replicate-0:
> selecting local read_child testHeal-client-1
> [2016-10-24 10:29:54.016802] W [MSGID: 108001]
> [afr-transaction.c:789:afr_handle_quorum] 0-testHeal-replicate-0:
> /.shard/9061198a-eb7e-45a2-93fb-eb396d1b2727.1: F
> ailing MKNOD as quorum is not met
> [2016-10-24 10:29:54.019330] W [MSGID: 114031]
> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal-client-0:
> remote operation failed. Path: (null) (00000000-
> 0000-0000-0000-000000000000) [Invalid argument]
> [2016-10-24 10:29:54.019343] W [MSGID: 114031]
> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal-client-2:
> remote operation failed. Path: (null) (00000000-
> 0000-0000-0000-000000000000) [Invalid argument]
> [2016-10-24 10:29:54.019373] W [MSGID: 114031]
> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal-client-1:
> remote operation failed. Path: (null) (00000000-
> 0000-0000-0000-000000000000) [Invalid argument]
> [2016-10-24 10:29:54.019854] E [MSGID: 133010]
> [shard.c:1582:shard_common_lookup_shards_cbk] 0-testHeal-shard: Lookup
> on shard 1 failed. Base file gfid = 9061198a
> -eb7e-45a2-93fb-eb396d1b2727 [Input/output error]
> [2016-10-24 10:29:54.020886] W [fuse-bridge.c:2227:fuse_readv_cbk]
> 0-glusterfs-fuse: 135: READ => -1
> gfid=9061198a-eb7e-45a2-93fb-eb396d1b2727 fd=0x7f70c80d12dc (
> Input/output error)
> [2016-10-24 10:29:54.118264] W [MSGID: 114031]
> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal-client-0:
> remote operation failed. Path: (null) (00000000-
> 0000-0000-0000-000000000000) [Invalid argument]
> [2016-10-24 10:29:54.118308] W [MSGID: 114031]
> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal-client-2:
> remote operation failed. Path: (null)
> (00000000-0000-0000-0000-000000000000) [Invalid argument]
> [2016-10-24 10:29:54.118329] W [MSGID: 114031]
> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal-client-1:
> remote operation failed. Path: (null)
> (00000000-0000-0000-0000-000000000000) [Invalid argument]
> [2016-10-24 10:29:54.118751] E [MSGID: 133010]
> [shard.c:1582:shard_common_lookup_shards_cbk] 0-testHeal-shard: Lookup
> on shard 1 failed. Base file gfid =
> 9061198a-eb7e-45a2-93fb-eb396d1b2727 [Input/output error]
> [2016-10-24 10:29:54.118787] W [fuse-bridge.c:2227:fuse_readv_cbk]
> 0-glusterfs-fuse: 137: READ => -1
> gfid=9061198a-eb7e-45a2-93fb-eb396d1b2727 fd=0x7f70c80d12dc
> (Input/output error)
> [2016-10-24 10:29:54.119330] W [MSGID: 114031]
> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal-client-1:
> remote operation failed. Path: (null)
> (00000000-0000-0000-0000-000000000000) [Invalid argument]
> [2016-10-24 10:29:54.119338] W [MSGID: 114031]
> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal-client-0:
> remote operation failed. Path: (null)
> (00000000-0000-0000-0000-000000000000) [Invalid argument]
> [2016-10-24 10:29:54.119368] W [MSGID: 114031]
> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal-client-2:
> remote operation failed. Path: (null)
> (00000000-0000-0000-0000-000000000000) [Invalid argument]
> [2016-10-24 10:29:54.119674] E [MSGID: 133010]
> [shard.c:1582:shard_common_lookup_shards_cbk] 0-testHeal-shard: Lookup
> on shard 1 failed. Base file gfid =
> 9061198a-eb7e-45a2-93fb-eb396d1b2727 [Input/output error]
> [2016-10-24 10:29:54.119715] W [fuse-bridge.c:2227:fuse_readv_cbk]
> 0-glusterfs-fuse: 138: READ => -1
> gfid=9061198a-eb7e-45a2-93fb-eb396d1b2727 fd=0x7f70c80d12dc
> (Input/output error)
> [2016-10-24 10:36:13.140414] W [MSGID: 114031]
> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal-client-0:
> remote operation failed. Path: (null)
> (00000000-0000-0000-0000-000000000000) [Invalid argument]
> [2016-10-24 10:36:13.140451] W [MSGID: 114031]
> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal-client-2:
> remote operation failed. Path: (null)
> (00000000-0000-0000-0000-000000000000) [Invalid argument]
> [2016-10-24 10:36:13.140461] W [MSGID: 114031]
> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal-client-1:
> remote operation failed. Path: (null)
> (00000000-0000-0000-0000-000000000000) [Invalid argument]
> [2016-10-24 10:36:13.140956] E [MSGID: 133010]
> [shard.c:1582:shard_common_lookup_shards_cbk] 0-testHeal-shard: Lookup
> on shard 1 failed. Base file gfid =
> 9061198a-eb7e-45a2-93fb-eb396d1b2727 [Input/output error]
> [2016-10-24 10:36:13.140995] W [fuse-bridge.c:2227:fuse_readv_cbk]
> 0-glusterfs-fuse: 145: READ => -1
> gfid=9061198a-eb7e-45a2-93fb-eb396d1b2727 fd=0x7f70c80d12dc
> (Input/output error)
> [2016-10-25 03:22:01.220025] I [MSGID: 100011]
> [glusterfsd.c:1323:reincarnate] 0-glusterfsd: Fetching the volume file
> from server...
> [2016-10-25 03:22:01.220938] I
> [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] 0-glusterfs: No change in
> volfile, continuing
>
> I also attached the log in this email.
>
> Thanks.
>
> Cwtan
>
>
> On Wed, Oct 26, 2016 at 12:30 AM, Krutika Dhananjay <kdhananj at redhat.com>
> wrote:
> > Tried it locally on my setup. Worked fine.
> >
> > Could you please attach the mount logs?
> >
> > -Krutika
> >
> > On Tue, Oct 25, 2016 at 6:55 PM, Pranith Kumar Karampuri
> > <pkarampu at redhat.com> wrote:
> >>
> >> +Krutika
> >>
> >> On Mon, Oct 24, 2016 at 4:10 PM, qingwei wei <tchengwee at gmail.com>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am currently running a simple gluster setup using one server node
> >>> with multiple disks. I realize that if i delete away all the .shard
> >>> files in one replica in the backend, my application (dd) will report
> >>> Input/Output error even though i have 3 replicas.
> >>>
> >>> My gluster version is 3.7.16
> >>>
> >>> gluster volume file
> >>>
> >>> Volume Name: testHeal
> >>> Type: Replicate
> >>> Volume ID: 26d16d7f-bc4f-44a6-a18b-eab780d80851
> >>> Status: Started
> >>> Number of Bricks: 1 x 3 = 3
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: 192.168.123.4:/mnt/sdb_mssd/testHeal2
> >>> Brick2: 192.168.123.4:/mnt/sde_mssd/testHeal2
> >>> Brick3: 192.168.123.4:/mnt/sdd_mssd/testHeal2
> >>> Options Reconfigured:
> >>> cluster.self-heal-daemon: on
> >>> features.shard-block-size: 16MB
> >>> features.shard: on
> >>> performance.readdir-ahead: on
> >>>
> >>> dd error
> >>>
> >>> [root at fujitsu05 .shard]# dd of=/home/test if=/mnt/fuseMount/ddTest
> >>> bs=16M count=20 oflag=direct
> >>> dd: error reading ‘/mnt/fuseMount/ddTest’: Input/output error
> >>> 1+0 records in
> >>> 1+0 records out
> >>> 16777216 bytes (17 MB) copied, 0.111038 s, 151 MB/s
> >>>
> >>> in the .shard folder where i deleted all the .shard file, i can see
> >>> one .shard file is recreated
> >>>
> >>> getfattr -d -e hex -m.  9061198a-eb7e-45a2-93fb-eb396d1b2727.1
> >>> # file: 9061198a-eb7e-45a2-93fb-eb396d1b2727.1
> >>> trusted.afr.testHeal-client-0=0x000000010000000100000000
> >>> trusted.afr.testHeal-client-2=0x000000010000000100000000
> >>> trusted.gfid=0x41b653f7daa14627b1f91f9e8554ddde
> >>>
> >>> However, the gfid is not the same compare to the other replicas
> >>>
> >>> getfattr -d -e hex -m.  9061198a-eb7e-45a2-93fb-eb396d1b2727.1
> >>> # file: 9061198a-eb7e-45a2-93fb-eb396d1b2727.1
> >>> trusted.afr.dirty=0x000000000000000000000000
> >>> trusted.afr.testHeal-client-1=0x000000000000000000000000
> >>> trusted.bit-rot.version=0x0300000000000000580dde99000e5e5d
> >>> trusted.gfid=0x9ee5c5eed7964a6cb9ac1a1419de5a40
> >>>
> >>> Is this consider a bug?
> >>>
> >>> Regards,
> >>>
> >>> Cwtan
> >>> _______________________________________________
> >>> Gluster-devel mailing list
> >>> Gluster-devel at gluster.org
> >>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>
> >>
> >>
> >>
> >> --
> >> Pranith
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20161026/a19eac5d/attachment.html>


More information about the Gluster-devel mailing list