[Gluster-users] Replicated striped data lose

Krutika Dhananjay kdhananj at redhat.com
Tue Mar 15 08:04:47 UTC 2016


OK but what if you use it with replication? Do you still see the error? I
think not.
Could you give it a try and tell me what you find?

-Krutika

On Tue, Mar 15, 2016 at 1:23 PM, Mahdi Adnan <mahdi.adnan at earthlinktele.com>
wrote:

> Hi,
>
> I have created the following volume;
>
> Volume Name: v
> Type: Distribute
> Volume ID: 90de6430-7f83-4eda-a98f-ad1fabcf1043
> Status: Started
> Number of Bricks: 3
> Transport-type: tcp
> Bricks:
> Brick1: gfs001:/bricks/b001/v
> Brick2: gfs001:/bricks/b002/v
> Brick3: gfs001:/bricks/b003/v
> Options Reconfigured:
> features.shard-block-size: 128MB
> features.shard: enable
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.stat-prefetch: off
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> performance.readdir-ahead: on
>
> and after mounting it in ESXi and trying to clone a VM to it, i got the
> same error.
>
>
> Respectfully
> *Mahdi A. Mahdi*
>
>
> On 03/15/2016 10:44 AM, Krutika Dhananjay wrote:
>
> Hi,
>
> Do not use sharding and stripe together in the same volume because
> a) It is not recommended and there is no point in using both. Using
> sharding alone on your volume should work fine.
> b) Nobody tested it.
> c) Like Niels said, stripe feature is virtually deprecated.
>
> I would suggest that you create an nx3 volume where n is the number of
> distribute subvols you prefer, enable group virt options on it, and enable
> sharding on it,
> set the shard-block-size that you feel appropriate and then just start off
> with VM image creation etc.
> If you run into any issues even after you do this, let us know and we'll
> help you out.
>
> -Krutika
>
> On Tue, Mar 15, 2016 at 1:07 PM, Mahdi Adnan <
> mahdi.adnan at earthlinktele.com> wrote:
>
>> Thanks Krutika,
>>
>> I have deleted the volume and created a new one.
>> I found that it may be an issue with the NFS itself, i have created a new
>> striped volume and enabled sharding and mounted it via glusterfs and it
>> worked just fine, if i mount it with nfs it will fail and gives me the same
>> errors.
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> On 03/15/2016 06:24 AM, Krutika Dhananjay wrote:
>>
>> Hi,
>>
>> So could you share the xattrs associated with the file at
>> <BRICK_PATH>/.glusterfs/c3/e8/c3e88cc1-7e0a-4d46-9685-2d12131a5e1c
>>
>> Here's what you need to execute:
>>
>> # getfattr -d -m . -e hex
>> /mnt/b1/v/.glusterfs/c3/e8/c3e88cc1-7e0a-4d46-9685-2d12131a5e1c      on the
>> first node and
>>
>> # getfattr -d -m . -e hex
>> /mnt/b2/v/.glusterfs/c3/e8/c3e88cc1-7e0a-4d46-9685-2d12131a5e1c      on the
>> second.
>>
>>
>> Also, it is normally advised to use a replica 3 volume as opposed to
>> replica 2 volume to guard against split-brains.
>>
>> -Krutika
>>
>> On Mon, Mar 14, 2016 at 3:17 PM, Mahdi Adnan <
>> <mahdi.adnan at earthlinktele.com>mahdi.adnan at earthlinktele.com> wrote:
>>
>>> sorry for serial posting but, i got new logs it might help..
>>>
>>> the message appear during the migration;
>>>
>>> /var/log/glusterfs/nfs.log
>>>
>>>
>>> [2016-03-14 09:45:04.573765] I [MSGID: 109036]
>>> [dht-common.c:8043:dht_log_new_layout_for_dir_selfheal] 0-testv-dht:
>>> Setting layout of /New Virtual Machine_1 with [Subvol_name: testv-stripe-0,
>>> Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
>>> [2016-03-14 09:45:04.957499] E
>>> [shard.c:369:shard_modify_size_and_block_count]
>>> (-->/usr/lib64/glusterfs/3.7.8/xlator/cluster/distribute.so(dht_file_setattr_cbk+0x14f)
>>> [0x7f27a13c067f]
>>> -->/usr/lib64/glusterfs/3.7.8/xlator/features/shard.so(shard_common_setattr_cbk+0xcc)
>>> [0x7f27a116681c]
>>> -->/usr/lib64/glusterfs/3.7.8/xlator/features/shard.so(shard_modify_size_and_block_count+0xdd)
>>> [0x7f27a116584d] ) 0-testv-shard: Failed to get
>>> trusted.glusterfs.shard.file-size for c3e88cc1-7e0a-4d46-9685-2d12131a5e1c
>>> [2016-03-14 09:45:04.957577] W [MSGID: 112199]
>>> [nfs3-helpers.c:3418:nfs3_log_common_res] 0-nfs-nfsv3: /New Virtual
>>> Machine_1/New Virtual Machine-flat.vmdk => (XID: 3fec5a26, SETATTR: NFS:
>>> 22(Invalid argument for operation), POSIX: 22(Invalid argument)) [Invalid
>>> argument]
>>> [2016-03-14 09:45:05.079657] E [MSGID: 112069]
>>> [nfs3.c:3649:nfs3_rmdir_resume] 0-nfs-nfsv3: No such file or directory: (
>>> 192.168.221.52:826) testv : 00000000-0000-0000-0000-000000000001
>>>
>>>
>>>
>>> Respectfully
>>>
>>>
>>> *Mahdi A. Mahd *
>>> On 03/14/2016 11:14 AM, Mahdi Adnan wrote:
>>>
>>> So i have deployed a new server "Cisco UCS C220M4" and created a new
>>> volume;
>>>
>>> Volume Name: testv
>>> Type: Stripe
>>> Volume ID: 55cdac79-fe87-4f1f-90c0-15c9100fe00b
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: 10.70.0.250:/mnt/b1/v
>>> Brick2: 10.70.0.250:/mnt/b2/v
>>> Options Reconfigured:
>>> nfs.disable: off
>>> features.shard-block-size: 64MB
>>> features.shard: enable
>>> cluster.server-quorum-type: server
>>> cluster.quorum-type: auto
>>> network.remote-dio: enable
>>> cluster.eager-lock: enable
>>> performance.stat-prefetch: off
>>> performance.io-cache: off
>>> performance.read-ahead: off
>>> performance.quick-read: off
>>> performance.readdir-ahead: off
>>>
>>> same error ..
>>>
>>> can anyone share with me the info of a working striped volume ?
>>>
>>> On 03/14/2016 09:02 AM, Mahdi Adnan wrote:
>>>
>>> I have a pool of two bricks in the same server;
>>>
>>> Volume Name: k
>>> Type: Stripe
>>> Volume ID: 1e9281ce-2a8b-44e8-a0c6-e3ebf7416b2b
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: gfs001:/bricks/t1/k
>>> Brick2: gfs001:/bricks/t2/k
>>> Options Reconfigured:
>>> features.shard-block-size: 64MB
>>> features.shard: on
>>> cluster.server-quorum-type: server
>>> cluster.quorum-type: auto
>>> network.remote-dio: enable
>>> cluster.eager-lock: enable
>>> performance.stat-prefetch: off
>>> performance.io-cache: off
>>> performance.read-ahead: off
>>> performance.quick-read: off
>>> performance.readdir-ahead: off
>>>
>>> same issue ...
>>> glusterfs 3.7.8 built on Mar 10 2016 20:20:45.
>>>
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> Systems Administrator
>>> IT. Department
>>> Earthlink Telecommunications <https://www.facebook.com/earthlinktele>
>>>
>>> Cell: 07903316180
>>> Work: 3352
>>> Skype: <mahdi.adnan at outlook.com>mahdi.adnan at outlook.com
>>> On 03/14/2016 08:11 AM, Niels de Vos wrote:
>>>
>>> On Mon, Mar 14, 2016 at 08:12:27AM +0530, Krutika Dhananjay wrote:
>>>
>>> It would be better to use sharding over stripe for your vm use case. It
>>> offers better distribution and utilisation of bricks and better heal
>>> performance.
>>> And it is well tested.
>>>
>>> Basically the "striping" feature is deprecated, "sharding" is its
>>> improved replacement. I expect to see "striping" completely dropped in
>>> the next major release.
>>>
>>> Niels
>>>
>>>
>>>
>>> Couple of things to note before you do that:
>>> 1. Most of the bug fixes in sharding have gone into 3.7.8. So it is advised
>>> that you use 3.7.8 or above.
>>> 2. When you enable sharding on a volume, already existing files in the
>>> volume do not get sharded. Only the files that are newly created from the
>>> time sharding is enabled will.
>>>     If you do want to shard the existing files, then you would need to cp
>>> them to a temp name within the volume, and then rename them back to the
>>> original file name.
>>>
>>> HTH,
>>> Krutika
>>>
>>> On Sun, Mar 13, 2016 at 11:49 PM, Mahdi Adnan <mahdi.adnan at earthlinktele.com
>>>
>>> wrote:
>>>
>>> I couldn't find anything related to cache in the HBAs.
>>> what logs are useful in my case ? i see only bricks logs which contains
>>> nothing during the failure.
>>>
>>> ###
>>> [2016-03-13 18:05:19.728614] E [MSGID: 113022] [posix.c:1232:posix_mknod]
>>> 0-vmware-posix: mknod on
>>> /bricks/b003/vmware/.shard/17d75e20-16f1-405e-9fa5-99ee7b1bd7f1.511 failed
>>> [File exists]
>>> [2016-03-13 18:07:23.337086] E [MSGID: 113022] [posix.c:1232:posix_mknod]
>>> 0-vmware-posix: mknod on
>>> /bricks/b003/vmware/.shard/eef2d538-8eee-4e58-bc88-fbf7dc03b263.4095 failed
>>> [File exists]
>>> [2016-03-13 18:07:55.027600] W [trash.c:1922:trash_rmdir] 0-vmware-trash:
>>> rmdir issued on /.trashcan/, which is not permitted
>>> [2016-03-13 18:07:55.027635] I [MSGID: 115056]
>>> [server-rpc-fops.c:459:server_rmdir_cbk] 0-vmware-server: 41987: RMDIR
>>> /.trashcan/internal_op (00000000-0000-0000-0000-000000000005/internal_op)
>>> ==> (Operation not permitted) [Operation not permitted]
>>> [2016-03-13 18:11:34.353441] I [login.c:81:gf_auth] 0-auth/login: allowed
>>> user names: c0c72c37-477a-49a5-a305-3372c1c2f2b4
>>> [2016-03-13 18:11:34.353463] I [MSGID: 115029]
>>> [server-handshake.c:612:server_setvolume] 0-vmware-server: accepted client
>>> from gfs002-2727-2016/03/13-20:17:43:613597-vmware-client-4-0-0 (version:
>>> 3.7.8)
>>> [2016-03-13 18:11:34.591139] I [login.c:81:gf_auth] 0-auth/login: allowed
>>> user names: c0c72c37-477a-49a5-a305-3372c1c2f2b4
>>> [2016-03-13 18:11:34.591173] I [MSGID: 115029]
>>> [server-handshake.c:612:server_setvolume] 0-vmware-server: accepted client
>>> from gfs002-2719-2016/03/13-20:17:42:609388-vmware-client-4-0-0 (version:
>>> 3.7.8)
>>> ###
>>>
>>> ESXi just keeps telling me "Cannot clone T: The virtual disk is either
>>> corrupted or not a supported format.
>>> error
>>> 3/13/2016 9:06:20 PM
>>> Clone virtual machine
>>> T
>>> VCENTER.LOCAL\Administrator
>>> "
>>>
>>> My setup is 2 servers with a floating ip controlled by CTDB and my ESXi
>>> server mount the NFS via the floating ip.
>>>
>>>
>>>
>>>
>>>
>>> On 03/13/2016 08:40 PM, pkoelle wrote:
>>>
>>>
>>> Am 13.03.2016 um 18:22 schrieb David Gossage:
>>>
>>>
>>> On Sun, Mar 13, 2016 at 11:07 AM, Mahdi Adnan <mahdi.adnan at earthlinktele.com
>>>
>>> wrote:
>>>
>>>
>>> My HBAs are LSISAS1068E, and the filesystem is XFS.
>>>
>>> I tried EXT4 and it did not help.
>>> I have created a stripted volume in one server with two bricks, same
>>> issue.
>>> and i tried a replicated volume with just "sharding enabled" same issue,
>>> as soon as i disable the sharding it works just fine, niether sharding
>>> nor
>>> striping works for me.
>>> i did follow up with some of threads in the mailing list and tried some
>>> of
>>> the fixes that worked with the others, none worked for me. :(
>>>
>>>
>>>
>>> Is it possible the LSI has write-cache enabled?
>>>
>>>
>>> Why is that relevant? Even the backing filesystem has no idea if there is
>>> a RAID or write cache or whatever. There are blocks and sync(), end of
>>> story.
>>> If you lose power and screw up your recovery OR do funky stuff with SAS
>>> multipathing that might be an issue with a controller cache. AFAIK thats
>>> not what we are talking about.
>>>
>>> I'm afraid but unless the OP has some logs from the server, a
>>> reproducible testcase or a backtrace from client or server this isn't
>>> getting us anywhere.
>>>
>>> cheers
>>> Paul
>>>
>>>
>>>
>>> On 03/13/2016 06:54 PM, David Gossage wrote:
>>>
>>> On Sun, Mar 13, 2016 at 8:16 AM, Mahdi Adnan <mahdi.adnan at earthlinktele.com> wrote:
>>>
>>> Okay so i have enabled shard in my test volume and it did not help,
>>>
>>> stupidly enough, i have enabled it in a production volume
>>> "Distributed-Replicate" and it currpted  half of my VMs.
>>> I have updated Gluster to the latest and nothing seems to be changed in
>>> my situation.
>>> below the info of my volume;
>>>
>>>
>>>
>>> I was pointing at the settings in that email as an example for
>>> corruption
>>> fixing. I wouldn't recommend enabling sharding if you haven't gotten the
>>> base working yet on that cluster. What HBA's are you using and what is
>>> layout of filesystem for bricks?
>>>
>>>
>>> Number of Bricks: 3 x 2 = 6
>>>
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: gfs001:/bricks/b001/vmware
>>> Brick2: gfs002:/bricks/b004/vmware
>>> Brick3: gfs001:/bricks/b002/vmware
>>> Brick4: gfs002:/bricks/b005/vmware
>>> Brick5: gfs001:/bricks/b003/vmware
>>> Brick6: gfs002:/bricks/b006/vmware
>>> Options Reconfigured:
>>> performance.strict-write-ordering: on
>>> cluster.server-quorum-type: server
>>> cluster.quorum-type: auto
>>> network.remote-dio: enable
>>> performance.stat-prefetch: disable
>>> performance.io-cache: off
>>> performance.read-ahead: off
>>> performance.quick-read: off
>>> cluster.eager-lock: enable
>>> features.shard-block-size: 16MB
>>> features.shard: on
>>> performance.readdir-ahead: off
>>>
>>>
>>> On 03/12/2016 08:11 PM, David Gossage wrote:
>>>
>>>
>>> On Sat, Mar 12, 2016 at 10:21 AM, Mahdi Adnan <<mahdi.adnan at earthlinktele.com> <mahdi.adnan at earthlinktele.com>mahdi.adnan at earthlinktele.com> wrote:
>>>
>>> Both servers have HBA no RAIDs and i can setup a replicated or
>>>
>>> dispensers without any issues.
>>> Logs are clean and when i tried to migrate a vm and got the error,
>>> nothing showed up in the logs.
>>> i tried mounting the volume into my laptop and it mounted fine but,
>>> if i
>>> use dd to create a data file it just hang and i cant cancel it, and i
>>> cant
>>> unmount it or anything, i just have to reboot.
>>> The same servers have another volume on other bricks in a distributed
>>> replicas, works fine.
>>> I have even tried the same setup in a virtual environment (created two
>>> vms and install gluster and created a replicated striped) and again
>>> same
>>> thing, data corruption.
>>>
>>>
>>>
>>> I'd look through mail archives for a topic "Shard in Production" I
>>> think
>>> it's called.  The shard portion may not be relevant but it does discuss
>>> certain settings that had to be applied with regards to avoiding
>>> corruption
>>> with VM's.  You may want to try and disable the
>>> performance.readdir-ahead
>>> also.
>>>
>>>
>>>
>>>
>>> On 03/12/2016 07:02 PM, David Gossage wrote:
>>>
>>>
>>>
>>> On Sat, Mar 12, 2016 at 9:51 AM, Mahdi Adnan <<mahdi.adnan at earthlinktele.com> <mahdi.adnan at earthlinktele.com>mahdi.adnan at earthlinktele.com> wrote:
>>>
>>> Thanks David,
>>>
>>> My settings are all defaults, i have just created the pool and
>>> started
>>> it.
>>> I have set the settings as your recommendation and it seems to be the
>>> same issue;
>>>
>>> Type: Striped-Replicate
>>> Volume ID: 44adfd8c-2ed1-4aa5-b256-d12b64f7fc14
>>> Status: Started
>>> Number of Bricks: 1 x 2 x 2 = 4
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: gfs001:/bricks/t1/s
>>> Brick2: gfs002:/bricks/t1/s
>>> Brick3: gfs001:/bricks/t2/s
>>> Brick4: gfs002:/bricks/t2/s
>>> Options Reconfigured:
>>> performance.stat-prefetch: off
>>> network.remote-dio: on
>>> cluster.eager-lock: enable
>>> performance.io-cache: off
>>> performance.read-ahead: off
>>> performance.quick-read: off
>>> performance.readdir-ahead: on
>>>
>>>
>>>
>>> Is their a raid controller perhaps doing any caching?
>>>
>>> In the gluster logs any errors being reported during migration
>>> process?
>>> Since they aren't in use yet have you tested making just mirrored
>>> bricks
>>> using different pairings of servers two at a time to see if problem
>>> follows
>>> certain machine or network ports?
>>>
>>>
>>>
>>>
>>>
>>> On 03/12/2016 03:25 PM, David Gossage wrote:
>>>
>>>
>>>
>>> On Sat, Mar 12, 2016 at 1:55 AM, Mahdi Adnan <<mahdi.adnan at earthlinktele.com> <mahdi.adnan at earthlinktele.com>mahdi.adnan at earthlinktele.com> wrote:
>>>
>>> Dears,
>>>
>>> I have created a replicated striped volume with two bricks and two
>>> servers but I can't use it because when I mount it in ESXi and try
>>> to
>>> migrate a VM to it, the data get corrupted.
>>> Is any one have any idea why is this happening ?
>>>
>>> Dell 2950 x2
>>> Seagate 15k 600GB
>>> CentOS 7.2
>>> Gluster 3.7.8
>>>
>>> Appreciate your help.
>>>
>>>
>>>
>>> Most reports of this I have seen end up being settings related.  Post
>>> gluster volume info. Below is what I have seen as most common
>>> recommended
>>> settings.
>>> I'd hazard a guess you may have some the read ahead cache or prefetch
>>> on.
>>>
>>> quick-read=off
>>> read-ahead=off
>>> io-cache=off
>>> stat-prefetch=off
>>> eager-lock=enable
>>> remote-dio=on
>>>
>>>
>>>
>>> Mahdi Adnan
>>> System Admin
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list<Gluster-users at gluster.org> <Gluster-users at gluster.org>Gluster-users at gluster.org<http://www.gluster.org/mailman/listinfo/gluster-users> <http://www.gluster.org/mailman/listinfo/gluster-users>http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>  _______________________________________________
>>> Gluster-users mailing listGluster-users at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>  _______________________________________________
>>> Gluster-users mailing listGluster-users at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>> _______________________________________________
>>> Gluster-users mailing listGluster-users at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>> _______________________________________________
>>> Gluster-users mailing listGluster-users at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing listGluster-users at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing listGluster-users at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160315/5d29fb91/attachment-0001.html>


More information about the Gluster-users mailing list