[Gluster-users] Replicated striped data lose

Mon Mar 14 06:02:40 UTC 2016

I have a pool of two bricks in the same server;

Volume Name: k
Type: Stripe
Volume ID: 1e9281ce-2a8b-44e8-a0c6-e3ebf7416b2b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gfs001:/bricks/t1/k
Brick2: gfs001:/bricks/t2/k
Options Reconfigured:
features.shard-block-size: 64MB
features.shard: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: off

same issue ...
glusterfs 3.7.8 built on Mar 10 2016 20:20:45.

Respectfully*
**Mahdi A. Mahdi*

Systems Administrator
IT. Department
Earthlink Telecommunications <https://www.facebook.com/earthlinktele>

Cell: 07903316180
Work: 3352
Skype: mahdi.adnan at outlook.com <mailto:mahdi.adnan at outlook.com>
On 03/14/2016 08:11 AM, Niels de Vos wrote:
> On Mon, Mar 14, 2016 at 08:12:27AM +0530, Krutika Dhananjay wrote:
>> It would be better to use sharding over stripe for your vm use case. It
>> offers better distribution and utilisation of bricks and better heal
>> performance.
>> And it is well tested.
> Basically the "striping" feature is deprecated, "sharding" is its
> improved replacement. I expect to see "striping" completely dropped in
> the next major release.
>
> Niels
>
>
>> Couple of things to note before you do that:
>> 1. Most of the bug fixes in sharding have gone into 3.7.8. So it is advised
>> that you use 3.7.8 or above.
>> 2. When you enable sharding on a volume, already existing files in the
>> volume do not get sharded. Only the files that are newly created from the
>> time sharding is enabled will.
>>      If you do want to shard the existing files, then you would need to cp
>> them to a temp name within the volume, and then rename them back to the
>> original file name.
>>
>> HTH,
>> Krutika
>>
>> On Sun, Mar 13, 2016 at 11:49 PM, Mahdi Adnan <mahdi.adnan at earthlinktele.com
>>> wrote:
>>> I couldn't find anything related to cache in the HBAs.
>>> what logs are useful in my case ? i see only bricks logs which contains
>>> nothing during the failure.
>>>
>>> ###
>>> [2016-03-13 18:05:19.728614] E [MSGID: 113022] [posix.c:1232:posix_mknod]
>>> 0-vmware-posix: mknod on
>>> /bricks/b003/vmware/.shard/17d75e20-16f1-405e-9fa5-99ee7b1bd7f1.511 failed
>>> [File exists]
>>> [2016-03-13 18:07:23.337086] E [MSGID: 113022] [posix.c:1232:posix_mknod]
>>> 0-vmware-posix: mknod on
>>> /bricks/b003/vmware/.shard/eef2d538-8eee-4e58-bc88-fbf7dc03b263.4095 failed
>>> [File exists]
>>> [2016-03-13 18:07:55.027600] W [trash.c:1922:trash_rmdir] 0-vmware-trash:
>>> rmdir issued on /.trashcan/, which is not permitted
>>> [2016-03-13 18:07:55.027635] I [MSGID: 115056]
>>> [server-rpc-fops.c:459:server_rmdir_cbk] 0-vmware-server: 41987: RMDIR
>>> /.trashcan/internal_op (00000000-0000-0000-0000-000000000005/internal_op)
>>> ==> (Operation not permitted) [Operation not permitted]
>>> [2016-03-13 18:11:34.353441] I [login.c:81:gf_auth] 0-auth/login: allowed
>>> user names: c0c72c37-477a-49a5-a305-3372c1c2f2b4
>>> [2016-03-13 18:11:34.353463] I [MSGID: 115029]
>>> [server-handshake.c:612:server_setvolume] 0-vmware-server: accepted client
>>> from gfs002-2727-2016/03/13-20:17:43:613597-vmware-client-4-0-0 (version:
>>> 3.7.8)
>>> [2016-03-13 18:11:34.591139] I [login.c:81:gf_auth] 0-auth/login: allowed
>>> user names: c0c72c37-477a-49a5-a305-3372c1c2f2b4
>>> [2016-03-13 18:11:34.591173] I [MSGID: 115029]
>>> [server-handshake.c:612:server_setvolume] 0-vmware-server: accepted client
>>> from gfs002-2719-2016/03/13-20:17:42:609388-vmware-client-4-0-0 (version:
>>> 3.7.8)
>>> ###
>>>
>>> ESXi just keeps telling me "Cannot clone T: The virtual disk is either
>>> corrupted or not a supported format.
>>> error
>>> 3/13/2016 9:06:20 PM
>>> Clone virtual machine
>>> T
>>> VCENTER.LOCAL\Administrator
>>> "
>>>
>>> My setup is 2 servers with a floating ip controlled by CTDB and my ESXi
>>> server mount the NFS via the floating ip.
>>>
>>>
>>>
>>>
>>>
>>> On 03/13/2016 08:40 PM, pkoelle wrote:
>>>
>>>> Am 13.03.2016 um 18:22 schrieb David Gossage:
>>>>
>>>>> On Sun, Mar 13, 2016 at 11:07 AM, Mahdi Adnan <
>>>>> mahdi.adnan at earthlinktele.com
>>>>>
>>>>>> wrote:
>>>>>>
>>>>> My HBAs are LSISAS1068E, and the filesystem is XFS.
>>>>>> I tried EXT4 and it did not help.
>>>>>> I have created a stripted volume in one server with two bricks, same
>>>>>> issue.
>>>>>> and i tried a replicated volume with just "sharding enabled" same issue,
>>>>>> as soon as i disable the sharding it works just fine, niether sharding
>>>>>> nor
>>>>>> striping works for me.
>>>>>> i did follow up with some of threads in the mailing list and tried some
>>>>>> of
>>>>>> the fixes that worked with the others, none worked for me. :(
>>>>>>
>>>>>>
>>>>> Is it possible the LSI has write-cache enabled?
>>>>>
>>>> Why is that relevant? Even the backing filesystem has no idea if there is
>>>> a RAID or write cache or whatever. There are blocks and sync(), end of
>>>> story.
>>>> If you lose power and screw up your recovery OR do funky stuff with SAS
>>>> multipathing that might be an issue with a controller cache. AFAIK thats
>>>> not what we are talking about.
>>>>
>>>> I'm afraid but unless the OP has some logs from the server, a
>>>> reproducible testcase or a backtrace from client or server this isn't
>>>> getting us anywhere.
>>>>
>>>> cheers
>>>> Paul
>>>>
>>>>
>>>>>
>>>>>
>>>>> On 03/13/2016 06:54 PM, David Gossage wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 13, 2016 at 8:16 AM, Mahdi Adnan <
>>>>>> mahdi.adnan at earthlinktele.com> wrote:
>>>>>>
>>>>>> Okay so i have enabled shard in my test volume and it did not help,
>>>>>>> stupidly enough, i have enabled it in a production volume
>>>>>>> "Distributed-Replicate" and it currpted  half of my VMs.
>>>>>>> I have updated Gluster to the latest and nothing seems to be changed in
>>>>>>> my situation.
>>>>>>> below the info of my volume;
>>>>>>>
>>>>>>>
>>>>>> I was pointing at the settings in that email as an example for
>>>>>> corruption
>>>>>> fixing. I wouldn't recommend enabling sharding if you haven't gotten the
>>>>>> base working yet on that cluster. What HBA's are you using and what is
>>>>>> layout of filesystem for bricks?
>>>>>>
>>>>>>
>>>>>> Number of Bricks: 3 x 2 = 6
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: gfs001:/bricks/b001/vmware
>>>>>>> Brick2: gfs002:/bricks/b004/vmware
>>>>>>> Brick3: gfs001:/bricks/b002/vmware
>>>>>>> Brick4: gfs002:/bricks/b005/vmware
>>>>>>> Brick5: gfs001:/bricks/b003/vmware
>>>>>>> Brick6: gfs002:/bricks/b006/vmware
>>>>>>> Options Reconfigured:
>>>>>>> performance.strict-write-ordering: on
>>>>>>> cluster.server-quorum-type: server
>>>>>>> cluster.quorum-type: auto
>>>>>>> network.remote-dio: enable
>>>>>>> performance.stat-prefetch: disable
>>>>>>> performance.io-cache: off
>>>>>>> performance.read-ahead: off
>>>>>>> performance.quick-read: off
>>>>>>> cluster.eager-lock: enable
>>>>>>> features.shard-block-size: 16MB
>>>>>>> features.shard: on
>>>>>>> performance.readdir-ahead: off
>>>>>>>
>>>>>>>
>>>>>>> On 03/12/2016 08:11 PM, David Gossage wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Mar 12, 2016 at 10:21 AM, Mahdi Adnan <
>>>>>>> <mahdi.adnan at earthlinktele.com>mahdi.adnan at earthlinktele.com> wrote:
>>>>>>>
>>>>>>> Both servers have HBA no RAIDs and i can setup a replicated or
>>>>>>>> dispensers without any issues.
>>>>>>>> Logs are clean and when i tried to migrate a vm and got the error,
>>>>>>>> nothing showed up in the logs.
>>>>>>>> i tried mounting the volume into my laptop and it mounted fine but,
>>>>>>>> if i
>>>>>>>> use dd to create a data file it just hang and i cant cancel it, and i
>>>>>>>> cant
>>>>>>>> unmount it or anything, i just have to reboot.
>>>>>>>> The same servers have another volume on other bricks in a distributed
>>>>>>>> replicas, works fine.
>>>>>>>> I have even tried the same setup in a virtual environment (created two
>>>>>>>> vms and install gluster and created a replicated striped) and again
>>>>>>>> same
>>>>>>>> thing, data corruption.
>>>>>>>>
>>>>>>>>
>>>>>>> I'd look through mail archives for a topic "Shard in Production" I
>>>>>>> think
>>>>>>> it's called.  The shard portion may not be relevant but it does discuss
>>>>>>> certain settings that had to be applied with regards to avoiding
>>>>>>> corruption
>>>>>>> with VM's.  You may want to try and disable the
>>>>>>> performance.readdir-ahead
>>>>>>> also.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On 03/12/2016 07:02 PM, David Gossage wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Mar 12, 2016 at 9:51 AM, Mahdi Adnan <
>>>>>>>> <mahdi.adnan at earthlinktele.com>mahdi.adnan at earthlinktele.com> wrote:
>>>>>>>>
>>>>>>>> Thanks David,
>>>>>>>>> My settings are all defaults, i have just created the pool and
>>>>>>>>> started
>>>>>>>>> it.
>>>>>>>>> I have set the settings as your recommendation and it seems to be the
>>>>>>>>> same issue;
>>>>>>>>>
>>>>>>>>> Type: Striped-Replicate
>>>>>>>>> Volume ID: 44adfd8c-2ed1-4aa5-b256-d12b64f7fc14
>>>>>>>>> Status: Started
>>>>>>>>> Number of Bricks: 1 x 2 x 2 = 4
>>>>>>>>> Transport-type: tcp
>>>>>>>>> Bricks:
>>>>>>>>> Brick1: gfs001:/bricks/t1/s
>>>>>>>>> Brick2: gfs002:/bricks/t1/s
>>>>>>>>> Brick3: gfs001:/bricks/t2/s
>>>>>>>>> Brick4: gfs002:/bricks/t2/s
>>>>>>>>> Options Reconfigured:
>>>>>>>>> performance.stat-prefetch: off
>>>>>>>>> network.remote-dio: on
>>>>>>>>> cluster.eager-lock: enable
>>>>>>>>> performance.io-cache: off
>>>>>>>>> performance.read-ahead: off
>>>>>>>>> performance.quick-read: off
>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Is their a raid controller perhaps doing any caching?
>>>>>>>>
>>>>>>>> In the gluster logs any errors being reported during migration
>>>>>>>> process?
>>>>>>>> Since they aren't in use yet have you tested making just mirrored
>>>>>>>> bricks
>>>>>>>> using different pairings of servers two at a time to see if problem
>>>>>>>> follows
>>>>>>>> certain machine or network ports?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 03/12/2016 03:25 PM, David Gossage wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Mar 12, 2016 at 1:55 AM, Mahdi Adnan <
>>>>>>>>> <mahdi.adnan at earthlinktele.com>mahdi.adnan at earthlinktele.com> wrote:
>>>>>>>>>
>>>>>>>>> Dears,
>>>>>>>>>> I have created a replicated striped volume with two bricks and two
>>>>>>>>>> servers but I can't use it because when I mount it in ESXi and try
>>>>>>>>>> to
>>>>>>>>>> migrate a VM to it, the data get corrupted.
>>>>>>>>>> Is any one have any idea why is this happening ?
>>>>>>>>>>
>>>>>>>>>> Dell 2950 x2
>>>>>>>>>> Seagate 15k 600GB
>>>>>>>>>> CentOS 7.2
>>>>>>>>>> Gluster 3.7.8
>>>>>>>>>>
>>>>>>>>>> Appreciate your help.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Most reports of this I have seen end up being settings related.  Post
>>>>>>>>> gluster volume info. Below is what I have seen as most common
>>>>>>>>> recommended
>>>>>>>>> settings.
>>>>>>>>> I'd hazard a guess you may have some the read ahead cache or prefetch
>>>>>>>>> on.
>>>>>>>>>
>>>>>>>>> quick-read=off
>>>>>>>>> read-ahead=off
>>>>>>>>> io-cache=off
>>>>>>>>> stat-prefetch=off
>>>>>>>>> eager-lock=enable
>>>>>>>>> remote-dio=on
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Mahdi Adnan
>>>>>>>>>> System Admin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>> <Gluster-users at gluster.org>Gluster-users at gluster.org
>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160314/d94403d5/attachment.html>