[Gluster-users] Replicated striped data lose

Mahdi Adnan mahdi.adnan at earthlinktele.com
Mon Mar 14 08:14:31 UTC 2016


So i have deployed a new server "Cisco UCS C220M4" and created a new volume;

Volume Name: testv
Type: Stripe
Volume ID: 55cdac79-fe87-4f1f-90c0-15c9100fe00b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.0.250:/mnt/b1/v
Brick2: 10.70.0.250:/mnt/b2/v
Options Reconfigured:
nfs.disable: off
features.shard-block-size: 64MB
features.shard: enable
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: off

same error ..

can anyone share with me the info of a working striped volume ?

On 03/14/2016 09:02 AM, Mahdi Adnan wrote:
> I have a pool of two bricks in the same server;
>
> Volume Name: k
> Type: Stripe
> Volume ID: 1e9281ce-2a8b-44e8-a0c6-e3ebf7416b2b
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: gfs001:/bricks/t1/k
> Brick2: gfs001:/bricks/t2/k
> Options Reconfigured:
> features.shard-block-size: 64MB
> features.shard: on
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.stat-prefetch: off
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> performance.readdir-ahead: off
>
> same issue ...
> glusterfs 3.7.8 built on Mar 10 2016 20:20:45.
>
>
> Respectfully*
> **Mahdi A. Mahdi*
>
> Systems Administrator
> IT. Department
> Earthlink Telecommunications <https://www.facebook.com/earthlinktele>
>
> Cell: 07903316180
> Work: 3352
> Skype: mahdi.adnan at outlook.com <mailto:mahdi.adnan at outlook.com>
> On 03/14/2016 08:11 AM, Niels de Vos wrote:
>> On Mon, Mar 14, 2016 at 08:12:27AM +0530, Krutika Dhananjay wrote:
>>> It would be better to use sharding over stripe for your vm use case. It
>>> offers better distribution and utilisation of bricks and better heal
>>> performance.
>>> And it is well tested.
>> Basically the "striping" feature is deprecated, "sharding" is its
>> improved replacement. I expect to see "striping" completely dropped in
>> the next major release.
>>
>> Niels
>>
>>
>>> Couple of things to note before you do that:
>>> 1. Most of the bug fixes in sharding have gone into 3.7.8. So it is advised
>>> that you use 3.7.8 or above.
>>> 2. When you enable sharding on a volume, already existing files in the
>>> volume do not get sharded. Only the files that are newly created from the
>>> time sharding is enabled will.
>>>      If you do want to shard the existing files, then you would need to cp
>>> them to a temp name within the volume, and then rename them back to the
>>> original file name.
>>>
>>> HTH,
>>> Krutika
>>>
>>> On Sun, Mar 13, 2016 at 11:49 PM, Mahdi Adnan <mahdi.adnan at earthlinktele.com
>>>> wrote:
>>>> I couldn't find anything related to cache in the HBAs.
>>>> what logs are useful in my case ? i see only bricks logs which contains
>>>> nothing during the failure.
>>>>
>>>> ###
>>>> [2016-03-13 18:05:19.728614] E [MSGID: 113022] [posix.c:1232:posix_mknod]
>>>> 0-vmware-posix: mknod on
>>>> /bricks/b003/vmware/.shard/17d75e20-16f1-405e-9fa5-99ee7b1bd7f1.511 failed
>>>> [File exists]
>>>> [2016-03-13 18:07:23.337086] E [MSGID: 113022] [posix.c:1232:posix_mknod]
>>>> 0-vmware-posix: mknod on
>>>> /bricks/b003/vmware/.shard/eef2d538-8eee-4e58-bc88-fbf7dc03b263.4095 failed
>>>> [File exists]
>>>> [2016-03-13 18:07:55.027600] W [trash.c:1922:trash_rmdir] 0-vmware-trash:
>>>> rmdir issued on /.trashcan/, which is not permitted
>>>> [2016-03-13 18:07:55.027635] I [MSGID: 115056]
>>>> [server-rpc-fops.c:459:server_rmdir_cbk] 0-vmware-server: 41987: RMDIR
>>>> /.trashcan/internal_op (00000000-0000-0000-0000-000000000005/internal_op)
>>>> ==> (Operation not permitted) [Operation not permitted]
>>>> [2016-03-13 18:11:34.353441] I [login.c:81:gf_auth] 0-auth/login: allowed
>>>> user names: c0c72c37-477a-49a5-a305-3372c1c2f2b4
>>>> [2016-03-13 18:11:34.353463] I [MSGID: 115029]
>>>> [server-handshake.c:612:server_setvolume] 0-vmware-server: accepted client
>>>> from gfs002-2727-2016/03/13-20:17:43:613597-vmware-client-4-0-0 (version:
>>>> 3.7.8)
>>>> [2016-03-13 18:11:34.591139] I [login.c:81:gf_auth] 0-auth/login: allowed
>>>> user names: c0c72c37-477a-49a5-a305-3372c1c2f2b4
>>>> [2016-03-13 18:11:34.591173] I [MSGID: 115029]
>>>> [server-handshake.c:612:server_setvolume] 0-vmware-server: accepted client
>>>> from gfs002-2719-2016/03/13-20:17:42:609388-vmware-client-4-0-0 (version:
>>>> 3.7.8)
>>>> ###
>>>>
>>>> ESXi just keeps telling me "Cannot clone T: The virtual disk is either
>>>> corrupted or not a supported format.
>>>> error
>>>> 3/13/2016 9:06:20 PM
>>>> Clone virtual machine
>>>> T
>>>> VCENTER.LOCAL\Administrator
>>>> "
>>>>
>>>> My setup is 2 servers with a floating ip controlled by CTDB and my ESXi
>>>> server mount the NFS via the floating ip.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 03/13/2016 08:40 PM, pkoelle wrote:
>>>>
>>>>> Am 13.03.2016 um 18:22 schrieb David Gossage:
>>>>>
>>>>>> On Sun, Mar 13, 2016 at 11:07 AM, Mahdi Adnan <
>>>>>> mahdi.adnan at earthlinktele.com
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>> My HBAs are LSISAS1068E, and the filesystem is XFS.
>>>>>>> I tried EXT4 and it did not help.
>>>>>>> I have created a stripted volume in one server with two bricks, same
>>>>>>> issue.
>>>>>>> and i tried a replicated volume with just "sharding enabled" same issue,
>>>>>>> as soon as i disable the sharding it works just fine, niether sharding
>>>>>>> nor
>>>>>>> striping works for me.
>>>>>>> i did follow up with some of threads in the mailing list and tried some
>>>>>>> of
>>>>>>> the fixes that worked with the others, none worked for me. :(
>>>>>>>
>>>>>>>
>>>>>> Is it possible the LSI has write-cache enabled?
>>>>>>
>>>>> Why is that relevant? Even the backing filesystem has no idea if there is
>>>>> a RAID or write cache or whatever. There are blocks and sync(), end of
>>>>> story.
>>>>> If you lose power and screw up your recovery OR do funky stuff with SAS
>>>>> multipathing that might be an issue with a controller cache. AFAIK thats
>>>>> not what we are talking about.
>>>>>
>>>>> I'm afraid but unless the OP has some logs from the server, a
>>>>> reproducible testcase or a backtrace from client or server this isn't
>>>>> getting us anywhere.
>>>>>
>>>>> cheers
>>>>> Paul
>>>>>
>>>>>
>>>>>>
>>>>>> On 03/13/2016 06:54 PM, David Gossage wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Mar 13, 2016 at 8:16 AM, Mahdi Adnan <
>>>>>>> mahdi.adnan at earthlinktele.com> wrote:
>>>>>>>
>>>>>>> Okay so i have enabled shard in my test volume and it did not help,
>>>>>>>> stupidly enough, i have enabled it in a production volume
>>>>>>>> "Distributed-Replicate" and it currpted  half of my VMs.
>>>>>>>> I have updated Gluster to the latest and nothing seems to be changed in
>>>>>>>> my situation.
>>>>>>>> below the info of my volume;
>>>>>>>>
>>>>>>>>
>>>>>>> I was pointing at the settings in that email as an example for
>>>>>>> corruption
>>>>>>> fixing. I wouldn't recommend enabling sharding if you haven't gotten the
>>>>>>> base working yet on that cluster. What HBA's are you using and what is
>>>>>>> layout of filesystem for bricks?
>>>>>>>
>>>>>>>
>>>>>>> Number of Bricks: 3 x 2 = 6
>>>>>>>> Transport-type: tcp
>>>>>>>> Bricks:
>>>>>>>> Brick1: gfs001:/bricks/b001/vmware
>>>>>>>> Brick2: gfs002:/bricks/b004/vmware
>>>>>>>> Brick3: gfs001:/bricks/b002/vmware
>>>>>>>> Brick4: gfs002:/bricks/b005/vmware
>>>>>>>> Brick5: gfs001:/bricks/b003/vmware
>>>>>>>> Brick6: gfs002:/bricks/b006/vmware
>>>>>>>> Options Reconfigured:
>>>>>>>> performance.strict-write-ordering: on
>>>>>>>> cluster.server-quorum-type: server
>>>>>>>> cluster.quorum-type: auto
>>>>>>>> network.remote-dio: enable
>>>>>>>> performance.stat-prefetch: disable
>>>>>>>> performance.io-cache: off
>>>>>>>> performance.read-ahead: off
>>>>>>>> performance.quick-read: off
>>>>>>>> cluster.eager-lock: enable
>>>>>>>> features.shard-block-size: 16MB
>>>>>>>> features.shard: on
>>>>>>>> performance.readdir-ahead: off
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/12/2016 08:11 PM, David Gossage wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Mar 12, 2016 at 10:21 AM, Mahdi Adnan <
>>>>>>>> <mahdi.adnan at earthlinktele.com>mahdi.adnan at earthlinktele.com> wrote:
>>>>>>>>
>>>>>>>> Both servers have HBA no RAIDs and i can setup a replicated or
>>>>>>>>> dispensers without any issues.
>>>>>>>>> Logs are clean and when i tried to migrate a vm and got the error,
>>>>>>>>> nothing showed up in the logs.
>>>>>>>>> i tried mounting the volume into my laptop and it mounted fine but,
>>>>>>>>> if i
>>>>>>>>> use dd to create a data file it just hang and i cant cancel it, and i
>>>>>>>>> cant
>>>>>>>>> unmount it or anything, i just have to reboot.
>>>>>>>>> The same servers have another volume on other bricks in a distributed
>>>>>>>>> replicas, works fine.
>>>>>>>>> I have even tried the same setup in a virtual environment (created two
>>>>>>>>> vms and install gluster and created a replicated striped) and again
>>>>>>>>> same
>>>>>>>>> thing, data corruption.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> I'd look through mail archives for a topic "Shard in Production" I
>>>>>>>> think
>>>>>>>> it's called.  The shard portion may not be relevant but it does discuss
>>>>>>>> certain settings that had to be applied with regards to avoiding
>>>>>>>> corruption
>>>>>>>> with VM's.  You may want to try and disable the
>>>>>>>> performance.readdir-ahead
>>>>>>>> also.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 03/12/2016 07:02 PM, David Gossage wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Mar 12, 2016 at 9:51 AM, Mahdi Adnan <
>>>>>>>>> <mahdi.adnan at earthlinktele.com>mahdi.adnan at earthlinktele.com> wrote:
>>>>>>>>>
>>>>>>>>> Thanks David,
>>>>>>>>>> My settings are all defaults, i have just created the pool and
>>>>>>>>>> started
>>>>>>>>>> it.
>>>>>>>>>> I have set the settings as your recommendation and it seems to be the
>>>>>>>>>> same issue;
>>>>>>>>>>
>>>>>>>>>> Type: Striped-Replicate
>>>>>>>>>> Volume ID: 44adfd8c-2ed1-4aa5-b256-d12b64f7fc14
>>>>>>>>>> Status: Started
>>>>>>>>>> Number of Bricks: 1 x 2 x 2 = 4
>>>>>>>>>> Transport-type: tcp
>>>>>>>>>> Bricks:
>>>>>>>>>> Brick1: gfs001:/bricks/t1/s
>>>>>>>>>> Brick2: gfs002:/bricks/t1/s
>>>>>>>>>> Brick3: gfs001:/bricks/t2/s
>>>>>>>>>> Brick4: gfs002:/bricks/t2/s
>>>>>>>>>> Options Reconfigured:
>>>>>>>>>> performance.stat-prefetch: off
>>>>>>>>>> network.remote-dio: on
>>>>>>>>>> cluster.eager-lock: enable
>>>>>>>>>> performance.io-cache: off
>>>>>>>>>> performance.read-ahead: off
>>>>>>>>>> performance.quick-read: off
>>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Is their a raid controller perhaps doing any caching?
>>>>>>>>>
>>>>>>>>> In the gluster logs any errors being reported during migration
>>>>>>>>> process?
>>>>>>>>> Since they aren't in use yet have you tested making just mirrored
>>>>>>>>> bricks
>>>>>>>>> using different pairings of servers two at a time to see if problem
>>>>>>>>> follows
>>>>>>>>> certain machine or network ports?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 03/12/2016 03:25 PM, David Gossage wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Mar 12, 2016 at 1:55 AM, Mahdi Adnan <
>>>>>>>>>> <mahdi.adnan at earthlinktele.com>mahdi.adnan at earthlinktele.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Dears,
>>>>>>>>>>> I have created a replicated striped volume with two bricks and two
>>>>>>>>>>> servers but I can't use it because when I mount it in ESXi and try
>>>>>>>>>>> to
>>>>>>>>>>> migrate a VM to it, the data get corrupted.
>>>>>>>>>>> Is any one have any idea why is this happening ?
>>>>>>>>>>>
>>>>>>>>>>> Dell 2950 x2
>>>>>>>>>>> Seagate 15k 600GB
>>>>>>>>>>> CentOS 7.2
>>>>>>>>>>> Gluster 3.7.8
>>>>>>>>>>>
>>>>>>>>>>> Appreciate your help.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Most reports of this I have seen end up being settings related.  Post
>>>>>>>>>> gluster volume info. Below is what I have seen as most common
>>>>>>>>>> recommended
>>>>>>>>>> settings.
>>>>>>>>>> I'd hazard a guess you may have some the read ahead cache or prefetch
>>>>>>>>>> on.
>>>>>>>>>>
>>>>>>>>>> quick-read=off
>>>>>>>>>> read-ahead=off
>>>>>>>>>> io-cache=off
>>>>>>>>>> stat-prefetch=off
>>>>>>>>>> eager-lock=enable
>>>>>>>>>> remote-dio=on
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Mahdi Adnan
>>>>>>>>>>> System Admin
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>> <Gluster-users at gluster.org>Gluster-users at gluster.org
>>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160314/88f2d80b/attachment.html>


More information about the Gluster-users mailing list