[Gluster-users] Significant issues after update to 3.7.5 (Centos 6.7)

Sun Oct 11 04:47:43 UTC 2015

After going through the logs (shared off the list), cmd_history.log
indicates that the commit failed locally so my earlier suspect of
command failing remotely is out of the equation now.

As glusterd log doesn't capture any failure for these transactions I
suspect we have a failure case in volume start code flow which we don't
log at INFO.

I'd try to reproduce this and get back. However if you can recreate this
issue with running GlusterD with DEBUG log enabled that can give us some
clue. You should also open a bug to track it. Guidelines for opening a
bug is here [1]

[1]
http://www.gluster.org/community/documentation/index.php/Bug_reporting_guidelines

Thanks,
Atin
On 10/11/2015 01:25 AM, Mauro M. wrote:
> Hi Atin,
> 
> Thank you for your reply. The error on cli was the same posted below. I
> looked through the logs and there is no further clue.
> 
> I have now re-installed the previous version and I disabled further updates.
> I use the packages for CentOS 6 and CentOS 7 from elrepo:
> 
> [glusterfs-epel]
> name=GlusterFS is a clustered file-system capable of scaling to several
> petabytes.
> baseurl=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-$releasever/$basearch/
> enabled=0
> skip_if_unavailable=1
> gpgcheck=0
> 
> 
> On Sat, October 10, 2015 17:15, Atin Mukherjee wrote:
>> What has happened is here is one of the node acked negative which lead to
>> an inconsistent state as GlusterD doesn't have transaction rollback
>> mechanism. This is why subsequent commands on the volume failed.
>>
>> We'd need to see why the other node didn't behave correctly. What error
>> was
>> thrown at CLI when volume start failed. Could you attach glusterd &
>> cmd_history.log files from both the nodes?
>>
>> -Atin
>> Sent from one plus one
>> On Oct 10, 2015 9:35 PM, "Mauro M." <gluster at ezplanet.net> wrote:
>>>
>>> Hello,
>>>
>>> Today I received the update to 3.7.5 and since the update I began to
>>> have
>>> serious issues. My cluster has two bricks with replication.
>>>
>>> With both bricks up I could not start the volume that was stopped soon
>>> after the update. By taking one of the nodes down I managed finally to
>>> start the volume, but ... with the following error:
>>>
>>> [2015-10-10 09:40:59.600974] E [MSGID: 106123]
>>> [glusterd-syncop.c:1404:gd_commit_op_phase] 0-management: Commit of
>>> operation 'Volume Start' failed on localhost
>>>
>>> At which point clients could mount the filesystem, however with:
>>> # gluster volume status
>>> it showed the volume as stopped.
>>>
>>> If I stopped and started again the volume same problem, but, if I issued
>>> again a "volume start myvolume" at this point it would show as started!
>>>
>>> With both bricks up and running instead there is no way to start the
>>> volume once stopped. Only if I take one of the bricks down then I can
>>> start it with the procedure above.
>>>
>>> I am downgrading to 3.7.4.
>>>
>>> If you have not yet upgraded, BEWARE!
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
> 
>