[Gluster-users] Another transaction could be in progress

Tue Mar 18 09:05:13 UTC 2014

On Tue, Mar 18, 2014 at 1:51 PM, Franco Broi <franco.broi at iongeo.com> wrote:
>
> Sorry, didn't think to look in the log file, I can see I have bigger
> problems. Last time I saw this was because I had changed an IP address
> but this time all I did was reboot the server. I've checked all the
> files in vols and everything looks good.
>
> [2014-03-18 08:09:18.117040] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-0
> [2014-03-18 08:09:18.117074] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-1
> [2014-03-18 08:09:18.117087] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-2
> [2014-03-18 08:09:18.117097] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-3
> [2014-03-18 08:09:18.117107] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-4
> [2014-03-18 08:09:18.117117] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-5
> [2014-03-18 08:09:18.117128] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-6
> [2014-03-18 08:09:18.117138] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-7
> [2014-03-18 08:09:18.117148] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-8
> [2014-03-18 08:09:18.117158] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-9
> [2014-03-18 08:09:18.117168] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-10
> [2014-03-18 08:09:18.117178] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-11
> [2014-03-18 08:09:18.117196] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-12
> [2014-03-18 08:09:18.117209] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-13
> [2014-03-18 08:09:18.117219] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-14
> [2014-03-18 08:09:18.117229] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-15
>

These logs actually show no errors. They occur always during glusterd
startup and need to be gotten rid of. Could you provide the logs from
all the peers for the time you ran the failing command.

>
> This is from another server
>
> [root at nas1 bricks]# gluster vol status
> Status of volume: data
> Gluster process                                         Port    Online  Pid
> ------------------------------------------------------------------------------
> Brick nas1-10g:/data1/gvol                              49152   Y       17331
> Brick nas2-10g:/data5/gvol                              49160   Y       3933
> Brick nas1-10g:/data2/gvol                              49153   Y       17340
> Brick nas2-10g:/data6/gvol                              49161   Y       3942
> Brick nas1-10g:/data3/gvol                              49154   Y       17350
> Brick nas2-10g:/data7/gvol                              49162   Y       3951
> Brick nas1-10g:/data4/gvol                              49155   Y       17360
> Brick nas2-10g:/data8/gvol                              49163   Y       3960
> Brick nas3-10g:/data9/gvol                              49156   Y       10076
> Brick nas3-10g:/data10/gvol                             49157   Y       10085
> Brick nas3-10g:/data11/gvol                             49158   Y       10094
> Brick nas3-10g:/data12/gvol                             49159   Y       10108
> Brick nas4-10g:/data13/gvol                             N/A     N       8879
> Brick nas4-10g:/data14/gvol                             N/A     N       8884
> Brick nas4-10g:/data15/gvol                             N/A     N       8888
> Brick nas4-10g:/data16/gvol                             N/A     N       8892
> NFS Server on localhost                                 2049    Y       18725
> NFS Server on nas3-10g                                  2049    Y       11667
> NFS Server on nas2-10g                                  2049    Y       4980
> NFS Server on nas4-10g                                  N/A     N       N/A
>
> There are no active volume tasks
>
>
> Any ideas?
>
>
> On Tue, 2014-03-18 at 12:39 +0530, Kaushal M wrote:
>> The lock is an in-memory structure which isn't persisted. Restarting
>> should reset the lock. You could possibly reset the lock by gdbing
>> into the glusterd process.
>>
>> Since this is happening to you consistently, there is something else
>> that is wrong. Could you please give more details on your cluster? And
>> the glusterd logs of the misbehaving peer (if possible for all the
>> peers). It would help in tracking it down.
>>
>>
>>
>> On Tue, Mar 18, 2014 at 12:24 PM, Franco Broi <franco.broi at iongeo.com> wrote:
>> >
>> > Restarted the glusterd daemons on all 4 servers, still the same.
>> >
>> > It only and always fails on the same server and it always works on the
>> > other servers.
>> >
>> > I had to reboot the server in question this morning, perhaps it's got
>> > itself in a funny state.
>> >
>> > Is the lock something that can be examined? And removed?
>> >
>> > On Tue, 2014-03-18 at 12:08 +0530, Kaushal M wrote:
>> >> This mostly occurs when you run two gluster commands simultaneously.
>> >> Gluster uses a lock on each peer to synchronize commands. Any command
>> >> which would need to do operations on multiple peers, would first
>> >> acquire this lock, and release it after doing the operation. If a
>> >> command cannot acquire a lock because another command had the lock, it
>> >> will fail with the above error message.
>> >>
>> >> It sometimes happens that a command could fail to release the lock on
>> >> some peers. When this happens all further commands which need the lock
>> >> will fail with the same error. In this case your only option is to
>> >> restart glusterd on the peers which have the stale lock held. This
>> >> will not cause any downtime as the brick processes are not affected by
>> >> restarting glusterd.
>> >>
>> >> In your case, since you can run commands on other nodes, most likely
>> >> you are running commands simultaneously or at least running a command
>> >> before an old one finishes.
>> >>
>> >> ~kaushal
>> >>
>> >> On Tue, Mar 18, 2014 at 11:24 AM, Franco Broi <franco.broi at iongeo.com> wrote:
>> >> >
>> >> > What causes this error? And how do I get rid of it?
>> >> >
>> >> > [root at nas4 ~]# gluster vol status
>> >> > Another transaction could be in progress. Please try again after sometime.
>> >> >
>> >> >
>> >> > Looks normal on any other server.
>> >> >
>> >> > _______________________________________________
>> >> > Gluster-users mailing list
>> >> > Gluster-users at gluster.org
>> >> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> >
>> >
>
>