[Bugs] [Bug 1787294] Improve logging in EC, client and lock xlator

bugzilla at redhat.com bugzilla at redhat.com
Thu Nov 5 11:09:51 UTC 2020


https://bugzilla.redhat.com/show_bug.cgi?id=1787294

Pranav Prakash <prprakas at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ON_QA                       |VERIFIED



--- Comment #9 from Pranav Prakash <prprakas at redhat.com> ---
The following scenario was performed to verify the logging improvement.

1. Create a dispersed volume
2. Mount to 2 or more clients
3. Perfom IO on the clients
4. Kill few bricks
5. Bring the bricks up
6. Verify the logs


[root at dhcp43-237 ~]# gluster v status
Status of volume: testvol_dispersed
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.43.188:/gluster/bricks/brick1/t
estvol_dispersed_brick0                     49152     0          Y       107025
Brick 10.70.43.5:/gluster/bricks/brick1/tes
tvol_dispersed_brick1                       49152     0          Y       67553
Brick 10.70.43.245:/gluster/bricks/brick1/t
estvol_dispersed_brick2                     49152     0          Y       67689
Brick 10.70.41.159:/gluster/bricks/brick1/t
estvol_dispersed_brick3                     49152     0          Y       68209
Brick 10.70.43.237:/gluster/bricks/brick1/t
estvol_dispersed_brick4                     49152     0          Y       67796
Brick 10.70.43.224:/gluster/bricks/brick1/t
estvol_dispersed_brick5                     49152     0          Y       67553
Self-heal Daemon on localhost               N/A       N/A        Y       67813
Self-heal Daemon on 10.70.43.245            N/A       N/A        Y       67706
Self-heal Daemon on 10.70.43.224            N/A       N/A        Y       67571
Self-heal Daemon on 10.70.41.159            N/A       N/A        Y       68226
Self-heal Daemon on 10.70.43.5              N/A       N/A        Y       67570
Self-heal Daemon on dhcp43-188.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       107042

Task Status of Volume testvol_dispersed
------------------------------------------------------------------------------
There are no active volume tasks


Performed IO on clients, brought bricks down on couple of nodes and then up. 

----------
OBSERVATIONS:

The logs, for e.g 'remote option failed' log in the previous versions, are now
providing additional information, thus improving the logging. 
Below is one eg. 

```
[2020-11-03 05:35:32.171859] W [MSGID: 114031]
[client-rpc-fops_v2.c:2635:client4_0_lookup_cbk] 0-testvol_dispersed-client-2:
remote operation failed. Path: /user1/testfile1.txt
(00000000-0000-0000-0000-000000000000) [Transport endpoint is not connected]

[2020-11-03 05:37:34.346233] W [MSGID: 114031]
[client-rpc-fops_v2.c:2116:client4_0_create_cbk] 3-testvol_dispersed-client-2:
remote operation failed. Path: /file114 [Input/output error]

```

Now additional information about which operation and why it is failed can be
identified from the logs.

Similarly,
```
[2020-11-03 05:37:34.737417] W [MSGID: 122053]
[ec-common.c:329:ec_check_status] 3-testvol_dispersed-disperse-0: Operation
failed on 1 of 6 subvolumes.(up=111111, mask=111111, remaining=000000,
good=111011, bad=000100, FOP : 'CREATE' failed on '/file116' with gfid
00000000-0000-0000-0000-000000000000)
[2020-11-03 05:37:34.741320] W [MSGID: 122053]
[ec-common.c:329:ec_check_status] 3-testvol_dispersed-disperse-0: Operation
failed on 1 of 6 subvolumes.(up=111111, mask=111011, remaining=000000,
good=111011, bad=000100, FOP : 'FLUSH' failed on gfid
09c1992b-3190-45b0-9ab8-b39e27e12e32)
[2020-11-03 05:37:34.761148] W [MSGID: 122053]
[ec-common.c:329:ec_check_status] 3-testvol_dispersed-disperse-0: Operation
failed on 1 of 6 subvolumes.(up=111111, mask=111011, remaining=000000,
good=111011, bad=000100, FOP : 'WRITE' failed on gfid
09c1992b-3190-45b0-9ab8-b39e27e12e32)

[2020-11-03 05:38:00.132336] W [MSGID: 122053]
[ec-common.c:329:ec_check_status] 3-testvol_dispersed-disperse-0: Operation
failed on 1 of 6 subvolumes.(up=101111, mask=101111, remaining=000000,
good=101011, bad=000100, FOP : 'LOOKUP' failed on '/user38' with gfid
585b50e9-303c-47f4-a67c-75db7fa73878)

2020-11-03 05:38:02.168378] W [MSGID: 122033] [ec-common.c:1914:ec_locked]
3-testvol_dispersed-disperse-0: Failed to complete preop lock [Stale file
handle]


```


The above observations are verified in the following glusterfs version : 
```
glusterfs-6.0-46.el8rhgs.x86_64

```
------------------------------------
Marking it as Verifed.


-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Bugs mailing list