[Bugs] [Bug 1787294] Improve logging in EC, client and lock xlator
bugzilla at redhat.com
bugzilla at redhat.com
Thu Nov 5 11:09:51 UTC 2020
https://bugzilla.redhat.com/show_bug.cgi?id=1787294
Pranav Prakash <prprakas at redhat.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ON_QA |VERIFIED
--- Comment #9 from Pranav Prakash <prprakas at redhat.com> ---
The following scenario was performed to verify the logging improvement.
1. Create a dispersed volume
2. Mount to 2 or more clients
3. Perfom IO on the clients
4. Kill few bricks
5. Bring the bricks up
6. Verify the logs
[root at dhcp43-237 ~]# gluster v status
Status of volume: testvol_dispersed
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.43.188:/gluster/bricks/brick1/t
estvol_dispersed_brick0 49152 0 Y 107025
Brick 10.70.43.5:/gluster/bricks/brick1/tes
tvol_dispersed_brick1 49152 0 Y 67553
Brick 10.70.43.245:/gluster/bricks/brick1/t
estvol_dispersed_brick2 49152 0 Y 67689
Brick 10.70.41.159:/gluster/bricks/brick1/t
estvol_dispersed_brick3 49152 0 Y 68209
Brick 10.70.43.237:/gluster/bricks/brick1/t
estvol_dispersed_brick4 49152 0 Y 67796
Brick 10.70.43.224:/gluster/bricks/brick1/t
estvol_dispersed_brick5 49152 0 Y 67553
Self-heal Daemon on localhost N/A N/A Y 67813
Self-heal Daemon on 10.70.43.245 N/A N/A Y 67706
Self-heal Daemon on 10.70.43.224 N/A N/A Y 67571
Self-heal Daemon on 10.70.41.159 N/A N/A Y 68226
Self-heal Daemon on 10.70.43.5 N/A N/A Y 67570
Self-heal Daemon on dhcp43-188.lab.eng.blr.
redhat.com N/A N/A Y 107042
Task Status of Volume testvol_dispersed
------------------------------------------------------------------------------
There are no active volume tasks
Performed IO on clients, brought bricks down on couple of nodes and then up.
----------
OBSERVATIONS:
The logs, for e.g 'remote option failed' log in the previous versions, are now
providing additional information, thus improving the logging.
Below is one eg.
```
[2020-11-03 05:35:32.171859] W [MSGID: 114031]
[client-rpc-fops_v2.c:2635:client4_0_lookup_cbk] 0-testvol_dispersed-client-2:
remote operation failed. Path: /user1/testfile1.txt
(00000000-0000-0000-0000-000000000000) [Transport endpoint is not connected]
[2020-11-03 05:37:34.346233] W [MSGID: 114031]
[client-rpc-fops_v2.c:2116:client4_0_create_cbk] 3-testvol_dispersed-client-2:
remote operation failed. Path: /file114 [Input/output error]
```
Now additional information about which operation and why it is failed can be
identified from the logs.
Similarly,
```
[2020-11-03 05:37:34.737417] W [MSGID: 122053]
[ec-common.c:329:ec_check_status] 3-testvol_dispersed-disperse-0: Operation
failed on 1 of 6 subvolumes.(up=111111, mask=111111, remaining=000000,
good=111011, bad=000100, FOP : 'CREATE' failed on '/file116' with gfid
00000000-0000-0000-0000-000000000000)
[2020-11-03 05:37:34.741320] W [MSGID: 122053]
[ec-common.c:329:ec_check_status] 3-testvol_dispersed-disperse-0: Operation
failed on 1 of 6 subvolumes.(up=111111, mask=111011, remaining=000000,
good=111011, bad=000100, FOP : 'FLUSH' failed on gfid
09c1992b-3190-45b0-9ab8-b39e27e12e32)
[2020-11-03 05:37:34.761148] W [MSGID: 122053]
[ec-common.c:329:ec_check_status] 3-testvol_dispersed-disperse-0: Operation
failed on 1 of 6 subvolumes.(up=111111, mask=111011, remaining=000000,
good=111011, bad=000100, FOP : 'WRITE' failed on gfid
09c1992b-3190-45b0-9ab8-b39e27e12e32)
[2020-11-03 05:38:00.132336] W [MSGID: 122053]
[ec-common.c:329:ec_check_status] 3-testvol_dispersed-disperse-0: Operation
failed on 1 of 6 subvolumes.(up=101111, mask=101111, remaining=000000,
good=101011, bad=000100, FOP : 'LOOKUP' failed on '/user38' with gfid
585b50e9-303c-47f4-a67c-75db7fa73878)
2020-11-03 05:38:02.168378] W [MSGID: 122033] [ec-common.c:1914:ec_locked]
3-testvol_dispersed-disperse-0: Failed to complete preop lock [Stale file
handle]
```
The above observations are verified in the following glusterfs version :
```
glusterfs-6.0-46.el8rhgs.x86_64
```
------------------------------------
Marking it as Verifed.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the Bugs
mailing list