[Bugs] [Bug 1392181] New: [GSS]"gluster vol status all clients --xml" get malformed at times, causes gstatus to fail
bugzilla at redhat.com
bugzilla at redhat.com
Sat Nov 5 18:08:48 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1392181
Bug ID: 1392181
Summary: [GSS]"gluster vol status all clients --xml" get
malformed at times, causes gstatus to fail
Product: GlusterFS
Version: 3.7.15
Component: cli
Severity: medium
Assignee: bugs at gluster.org
Reporter: giuseppe.ragusa at hotmail.com
CC: amukherj at redhat.com, ashah at redhat.com,
bugs at gluster.org, rcyriac at redhat.com,
rhinduja at redhat.com, rhs-bugs at redhat.com,
rnalakka at redhat.com, sbairagy at redhat.com
+++ This bug was initially created as a clone of Bug #1359619 +++
Description of problem:
Sometimes the gstatus command below traceback instead of proper output. The
issue is because the glusterd is giving malformed xml outputs to gstatus
scripts.
# gstatus
Traceback (most recent call last):ons
File "/usr/bin/gstatus", line 221, in <module>
main()
File "/usr/bin/gstatus", line 135, in main
cluster.update_state(self_heal_backlog)
File "/usr/lib/python2.7/site-packages/gstatus/libgluster/cluster.py", line
638, in update_state
self.calc_connections()
File "/usr/lib/python2.7/site-packages/gstatus/libgluster/cluster.py", line
713, in calc_connections
cmd.run()
File "/usr/lib/python2.7/site-packages/gstatus/libcommand/glustercmd.py",
line 100, in run
xmldoc = ETree.fromstring(''.join(self.stdout))
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1301, in XML
return parser.close()
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1654, in close
self._raiseerror(v)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1506, in
_raiseerror
raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
Version-Release number of selected component (if applicable):
RHGS 3.1.3 on RHEL 7
How reproducible:
Not always reproducible.
Steps to Reproduce:
1. Install RHGS
2. Install gstatus
3. Run gstatus
Actual results:
gstatus gives traceback at times instead of cluster health status.
Expected results:
glusterd should provide the proper xml output and gstatus should work properly.
Additional info:
--- Additional comment from Atin Mukherjee on 2016-07-25 03:19:25 EDT ---
Given that customer hasn't captured the xml output for the same when the issue
is hit and its not reproducible, this will be hard to debug. We'll do a code
walk through on the XML generation part to see if anything is buggy here.
--- Additional comment from Atin Mukherjee on 2016-07-26 00:45:49 EDT ---
I see following error entries in the CLI log.
[2016-07-21 05:58:45.598277] E [cli-rpc-ops.c:7744:gf_cli_status_cbk] 0-cli:
Error outputting to xml
[2016-07-21 05:58:45.598341] E [cli-rpc-ops.c:8036:gf_cli_status_volume_all]
0-cli: status all failed
This indicates that while dumping the xml output some dict_get_* call failed.
CLI doesn't log these failures, so its difficult to figure out which key was
missing. GlusterD log however doesn't show any evidence of an error on the same
timestamp. Having said this, IMO there is an issue in the way we generate xml
output.
<snip>
/* <volume> */
ret = xmlTextWriterStartElement (local->writer, (xmlChar *)"volume");
XML_RET_CHECK_AND_GOTO (ret, out);
ret = dict_get_str (dict, "volname", &volname);
if (ret)
goto out;
ret = xmlTextWriterWriteFormatElement (local->writer,
(xmlChar *)"volName", "%s",
volname);
................
................
out:
gf_log ("cli", GF_LOG_DEBUG, "Returning %d", ret);
return ret;
>From this we can clearly see that if in case dict_get_str fails we don't close
the xml tag. And once some other application (gStatus in this case) tries to
parse it, this may result into trace back as well.
--- Additional comment from Atin Mukherjee on 2016-09-01 08:20:07 EDT ---
Riyas,
Can you also file a bug on gstatus component for handling exceptions? gStatus
needs to also handle these exceptions and should not crash.
Thanks,
Atin
--- Additional comment from Atin Mukherjee on 2016-09-01 13:40:28 EDT ---
Riyas,
Unfortunately this will not help either as after detail code walkthrough I see
that the xml output doesn't get dumped into the stdout at the
xmlEndWriterElement which doesn't get called in this case as highlighted in
comment 1.
Looking at the case 01674704, I see the case mentioning about command getting
timed out which is very well possible if the volume has a heavy list of
connected clients. But I am not able to map the exit code 90 here, in my setup
cli doesn't show up anything if the command fails but echo $? shows 1 instead
of 90.
I've identified a fix at the CLI side to write the xml output and will be
sending a patch by tomorrow. However I'll continue to analyse we did we end up
with exit code 90.
--- Additional comment from Atin Mukherjee on 2016-09-06 00:34:06 EDT ---
http://review.gluster.org/15384 posted upstream for review.
--- Additional comment from Atin Mukherjee on 2016-09-19 09:03:51 EDT ---
Upstream mainline : http://review.gluster.org/15384
Upstream 3.8 : http://review.gluster.org/15428
And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.
--- Additional comment from Anil Shah on 2016-10-19 07:32:28 EDT ---
gluster volume status all clients --xml commands give output in xml format.
[root at rhs-client46 ~]# gluster volume status all clients --xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cliOutput>
<opRet>0</opRet>
<opErrno>0</opErrno>
<opErrstr/>
<volStatus>
<volumes>
<volume>
<volName>arvol</volName>
<nodeCount>7</nodeCount>
<node>
<hostname>10.70.36.70</hostname>
<path>/rhs/brick1/b1</path>
<peerid>02cff39e-b4c9-435d-85f0-8b9fe3b33193</peerid>
<status>1</status>
<port>49152</port>
<ports>
<tcp>49152</tcp>
<rdma>N/A</rdma>
</ports>
<pid>25364</pid>
<clientsStatus>
<clientCount>10</clientCount>
<client>
<hostname>10.70.44.7:1015</hostname>
<bytesRead>9678485624</bytesRead>
<bytesWrite>23700708</bytesWrite>
</client>
<client>
<hostname>10.70.36.70:1023</hostname>
<bytesRead>18708</bytesRead>
<bytesWrite>16012</bytesWrite>
</client>
<client>
<hostname>10.70.44.7:1021</hostname>
<bytesRead>1032</bytesRead>
<bytesWrite>644</bytesWrite>
</client>
<client>
<hostname>10.70.44.7:984</hostname>
<bytesRead>25708</bytesRead>
<bytesWrite>22956</bytesWrite>
</client>
<client>
<hostname>10.70.36.46:1021</hostname>
<bytesRead>1040</bytesRead>
<bytesWrite>644</bytesWrite>
</client>
<client>
<hostname>10.70.36.70:994</hostname>
<bytesRead>32308</bytesRead>
<bytesWrite>28108</bytesWrite>
</client>
<client>
<hostname>10.70.36.46:991</hostname>
<bytesRead>21680</bytesRead>
<bytesWrite>19412</bytesWrite>
</client>
<client>
<hostname>10.70.36.70:974</hostname>
<bytesRead>49564</bytesRead>
<bytesWrite>44564</bytesWrite>
</client>
<client>
<hostname>10.70.36.71:1014</hostname>
<bytesRead>1040</bytesRead>
<bytesWrite>644</bytesWrite>
</client>
<client>
<hostname>10.70.36.71:985</hostname>
<bytesRead>46260</bytesRead>
<bytesWrite>41644</bytesWrite>
</client>
</clientsStatus>
gstatus command output
======================================
[root at rhs-client46 ~]# gstatus -a
Product: RHGS Server v3.1Update3 Capacity: 3.90 TiB(raw bricks)
Status: HEALTHY 38.00 GiB(raw used)
Glusterfs: 3.8.4 1.40 TiB(usable from volumes)
OverCommit: No Snapshots: 9
Nodes : 4/ 4 Volumes: 2 Up
Self Heal : 4/ 2 0 Up(Degraded)
Bricks : 8/ 8 0 Up(Partial)
Connections : 4/ 68 0 Down
Volume Information
arvol UP - 6/6 bricks up - Distributed-Replicate
Capacity: (1% used) 13.00 GiB/1.20 TiB (used/total)
Snapshots: 9
Self Heal: 6/ 6
Tasks Active: None
Protocols: glusterfs:on NFS:off SMB:on
Gluster Connectivty: 4 hosts, 60 tcp connections
testvol UP - 2/2 bricks up - Replicate
Capacity: (0% used) 33.00 MiB/199.00 GiB (used/total)
Snapshots: 0
Self Heal: 2/ 2
Tasks Active: None
Protocols: glusterfs:on NFS:off SMB:on
Gluster Connectivty: 4 hosts, 8 tcp connections
Status Messages
- Cluster is HEALTHY, all_bricks checks successful
Bug verified on build glusterfs-3.8.4-2.el7rhgs.x86_64
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list