XML Parser Tutorial

The XML parser is used to parse an XML output from a REMOTE operation.

The XML parser is slightly more tricky. It uses the YAML syntax (see here). One important thing to remember is that YAML is very sensitive to indentation. Do NOT use tab. Instead, use 4 spaces. Don't use any other number of spaces for indentation, otherwise the script won't work well.

The XML parser supports emitting tags as well as metrics, so it can be used for both interrogation and monitoring. Here is a sample structure:

Input for this example:
<response><result><total>17</total><max>512</max></result></response> 

# The XML parser has operators, which are used to define behavior or certain actions. 
# Operators always start with "_". Think of them as functions.

# The _vars operator is used to define variables that are used in the script. You can write any kind of script without these, but they help
# make the script clearer.
_vars:
    root: /response/result   # We are setting the root variable to contain the beginning of an xpath
# This section must exist in monitoring scripts and must not exist in interrogation scripts
_metrics:
    -     # This is a bit tricky, each metric must be in its own section under "_metrics". Each metric has a "-" representing it.
        _value.double:            # This is the equivalent of writeDoubleMetric in AWK. It says "this metric is a double metric".
            _text: ${root}/total  # This is the value of the metric. The "_text" operator extracts the text at the given xpath.
        _tags:					  # Each metric has tags
            "im.name":				  # The "im.name" tag is the name of the metric (matches the list of reserved metrics)
                _constant: "arp-total-entries"       # "_constant" means - "take the string as is and use it"
    -     # And here is another metric.
        _value.double:
            _text: ${root}/max
        _tags:
            "im.name":
                _constant: "arp-limit"

Below are useful example scripts:

Interrogation script sample
For this input:
<response status="success"><result><system><model>PA-VM</model><serial>007200003165</serial><sw-version>6.1.2</sw-version></system></result></response>

We extract two elements for tags, and set two other tags specifically:
 
_tags:
    "os.name":
        _constant: "panos"
    "os.version":
        _text: "/response/result/system/sw-version"
    "vendor":
        _constant: "paloaltonetworks"
    "model":
        _text: "/response/result/system/model"
Monitoring script using the _groups operator to collect multiple DOUBLE items
For this input:
<response status="success"><result>
  <drive_status>
    <disk_id_1>1</disk_id_1>
    <disk_id_2>0</disk_id_2>
  </drive_status>
</result></response>

# We want to collect the status of the drives:
_metrics:
    -
        _groups:					   # The "_groups" operator tells the parser that there are multiple matching elements and a metric should be created for each of the matching elements. The output of this script will be two double metrics of the same type, one for each disk.
            ${root}/drive_status/*:    # Notice this xpath - we are iterating over all of "drive_status"'s child elements
                _value.double:
                    _text: node()      # The "node()" function takes the content of the node itself (instead of a sub-node or element)
                _tags:
                    name:
                        _name: .       # The "_name" operator pulls the name of the current element (disk_id_1 and disk_id_2 in this case)
                    "im.name":
                        _constant: "hardware-element-status"

 

Sometimes, you may need to manipulate the XML content. For example, if in the above _groups sample, the disk status was not provided as 1 or 0 in the XML? What if it was string representing a staut code?

To deal with this we have introduced the _transform operator. This allows you to write AWK code which can transform the XML content. The _transform section supports all of the AWK helper functions, except "write*" functions. So let's expand the above _groups example and involve _transform. Note that the input XML is slightly different.

Monitoring script using the _groups operator and _transform block to collect multiple DOUBLE items (and transform content)
The following input XML shows textual status values:
<response status="success"><result>
  <drive_status>
    <disk_id_1>Present</disk_id_1>
    <disk_id_2>Missing</disk_id_2>
  </drive_status>
</result></response>

So we need to parse the Present and Missing statuses into 1 and 0 respectively.


_vars:
    root: /response/result
_metrics:
    -
        _groups:
            ${root}/drive_status/*:
                _temp:                 # This is required for _transform operations. You basically save the XML data in temporary variables.
                    status:            # This variable will later be referred to as "${temp.status}" in the _transform block.
                        _text: node()
                _tags:
                    name:
                        _name: .
                    "im.name":
                        _constant: "hardware-element-status"
                    "live-config":
                       _constant: "true"
                    "display-name":
                        _constant: "Hardware Elements State"
                    "im.dstype.displayType":
                        _constant: "state"
                    "im.identity-tags":
                        _constant: "name"
        _transform:                    # Ready to transform some data! Note the indentation of this block - it is directly under the metric ("-") and parallel to "_groups". This transform block will be run for each occurrence of a group item (matched by ${root}/drive_status/*)
            _value.double: |           # See this weird pipe ("|")? This tells the system that an AWK block is a about to follow
                {
                    if (trim("${temp.status}") == "Present") { print "1.0" } else { print "0.0" }     # The trim() function is from the AWK helper functions
                }
 

When trying to emit complex metrics, things are generally similar. Here are a couple of examples:

Monitoring script with a simple complex metric
For this XML:
<response><system><timezone>London/UK</timezone></system></response>

_metrics:
    -
        _value.complex:      # Compare this with "_value.double"
            value:           # This is slightly different compared to double. In complex, we have key-value pairs.
                _text: "/response/system/timezone"
        _tags:
            "im.name":
                _constant: "timezone"

The above parsing script will generate a complex value which looks like this:
{"value": "London/UK"}

Complex metric array with transform (advanced!!)
For this XML:
<response status="success">
    <result>
        <admins>
            <entry>
                <admin>indeni</admin>
                <from>10.10.1.1</from>
                <type>Web</type>
                <session-start>11/03 01:14:51</session-start>
                <idle-for>00:00:00s</idle-for>
            </entry>
            <entry>
                <admin>admin</admin>
                <from>10.10.1.3</from>
                <type>CLI</type>
                <session-start>11/03 01:13:32</session-start>
                <idle-for>00:01:31s</idle-for>
            </entry>
        </admins>
    </result>
</response>
We are trying to extract who is logged in and do some transform on the input.
_metrics:
    -
        _groups:
            ${root}/admins/entry:
                _temp:
                    idlefor:              # We will need to parse the idlefor into seconds
                        _text: "idle-for"
                _tags:
                    "im.name":
                        _constant: "logged-in-users"
                _value.complex:
                    "from": 
                        _text: "from"
                    "username":
                        _text: "admin"
        _transform:                       
            _value.complex:               # Note, "_value.complex" shows up twice here. The parser will merge the two together into a single complex metric entry.
                "idle": |
                    {
                        str = "${temp.idlefor}"
                        sub(/s/, "", str)
                        split(str, timearr, ":")
                        print timearr[1] * 3600 + timearr[2] * 60 + timearr[3]
                    }
        _value: complex-array             # This is a very special operator, it tells the parser to take all the items found under the "_groups" for this metric and turn it into a JSON array

So the final output looks like this:
[ {"from": "10.10.1.1", "username": "indeni", "idle": "0"}, {"from": "10.10.1.3", "username": "admin", "idle": "91"}]

If we didn't include the "_value: complex-array", the parser would have tried to create two separate metrics, one for each admin. However, the definition of the "logged-in-users" metric requires them all to be in one metric as a JSON array. 

For a full list of operators, see this.

XPath Power

The XML Parser uses XPath v1.0, per this Crowd post: https://goo.gl/69fima

XPath is a known path language for XML, allowing you to select specific elements. There are all kinds of cool things you can do with it, and use them in the XML parser too. For example, consider predicates (see http://www.tizag.com/xmlTutorial/xpathpredicate.php). One could _count how many elements match a certain criteria by writing this in the ind script:

_metrics:
    -
        _groups:
            ${root}/TABLE_interface/ROW_interface:
                _temp:
                    state:
                        _text: "state"
                    admin_state_down_count:
                        _count: "admin_state[text() = 'down']"
                _tags:
                    "im.name":
                        _constant: "network-interface-admin-state" 
                    "name":
                        _text: "interface"
        _transform:
             _value.double: |
                 {
                     if ("${temp.state}" == "up") {
                        print "1.0"
                     } else {
                        # Slightly backwards logic, but admin_state does not appear if it is
                        # set to up (so we can't simply try and grab its value).
                        # So we count how many 'down' exist. 
                        if ("${temp.admin_state_down_count}" == "1") {print "0.0"} else {print "1.0"}
                     }
                 }

Accessing XPath Ancestors

Sometimes a piece of data is embedded in a lower level (ancestor) but is required to complete the metric information.

Other options can be found here.

XML XPath Ancestor
For the following XML data section, we would like to get the interface name from within the Lane table/row:
        <TABLE_interface>
         <ROW_interface>
          <interface>Ethernet1/1</interface>
          <sfp>present</sfp>
          <type>10Gbase-SR</type>
          <name>CISCO-JDSU      </name>
          <partnum>PLRXPL-SC-S43-CS</partnum>
          <rev>1   </rev>
          <serialnum>JUR2003G2ZJ     </serialnum>
          <nom_bitrate>10300</nom_bitrate>
          <len_50>82</len_50>
          <len_625>26</len_625>
          <len_50_OM3>300</len_50_OM3>
          <ciscoid>3</ciscoid>
          <ciscoid_1>4</ciscoid_1>
          <TABLE_lane>
           <ROW_lane>
			<...>
            <tx_pwr>-2.24</tx_pwr>
            <tx_pwr_flag> </tx_pwr_flag>
            <tx_pwr_alrm_hi>1.69</tx_pwr_alrm_hi>
            <tx_pwr_alrm_lo>-11.30</tx_pwr_alrm_lo>
            <tx_pwr_warn_hi>-1.30</tx_pwr_warn_hi>
            <tx_pwr_warn_lo>-7.30</tx_pwr_warn_lo>
			<...>
           </ROW_lane>
          </TABLE_lane>
         </ROW_interface>

  


    -
        _groups:
            # Tx Power Alarm: 0.0 if Tx Power is within warning range
            ${root}/TABLE_interface/ROW_interface/TABLE_lane/ROW_lane[not(lane_number)]:
                _temp:
                    tx_pwr:
                        _text: "tx_pwr"
                    low:
                        _text: "tx_pwr_warn_lo"
                    high:
                        _text: "tx_pwr_warn_hi"
                    interface:
                        _text: ancestor::node()/interface
                _tags:
                    "im.name":
                        _constant: "hardware-element-status"
                    "live-config":
                        _constant: "true"
                    "display-name":
                       _constant: "Optics"
                    "im.dstype.displayType":
                        _constant: "state"            
                    "im.identity-tags":
                        _constant: "name"
        _transform:
            _value.double:  |
                {
                    if ((${temp.tx_pwr} < ${temp.low}) ||  (${temp.tx_pwr} > ${temp.high})) 
                    { 
                        print "0.0" 
                    } 
                    else 
                    { 
                        print "1.0" 
                    }
                }
            _tags:
                "name": |
                    {
                        print "Optic Tx Power State ${temp.interface}"
                    }
                "interface": |
                    {
                        print "${temp.interface}"
                    }


Important Note: Missing paths

If a certain path doesn't exist, the parser will skip that metric (because it will consider the path as a failure). For example, consider this:

_metrics:
    -
        _value.double:
            _text: somepath/somesubelement

If "somesubelement" doesn't exist, then _text won't evaluate and the specific metric will be dropped.

But what if you still want to create the metric? The way to do that is by using _count (see the list of operators). You can "_count" the somesubelement and if you get 0 you know it's not there. For example:

_metrics:
    -
        _temp:
            subelementcount:
                _count: somepath/somesubelement
        _transform:
            _value.double: |
                {
                    if ("${temp.subelementcount}" == "0") { print "0" } else { print "1" }
                }

Testing an XPATH against your input

There are a few XPATH testers out there that you can use to test an XPATH against an input file. One of the better ones are codebeautify.org.
http://codebeautify.org/Xpath-Tester

Removing name space

Before testing, make sure to remove all xmlns attributes from your input file as this would give you head aches.

Example

Original input:

<rpc-reply xmlns:junos="http://xml.juniper.net/junos/12.1X46/junos">
	<interface-information xmlns="http://xml.juniper.net/junos/12.1X46/junos-interface" junos:style="normal">

Revised input:

<rpc-reply>
	<interface-information junos:style="normal">

Good practices

It is recommended to put the sections in this order if possible:

  • _tags
  • _temp (if available)
  • _transform (if available)
  • _value (last)

Reason being:

  • _tags first because it gives the reader a clue about what he/she is about to read.
  • _temp is used in the transform section and thus should precede it for a natural way of reading as the values here is used later on
  • _transform should come after _temp because of the previous bullet
  • _value comes last, because all of the logic above has been used to provide this part.

Within _tags "im.name" should be the first one.