Troubleshooting guide

https://indeni.atlassian.net/wiki/spaces/IKP/pages/879329653

Step

If not OK

Step

If not OK

Alert is not being created

Validate that the rule appears in the Knowledge Explorer

  • Validate that the rule loaded properly by the server

  • Check for errors in
    /usr/share/indeni/logs/rules/general.log

Enable extensive log output

vi /usr/share/indeni-collector/conf/logback.xml

Change to ALLOW

   <turboFilter class="ch.qos.logback.classic.turbo.MarkerFilter">
        <Marker>parsing-output</Marker>
        <OnMatch>DENY</OnMatch>
    </turboFilter>

Review /usr/share/indeni-collector/logs/collector.log

Validate the IND is compiled properly by the server

/usr/share/indeni-collector/logs/collector.log

Each IND file is being compiled by the server, which adds these lines to the log

On success:
INFO [2020-04-20 08:54:05,554] com.indeni.commands.knowledge.KnowledgeBase: Compiling file '/usr/share/indeni-knowledge/stable/ind/parsers/src/crossvendor/linux/console-idle-time/console-idle-time.ind.yaml'
INFO [2020-04-20 08:54:05,562] com.indeni.commands.knowledge.KnowledgeBase: Compiled 'linux-console_idle_time' command from '/usr/share/indeni-knowledge/stable/ind/parsers/src/crossvendor/linux/console-idle-time/console-idle-time.ind.yaml'

On failure:
INFO [2020-04-20 08:54:05,505] com.indeni.commands.knowledge.KnowledgeBase: Compiling file '/usr/share/indeni-knowledge/stable/ind/parsers/src/crossvendor/linux/netstat-rn/netstat-rn.ind.yaml'
ERROR [2020-04-20 08:54:05,508] com.indeni.commands.knowledge.KnowledgeBase: Failed to compile a command from '/usr/share/indeni-knowledge/stable/ind/parsers/src/crossvendor/linux/netstat-rn/netstat-rn.ind.yaml'

Validate that the server actually executes the command against the specific device

/usr/share/indeni-collector/logs/devices/device_ip.log

Check whether there was any failure parsing the output

/usr/share/indeni-services/logs/parser.log Search your parser file inside

Validate that the alert is triggering an issue

Check the metric

  • Validate that metric is in time series DB

  • Validate that the metric has a value that should trigger alert according to the rule

  • Change the monitoring_interval to 1 minute and restart all services

Check the script using https://indeni.atlassian.net/wiki/spaces/IKP/pages/59342857 , from the server.

Simulate on a real device the scenario which should trigger it

Change the threshold of the alert, inside the UI (“knowledge explorer” tab)

Workflow is not working

Validate that workflow appears in Knowledge Explorer

If not, check that workflow

  • Defined in $IKWORKFLOWS/workflow_catalog.yaml

  • Appears in its corresponding directory

Validate that the associated rule triggered an issue

  • Review the rule and understand

    • The logic that creates and issue

    • Related metric(s) (can be more than 1)

  • Review the metric(s) actual values in DB

  • If metric value is not the expected value

    • Review monitored device configuration

    • Review IND implementation

Issue exists, but workflow does not appear

  • Test workflow execution using iktool wf

  • Check for failure root cause using Robo3T

  • Troubleshoot workflow using Python

Unit Test not being created or run(command-runner)

The unit tests is not being created/run as well as no error message given the test exits.

  • Ensure the unit tests are run inside the parent folder of the .ind.yaml file

  • The format of the command-runner test needs to be as follows:
    # command-runner  test create ../track_total-connectionpeer-vs-per-chassis/track-total-connections-per-vs-per-chassis.ind.yaml basictest  /Userssiimonwarulkar/repos/indeni-knowledge/parsers/test/checkpoint/asg/track_total-connection-err-vs-per-chassis/input_0_

    # command-runner  test run ../track_total-connection-per-vs-per-chassis/track-total-connections-per-vs-per-chassis.ind.yaml

 

Command-runner fails with the error:

File "/usr/local/lib/python3.6/dist-packages/parser_service/public/parser_api.py", line 42, in parse module = __import__(module_name)ModuleNotFoundError: No module named <some module name>

The command runner expect the parser file to be without the word: parser.
The name convention should be: <module-name>.py
Example: throughput-alert.py

Command-runner fails with the error:
2020-05-11 13:23:10,336 ERROR -- failed to parse results of command '<Some-metric-name>'
Expected String as JsString, but got ["Value"]

Will occur if you are using: “self.write_complex_metric_string” , as the metrics should be strings.
Need to modify the parser file, and make sure the values enter the DB as strings.