Testing on a live server

Testing/Debugging .ind scripts

Adding IND Scripts to an Indeni Server

IND scripts are deployed in /usr/share/indeni-knowledge/stable/ind/parsers/src/
To add your own script to the server, you need to copy the script into the /usr/share/indeni-knowledge/overwrite/ind/parsers/src/

For example, if you are fixing a bug in parsers/src/checkpoint/firewall/detecting-cluster.ind, to test it on your local indeni server, you would copy it to that server at /usr/share/indeni-knowledge/overwrite/ind/parsers/src/checkpoint/firewall/detecting-cluster.ind and restart the indeni server (imanage 3)

If you want, you can overwrite the entire script hierarchy. You might do this if you want to code directly on the server, but keep in mind that this creates a 'dirty' server environment.
If you update your knowledge version and forget to clean up your overwrite directory, it could get very confusing (smile)

Be aware that when new scripts are placed in the /usr/share/indeni-knowledge/overwrite/ind/parsers/ folder they only overwrite scripts that have the same "command name" in the META section of the script. The scripts they overwrite are the ones /usr/share/indeni-knowledge/stable/ind/parsers/src/.

This can cause issues when renaming *.ind script files, or, in general, when using the overwrite directory. You may end up with two different *.ind scripts (different file names), both with the same command name (in the META section). In this case, the script in /stable/ind/parsers/src/ folder and the script in the /overwrite/ind/parsers/ folder would both execute and write data for the same metric – generally not what you want.

Checking metric value in the time series database

curl -G -k -u "admin:admin123!" "https://localhost:9009/api/v1/metrics" --data-urlencode "query=(im.name==config-unsaved and device-id=='f8dccd39-fc7f-4e41-aa03-81965c9c9fde')" | python -m json.tool

You got an alert, but it doesn't make sense? How did it happen? What was the metrics' values when it was generated?
You can access the database and pull double metrics yourself. SSH into the Indeni server and use this: (the last part of the line is using sed to get each device on a separate line for improved readability)

Notes:

  • Currently, these queries only work for double metrics; there's no way of querying complex metrics, unless they are tagged as live-config (in which case the last value of the three will appear in the device information).
  • Replace

    • config-unsaved with the double-metric you'd like to fetch.

    • device-id with the device ID. To find your device id, run the following command first (it'll dump the list of devices from the database):

      psql -c "select id,name,ip_address from device;"

      The output of a metric query looks like this:

      [
        {
      	"type": "ts",
          "tags": {
            "im.dstype": "gauge",                  # i.m. --> "Indeni Metric" Data Storage Type
            "im.dstype.displaytype": "boolean",
            "im.name": "config-unsaved",           # Indeni Metric Name
            "im.step": "300",                      # Monitoring interval (from script META) in seconds
            "device-id": "a158058a-afa4-48f7-bbbe-a789ddc82ed7",
            "live-config": "true",
            "display-name": "Configuration Unsaved?",
          },
          "points": [null,null,null,null,null,null,null,null,null,null,null,1.0,null,null,null,null,0.0,null,null,null,null,0.0,null,null,null,null,null,0.0,null,null,null,null,0.0,null,null,null,null,0.0,null,null,null,null,0.0,null,null,null,null,0.0,null,null,null,null,0.0,null,null,null,null,0.0,null,null,null],
          "start": 1484481240000,
          "step": 60000
        }
      ]

      Which shows you the metric's tags, as well as the points, in a "time series", themselves. Ignore the nulls. The series shows data collected in 1 minute intervals, according to the "step": 60000 (60,000 miliseconds) field. The script monitoring interval is shown in the "im.step": "300" field, in seconds (so 5 mins).
      That's why you see a non-null entry every ~5 nulls (since the script started running -- to start with, the entire series is null).
      In the above, you'll notice that earlier in the series, the metric was 1.0, then later 0.0, and stayed 0.0 for the rest of the period.
      The points are sorted by oldest-first, so the first entry in the list is the oldest one. If you've used the command above, it means the oldest is roughly an hour old.

Clearing metrics from the database

Some of the metrics are written to an in-memory DB; in this case you can just restart the indeni-server and the metrics will be deleted. To restart the server, from your Indeni server CLI, you can run imanage 3.

Other metrics are written to a filesystem DB at /usr/share/indeni/db/trends/

Deleting sub-directories here (organized by device UUID) will delete any metrics stored here.

Deleting alerts from a device

When an alert is acknowledged it will not re-appear even if the issue is ongoing.
The issue needs to get resolved and then re-appear for the alert to trigger again.

If you would like to re-trigger an alert, you can delete the alert (or all alerts) from a device, and start your test over again.

To delete a device's alerts, you need to use the command-line on the indeni server and connect to the Postgres database there:

indeni@ind-local:~$ psql indeni        <-- start the posgres shell against the indeni db
indeni=> select id, name, ip_address from device;     <-- list the devices connected to the indeni server (to find your device id)
                  id                  |           name           |   ip_address   
--------------------------------------+--------------------------+----------------
 9b5f66e5-3e10-4363-9973-c3dc0478bf9c | chkp-lab-CP-MGMT1-2      | 192.168.194.31
 c25acab5-9d15-4e6c-9cab-a33049534a72 | chkp-lab-CP-GW1          | 192.168.194.36
...
indeni=> select id from device where name = 'chkp-lab-CP-GW1';   <-- another command to find the device id by device name
                  id                  
--------------------------------------
 c25acab5-9d15-4e6c-9cab-a33049534a72

indeni=> delete from alert where device_id = 'c25acab5-9d15-4e6c-9cab-a33049534a72';   <-- delete ALL alerts for this device (using YOUR device id, not this one :)
indeni=> select alert_id, headline from alert where device_id = 'c25acab5-9d15-4e6c-9cab-a33049534a72';   <-- list alerts for a given device
 alert_id |                headline                
----------+----------------------------------------
      207 | Clock set incorrectly
      228 | Device not responding
...
indeni=> delete from alert where alert_id = 207;   <-- delete an alert with a specific id
indeni=> \q    <-- quit the psql shell

# A few other useful commands:
indeni=> \dt+                          <-- list the tables in the db
                                      List of relations
 Schema |                  Name                  | Type  | Owner  |    Size    | Description 
--------+----------------------------------------+-------+--------+------------+-------------
 public | alert                                  | table | indeni | 72 kB      | 
...
 public | device                                 | table | indeni | 16 kB      | 
...
indeni=> \d+ device      <-- list the columns in the "device" table

How to find the interrogation tags for a given device

Query the in-memory db using a REST API call

curl -G -k -u "<user>:<pwrd>" https://localhost:9009/api/v1/devices/<your-device-id>

Find your device id, at the indeni CLI

psql -c "select id,name,ip_address from device;"

e.g.,

curl -G -k -u "admin:admin123!" https://localhost:9009/api/v1/devices/b884a3ff-4747-4583-9eec-e6dab6825851

Searching Indeni Server Logs

First, just keep in mind that Indeni engineering can change the log file names, structure, and log data at any time. This is just a general guide.

  • You can use this command to grep the logs for info about the script you are debugging:
    find /usr/share/indeni* -name '*.log' -exec grep -H --color <your-script-command-name> {} +

<your-script-command-name> comes from the 'name:' field in the #! META section of your script (not the .ind file name).

Here is some of the kinds of info you can find in the logs. Here I'm searching for IND script named "hawkeye-test-script".

  • /indeni-collector/logs/devices/: script was scheduled to run
    /usr/share/indeni-collector/logs/devices/192.168.194.42.log:INFO [2018-09-17 16:33:17,339] com.indeni.collector.actors.SchedulerActor$$anonfun$scheduling$1: Command 'hawkeye-test-script' scheduled to run in 0 seconds
  • /indeni-collector/logs/devices/: script ran and returned these metric names (doesn't show actual values)
    /usr/share/indeni-collector/logs/devices/192.168.194.42.log:INFO [2018-09-17 16:38:20,537] com.indeni.commands.execution.CommandProcessing: Command hawkeye-test-script ran as monitoring and returned 2 metrics: hawkeye-test-double (1), hawkeye-test-object (1)
  • /indeni-collector/logs/commands-monitoring.log: some statistics about the script – ?
    INFO [2018-09-17 16:37:52,053] commands_monitoring: type=TIMER, name=lab-CP-GW4-2(192.168.194.42)-hawkeye-test-script, count=0, min=0.0, max=0.0, mean=0.0, stddev=0.0, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0, mean_rate=0.0, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second, duration_unit=milliseconds
  • /indeni-collector/logs/collector.log: did the script compile? if so, where from?
    INFO [2018-09-17 16:32:51,964] com.indeni.commands.knowledge.KnowledgeBase: Compiled 'hawkeye-test-script' command from '/usr/share/indeni-knowledge/overwrite/ind/parsers/src/checkpoint/firewall/simple-test-script.ind'
  • /indeni-collector/logs/collector.log: list of all command names loaded in the server
    INFO [2018-09-17 16:32:52,008] com.indeni.commands.knowledge.KnowledgeBase: Monitoring commands names = Set(cpvsx-vpn-check-tunnels-vsx, chkp-fw-tab-stats-novsx, radware-hwFanStatus...

How/When IND Scripts Run

Not all scripts run at once. In fact, most of the scripts run sequentially (not concurrently), and that is for two main reasons:

  • There are a limited of connections to the device
  • we do not want to overload the device with Indeni commands. The order in which the monitoring commands run is completely arbitrary.

There is indeed the possibility in which a command might “starve” the connection to a device by taking too long to run, and thus prevent from other commands from running. We currently have only a very basic mechanism to protect against that and we are definitely thinking of how to tackle this better in the future.

Testing/Debugging rules

The rules are not visible in the indeni server like the collector scripts are. Instead you should follow these steps:

Adding a rule

In this example we will use a rule called FortinetHaMonitorLinkStatusRule.yaml.
The rule_friendly_name is also called "FortinetHaMonitorLinkStatusRule"

  1. Copy the rule to /usr/share/indeni-knowledge/overwrite/rules with the same folder structure as it appears in the /usr/share/indeni-knowledge/stable/rules 
  2. Restart indeni (imanage 3)
  3. In order to validate that the rule was properly loaded, check:
    /usr/share/indeni/logs/rules/general.log
  4. tail -f /usr/share/indeni/logs/rules/FortinetHaMonitorLinkStatusRule.log where the log filename is the "rule_name"
    1. This will update the log file and give you information if the rule is running. First line should be
    2. INFO [2019-11-04 18:51:54,963] com.indeni.server.rules.manager.factory.FileSystemRuleFactory: Successfully loaded rule from class 'com.indeni.server.rules.library.templatebased.fortinet.FortinetHaMonitorLinkStatusRule'
    3. That and the timestamp should show you that the rule successfully loaded, if you are overwriting an existing rule you can change some info in the rule to verify in the UI that YOUR version loads, for example rule_friendly_name

An example where the rule to be loaded is called FortinetHaMonitorLinkStatusRule.scala

I placed it in /usr/share/indeni-knowledge/overwrite/rules/templatebased/fortinet

To see when a rule is run

You can watch a rule as it is run and also see if it triggers an alert. You can do this by tailing the log for the rule you want to watch. To do this, use the following command, where the rule name (e.g., FortinetHaMonitorLinkStatusRule) is the rule_name of the rule in the actual yaml file, not the filename. In newer rules, the rule name and the filename are the same, but this is not the case for older rules, so double-check this part!

Check if a rule is triggered
indeni@indeni-server: tail -f /usr/share/indeni/logs/rules/FortinetHaMonitorLinkStatusRule.log

This will follow the log file for this rule and give you updates as the rule is run:

Example output
INFO  [2019-02-05 21:07:45,748] com.indeni.server.rules.manager.jobs.RuleJob: Evaluation started
INFO  [2019-02-05 21:07:45,751] com.indeni.server.rules.manager.jobs.RuleJob: Finished evaluation, handling alerts
INFO  [2019-02-05 21:07:45,774] com.indeni.server.rules.manager.alerts.Alerter: Alert created: 5e9a1bf3-e210-4968-8bae-11fe47173573 - ERROR - FG-v6-cluster (10.11.93.12) - Firewall cluster monitor interface problem
INFO  [2019-02-05 21:07:45,780] com.indeni.server.rules.manager.jobs.RuleJob: Done
INFO  [2019-02-05 21:08:45,748] com.indeni.server.rules.manager.jobs.RuleJob: Evaluation started
INFO  [2019-02-05 21:08:45,752] com.indeni.server.rules.manager.jobs.RuleJob: Finished evaluation, handling alerts
INFO  [2019-02-05 21:08:45,752] com.indeni.server.rules.manager.jobs.RuleJob: Done