IND scripts are deployed in /usr/share/indeni-knowledge/stable/ind/parsers/src/
To add your own script to the server, you need to copy the script into the /usr/share/indeni-knowledge/overwrite/ind/parsers/src/
For example, if you are fixing a bug in parsers/src/checkpoint/firewall/detecting-cluster.ind, to test it on your local indeni server, you would copy it to that server at /usr/share/indeni-knowledge/overwrite/ind/parsers/src/checkpoint/firewall/detecting-cluster.ind and restart the indeni server (imanage 3)
If you want, you can overwrite the entire script hierarchy. You might do this if you want to code directly on the server, but keep in mind that this creates a 'dirty' server environment.
If you update your knowledge version and forget to clean up your overwrite directory, it could get very confusing
Be aware that when new scripts are placed in the /usr/share/indeni-knowledge/overwrite/ind/parsers/ folder they only overwrite scripts that have the same "command name" in the META section of the script. The scripts they overwrite are the ones /usr/share/indeni-knowledge/stable/ind/parsers/src/. This can cause issues when renaming *.ind script files, or, in general, when using the overwrite directory. You may end up with two different *.ind scripts (different file names), both with the same command name (in the META section). In this case, the script in /stable/ind/parsers/src/ folder and the script in the /overwrite/ind/parsers/ folder would both execute and write data for the same metric – generally not what you want. |
curl -G -k -u "admin:admin123!" "https://localhost:9009/api/v1/metrics" --data-urlencode "query=(im.name==config-unsaved and device-id=='f8dccd39-fc7f-4e41-aa03-81965c9c9fde')" | python -m json.tool |
You got an alert, but it doesn't make sense? How did it happen? What was the metrics' values when it was generated?
You can access the database and pull double metrics yourself. SSH into the Indeni server and use this: (the last part of the line is using sed to get each device on a separate line for improved readability)
Notes:
Replace
config-unsaved with the double-metric you'd like to fetch.
device-id with the device ID. To find your device id, run the following command first (it'll dump the list of devices from the database):
psql -c "select id,name,ip_address from device;" |
The output of a metric query looks like this:
[ { "type": "ts", "tags": { "im.dstype": "gauge", # i.m. --> "Indeni Metric" Data Storage Type "im.dstype.displaytype": "boolean", "im.name": "config-unsaved", # Indeni Metric Name "im.step": "300", # Monitoring interval (from script META) in seconds "device-id": "a158058a-afa4-48f7-bbbe-a789ddc82ed7", "live-config": "true", "display-name": "Configuration Unsaved?", }, "points": [null,null,null,null,null,null,null,null,null,null,null,1.0,null,null,null,null,0.0,null,null,null,null,0.0,null,null,null,null,null,0.0,null,null,null,null,0.0,null,null,null,null,0.0,null,null,null,null,0.0,null,null,null,null,0.0,null,null,null,null,0.0,null,null,null,null,0.0,null,null,null], "start": 1484481240000, "step": 60000 } ] |
Which shows you the metric's tags, as well as the points, in a "time series", themselves. Ignore the nulls. The series shows data collected in 1 minute intervals, according to the "step": 60000 (60,000 miliseconds) field. The script monitoring interval is shown in the "im.step": "300" field, in seconds (so 5 mins).
That's why you see a non-null entry every ~5 nulls (since the script started running -- to start with, the entire series is null).
In the above, you'll notice that earlier in the series, the metric was 1.0, then later 0.0, and stayed 0.0 for the rest of the period.
The points are sorted by oldest-first, so the first entry in the list is the oldest one. If you've used the command above, it means the oldest is roughly an hour old.
Some of the metrics are written to an in-memory DB; in this case you can just restart the indeni-server and the metrics will be deleted. To restart the server, from your Indeni server CLI, you can run imanage 3.
Other metrics are written to a filesystem DB at /usr/share/indeni/db/trends/
Deleting sub-directories here (organized by device UUID) will delete any metrics stored here.
When an alert is acknowledged it will not re-appear even if the issue is ongoing.
The issue needs to get resolved and then re-appear for the alert to trigger again.
If you would like to re-trigger an alert, you can delete the alert (or all alerts) from a device, and start your test over again.
To delete a device's alerts, you need to use the command-line on the indeni server and connect to the Postgres database there:
indeni@ind-local:~$ psql indeni <-- start the posgres shell against the indeni db indeni=> select id, name, ip_address from device; <-- list the devices connected to the indeni server (to find your device id) id | name | ip_address --------------------------------------+--------------------------+---------------- 9b5f66e5-3e10-4363-9973-c3dc0478bf9c | chkp-lab-CP-MGMT1-2 | 192.168.194.31 c25acab5-9d15-4e6c-9cab-a33049534a72 | chkp-lab-CP-GW1 | 192.168.194.36 ... indeni=> select id from device where name = 'chkp-lab-CP-GW1'; <-- another command to find the device id by device name id -------------------------------------- c25acab5-9d15-4e6c-9cab-a33049534a72 indeni=> delete from alert where device_id = 'c25acab5-9d15-4e6c-9cab-a33049534a72'; <-- delete ALL alerts for this device (using YOUR device id, not this one :) indeni=> select alert_id, headline from alert where device_id = 'c25acab5-9d15-4e6c-9cab-a33049534a72'; <-- list alerts for a given device alert_id | headline ----------+---------------------------------------- 207 | Clock set incorrectly 228 | Device not responding ... indeni=> delete from alert where alert_id = 207; <-- delete an alert with a specific id indeni=> \q <-- quit the psql shell # A few other useful commands: indeni=> \dt+ <-- list the tables in the db List of relations Schema | Name | Type | Owner | Size | Description --------+----------------------------------------+-------+--------+------------+------------- public | alert | table | indeni | 72 kB | ... public | device | table | indeni | 16 kB | ... indeni=> \d+ device <-- list the columns in the "device" table |
Query the in-memory db using a REST API call
curl -G -k -u "<user>:<pwrd>" https://localhost:9009/api/v1/devices/<your-device-id>
Find your device id, at the indeni CLI
psql -c "select id,name,ip_address from device;"
e.g.,
curl -G -k -u "admin:admin123!" https://localhost:9009/api/v1/devices/b884a3ff-4747-4583-9eec-e6dab6825851 |
First, just keep in mind that Indeni engineering can change the log file names, structure, and log data at any time. This is just a general guide.
find /usr/share/indeni* -name '*.log' -exec grep -H --color <your-script-command-name> {} +
<your-script-command-name> comes from the 'name:' field in the #! META section of your script (not the .ind file name).
Here is some of the kinds of info you can find in the logs. Here I'm searching for IND script named "hawkeye-test-script".
/usr/share/indeni-collector/logs/devices/192.168.194.42.log:INFO [2018-09-17 16:33:17,339] com.indeni.collector.actors.SchedulerActor$$anonfun$scheduling$1: Command 'hawkeye-test-script' scheduled to run in 0 seconds
/usr/share/indeni-collector/logs/devices/192.168.194.42.log:INFO [2018-09-17 16:38:20,537] com.indeni.commands.execution.CommandProcessing: Command hawkeye-test-script ran as monitoring and returned 2 metrics: hawkeye-test-double (1), hawkeye-test-object (1)
INFO [2018-09-17 16:37:52,053] commands_monitoring: type=TIMER, name=lab-CP-GW4-2(192.168.194.42)-hawkeye-test-script, count=0, min=0.0, max=0.0, mean=0.0, stddev=0.0, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0, mean_rate=0.0, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second, duration_unit=milliseconds
INFO [2018-09-17 16:32:51,964] com.indeni.commands.knowledge.KnowledgeBase: Compiled 'hawkeye-test-script' command from '/usr/share/indeni-knowledge/overwrite/ind/parsers/src/checkpoint/firewall/simple-test-script.ind'
INFO [2018-09-17 16:32:52,008] com.indeni.commands.knowledge.KnowledgeBase: Monitoring commands names = Set(cpvsx-vpn-check-tunnels-vsx, chkp-fw-tab-stats-novsx, radware-hwFanStatus...
Not all scripts run at once. In fact, most of the scripts run sequentially (not concurrently), and that is for two main reasons:
There is indeed the possibility in which a command might “starve” the connection to a device by taking too long to run, and thus prevent from other commands from running. We currently have only a very basic mechanism to protect against that and we are definitely thinking of how to tackle this better in the future.
The rules are not visible in the indeni server like the collector scripts are. Instead you should follow these steps:
In this example we will use a rule called FortinetHaMonitorLinkStatusRule.yaml.
The rule_friendly_name is also called "FortinetHaMonitorLinkStatusRule"
An example where the rule to be loaded is called FortinetHaMonitorLinkStatusRule.scala
I placed it in /usr/share/indeni-knowledge/overwrite/rules/templatebased/fortinet
You can watch a rule as it is run and also see if it triggers an alert. You can do this by tailing the log for the rule you want to watch. To do this, use the following command, where the rule name (e.g., FortinetHaMonitorLinkStatusRule) is the rule_name of the rule in the actual yaml file, not the filename. In newer rules, the rule name and the filename are the same, but this is not the case for older rules, so double-check this part!
indeni@indeni-server: tail -f /usr/share/indeni/logs/rules/FortinetHaMonitorLinkStatusRule.log |
This will follow the log file for this rule and give you updates as the rule is run:
INFO [2019-02-05 21:07:45,748] com.indeni.server.rules.manager.jobs.RuleJob: Evaluation started INFO [2019-02-05 21:07:45,751] com.indeni.server.rules.manager.jobs.RuleJob: Finished evaluation, handling alerts INFO [2019-02-05 21:07:45,774] com.indeni.server.rules.manager.alerts.Alerter: Alert created: 5e9a1bf3-e210-4968-8bae-11fe47173573 - ERROR - FG-v6-cluster (10.11.93.12) - Firewall cluster monitor interface problem INFO [2019-02-05 21:07:45,780] com.indeni.server.rules.manager.jobs.RuleJob: Done INFO [2019-02-05 21:08:45,748] com.indeni.server.rules.manager.jobs.RuleJob: Evaluation started INFO [2019-02-05 21:08:45,752] com.indeni.server.rules.manager.jobs.RuleJob: Finished evaluation, handling alerts INFO [2019-02-05 21:08:45,752] com.indeni.server.rules.manager.jobs.RuleJob: Done |