Thanks to Ido Raday for this amazing troubleshooting guide!

Introduction

This is an interactive guide for debugging the auto triage process.

There are several requirements:

For any question please reach out to the server team

Was The issue created?

The Auto triage process is triggered by the corresponding issue.

Before we debug the auto triage we first need to make sure that a new issue created.

We also need to keep the alert_id for future steps.


Does the issue have Auto triage details?

You can identify an issue with auto triage by the symbol next to the headline

The issue was not created!

If the issue was not created it can be due to several reasons:

Make sure to find the reason and only after fixing it continue to the next step

Verify the Auto triage process result

Every issue should trigger event in the Auto triage service.

The auto triage will run the corresponding playbook (if available) and will send the result back to the server.

This step will check the result the server got from the playbook,

\x
select * from automation_job where alert_id = 'ALERT_ID';


Is Auto Triage feature enabled?

It seems like the auto triage process did not run.

There are two ways to enable the Auto triage feature:

Enable the feature flag



Auto triage process result Was a success

It seems like the auto triage process worked on the server-side.

If the issue does not show the alert it can be due to UI issue, please open a ticket to application team


Auto triage process result Was a Not_Available

It seems like the auto triage process did not find a playbook to run for the issue.

Search the triage process log for alert_id for example:

less /usr/share/indeni-services/logs/automation.log | grep  0c7968cd-8df9-4272-852a-ebd3bea2b130 -A20

Find the log block the related to the alert_id, for example:

2019-10-05 19:53:38,263 - INFO - automation_registration.py - New automation request, alert_id: 0c7968cd-8df9-4272-852a-ebd3bea2b130, device_id: 270d7888-ede5-419d-b968-ab45c8a08c07, rule_name: DeviceMonitoringSuspended, vendor_name: paloaltonetworks
2019-10-05 19:53:38,264 - INFO - playbook_catalog.py - Get playbook for rule: DeviceMonitoringSuspended  vendor: paloaltonetworks
2019-10-05 19:53:38,264 - INFO - playbook_catalog.py - playbook for rule: DeviceMonitoringSuspended  vendor: paloaltonetworks is None
2019-10-05 19:53:38,264 - INFO - automation_registration.py - New job created, job_id: d8dad379-c130-4135-9c1a-35ce52fd201d, alert_id: 0c7968cd-8df9-4272-852a-ebd3bea2b130, device_id: 270d7888-ede5-419d-b968-ab45c8a08c07, playbook_file: None

If the data does not as expected - goto “Open ticket for Server team”

Auto triage process failed with error code FILE_IS_MISSING

The error FILE_IS_MISSING indicated that the playbook file is missing. 

Auto triage process failed with error code Unable_To_Get_Device_Data

The error Unable_To_Get_Device_Data indicated that the Auto triage service did not manage to get the credentials from the server. Those cases can be when, for example, indeni only needs ssh credentials for interrogation but the playbook requires HTTP credentials

Auto triage process failed with error code INSUFFICIENT_DATA_ACQUIRED

The error INSUFFICIENT_DATA_ACQUIRED indicated that the Auto triage service did not manage to extract enough data for the server. This error will appear when:

This behavior will happen when the playbook exited in an unexpected way without throwing an exception.

 

Auto triage process failed with error code PLAYBOOK_RUN_FAILED

The error PLAYBOOK_RUN_FAILED indicated that the ansible process failed while running the playbook

Reading the ansible logs

The ansible logs are in /usr/share/indeni-services/logs/ansible

Open ticket for Server team

When opening a ticket please include:

Also, make sure you write in details all the steps that you went and all the data you collected