Keep’s incident management system provides a comprehensive solution for handling, tracking, and resolving operational incidents. This system helps teams effectively manage incidents from detection through resolution, ensuring minimal downtime and efficient collaboration.

(1) Incident Severity

Displays the severity of the incident, helping teams prioritize and focus on the most critical issues.

(2) Incident Name

The unique name or identifier of the incident for easy reference and tracking.

(3) Incident Summary (+ AI Summary)

A brief overview of the incident, optionally enhanced with AI-generated summaries to provide deeper insights.

Connects related incidents for better visibility into recurring or interconnected issues.

(5) Involved Services

Lists the services affected by the incident, allowing teams to understand the scope of the impact.

(6) Affected Environments

Specifies the environments (e.g., production, staging) impacted by the incident.

(7) Run Workflow

Quickly initiate workflows to address the incident, such as creating tickets, notifying teams, or executing remediation steps.

(8) Edit Incident

Allows modification of incident details, such as severity, name, or involved services, to keep information up-to-date.

(9) Incident Status

Indicates the current status of the incident (e.g., open, resolved, acknowledged).

(10) Incident Last Seen At

Records the most recent timestamp when the incident was observed, providing context for its activity.

(11) Incident Started At

Indicates when the incident was first detected, helping establish timelines for resolution.

(12) Incident Assignee

Displays the individual or team responsible for resolving the incident, promoting accountability.

(13) Incident Group By Value

Groups incidents based on a specific attribute, such as service, environment, or severity, for better organization.

Lists all alerts linked to the incident, offering a complete view of its underlying causes.

(15) Incident Activity

Tracks all activities and updates related to the incident, enabling detailed audits and reviews.

(16) Incident Timeline

Provides a chronological view of the incident’s lifecycle, including updates, actions, and status changes.

(17) Incident Topology

Visualizes the relationships between affected components, services, and infrastructure in a topology map.

(18) Incident Workflows

Lists workflows associated with the incident, showing actions taken or available options for resolution.

(19) Incident Chat with AI (Incident Copilot)

Engage with AI-powered chat for guidance, insights, or recommended actions related to the incident.

(20) Incident Alert List

Displays a detailed list of alerts contributing to the incident, with metadata for each alert.

Provides quick access to the original monitoring tool for a specific alert.

(22) Incident Alert Status

Shows the current status of each alert, such as acknowledged, resolved, or firing.

(23) Incident Correlation Type

Indicates how the incident was correlated: manually, via AI, or by rule-based logic.

Enables unlinking specific alerts from the incident if they are found to be unrelated.