Health check in Cayosoft Guardian
The Health Check feature in Cayosoft Guardian provides an automated way to assess the health of critical system components, such as system and archive databases, AD Connectors, managed domains, managed tenants, domain controllers, and connection credentials. This helps ensure your environment is functioning as expected and provides early warnings for potential issues.
By default, the Health Check job runs every hour and produces one record per component, which feeds into both the Health Check report and the Cayosoft Guardian alerting pipeline.
Prerequisites
Before relying on Health Check results, confirm that the following requirements are met in the environment being evaluated:
Network ports — The Cayosoft Guardian server, and any AD Connector that proxies the call, must be able to reach each evaluated component. Learn more: Required Ports for Cayosoft Guardian.
Credentials — Every managed domain, managed tenant, and AD Connector must have a valid credential or service account assigned. Connection credentials that are not linked to a connected system are flagged as a warning.
Privileges — The credential used against a domain controller must have the Manage auditing and security log user right for the Group Policy and audit policy portion of the Check Managed Domains step to succeed. Without this permission, the step reports
Health_AuditPolicySkipped, Windows error0x00000522.
Running a health check
You can manually trigger a health check or configure it to run automatically on a schedule.
Go to Settings > Service Settings.
Select Health Check Settings.
Click Run Health Check.
-
In the System Job - Health Check, navigate to the Execution History tab to see the results. These results provide detailed insights into each evaluated component.
What each workflow step checks
The Health Check job is a workflow made of several steps. Each step targets one component type and produces one or more records, such as Success, Warning, Info, or Error, for each object of that type.
Check System Databases - Validates the Cayosoft Guardian product database and history database. For each database, the step opens a master SQL connection to confirm that the SQL Server instance is reachable, verifies that the database owner and schema are intact, and evaluates configured size thresholds so that you are warned before the database fills up. On service startup, it also surfaces any deferred initializer failure.
Check Archive Databases - Validates every archive database used to store Active Directory and Microsoft Entra change history and recovery data. If an archive has not yet been registered with the storage locator, the step attempts to register it. It then verifies the SQL connection and runs the same threshold check as system databases. A failure in this step means that recovery and change-history queries against that archive will fail.
Check AD Connectors - Verifies that each on-premises AD Connector, or worker node, is online. The Guardian service asks the node locator for the connector's last known state and reports success only when the node's heartbeat is current and the node is enabled. Administratively disabled connectors are reported as a warning. Enabled but unreachable connectors are reported as an error, such as ADConnector_Heartbeat_ADConnector_Offline.
Check Managed Domains - Performs an end-to-end validation of every managed domain. For each domain, the step validates the configured domain credential by binding to a connected domain controller, runs an embedded PowerShell script over WinRM to enumerate Group Policy objects and confirm the audit policy backup capability, and inspects the running Event Collection and Active Directory Data Collection jobs to confirm that none of them have exceeded their configured execution-time threshold. A stuck collection job is the usual cause of Health_JobExecutionTimeFailed.
Check Managed Tenants - For every Microsoft 365 or Microsoft Entra ID tenant under management, this step enumerates all credentials attached to the tenant and validates each one against Microsoft Graph for the protected system types in scope, such as Microsoft Entra users and groups, Exchange Online, and Intune. Tenants with no credentials assigned are reported as an error, such as Health_NoManagedTenantCredentials. Credentials whose permissions no longer satisfy the protected system types surface the underlying Microsoft Graph error.
Check Domain Controllers
For every domain controller in every non-excluded managed partition, this step performs the following sub-checks:
LDAP bind — Attempts an LDAP connection to the domain controller using either the partition's configured credential or the Cayosoft Guardian service account. The step reports
AD_LdapConnectionSuccessorAD_LdapConnectionFailed.WinRM round-trip — Opens an encrypted WinRM session and runs a one-line PowerShell command that returns the remote
$env:COMPUTERNAME. The step compares the returned hostname to the expected domain controller name and reportsAD_WinRMConnectionSucceeded,AD_WinRMConnectionFailed, orAD_DCNameIsDifferentif WinRM resolved to a different host than expected.Security event log sizing — Runs an embedded PowerShell script over WinRM that reads the Security event log's current size, configured maximum size, and recent event rate. If the configured maximum size is too small to retain approximately 48 hours of events at the current write rate, an informational record is emitted with a recommended size and a link to the Initiator Discovery article. This enables Cayosoft Guardian's initiator attribution feature. If WinRM is unavailable, only the LDAP sub-check can succeed. The WinRM and event log sub-checks fail, and Guardian features that depend on them, such as initiator attribution, domain controller name verification, event collection sizing guidance, and GPO rollback validation, are unavailable.
Check Connection Credentials - Validates every connection credential defined in Cayosoft Guardian. The step warns if a credential is not linked to any connected system, refreshes the credential cache, and inspects the credential payload. Empty passwords produce
Creds_PasswordMissing. Empty refresh tokens produceCreds_TokenMissing. For Active Directory service account credentials, the step also performs a real logon impersonation against the local Cayosoft Guardian host to confirm that the account can still authenticate.
Report Results - Aggregates the records produced by the previous steps into the Health Check report. This is the data source consumed by the Attach health check reports when sending alerts option.
Sending alerts with health check results
You can configure alerts to be sent automatically when the health check detects errors:
Use the Alerting options field to specify the alert behavior.
Enable Attach health check reports when sending alerts to include a full report in the email.
-
Exclude specific components from generating alerts.
How to disable the health check for a specific component
To skip health check evaluation for specific components, such as domain controllers:
Go to Settings > Service Settings > Health Check Settings.
Click the link for the System Job - Health Check under Health check job.
In the job configuration pane, on the General tab, review the list of Workflow steps.
Locate the component you want to disable, for example, Check Domain Controllers.
Use the toggle next to the component to disable it. Once toggled off, that component is not evaluated during health check runs.
-
Click Save to apply the changes.
NOTE: Disabling a component means that it is not checked for health issues. Use this option only if you intentionally want to exclude it from routine monitoring, for example, during maintenance or when the system is offline by design.
Limitations and warnings
WinRM dependency. The Check Domain Controllers WinRM sub-checks and the Check Managed Domains Group Policy and audit policy sub-check require encrypted WinRM, either HTTP 5985 with message encryption or HTTPS 5986. If WinRM is blocked enterprise-wide, expect these steps to report errors even when LDAP connectivity is healthy. Guardian features impacted include initiator attribution, domain controller name verification, event log sizing guidance, and GPO rollback validation.
Forest Recovery scope. Domain controller health checks are used both during normal hourly Health Check runs for Change Monitoring and during Forest Recovery operations. The Forest Recovery Agent communication and Domain controller health checks during recovery operations items referenced in the Security Guide apply only when the Guardian Forest Recovery module is licensed and in use.
Transient errors are retried automatically. Each component step retries common transient failures, such as
LdapServerUnavailableException,LdapTimeoutException,HeartBeatException,OnPremAgentIsNotAvailableException, andNodeUnavailableException, before reporting an error. Brief network interruptions do not generate noise.Excluded objects. Disabled partitions, disabled tenants, disabled credentials, and any object placed on the exclusion list are skipped silently. Use the exclusion list rather than disabling an entire workflow step when you only need to suppress a single object.
Startup vs. regular mode. Some steps behave differently at service startup. For example, Check Managed Domains reports a worker-unavailable condition as informational rather than as an error during startup to avoid false alarms on boot.
Comments
0 comments
Please sign in to leave a comment.