Increased Error Rates and Latencies in the ZenML Pro Dashboard and Workspace Operations

Incident Report for ZenML Public Services

Resolved

On the 28th of October 2025, between approximately 11:30 PM GMT+1 and the early hours of the morning, ZenML Pro experienced degraded performance affecting dashboard availability. Some pages loaded slowly or intermittently returned errors, which were typically resolved upon retry. The issue was traced to an AWS service disruption causing intermittent IAM authentication failures and ECS task launch errors in the US-EAST-1 region (see https://health.aws.amazon.com/health/status?eventID=arn:aws:health:us-east-1::event/MULTIPLE_SERVICES/AWS_MULTIPLE_SERVICES_OPERATIONAL_ISSUE/AWS_MULTIPLE_SERVICES_OPERATIONAL_ISSUE_30422_580368C1278). Normal operation resumed following AWS’s resolution of the underlying incident at around 7:00 AM GMT+1 the following day (October 29th).
Posted Oct 28, 2025 - 22:30 UTC