Advancing Azure Virtual Machine availability monitoring with Project Flash
Flash, as the project is internally known, is a collection of efforts across Azure Engineering, that aims to evolve Azure’s virtual machine (VM) availability monitoring ecosystem into a centralized, holistic, and intelligible solution customers can rely on to meet their specific observability needs. Today, we’re excited to announce the completion of the project’s first two milestones—the preview of VM availability data in Azure Resource Graph, and the private preview of a VM availability metric in Azure Monitor.
What is Project Flash?
Project Flash derives its name from our commitment to building robust and rapid ways to monitor virtual machine (VM) availability as comprehensively as possible—a key prerequisite for efficient application performance. It’s our mission to ensure you can:
- Consume accurate and actionable data on VM availability disruptions (for example, VM reboots and restarts, application freezes due to network driver updates, and 30-second host OS updates), along with precise failure details (for example, platform versus user-initiated, reboot versus freeze, planned versus unplanned).
- Analyze and alert on trends in VM availability for quick debugging and month-over-month reporting.
- Periodically monitor data at scale and build custom dashboards to stay updated on the latest availability states of all resources.
- Receive automated root cause analyses (RCAs) detailing impacted VMs, downtime cause and duration, consequent fixes, and similar—all to enable targeted investigations and post-mortem analyses.
- Receive instantaneous notifications on critical changes in VM availability to quickly trigger remediation actions and prevent end-user impact.
- Dynamically tailor and automate platform recovery policies, based on ever-changing workload sensitivities and failover needs.
Comments