Exploring fault tolerance, high availability, savepoint management, and observability when running Flink on Kubernetes clusters.