An Austrian financial service provider with over 1.2 million customers wanted to future-proof its IT landscape. To achieve this,
a state-of-the-art Cloud Native platform was to be built to serve as the target infrastructure for numerous
cloud-native microservices. Modern DevOps standards,
automation concepts, observability practices, and security best practices were to be consistently implemented.
The underlying base infrastructure consisted of a VMware vSphere cluster with an integrated storage cluster.
The vSphere compute cluster operates virtual machines, while the storage cluster provides both virtual disks and
backing disks for Kubernetes persistent volumes.
The platform was to be comprehensively built on this vSphere environment and include all components necessary for
productive operation:
Based on the least-privilege access model, Ansible and Packer were used to provision the base clusters on which
Rancher operates as the central Kubernetes management tool. From these base clusters, multiple downstream clusters
were then automatically created, providing different environments (Development, Staging, Production). Nearly the entire infrastructure – from
clusters to network and platform components to central services – was managed declaratively with Terraform.
For secure credential management and identity management, HashiCorp Vault and Keycloak were
implemented and configured using Infrastructure as Code. All tools were connected to Keycloak as the central identity provider, with existing
users and groups from Active Directory integrated into Keycloak via federation.
In designing the storage concept, high availability and secure data storage were prioritized from the start.
The database clusters (MariaDB) were implemented in a highly available setup within Kubernetes. Container images and other
build artifacts were managed via Sonatype Nexus, which serves both as a container registry and a central caching system. Dynamic
provisioning of persistent storage was implemented using the vSphere CSI driver, while MinIO served as a
highly available S3-compatible object storage for backups.
With Velero and Veeam, a multi-tier backup concept was implemented on two different levels – VM layer and Kubernetes layer.
Backups were stored on a highly available S3 storage (MinIO), which was in turn backed up by Veeam using its
S3 connector and then archived for long-term storage. Together with the fully automated infrastructure setup,
this enabled comprehensive disaster recovery.
For the specific requirements of the Cloud Native environment, a comprehensive monitoring and alerting system was implemented.
Prometheus continuously collects metrics from the infrastructure and applications, Grafana visualizes these in
meaningful dashboards, and Alertmanager sends warning messages and error notifications to Slack and
Webex channels. Additionally, a highly available Elasticsearch cluster was set up, used by various microservices
for logging, audit trails, and search functionality. Kibana serves as the central interface for analyzing and
visualizing log data. Rancher Monitoring complements this solution, providing an integrated overview of
cluster health.
Through the consistent use of GitOps, the infrastructure was not only transparently documented but also maintainably implemented.
The GitOps operator ArgoCD provides developers with a central overview of all installed applications
and enables auditable deployment. The release management concept was designed to allow bundled release bundles to be
rolled out via GitOps. Special emphasis was placed on a comprehensive concept that fully automates both feature deployments and
hotfixes in the infrastructure and individual software components.
The establishment of the Cloud Native platform enabled a fundamentally modernized developer workflow, eliminating
numerous manual process steps. The first deployments of various applications from the financial service provider
demonstrated that both lead times and developer experience have improved significantly.
Through its dynamic design, the environment offers developers diverse opportunities to independently dimension and optimize
resources such as CPU, RAM, and storage for their applications.
The consistently declarative Infrastructure as Code approach in building the infrastructure enables significant time savings in
maintenance and further development. Infrastructure updates can, in many cases, be rolled out through simple changes in the
Git repository.
The new platform thus forms a developer-friendly Cloud Native environment that simultaneously enables
resource-efficient administration. The platform is built according to the most modern security standards and operational concepts
and is fully scalable for future migrations and deployments.