Akka 24.05 - fully replicated entities across regions, edge, and clouds
In Akka 24.05 we developed the event replication for active-active entities to support edge connectivity. This blog describes how active-active entities can solve low latency and high availability demands for multi-region, multi-cloud, and cloud-edge topologies.
Let’s say you have a cluster on the US west coast but there are end users in Europe. The geographical round-trip latency would be at least 150 milliseconds, which is not particularly low by today's standards. Additionally, network or service disruption of the centralized service may cause costly outages for the European business. How can we deploy this application to the US and Europe and still share the data between the locations?
With Akka, each entity has an identity. For each entity identity, there is a single instance that holds the system of record and processes the requests for that entity. If there is only one instance of an entity can it be in the US and Europe at the same time? And avoid 150 milliseconds latency? Actually, yes.
With Akka, there can be more than one instance of the same entity and it can have replicas in other locations. Those replicas are not a read-only cache that can only be updated at one location. They are truly autonomous entities that can handle both the read and write operations of the entity. This is called active-active because the entity replicas can handle concurrent updates at multiple places at the same time. As an example, there can be one entity in US west, serving nearby end users with low millisecond response time. A replica of this entity can be in Europe, with low response time for European end users.
Another outcome of this architecture is better resilience and higher availability. If there is an outage at one location the traffic can be routed to another location with replicas of the same entities.
The changes to the entity are captured as events, which are replicated to corresponding entity replicas at other locations. Since we want low latency updates, the updates are sent to the other replicas asynchronously, which also means that the replicas don’t have to be connected to each other all the time. If they lose network connectivity for a while, the events will be replicated later, when they are connected again. Akka will automatically take care of this reliable event replication mechanism.
The application logic for updating the entity state from the events must be designed to handle concurrent, independent, updates from other replicas. The end goal is that the state would eventually become the same at all replicas after receiving all events. This topic of how to handle conflicting updates with Conflict Free Replicated Data Types (CRDTs) or application logic is covered in an earlier Active-Active blog post. It also shows what an event-sourced replicated entity looks like in code.
Akka active-active entities are not limited to replication between cloud regions. They can also be used to build applications spanning multiple clouds, hybrid clouds, or edge Point-of-Presence (PoP).
In Akka 24.05 we have added support for active-active entities at the edge, where it is assumed that the edge must establish the connection between edge and cloud.
With millions of entities, replicated to hundreds of locations, wouldn’t that result in high network traffic for the replication and redundant storage? Exactly! At large scale this would be a major problem. The solution is that you can control where the entities should be replicated by defining filters. For example, you might have a cloud backbone of 3 regions where all entities are replicated to all regions. Then you can have hundreds of edge PoPs, and an entity is replicated to one of those PoPs, in addition to the cloud, because the main interaction with the entity is from that location. Filters are dynamic and can be changed at runtime without redeploying your app, giving you powerful control over your application topology. With this an entity can move between PoPs or reside at a few but not all PoPs at the same time. You can express any topology you may need, at runtime.
To sum it up, Akka’s active-active entities give you:
- resilience to tolerate failures in one location and still be operational, even multi-cloud redundancy
- possibility to serve requests from a location near the user to provide better responsiveness, even from edge PoP
- support for updates to an entity from several locations at the same time–active-active
- a way to build active-passive, or hot-standby entities
- load balancing over many servers
You can learn more about Akka active-active from cloud to edge in the documentation.
Related Reading