A CERN
success story
Akka helps CERN to keep groundbreaking physics experiments running smoothly
The need
CERN operates seven large particle accelerators, including the Large Hadron Collider (LHC). With a circumference of 27 kilometers, making it by far the largest machine ever constructed, the LHC contains more than 10,000 superconducting magnets to accelerate and steer beams of subatomic particles. To keep the LHC and other accelerators working reliably to serve the scientific community, CERN gathers, stores, and analyzes more than 2.5 million signals (3.5TB of data daily) from operational devices that make up the Accelerator Control System.
Vito Baggiolini, leader of the CPA Section in the CERN Accelerator Controls Group, says: “We mainly collect information on how the hundreds of sub-systems and tens of thousands of devices in the accelerators are working, and on the characteristics of the particle beam, such as its shape and intensity. The data we store helps accelerator experts to ensure that the whole complex of accelerators is functioning reliably and to the expected standards at all times.”
Already in the early 2000s, CERN built its own highly available, resilient, and robust data acquisition and storage system: CALS. Based on a high-performance relational database, this system had scaled well from its first deployment in 2003, but to cope with the increasing need for more complex and long-running analyses, it became clear that a new approach was needed.
The challenge
In 2017, CERN set out to create its next-generation replacement: NXCALS. This would use Spark on a Hadoop cluster for distributed execution of advanced analytical algorithms.
The key technical requirements for the data acquisition and ingestion system within NXCALS were: exceptional fault-tolerance and resilience to manage process failures without any data losses or other service disruption; the ability to load-balance data subscriptions evenly across processes to avoid overloads; and the ability to scale rapidly and easily by adding new system resources to handle new sources of data.
Marcin Sobieszek, senior computing engineer within the CERN Accelerator Controls Group, says: “We aimed to implement a highly distributed application in which workload would be spread horizontally among nodes. We wanted the ability to dynamically add resources to the process cluster, along with failure detection and notification capabilities. For this, the ability to persist the state of this distributed system on external storage was vital.”