ESnet’s network is operated as a high-availability network 24 hours per day, 365 days per year, and significant resources are devoted to ensuring its continuous and secure availability. Our staff continues to work on enhancing the technical systems that benefit both our internal team and external production services, to ensure our performance and reliability continue to stay ahead of demand. In 2024, ESnet’s operational achievements included (but were not limited to) nearly quadrupling our total trans-Atlantic bandwidth; improving our tools for gathering, analyzing, and studying network performance data; and further automating the setup of devices on the production network. The increase in network capacity was planned to handle growth in network traffic easily, and we continue to plan for anticipated larger increases in the coming years due to adoption of AI tools.
Strengthening Site and Data Center Resiliency
While ESnet is required to deliver “three 9s” — 99.9% availability — we routinely exceed that standard. In fact, in 2024 ESnet hit a 100% Availability milestone, which was driven in large part by the Site Resiliency Program (SRP). The program’s importance only continues to grow as DOE science becomes more distributed, collaborative, and reliant on cloud-based resources.
The SRP seeks to align each site’s network resilience to its mission needs, supporting highly available, high-performance connectivity”. In 2024, three sites improved their resiliency scores, and progress continued on major multi-year upgrades, including the complex Oak Ridge area connectivity project. Many efforts will carry forward into 2025.
Beyond site connectivity, ESnet advanced operational resiliency through our High Availability Location (HAL) initiative. For over a decade, ESnet has maintained redundant data centers at Berkeley Lab in California and Brookhaven National Laboratory in New York. However, the cross-country separation incurred latency and required manual intervention during some outages.

The HAL project addresses this by adding a San Jose facility that enables automated, real-time synchronization with Berkeley and dramatically reduces recovery times. In 2024, ESnet deployed secure overlay networks between the two sites and created a “stretched” VMware cluster, allowing services to migrate seamlessly between data centers with minimal downtime.
In 2025, ESnet will begin transitioning production services into this environment, improving availability, reducing operational complexity, and strengthening the resilience of critical systems that support DOE’s science mission.
perfSONAR Expands Global Reach and Capabilities
perfSONAR — the performance Service-Oriented Network monitoring ARchitecture — is a global platform for measuring and monitoring end-to-end network performance. ESnet is a core partner in the perfSONAR collaboration, with Internet2, Indiana University, GÉANT, the University of Michigan, and Brazil’s RNP. Each contributes staff, computing, and network resources. Today, perfSONAR is deployed at over 2,000 locations worldwide. In 2024, major new deployments supported critical science projects:
- The Vera Rubin Observatory used perfSONAR nodes to identify and resolve packet loss issues.
- USDA’s Agricultural Research Service added 65 nodes to its SCInet research network.
- The Square Kilometre Array began prototyping a large-scale deployment for its radio telescope sites in Australia and South Africa.
The release of perfSONAR 5.1.0 introduced a redesigned, more flexible user interface, enhanced host instrumentation, and integration of multi-threaded iperf3 (an ESnet innovation) to support 100 Gbps network interfaces. Community engagement remained strong, with the partnership team holding quarterly office hours, running well-attended workshops at TNC24 in France and Internet2’s Technology Exchange in Boston, and offering hands-on training in deploying and customizing perfSONAR.
perfSONAR continues to prove its value for high-demand science communities such as that of the LHC projects, for which it supports custom alerting systems and resolves performance issues. In 2024, the perfSONAR community helped troubleshoot problems ranging from packet loss to quality-of-service gaps, ensuring that research networks worldwide operate at peak performance.

Visualizing and Mapping the Network in Real Time
The my.es.net portal shows real-time graphs, giving ESnet’s stakeholders a real-time view into their data flows from sites to facilities. In 2024, the Measurement & Analysis (M&A) team delivered 12 portal releases (two minor, ten patches), introducing OpenTelemetry-based logging, OS upgrades from CentOS to Ubuntu, API updates, and real-time analytics support for the LHC Data Challenge. Looking ahead, ESnet plans to enhance my.es.net with federated identity login and Grafana-based custom visualizations, enabling users to see project- and institution-specific data.
Beyond the portal, ESnet advanced its network mapping and visualization capabilities through two key tools:
- ESnet Network Map Panel – Renders network topologies from JSON data, overlays traffic information, and works as a Grafana plugin, React component, or standalone HTML object.
- Terranova – Generates JSON topology data from diverse sources via a flexible plugin architecture.

In 2024, the Network Map Panel had 19 releases, adding multi-source topology loading, support for up to 20 data layers, and an end-to-end test harness. It was also published as a public Grafana plugin, now adopted by partners including GÉANT. Terranova saw eight releases, adding new data translation backends, Google Sheets import tools, and testing frameworks. These open-source tools are now widely used across the global research and education networking community, with ESnet showcasing them at SC24 and other venues.
Fine-Tuning Network Orchestration and Automation
In a high-performance science network such as ESnet’s, orchestration and automation are critical for speed, reliability, and scalability. Automating complex network and service configurations reduces human error, accelerates service deployments, and ensures consistent, predictable performance.
ESnet Database (ESDB) and Kubernetes Development: In 2024, the Orchestration & Core Data (OCD) and Platform Engineering teams streamlined deployment by migrating from Ansible to Kubernetes Helm charts, moving closer to Continuous Integration/Continuous Deployment. Developers also began migrating to Okteto, enabling standardized Kubernetes-based development environments, “hot-reload” capabilities, and cloud-hosted dev resources — saving hundreds of hours in debugging and reducing hardware costs.
Service Orchestration and Automation: The new Device Base Config service now allows engineers to create, test, and deploy standardized “day one” device configurations in minutes instead of weeks, improving reliability and paving the way for future orchestration enhancements. The Late Protocol Transitions (LPT) project automated the migration of all remaining customers to dedicated virtual route forwarding tables, reducing cybersecurity risk and service disruption.
Enterprise User Interface: The NextGen ES Enterprise (ESE) GUI modernized workflows for ESDB and OSCARS, improving performance (the Lighthouse score went from 72 to 97) and accessibility (72 to 100). New features include enhanced search, record history, relational lists, and circuit visualizations. The new OSCARS LSP Builder lets users create optimized Layer 2 VPN paths with custom constraints directly from the interface.
Topology and Discovery Services: The new Discovery Service consolidates network data collection into a single, scalable, Kubernetes-based system. It normalizes device configurations, provides real-time updates via an asynchronous message bus, and maintains historical records for trend analysis and potential AI-driven optimization. This reduces redundant polling, improves data consistency, and gives applications a coherent, up-to-date view of the network.
In 2024, these orchestration and automation advances significantly improved ESnet’s operational efficiency, reliability, and ability to adapt quickly to evolving science requirements.
Reducing Vulnerability and Tightening Up Security
In preparation for ESnet’s first DOE Enterprise Assessments (EA-60) audit, scheduled for mid-2025, the Security team overhauled documentation, policies, and risk management processes. This included developing a formal Risk Assessment Self-Assessment (RASA), updating the enclave security plan, and producing hundreds of pages of documentation aligned with Berkeley Lab standards.
Significant progress was made on Zero Trust initiatives, particularly in Identity and Access Management (IAM). ESnet developed code and processes to transition from Feitian tokens to Yubikeys, enabling future NIST AAL3 Single Sign-On with phishing-resistant WebAuthn. Most web services migrated to the new SSO in 2024, and a Cloudflare pilot began to protect externally facing applications and reduce VPN dependency.
The Security team launched a new vulnerability management program, replacing Insight VM with Nessus, adding IPv6 scanning, and creating robust tracking workflows. ESnet deployed Endpoint Detection & Response (EDR) via Crowdstrike to laptops, workstations, and servers, and upgraded all firewalls as part of a Zero Trust–based data center refresh supporting IPv6-only networking and automated micro-segmentation.
Application security was strengthened through ESnet’s first penetration tests of key internal applications (ESDB and SCRAM), which found no critical issues. Routing security was improved with Resource Public Key Infrastructure (RPKI) support for customers, and Splunk’s Security Orchestration and Automated Response (SOAR) tools were implemented to enhance alerting and analysis.
These 2024 efforts significantly hardened ESnet’s security posture, streamlined identity management, and positioned the organization for a successful 2025 audit.