{"id":6692,"date":"2025-10-01T05:09:12","date_gmt":"2025-10-01T05:09:12","guid":{"rendered":"https:\/\/annualreports.es.net\/?page_id=6692"},"modified":"2025-10-15T04:30:30","modified_gmt":"2025-10-15T04:30:30","slug":"2024-operational-innovations","status":"publish","type":"page","link":"https:\/\/annualreports.es.net\/index.php\/2024-operational-innovations\/","title":{"rendered":"2024 Operational Innovations"},"content":{"rendered":"\n<div class=\"wp-block-obb-width-block organic-block obb-width clearfix\" style=\"max-width:2400px\"><div class=\"obb-width-content\">\n<div class=\"wp-block-group has-black-color has-text-color has-link-color wp-elements-3fdd099d3a4910bfa8b57ff0ccdaf8b3 is-layout-constrained wp-block-group-is-layout-constrained\">\n<pre class=\"wp-block-preformatted has-white-color has-custom-blue-800-background-color has-text-color has-background has-link-color has-medium-large-font-size wp-elements-2bba38dd6d3367b3d1c657f8c4e03aaa\" style=\"border-style:none;border-width:0px;border-radius:0px;margin-top:0;margin-bottom:0\">ESnet\u2019s network is operated as a high-availability network 24 hours per day, 365 days per year, and significant resources are devoted to ensuring its continuous and secure availability. Our staff continues to work on enhancing the technical systems that benefit both our internal team and external production services, to ensure our performance and reliability continue to stay ahead of demand.&nbsp;\n\nIn 2024, ESnet\u2019s operational achievements included (but were not limited to) <a href=\"https:\/\/annualreports.es.net\/index.php\/highlight-trans-atlantic-milestone\/\" data-type=\"page\" data-id=\"6661\">nearly quadrupling our total trans-Atlantic bandwidth<\/a>; improving our tools for gathering, analyzing, and studying network performance data; and further automating the setup of devices on the production network. The increase in network capacity was planned to handle growth in network traffic easily, and we continue to plan for anticipated larger increases in the coming years due to adoption of AI tools.<\/pre>\n\n\n\n<h3 class=\"wp-block-heading has-source-sans-3-font-family\" id=\"srp\"><strong>Strengthening Site and Data Center Resiliency<\/strong><\/h3>\n\n\n\n<div class=\"wp-block-group is-nowrap is-layout-flex wp-container-core-group-is-layout-6c531013 wp-block-group-is-layout-flex\">\n<p class=\"has-source-sans-3-font-family\">While ESnet is required to deliver \u201cthree 9s\u201d \u2014 99.9% availability \u2014 we routinely exceed that standard. In fact, in 2024 <a href=\"https:\/\/annualreports.es.net\/index.php\/highlight-100-availability\/\" data-type=\"page\" data-id=\"6678\">ESnet hit a 100% Availability milestone<\/a>, which was driven in large part by the Site Resiliency Program (SRP). The program\u2019s importance only continues to grow as DOE science becomes more distributed, collaborative, and reliant on cloud-based resources.&nbsp;<br><br>The SRP seeks to align each site&#8217;s network resilience to its mission needs, supporting highly available, high-performance connectivity&#8221;. In 2024, three sites improved their resiliency scores, and progress continued on major multi-year upgrades, including the complex Oak Ridge area connectivity project. Many efforts will carry forward into 2025.<br><br>Beyond site connectivity, ESnet advanced operational resiliency through our High Availability Location (HAL) initiative. For over a decade, ESnet has maintained redundant data centers at Berkeley Lab in California and Brookhaven National Laboratory in New York. However, the cross-country separation incurred latency and required manual intervention during some outages.<\/p>\n\n\n\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69d6cd62a1b2c&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69d6cd62a1b2c\" class=\"wp-block-image size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"996\" height=\"1024\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/operations-High-touch-anne-racks_1298_1200px-996x1024.jpg\" alt=\"Person adjusting hardware on a rack\" class=\"wp-image-6693\" style=\"object-fit:cover\" srcset=\"https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/operations-High-touch-anne-racks_1298_1200px-996x1024.jpg 996w, https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/operations-High-touch-anne-racks_1298_1200px-292x300.jpg 292w, https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/operations-High-touch-anne-racks_1298_1200px-768x790.jpg 768w, https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/operations-High-touch-anne-racks_1298_1200px.jpg 1200w\" sizes=\"auto, (max-width: 996px) 100vw, 996px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\">An ESnet Data and Facilities team member replaces an EJFAT load balancer.\n<\/figcaption><\/figure>\n<\/div>\n\n\n\n<p>The HAL project addresses this by adding a San Jose facility that enables automated, real-time synchronization with Berkeley and dramatically reduces recovery times. In 2024, ESnet deployed secure overlay networks between the two sites and created a \u201cstretched\u201d VMware cluster, allowing services to migrate seamlessly between data centers with minimal downtime.<\/p>\n\n\n\n<p>In 2025, ESnet will begin transitioning production services into this environment, improving availability, reducing operational complexity, and strengthening the resilience of critical systems that support DOE\u2019s science mission.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"perfsonar\"><strong>perfSONAR Expands Global Reach and Capabilities<\/strong><\/h3>\n\n\n\n<p><a href=\"https:\/\/www.perfsonar.net\/\" data-type=\"link\" data-id=\"https:\/\/www.perfsonar.net\/\" target=\"_blank\" rel=\"noreferrer noopener\">perfSONAR<\/a> \u2014 the <strong>p<\/strong>erformance <strong>S<\/strong>ervice-<strong>O<\/strong>riented <strong>N<\/strong>etwork monitoring <strong>AR<\/strong>chitecture \u2014 is a global platform for measuring and monitoring end-to-end network performance. ESnet is a core partner in the perfSONAR collaboration, with Internet2, Indiana University, G\u00c9ANT, the University of Michigan, and Brazil\u2019s RNP. Each contributes staff, computing, and network resources. Today, perfSONAR is deployed at over 2,000 locations worldwide. In 2024, major new deployments supported critical science projects:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The <strong>Vera Rubin Observatory<\/strong> used perfSONAR nodes to identify and resolve packet loss issues.<\/li>\n\n\n\n<li><strong>USDA\u2019s Agricultural Research Service<\/strong> added 65 nodes to its SCInet research network.<\/li>\n\n\n\n<li><strong>The Square Kilometre Array<\/strong> began prototyping a large-scale deployment for its radio telescope sites in Australia and South Africa.<\/li>\n<\/ul>\n\n\n\n<p>The release of perfSONAR 5.1.0 introduced a redesigned, more flexible user interface, enhanced host instrumentation, and integration of multi-threaded iperf3 (an ESnet innovation) to support 100 Gbps network interfaces. Community engagement remained strong, with the partnership team holding quarterly office hours, running well-attended workshops at TNC24 in France and Internet2\u2019s Technology Exchange in Boston, and offering hands-on training in deploying and customizing perfSONAR.<\/p>\n\n\n\n<p class=\"has-source-sans-3-font-family\">perfSONAR continues to prove its value for high-demand science communities such as that of the LHC projects, for which it supports custom alerting systems and resolves performance issues. In 2024, the perfSONAR community helped troubleshoot problems ranging from packet loss to quality-of-service gaps, ensuring that research networks worldwide operate at peak performance.<\/p>\n\n\n\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69d6cd62a28b9&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69d6cd62a28b9\" class=\"wp-block-image aligncenter size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"506\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/operations_LHCONE-psdash_map-1024x506.png\" alt=\"A dashboard\" class=\"wp-image-6694\" style=\"object-fit:cover\" srcset=\"https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/operations_LHCONE-psdash_map-1024x506.png 1024w, https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/operations_LHCONE-psdash_map-300x148.png 300w, https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/operations_LHCONE-psdash_map-768x379.png 768w, https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/operations_LHCONE-psdash_map-1536x758.png 1536w, https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/operations_LHCONE-psdash_map.png 1999w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\">The LHCONE group developed an application using new features of perfSONAR to generate alerts.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading has-source-sans-3-font-family\" id=\"mapping\"><strong>Visualizing and Mapping the Network in Real Time<\/strong><\/h3>\n\n\n\n<p class=\"has-source-sans-3-font-family\">The my.es.net portal shows real-time graphs, giving ESnet\u2019s stakeholders a real-time view into their data flows from sites to facilities. In 2024, the Measurement &amp; Analysis (M&amp;A) team delivered 12 portal releases (two minor, ten patches), introducing OpenTelemetry-based logging, OS upgrades from CentOS to Ubuntu, API updates, and real-time analytics support for the LHC Data Challenge. Looking ahead, ESnet plans to enhance my.es.net with federated identity login and Grafana-based custom visualizations, enabling users to see project- and institution-specific data.&nbsp;<\/p>\n\n\n\n<p>Beyond the portal, ESnet advanced its network mapping and visualization capabilities through two key tools:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>ESnet Network Map Panel<\/strong> \u2013 Renders network topologies from JSON data, overlays traffic information, and works as a Grafana plugin, React component, or standalone HTML object.<\/li>\n\n\n\n<li><strong>Terranova<\/strong> \u2013 Generates JSON topology data from diverse sources via a flexible plugin architecture.<\/li>\n<\/ul>\n\n\n\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69d6cd62a2bc4&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69d6cd62a2bc4\" class=\"wp-block-image size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"462\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/appliedrd_terranova-1024x462.png\" alt=\"\" class=\"wp-image-6695\" style=\"object-fit:cover\" srcset=\"https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/appliedrd_terranova-1024x462.png 1024w, https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/appliedrd_terranova-300x135.png 300w, https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/appliedrd_terranova-768x346.png 768w, https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/appliedrd_terranova-1536x692.png 1536w, https:\/\/annualreports.es.net\/wp-content\/uploads\/2025\/10\/appliedrd_terranova.png 1999w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\">The Terranova User Interface, shown developing an ESnet Topology visualization.<\/figcaption><\/figure>\n\n\n\n<p class=\"has-source-sans-3-font-family\">In 2024, the Network Map Panel had 19 releases, adding multi-source topology loading, support for up to 20 data layers, and an end-to-end test harness. It was also published as a public Grafana plugin, now adopted by partners including G\u00c9ANT. Terranova saw eight releases, adding new data translation backends, Google Sheets import tools, and testing frameworks. These open-source tools are now widely used across the global research and education networking community, with ESnet showcasing them at SC24 and other venues.<\/p>\n\n\n\n<h3 class=\"wp-block-heading has-source-sans-3-font-family\" id=\"orch\"><strong>Fine-Tuning Network Orchestration and Automation<\/strong><\/h3>\n\n\n\n<p>In a high-performance science network such as ESnet\u2019s, orchestration and automation are critical for speed, reliability, and scalability. Automating complex network and service configurations reduces human error, accelerates service deployments, and ensures consistent, predictable performance.<\/p>\n\n\n\n<p><strong>ESnet Database (ESDB) and Kubernetes Development: <\/strong>In 2024, the Orchestration &amp; Core Data (OCD) and Platform Engineering teams streamlined deployment by migrating from Ansible to Kubernetes Helm charts, moving closer to Continuous Integration\/Continuous Deployment. Developers also began migrating to Okteto, enabling standardized Kubernetes-based development environments, \u201chot-reload\u201d capabilities, and cloud-hosted dev resources \u2014 saving hundreds of hours in debugging and reducing hardware costs.<\/p>\n\n\n\n<p><strong>Service Orchestration and Automation: <\/strong>The new Device Base Config service now allows engineers to create, test, and deploy standardized \u201cday one\u201d device configurations in minutes instead of weeks, improving reliability and paving the way for future orchestration enhancements. The Late Protocol Transitions (LPT) project automated the migration of all remaining customers to dedicated virtual route forwarding tables, reducing cybersecurity risk and service disruption.<\/p>\n\n\n\n<p><strong>Enterprise User Interface: <\/strong>The NextGen ES Enterprise (ESE) GUI modernized workflows for ESDB and OSCARS, improving performance (the Lighthouse score went from 72 to 97) and accessibility (72 to 100). New features include enhanced search, record history, relational lists, and circuit visualizations. The new OSCARS LSP Builder lets users create optimized Layer 2 VPN paths with custom constraints directly from the interface.<\/p>\n\n\n\n<p><strong>Topology and Discovery Services: <\/strong>The new Discovery Service consolidates network data collection into a single, scalable, Kubernetes-based system. It normalizes device configurations, provides real-time updates via an asynchronous message bus, and maintains historical records for trend analysis and potential AI-driven optimization. This reduces redundant polling, improves data consistency, and gives applications a coherent, up-to-date view of the network.<\/p>\n\n\n\n<p>In 2024, these orchestration and automation advances significantly improved ESnet\u2019s operational efficiency, reliability, and ability to adapt quickly to evolving science requirements.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"sec\"><strong>Reducing Vulnerability and Tightening Up Security<\/strong><\/h3>\n\n\n\n<p>In preparation for ESnet\u2019s first DOE Enterprise Assessments (EA-60) audit, scheduled for mid-2025, the Security team overhauled documentation, policies, and risk management processes. This included developing a formal Risk Assessment Self-Assessment (RASA), updating the enclave security plan, and producing hundreds of pages of documentation aligned with Berkeley Lab standards.<\/p>\n\n\n\n<p>Significant progress was made on Zero Trust initiatives, particularly in Identity and Access Management (IAM). ESnet developed code and processes to transition from Feitian tokens to Yubikeys, enabling future NIST AAL3 Single Sign-On with phishing-resistant WebAuthn. Most web services migrated to the new SSO in 2024, and a Cloudflare pilot began to protect externally facing applications and reduce VPN dependency.<\/p>\n\n\n\n<p>The Security team launched a new vulnerability management program, replacing Insight VM with Nessus, adding IPv6 scanning, and creating robust tracking workflows. ESnet deployed Endpoint Detection &amp; Response (EDR) via Crowdstrike to laptops, workstations, and servers, and upgraded all firewalls as part of a Zero Trust\u2013based data center refresh supporting&nbsp;IPv6-only networking and automated micro-segmentation.<\/p>\n\n\n\n<p>Application security was strengthened through ESnet\u2019s first penetration tests of key internal applications (ESDB and SCRAM), which found no critical issues. Routing security was improved with Resource Public Key Infrastructure (RPKI) support for customers, and Splunk\u2019s Security Orchestration and Automated Response (SOAR) tools were implemented to enhance alerting and analysis.<\/p>\n\n\n\n<p>These 2024 efforts significantly hardened ESnet\u2019s security posture, streamlined identity management, and positioned the organization for a successful 2025 audit.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n<\/div>\n<\/div>\n<\/div><\/div>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":5,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-6692","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/annualreports.es.net\/index.php\/wp-json\/wp\/v2\/pages\/6692","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/annualreports.es.net\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/annualreports.es.net\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/annualreports.es.net\/index.php\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/annualreports.es.net\/index.php\/wp-json\/wp\/v2\/comments?post=6692"}],"version-history":[{"count":9,"href":"https:\/\/annualreports.es.net\/index.php\/wp-json\/wp\/v2\/pages\/6692\/revisions"}],"predecessor-version":[{"id":6839,"href":"https:\/\/annualreports.es.net\/index.php\/wp-json\/wp\/v2\/pages\/6692\/revisions\/6839"}],"wp:attachment":[{"href":"https:\/\/annualreports.es.net\/index.php\/wp-json\/wp\/v2\/media?parent=6692"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}