Company Description
Arista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. Arista is a well-established and profitable company with over $8 billion in revenue. Arista’s award-winning platforms, ranging in Ethernet speeds up to 800G bits per second, redefine scalability, agility, and resilience. Arista is a founding member of the Ultra Ethernet consortium. We have shipped over 20 million cloud networking ports worldwide with CloudVision and EOS, an advanced network operating system. Arista is committed to open standards, and its products are available worldwide directly and through partners.At Arista, we value the diversity of thought and perspectives each employee brings. We believe fostering an inclusive environment where individuals from various backgrounds and experiences feel welcome is essential for driving creativity and innovation.Our commitment to excellence has earned us several prestigious awards, such as the Great Place to Work Survey for Best Engineering Team and Best Company for Diversity, Compensation, and Work-Life Balance. At Arista, we take pride in our track record of success and strive to maintain the highest quality and performance standards in everything we do.
Job Description
Who You'll Work ForWe are seeking an experienced and analytically-minded Site Reliability Engineer to join our organisation on a permanent, remote basis from Ireland. In this role, you will be instrumental in building, deploying, and operating critical production systems with a steadfast commitment to scalability, reliability, observability, and security. You will work collaboratively with cross-functional teams to ensure our infrastructure remains resilient, efficient, and future-ready. This is an excellent opportunity for a detail-oriented professional who thrives in a dynamic environment and is passionate about solving complex infrastructure challenges.What You'll DoDesign, build, and deploy production systems with a focus on scalability, reliability, observability, and performance, ensuring systems meet stringent security standardsDevelop and maintain comprehensive automation solutions to eliminate toil and streamline operational efficiency across production environmentsProactively monitor production systems, establish intelligent alerting strategies, and implement automated incident response mechanisms to minimise downtimeCreate and maintain detailed incident response runbooks; conduct thorough postmortem analyses following incidents to identify root causes and prevent recurrenceCollaborate with software engineering teams to identify and resolve infrastructural bottlenecks, designing innovative solutions that enhance product deployment workflowsManage and optimise monitoring infrastructure using industry-standard tools, ensuring comprehensive visibility across all systemsPlan, communicate, and execute maintenance windows on production systems with minimal disruption to service availabilityTriage platform and infrastructural issues with decisiveness and analytical rigour; engage with third-party vendors and support teams as requiredDeploy new systems and updates in a staged, risk-managed manner, ensuring safe and incremental rolloutsSurvey and adopt best practices in infrastructure and platform management to maintain secure, scalable, and fault-tolerant systemsStudy the design and implementation details of open-source systems to enhance troubleshooting capabilities and accelerate issue resolutionWork transparently with stakeholders to communicate system status, planned maintenance, and infrastructure improvements#LI-EO1#automation #Ansible #Terraform #observability #Prometheus #Grafana #cloud platforms #AWS #GCP #Azure #container #orchestration #Kubernetes #Docker #CI/CD #Jenkins #GitLab
Qualifications
**Essential Requirements:**Bachelor's degree in Computer Science, Engineering, or equivalent professional experience (5+ years in a related infrastructure or systems role)Proficiency in one or more programming languages: Go, Python, or bash shell scripting, with the ability to implement medium-complexity automation workflowsStrong knowledge of Linux or UNIX from both administration and debugging perspectivesHands-on experience operating software systems, infrastructure, and complex applications at scale in production environmentsDemonstrated expertise in infrastructure-as-code principles and practicesStrong problem-solving and software troubleshooting skills with a methodical, analytical approachExperience with server provisioning, particularly from storage and networking perspectivesProven ability to work collaboratively within cross-functional teams and communicate technical concepts clearlyExperience with incident response, postmortem analysis, and continuous improvement methodologies**Desirable Skills and Experience:**Experience with container orchestration platforms, particularly KubernetesHands-on experience with Docker and virtualisation technologiesProficiency in managing monitoring stacks, including Prometheus and GrafanaExperience with CI/CD systems such as GitLab tools or SpinnakerKnowledge of infrastructure-as-code frameworks, particularly TerraformExperience managing databases such as PostgreSQL or equivalent relational database management systemsExperience with artifact repositories and Docker registriesFamiliarity with cloud platforms (Google Cloud Platform, Amazon Web Services, or Microsoft Azure)Understanding of distributed systems architecture and principlesExperience with performance tuning and system optimisationKnowledge of security best practices in infrastructure and systems designOn-call support experience and comfort with incident response responsibilities
Additional Information
Arista stands out as an engineering-centric company. Our leadership, including founders and engineering managers, are all engineers who understand sound software engineering principles and the importance of doing things right.
We hire globally into our diverse team. At Arista, engineers have complete ownership of their projects. Our management structure is flat and streamlined, and software engineering is led by those who understand it best. We prioritize the development and utilization of test automation tools.
Our engineers have access to every part of the company, providing opportunities to work across various domains. Arista is headquartered in Santa Clara, California, with development offices in Australia, Canada, India, Ireland, and the US. We consider all our R&D centers equal in stature.
Join us to shape the future of networking and be part of a culture that values invention, quality, respect, and fun.