EY Jobs

Job Information

EY Site Reliability Engineer in Buenos Aires, Argentina

Site Reliability Engineer

Core Business Services

Requisition # BUE002DQ

Post Date Oct 24, 2020

Site Reliability Engineers (SRE) at EY fill the mission-critical role of ensuring that our complex systems are healthy, monitored, automated, and designed to scale. You will use your background as an engineering generalist to work closely with our development teams from the early stages of design all the way through identifying and resolving production issues. The ideal candidate will be passionate about an operations role that involves deep knowledge of both the application and the product, and will also believe that automation is a key component to operating large-scale systems.

Our STE team solve incredibly difficult problems using the best tools available for the job, and are rapidly extending the use of new technologies. They spend just as much of their time working on systems as they do writing code. You’ll be tasked with all manner of work from building operational tooling, automating operational workflows, performing architecture and design reviews, investigating system failures and complex outages, improving our monitoring infrastructure, defining service level objectives and agreements for EY products and flows, and much more.

We’re a collaborative team who genuinely enjoys working together to make EY a better working world. We are looking for engineers that understand that simplicity and reliability are aspects of a system to be carefully calculated with every decision made.

EY has a positive, diverse, and supportive culture—we look for people who are curious, inventive, and work to be a little better every single day. In our work together we aim to be smart, humble, hardworking and, above all, collaborative.


  • Gaindeep knowledge of our complex applications.

  • Serveas a primary point responsible for the overall health, performance, andcapacity of one or more of our technology products.

  • Strong experience with Azure, or AWS (design, SDKs,best practices).

  • Familiar with design principles of monitoring andalerting systems.

  • Designing,implementing, and maintaining robust monitoring and alerting to improveperformance and reliability.

  • Experience implementing industry standard securitybest practices.

  • Experiencewith automation, configuration management, and developing infrastructure ascode.

  • Useengineering best practices — deliver high-quality production code, utilizeautomated testing, and build reusable components

  • Developtools to improve our ability to rapidly deploy and effectively monitor customapplications in a large-scale Windows and Linux environment.

  • Workclosely with development teams to ensure that platforms are designed with"operability" in mind.

  • Functionwell in a fast-paced, rapidly-changing environment.

  • Participatingin the operations on-call rotation, triaging and addressing production issues


  • B.S.or higher in Computer Science or other technical discipline, or relatedpractical experience.

  • Programmingskills (.NET and PowerShell | Python, Ruby, Java/Scala or C).

  • Databaseand big data knowledge is a plus.

  • MicrosoftAzure or AWS certifications.

  • 5+years experience in a Microsoft and Linux large-scale operations role.

  • Experiencein designing, analyzing, and troubleshooting large-scale distributed systems.

  • Debugproduction issues across services and levels of the stack.

  • Experiencewith one or more orchestration, deployment tools Azure Resource Manager (ARM),Terraform, Ansible.

  • Familiarity with Git or other source control systems.

  • Experiencewith TFS or Visual Studio Team Services (VSTS).

  • Experienceusing tools to create and manage CI (continuous integration) and CD (continuousdelivery) pipelines.

  • Experiencein working with Public Clouds (Microsoft Azure is a plus).

  • PowerShellor Python experience, specifically for systems automation.

  • RESTful and WebSocket APIs.

  • Workingknowledge of the TCP/IP stack, internet routing and load balancing.

  • Experiencewith monitoring alerting using technologies like Prometheus, Sensu, Nagios,Kafka, Wavefront, BigPanda, DataDog, PagerDuty.

  • Optional:Experience implementing, designing, deploying Docker, Kubernetes, Serverless(Function or Lambda’s).

  • Previousexperience working with geographically-distributed coworkers.

  • Stronginterpersonal communication skills (including listening, speaking, and writing)and ability to work well in a diverse, team-focused environment with otherSREs, Engineers, Product Managers, etc.

  • Creative thinker and strong problem solver withmeticulous attention to detail

  • Highly organized, creative, motivated, and passionateabout achieving results