Hot Jobs

Site Reliability Engineer

San Mateo, CA 94403

Post Date: 08/08/2018 Job ID: 8437 Industry: IT Operations Pay Rate: Not Specified
We are looking for Service Reliability Engineers to join a team responsible for scaling and optimizing the reliability, availability, and performance of infrastructure and platform services, and partnering with developers to build highly available and performant services. You will work with amazing developer teams in the design, provisioning, integration, configuration, monitoring, and incident response of large scale distributed applications and platform services to deliver kickass SaaS.

As the Service Reliability Engineer, you will:
* Champion service reliability and prevention
* Service restoration
* Prevention
* Service design and implementation
* Operations Engineering

This role requires an SRE who is able to understand and communicate the characteristics of your service stack, such as:
* Degradation and behavior under load of the services and their dependencies
* End-to-end tuning needs, optimizing resource utilization, as load patterns fluctuate
* Instrumentation and metrics that clearly describe the service behaviors
* Scaling requirements and patterns
* Resiliency and recoverability, ensuring that backup / restore and disaster recovery capabilities are implemented, tested and maintained
What you will need to be successful: * Scripting Languages: You demonstrate competence in shell scripting and high-level programming languages such as Bash, Python, Ruby. We use Python extensively. * 4+ years experience running large scale customer facing web services with a solid understanding of: * REST APIs * Load balancing technologies, including L7 routing, DNS, and CDN * Networking and TCP/IP * Server hardware configuration * Monitoring and instrumentation - we own ensuring critical instrumentation and alerting is in place * Standard Internet services, such as DNS, HTTP, etc. * Cloud computing patterns * Configuration management using Puppet, Chef, Ansible, or similar * Infrastructure Security and compliance * Experience with AWS services like EC2, ELB, ElastiCache, DynamoDB, SQS, SNS, RDS, S3. * Container and Container Management technologies, such as Docker and Kubernetes Databases and big data stores * Defining and documenting technical architecture of complex and highly scalable products * Familiarity with ITIL-based incident, problem, and change management * Proactive: self-motivated, customer-focused, organized, and a good communicator.

Not ready to apply?

Send an email reminder to:

Share This Job:

Related Jobs: