Our client is looking for a self-motivated SENIOR SITE RELIABILITY ENGINEER to join the growing team. You will be embedded in the Technical Operations organization, and also work closely with Engineering Development teams, Product Architecture, and Program Management.
As a Site Reliability Engineer, we know that you are passionate about seamless uptime. You delight in building tools to automate routine tasks and constantly seek new ways to improve system performance. If you also want to join a fast-growing startup, work with friendly people, and play with some cool tech, you found the right place!
What you'll be doing:
Our core technology is Java on Linux using open source technology throughout the stack. The Java engine runs and stores all data in RAM for super high performance while staying safe with transaction logs and auto recovery. The office is Macs with a few Windows holdouts. You decide which works best for you.
Utilize your skills in automation, replication and scaling to manage the production cloud in our worldwide data centers
Write scripts in Ruby, Python, Perl, etc. To build custom tools for automation, replication and scaling
Build tools to monitor and provide metrics on our systems
Perform Linux system administration (DNS, NFS, RPM, Apache, Raid, etc.)
Extend the existing automation we have in place and making things even easier to use.
Support Product Development Teams
Lead Release deployments and participate in revising software design to scale and prevent against failures
Ensure compliance with various best practices.
Adhere to compliance standards in the development and operations spaces as guided by security.
Participate in on-call rotation
Work in a customer facing production environment
More about you:
B.S. In Computer Science and 3 + years relevant experience OR 10+ years equivalent experience supporting production platforms using the following skills:
Automation using tools such as Chef and Rundeck
Programming in any of the following: Ruby, Python, Perl, BASH
Multi Data center management, replication, scaling.
Middleware software such as HA Proxy, Consul, Terraform or equivalent architectures
Java applications including JVM performance and tuning
Metrics and monitoring - writing custom tools and familiar with open source options.
Linux administration - DNS, NFS, RPM, Apache, Raid, etc.