Splunk Cloud Infrastructure Site Reliability Engineer - San Francisco in San Francisco, California

As a Site Reliability Engineer you will be part of our team that works very closely with our application development team to provide high availability services to users that use Splunk as a service using cloud infrastructure. You will provide input on and execute security and deployment practices, scaling and metrics, as well as running general day-to-day server management.

  • Responsibilities: *

    • Monitor all the things. We’re Splunk, we love the data

    • Designing, scaling out, and maintaining our AWS Cloud-based infrastructure

    • Designing and write code to develop and maintain systems which powers Splunk cloud services hosted in the public cloud

    • Developing Scripts and applications to automate system deployment scaling and infrastructure

    • Implement horizontally scaled out systems, which allow thousands of concurrent Splunk users

    • Build deployment, management and monitoring systems using “infrastructure as code” tools and techniques

    • Implementing zero-downtime production pushes. Provide fanatical production support for applications and infrastructure

  • Requirements: *

    • EC2 AWS S3

    • Linux system administration skills, Ubuntu Linux a plus

    • Knowledge of monitoring (e.g., Cloudwatch, Splunk, etc.) and configuration management tools (e.g., Ansible, Chef, Puppet, etc.)

    • Hosted and cloud-based service experience

    • Shell scripting and high level language expertise

    • We like Ruby and Python a lot .

    • Previous experience in a high paced agile development environment using tools such as git, jira, jenkins,.. etc.

*Education: *

  • Bachelors Degree in Computer Science or relevant experience.