Principal Site Reliability Engineer I & II

Remote   |   Full Time

What you will be doing

As an SRE, you will ensure and be accountable for guaranteeing and delivering top-notch application and service availability. Work as a subject matter expert on all aspects of our application performance, reliability, monitoring. Collaborate with multidisciplinary teams involved in designing, building, deploying, maintaining, and supporting our applications.

  • Partners with the Engineering & AWS team to ensure software complies with Security, SLA, and performance requirements.
  • Proactively seeks ways to enhance Application & Service reliability.
  • Remediates production issues adhering to incident response plans, Identifies preventive measures, and drives implementation.
  • Performs design and code reviews and partakes in software development activities.
  • Routinely optimizes Infrastructure workload for better performance, efficiency and maintaining all aspects
  • and disciplines of the well-architected framework.
  • Strategies and achieves Site Reliability Objectives to deliver world-class services to customers
  • Designs and builds mitigation plans for various disaster scenarios. Applies chaos engineering to test
  • solutions under real-world scenarios.
  • Debugs and provides robust solutions for complex problems. Adopts Industry best practices and solves operational problems through extreme and innovative automation and solutions.
  • Builds and improves production monitoring and management capabilities for maximum observability.
  • Governs various Database systems, reviews and validates Database models for efficiency.
  • Technology Stack for this role would include AWS, Docker, Jenkins, DynamoDB, Redshift, Java, Python, Splunk


What you should have

  • Minimum 7+ experience in SRE or similar roles, with strong exposure to Application development paradigms.
  • Ability to thrive in a fast-paced growing environment with a desire to learn new skills and implement new technologies.
  • High Expertise in AWS Cloud & Services to build, maintain secured, and reliable world-class SAAS Product.
  • Experience in the following areas: Cloud and Database architecture, ETL, business intelligence, Backup
  • Strategy, High Availability, and Disaster Recovery solutions.
  • Experience in Python/Java and a strong understanding of Unix/Linux networking operating systems.
  • Should understand & decode the project requirements and recommend best practices as the project evolves.
  • Demonstrated experience in managing Application Architecture and Cloud Infrastructure at scale.
  • Hands on skills in managing Monitoring and APM tools like Cloudwatch, Splunk, Datadog for maximum
  • observability.
  • Excellent Communication and Documentation skills to portray the vision clearly and precisely.

Chargebee might be the opportunity you’re looking for

  • If you’re interested in how subscription businesses can get more efficient.
  • If you’re hungry to give and receive feedback, fully understanding that challenging perspectives are the only way that you can grow.
  • If you can bring empathy to problem solving.
If this sounds interesting but you’re not sure you'll tick all the boxes, apply anyway! There’s tons of room to grow at Chargebee.

Let’s talk

apply with your résumé to get
the conversation started

Submit Your Application

You have successfully applied
  • You have errors in applying