Job Description
Job Description
Description : Lead Cloud Engineer & SRE (Cloud Reliability Maestro)
Description
Do you thrive on keeping complex systems stable, secure, and scalable? Are you ready to bring your Site Reliability Engineering (SRE) and Cloud Ops skills to the forefront, ensuring everything runs like a well-oiled machine? Do you love diving into logs, metrics, and on-the-fly troubleshooting—while still translating big-picture technical concepts for non-tech teams? If so, read on…
I’m the Head of Infrastructure & SRE, and I’m looking for a Lead Cloud Engineer & SRE to own the reliability, availability, and performance of our systems. This role blends hands-on engineering, cross-functional collaboration, and thought leadership—minus the formal management responsibilities. You’ll work closely with Product, Engineering, and Support to keep our stack humming, from AWS and Docker containers to security best practices. If this sounds like you, you could be the peanut butter to our jelly! Read on and apply!
What We’re Looking For
In general, someone who :
- Is passionate about serving others.
- Takes pride in being an SRE / Cloud expert , finding fulfillment in seeing our systems perform flawlessly.
- Is comfortable being part of a team that thrives on healthy conflict. People with thin skin need not apply. No, seriously.
- Passionately cares about our clients by helping them be more successful. Our clients are fleet managers, parts clerks, and automotive technicians who maintain everything from squad cars to school buses—so everyone comes home safely at the end of the day.
- Thinks of themselves less, while not thinking less of themselves. They’re other-centric, compassionate, and self-assured.
- Is willing to lift boxes, clean floors, and hold doors if that’s what it takes to get something done, because no job is beneath them.
- Takes ownership and initiative in their role. They identify how to make processes, platforms, and systems better without waiting to be told.
- Loves to read, learn, grow, and stretch themselves. Bonus points for each book they’ve read by Patrick Lencioni!
Specifically for This Job, Someone Who :
Has extensive hands-on Cloud Operations (AWS) and SRE experience (think monitoring, observability, resilience engineering).Understands and implements best practices for designing high-availability, secure, and scalable systems.Collaborates closely with cross-functional teams (Product, Engineering, Support) to plan deployments, optimize performance, and ensure smooth releases.Analyzes and resolves production incidents quickly, using data-driven approaches and post-mortems to prevent recurrence.Drives continuous improvement in system architecture, CI / CD pipelines, and infrastructure automation.Translates complex technical issues into clear, actionable insights for senior leadership.Organizes effectively , juggling multiple priorities while maintaining a high level of reliability and resilience across the environment.Keeps an eye on costs , optimizing AWS and resource usage without sacrificing quality.Key Responsibilities
Architect & Maintain : Build and refine cloud infrastructure on AWS, employing Infrastructure-as-Code (Terraform, CloudFormation) and container technologies like Docker.Site Reliability Engineering : Champion observability, proactive monitoring, and incident management to ensure minimal downtime and rapid recovery.Security & Compliance : Collaborate with InfoSec to maintain secure configurations, guardrails, and best practices.Performance Tuning : Identify and fix bottlenecks at any layer—from the network to the application—to keep our systems snappy.Incident Response : Lead incident triage and root cause analysis, driving post-incident follow-ups to prevent future issues.Technical Collaboration : Work with developers, QA, and product owners to define environment requirements, automate deployments, and optimize workflows.Thought Leadership : Provide expert advice on new technologies, tools, and methods to continually improve our cloud and SRE approaches.Key Results Areas (aka the Job Outcomes)
High Availability : Systems stay up and running consistently, with minimal unplanned downtime.Optimized Performance & Scalability : Apps scale seamlessly under load, meeting or exceeding SLAs.Efficient Incident Management : Rapid response and resolution of issues, with a culture of learning from every incident.Improved Team Collaboration : Product and engineering teams have clear, actionable insights and best practices to build stable services.Security & Cost Efficiency : Secure configurations and resource management keep our clients’ data safe while controlling cloud spend.Qualifications
OK, the “boring” HR part that’s necessary :
5+ years of hands-on experience in Cloud Ops (AWS) and Site Reliability Engineering.Proficiency with monitoring & observability tools (e.g., Prometheus, Grafana, Datadog, Splunk), Docker, and DevOps processes (CI / CD).Understanding of Infrastructure-as-Code (Terraform, CloudFormation) and scripting (Python, Bash, or equivalent).Bachelor’s Degree not required but preferred, especially in a computer science or related field.The Bottom Line
You’ve made it this far—congratulations! We are really looking for an ideal team player with an almost frightening intensity around customer service and a passion for serving others. Total compensation for the role is between $120k and $160k . This is a full-time, hybrid position (in office M / W / F) in Glendale, AZ, working side by side with the engineering, product, and operations teams. If all of this is checking off the items on your list, we’d love to hear from you!
About Us
RTA has been established since 1979 and has the reputation of providing the best customer service in the market. Our purpose is to help fleets succeed. We pride ourselves on creating a caring, family-oriented atmosphere for both staff and clients, and love that our work makes a positive impact on the lives we touch. Our clients carry kids in school buses, first responders in emergency vehicles, patients in ambulances, food and medical supplies in trucks, and people just taking the bus or train to work. We do meaningful work, and we want our clients to have the best tools available to them.
Our office spaces are open, spacious, and colorful, with plenty of natural light. We come together often as a company to enjoy freshly baked desserts or awesome lunches and genuinely enjoy each other’s company. We offer some pretty unique perks and benefits, as well as all the standard ones. We’re happy to talk through all the options!
Coming from Scottsdale? You’ll enjoy waving at the traffic going the other way while never having to stare at the blinding sun. It only takes about 25 minutes from downtown Scottsdale in the mornings. We are located close to Arrowhead Mall, with quick access to the 101 from multiple directions.
If all of this sounds like you, and your type of company, then click apply! Seriously, we’ve asked you four times now, and you’re still reading—bonus points for being thorough! Let’s see if you’re the Pepper to our Potts when it comes to keeping our cloud and systems rock-solid.
Requirements :