Summary
Responsibilities :
You will need to spend 50% of your time on and around production support, including the handling of user tickets, incidents and problem management
You will identify and create automation to eliminate manual day to day support activities; scope and create automation for deployment, management and visibility of our services.
Automate to drive efficiency by designing an autonomous system- Manage Service reliability by managing risk
Define service level indicators (SLIs), objectives (SLOs), and agreements (SLAs).
Implement best practices for building successful monitoring and alerting systems
You will use your expertise to tune and push our systems beyond their normal limit.
You will work closely with engineering / development teams to design, build, and maintain systems and help them decide on products to use, schema design and query tuning.
You will troubleshoot issues across the entire stack : hardware, software, application and network.
You will mentor other SREs on standard methodology for everything from monitoring to troubleshooting complex code and database issues.
Represent the SRE organization in design reviews and operational readiness exercises for new and existing services.
Participate in on-call rotation and periodic conference calls with other specialists from other time zones.
Required Technical Skills :
Bachelor’s Degree / background in Computer Science
Experience in software development : automation-related experience valued in particular. Scripting languages such as bash, python, ruby, or compiled languages such as C, C#, JAVA, Scala and Go are most relevant but others are acceptable. One higher level language is desired.
Hands on experience using Enterprise Tools such as App Dynamic, Grafana, Splunk, Dynatrace
Three Tier Support experience with DBs such as IBM, DB2, Sybase, Mongo, Green Plum, KDB
Professional ownership of issues
Deep understanding of operating system level concepts such as processes, memory allocation, and the network stack; an understanding of how applications are affected by the above, and ability to debug same.
Generally speaking, practical experience running large scale online systems is always an advantage.
Awareness of, and ability to reason about modern software & systems architectures, including load-balancing, queueing, caching, distributed systems failure modes, micro services, Cloud, etc.
Desired Skills
Knowledge of messaging layer : MQ / CPS / XML- Knowledge of SFTP / Comet- ServiceNow – Prior experience as a developer / support role in a large-scale financial firm