Swvl is a revolutionary idea that was born from passion, loyalty, and persistence to face all challenges that come our way. It started with an observation turning into a realization; too many cars on the streets, wasting our limited resources: time, space and money.
Our main goal is not just to facilitate commuting, but a hunger to strive for solutions, encourage the contribution of youth in innovation and inspire change.
In 2 years Swvl started operating in 6 cities across 3 countries; Cairo & Alexandria in Egypt, Nairobi in Kenya, and Lahore, Karachi & Islamabad in Pakistan.
We are looking for a SRE Engineer to help us grow and maintain our system that improves the experience of hundreds of thousands of users on a daily basis. SRE Engineer is a key role at SWVL, it’s more than setting up CI tools or managing servers on the cloud. SRE Engineers at SWVL write code (tooling to support dev team, or even in the core microservices), they debug and identify production issues at many levels (code/frameworks, microservices/datastores, containers/OS, clustering platform, servers, cloud providers), scale the fast infrastructure, own the software development process and make sure it is healthy and newcomers are aware of it. If you feel like to ride the roller coaster, having a sense of ownership, willing to share knowledge, learn and grow, team-centric and focused on end value rather than self-achievement, this could be your golden opportunity.
Responsibilities and Duties
- Develop and integrate tools/scripts to automate the process of development/deployment.
- Spread SRE culture and continuously enhance the process of software development.
- Implement automation tools and frameworks (CI/CD pipelines).
- Ability to have hands-on code (could write and push hot-fixes to production in urgent cases)
- Integrate/configure tools for system and inter-microservices monitoring and alerting (mostly over Kubernetes)
- Develop and deploy solutions to optimize the infrastructure and external services cost (ex: set up caching data stores, make changes to the code to integrate them)
- Assure required security level for the infrastructure, datastores and the different environments: production/staging/development (ACL, VPNs, authorization, etc..)
- Maintain the infrastructure on the cloud (AWS) allocating new resources, setting up new platforms/clusters with the proper configurations
- Handle critical production issues around the hour and prepare an incident report
- Perform root cause analysis for production issues
- Design procedures for system troubleshooting and maintenance
- Work closely and support the dev team with the infrastructure and architecture decisions, debugging production issues, new services deployment and new cloud resources setup and allocation
- Maintain our datastores, monitor the load, design and implement a backup and restore plans, scaling, clustering (sharding/replication)
- Contribute to both infrastructure architecture and microservices design
- 5-8 Solid years experience in software development life cycle (got to work with agile teams)
- Software Engineering background
- Excellent system design skills
- Familiarity with different open source web development languages/frameworks, and how they’re deployed (ex: NodeJS, Python, Go, Scala, etc..)
- Strong knowledge of cloud providers (mainly AWS)
- Solid experience in Unix like Operating Systems
- Experience in Linux containers, container orchestration platforms (Docker, Kubernetes), and related tools and technologies (Helm and kops are a good plus)
- Admin experience with databases including MySQL, MongoDB & Elasticsearch.
- Experience with CI/CD principles, architecture and operations.
- Experience with instrumentation for monitoring and logging the health and availability of services.
- Deep understanding of standard networking protocols and components such as HTTP, DNS, TCP/IP, the OSI Model, networking and load balancing.
- Familiarity with deployment and management systems such as Puppet, Ansible, Packer, Terraform, etc.
- Familiarity with messaging systems like RabbitMQ and Kafka is a must
- Sense of ownership
- Good communication skills