Dashboard > Job Search > Site Reliability Engineer

Site Reliability Engineer

Full Time

Others

Apply Via efinancialcareers Save this job

Job Description

Thought Machine's mission is bold - to properly and permanently rid the world's banks of legacy technology. To achieve this, we have developed the foundations of modern banking and built core and payments technology which runs natively in the cloud. What we are attempting is hard and means we need great people working together to build great technology.

We have grown rapidly in the past few years - growing our team to more than 500 individuals across offices in London, New York, Singapore, Sydney and Melbourne. We have raised more than $500m in funding and are now valued at $2.7bn. Our investors include Molten Ventures, Eurazeo, Intesa Sanpaolo, Temasek, Nyca Partners, JPMorgan Chase, Standard Chartered, and more.

We have created a culture enabling our team to produce the best work in the industry, ensuring we have fun along the way. We're regularly cited as having a fantastic workplace culture and have been recognised by Sifted magazine as having one of the highest Glassdoor ratings for a UK fintech company and the most generous employee share package in the industry. We've been named AltFi's B2B Fintech of the Year, placed in the FinTech50, and in the IDC list of top 100 Fintechs.

Site Reliability Engineers at Thought Machine take responsibility for deploying our software into production. As well as traditional DevOps roles, your focus will be on writing and maintaining software with the aim of automating the deployment processes.

This role will be based in Singapore.

DUTIES

Supporting the engineering team in building highly fault-tolerant, scalable applications.
Developing tools to ensure our services can scale and are highly available. We always try to manage our ops tasks with automation, by adopting open source tools or developing bespoke tools as required
Being part of the 24x7 on-call rota, helping support and maintain production systems
Day to day development support and monitoring of production server and network environments by developing and deploying logging and monitoring tools.
Developing applications to increase code quality throughout our codebase.
Supporting disaster recovery, backup, redundancy and capacity planning activities.
Working with external users/clients on a variety of projects, ensuring their success in running our core product Vault

Requirements

Essential

Strong background in Linux/Unix administration, e.g. Ubuntu, Debian
A strong background in Go or Python
A strong background in one of the following: database administration, Kafka, observability tools (such as Prometheus or Zipkin) or infrastructure automation.
Experience with AWS or GCP is essential
Experience or knowledge of container orchestration tools, e.g. Kubernetes

Desirable

Experience in supporting production systems
Experience with automation/configuration management, e.g. Terraform, Puppet, Chef, Ansible
Client engagement experience as an SRE for high traffic, mission critical systems.
Ability to explain technical concepts to technical and non-technical stakeholders.

Benefits

Highly competitive salary
Bonus incentive
Healthcare
25 days holiday and public holidays
$1,500 SGD per year flexible spend benefit
All the latest tech you need
A talented and experienced team as your colleagues
An environment where we encourage learning and progress

Thought Machine is committed to making a measurable positive impact on people's everyday lives. We are an equal opportunity employer and value diversity at our company. We actively hire for cultural growth. We welcome people of all ages, backgrounds and value people who take a journey unique to them. We provide everyone with equal access to professional development. You are encouraged to apply even if your experience doesn't precisely match the job description.

Other open positions

New positions coming soon.

Site Reliability Engineer

Modal Window