Overview

As a Site Reliability Engineer at Beamly you will work closely with the Heads of Engineering and the product teams. You will be focused on maintaining the reliability and availability of our production services spending at least 50% of your time on continual improvement in our ability to monitor, alert and automate away manual operational work. You will look to improve our on-call and incident handling processes, manage post-mortems and will participate in the on-call rota along with all other engineers.

You will be joining an engineering team that has a culture of continuous deployment, automation and who are all responsible for managing their code running in production. Indeed, the on-call engineer in any given week will be working with you directly.

You will come from a software or systems engineering background and have a passion for pre-empting issues, automating away pain, and finding root causes. You’ll find and fix things before they even become problems.

Our tech stack includes:

  • Microservices written in Scala/Play and Node.js running in Ubuntu on EC2
  • Apache Kafka and Spark
  • HiveMQ
  • MySQL/Postgres
  • Redis
  • Elasticsearch / Kibana / Flume
  • GlusterFS
  • WordPress and Drupal
  • Ansible and Consul
  • Lots of AWS services

Responsibilities:

  • Manage day to day operational responsibilities and provide direction to the on-call engineer including:
  • Pre-emptive investigation and resolution of system alerts before they cause outage
  • Handling of incidents
  • Incident post-mortem write ups
  • Upgrades and maintenance of common systems
  • Role playing outage scenarios
  • Managing and updating system documentation including runbooks
  • Production readiness reviews
  • Continuous improvement of our availability and reliability
  • Availability and reliability metrics gathering and reporting – ultimately leading to the implementation of error budgeting for product teams
  • Improve our monitoring and automated first-level resolution capabilities
  • Eliminate manual operational burden through obsessive automation
  • Take ownership of critical/cross-cutting production systems used by multiple product teams
  • Provide consultancy to product teams on production capacity planning and performance analysis

Requirements:

  • 5+ years working as a systems or software engineer
  • Demonstrable coding experience in at least one systems/scripting language (preferably Python or Golang) and one application language (preferably Scala, Javascript or Java)
  • Good knowledge of unix and networking fundamentals including knowledge of how to use and apply various tools for problem determination
  • End to end understanding of modern web architectures from CDN to persistence layers, and how to effectively monitor and measure availability and reliability of web applications
  • Experience of working in cloud environments, especially Amazon Web Services
  • Experience of using configuration management tools, especially Ansible
  • Be a fast learner and capable of working with minimal supervision

It’s a plus if you also have:

  • Experience of operating in an SRE role at a previous company
  • Knowledge of, or experience in designing, implementing and troubleshooting distributed systems
  • A systematic approach to problem solving

What’s it like to work at Beamly?

Working at Beamly means continuing the progress of an indispensable platform, built on world-class technology by a world-class team. You’ll join the Engineering & Product team in our vibrant, central, Covent Garden office. We practise test-driven development, continuous integration, weekly ‘Tech Share’ lunches, Friday ‘Show & Tell’, regular hack days, and set time aside for continuous improvement and learning. We are keen to see our engineers flourish and will dedicate time to helping you improve the breadth and depth of your professional skill set.

As a Beamly employee, you’ll also benefit from:

  • a high spec MacBook Pro with your choice of peripherals;
  • complimentary fruit, snacks, bagels and drinks;
  • private medical insurance,
  • life assurance (4 x base salary),
  • 25 days’ holiday per annum,
  • additional holiday on your birthday,
  • childcare vouchers.

Only applicants who have the right to work in the UK will be considered.  Candidate sponsorship is unfortunately unavailable.

Tagged as: reliability, SRE, systems