Database Reliability Engineer
Responsible for keeping database systems and services running smoothly 24/7/365. This resource utilizes a blend of database engineering, administration, and software development to apply sound engineering principles, operational discipline, and automation to our data systems.
Reporting and Working Relationships
Reports to Department senior manager or director. Performs work with a high degree of latitude.
ESSENTIAL DUTIES AND RESPONSABILITIES:
- 50% OPERATIONAL SUPPORT: Help define and execute a comprehensive data reliability and observability strategy. Function as the primary support specialist for data pipelines, databases, deployments, CI/CD, high availability, access & privileges, data warehouses, storage & archival. Precise expertise in data performance tuning, optimization and recommendations on best practices. Supports deliverables through modelling data, writing & optimizing queries, analyzing logs & access patterns, security & compliance, managing replication topologies, alerts, and monitoring.
- 35% PROBLEM SOLVING: Works to resolve service interruption’s and participates in root cause analysis post event. Maintains ability to quickly identify trends and issues, recommends possible solutions. Automates incident resolutions to increase data reliability over time. Bring together technical, procedural, and financial data to reduce toil and increase data efficiency.
- 15% INTERNAL INTERACTIONS: Interacts with all personnel levels including senior business leaders. Ability to work with a variety of individuals and groups, both in person and virtually, in a constructive and collaborative manner and build and maintain effective relationships. Coach peers on data reliability practices.
- Bachelor’s Degree in technology related field or 2-3 years equivalent experience in supporting cloud data systems.
- Solid understanding/experience of cloud technology (AWS, GCP)
- Understanding of complex database systems and data pipelines (MSSQL, MySQL, Etc.)
- Ability to interpret data security requirements and apply them to data systems
- Proven track record of optimizing data systems (Query/Execution Plans)
- Experience managing technical systems using infrastructure as code tools (IAM, Terraform, Ansible)
- Ability to triage, execute root cause analysis, and be decisive under pressure
- Desire to reduce technical inefficiencies and a commitment to reducing toil
- Familiarity with SRE concepts is a plus including SLO’s, SLI’s and Burn Rates