AWS Disaster Recovery/ BCM Specialist

IT - Application & Software Development
Toronto, ON
Permanent
Oct 06, 2022

 

Our client

One of the world's best pension investment funds undergoing major technology transformation and modernization

What's in it for you

Play a large part in driving the Disaster Recovery program, providing  the most innovative, efficient, and secure solutions. Learn new technologies and tools as the area rapidly expands.

 

Responsibilities:

 
  • Work as a member of an Agile cross-functional infrastructure team while collaborating with various customers and stakeholders from the business and IT teams.
  • Play a lead role in delivering backup and DR solutions on AWS for new and existing features supporting and collaborating on with our application developers/Business Analysts, other IT Infrastructure and Governance teams. This includes leadership and orchestration across multiple teams
  • Plan and improve systems and applications in terms of reliability, scalability, recoverability, resiliency, and supportability.
  • Lead and participate in disaster recovery planning, design, tests, write wiki articles; provide support during system outage/outbreak incidents and disaster scenarios.
  • Assess, adapt and evolve operational strategies and design new solutions for DR/BCM for application/process resilience.
  • Partner with DR/BCM leads in the Enterprise and Governance & Risk teams.
  • Enhance operational incident response management and perform incident aligning with  IT standards and policy.
  • Educate and promote DR and business continuity process and best practices throughout the team, IT partners and business stakeholders and build good working relationships.
  • Interface and collaborate with internal business partners and vendors to gather requirements, design and implement solutions, manage technical operations, triage and fix operational issues
  • Support other development teams and educate them about platform/application resilience best practices.
  • Support and advocate cloud cost management
  • Enhance proactive monitoring and support of cloud infrastructure, backup and recovery process, services and network.
  • Design and implement automation solutions for backup and restore, disaster recovery in AWS cloud; develop CI & CD pipelines to maximize efficiency from Infrastructure as a Code (IaC) standpoint
  • Perform system administration analysis, troubleshoot and resolve complex production issues spanning multiple systems and technologies.
  • Work closely with IT and Development teams to answer technical questions or resolve issues within system administration, data protection and system resilience.
 
Qualifications:
 
  • 7+ years’ experience in IT Infrastructure Operations and/or Software Development in progressively more senior roles
  • College or University degree in the field of computer science and Information Technology
  • 5+years’ experience in setting up general application backup and recovery and operations (including VM-hosted applications, database backup/recovery)
  • Possesses in-depth knowledge and expertise in designing, building, implementing, and maintaining complex and automated DR/BCM solutions on Cloud infrastructure.
  • Knowledge of AWS services, recovery, backup and resiliency etc.
  • Knowledge of service and hosting solutions such in a as private/public cloud using IaaS, PaaS and SaaS platforms and their integrations.
  • Knowledge of Disaster Recover technologies such as Data replication, CommVault, AWS Backups, Data Lifecycle Manager (DLM), Commvault, AWS Application Resilience Hub, AWS Elastic Disaster Recovery, CloudEndure, etc.
  • Knowledge of the various techniques for meeting Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) in the cloud, based on application criticality.
  • Knowledge Database consistency & duplication, MS SQL, Oracle and Cloud based SQL Azure SQL or AWS RDS SQL and Oracle engines.
  • Knowledge of configuration management and automation tools such as Ansible, Terraform, Puppet or Cloud based native services for provisioning and recovery of cloud workloads.
  • General understanding of cloud networking principles, vulnerability management controls, and identity services (Active Directory/Azure AD, Centrify/Ping, ADFS, etc.)
  • Hands-on operations experience with cloud platforms or ability to learn within a short period of time (AWS)
  • Expertise in Windows and Linux operating systems
  • Strong analytical and troubleshooting abilities to resolve issues
  • Experience leading delivery on major initiatives.
  • Some or growing experience including DevOps Automation tools like Terraform, Microsoft Azure DevOps (Repos, Pipelines and Branching strategy) AWS services (CodeCommit, CodeDeploy, etc.); Cloud based server-less or microservice based architectures (AWS, Azure or GCP)

Send to Friend

Send to Friend