Automating Azure Site Recovery with PowerShell

In a recent consulting engagement, I’ve needed to perform a large-scale migration of a company’s virtual machine (VM) fleet from an On-premise datacenter to Microsoft Azure. Thinking about what that actually means – We’re picking up many compute workloads that are (in most cases) essential for day to day business operation and re-homing them to a new slice of a Microsoft-managed datacenter. After coming out the other end and completing the project, I thought I would shed some light on the tools that I used and developed to make the vision a reality.

In this particular engagement the customer is a large enterprise with a VMware environment servicing 300+ VM’s. When we consider the business value behind each of these compute workloads, it quickly becomes apparent that selecting the right tooling and approach is vital to deliver a successful outcome whilst causing as little disruption to the business as possible.

Enter Azure Site Recovery

Azure Site Recovery (ASR) is Microsoft’s Disaster Recovery as a Service solution which can replicate workloads running on physical and virtual machines from one location to a secondary location. As a disaster recovery platform, it’s possible for workloads to failover and successfully failback in a disaster scenario. ASR can also be used to migrate workloads to Azure by completing the failover component without failing back.

Why Should We Automate Azure Site Recovery?

I like to automate things like this because a computer following a process that someone writes will always perform it the same way. We can expect what the output will look like. In this case that means a like for like VM that looks and feels like it did in its previous life, before being migrated. When we introduce an operator into the mix we also introduce the human element. Things like resource names and groups, VM specs, disk settings, network location and ip addresses all need to be configured for each VM migration.

To have success running migrations at scale, it is important to use known, well-tested, repeatable processes. For me, that means figuring out the best way to use a tool then automating it so that you (or anyone else) can use it the right way, everytime, easily. 

How Can We Automate Azure Site Recovery?

I use PowerShell as an automation tool on top of ASR for a couple of reasons. The main reason being that Microsoft provide and maintain a set of PowerShell modules for interacting with Azure resources, including ASR. This is known as the Az module – See our previous post on the Azure PowerShell Az module for a deeper explanation.  PowerShell can also run almost anywhere thanks to PowerShell Core, a cross-platform edition of PowerShell that runs on Windows, macOS and Linux.

Armed with PowerShell and the Az module, we can get cracking on with the fun stuff – bashing out some lines of code. My approach and methodology here usually involve a fair bit of back and forth, playing with the commands that are available to me and learning the best ways to drive them. Importantly, you don’t want to do this with live data, setting up an isolated sandpit with dummy data will go a long way in allowing you to upskill knowledge around the tools while making sure your production systems remain untouched.

Once I’ve got a handle on the commands that are needed and how they fit together, I make a MVP (minimum viable product) script. The idea here is to demonstrate that its possible for the tooling to work (it’s not pretty but it works). To paint a picture, one of my MVP scripts will usually have a bunch of variables at the start declaring all the info that is required, things like VM name, source location, target location etc. From here, I usually design the script to be ran line by line, this is mostly for simplicity sake, complexity can come later, right now it just needs to be as simple as possible. At this stage, we can demonstrate our capability to perform a migration with PowerShell. A quick example of this is setting up a replication job, preceding this line, I do a series of get statements to build up all the variables seen in the command bellow.

$replicationJob = New-AzRecoveryServicesAsrReplicationProtectedItem -VMwareToAzure -ProtectableItem $vm -Name (New-Guid).Guid -ProtectionContainerMapping $replicationPolicy -ProcessServer $ProcessServer -Account $Account -RecoveryResourceGroupId $ResourceGroup.ResourceId -logStorageAccountId $LogStorageAccount.Id -RecoveryAzureNetworkId $vnet.Id -RecoveryAzureSubnetName $failoverSubnetName

From here, I like to put some lipstick on it and make it feel like a more polished product. Personally, I like to use a series of questions and prompts to generate the variables I described in the last paragraph. I also add status checks and operator prompts to continue. An example of this could be when performing a failover, once the operator confirms he is ready to begin, the command executes the failover, then continuously checks the failover job status until it has completed, once completed, tell the user running the script that its complete. Here is an example of a status check that I wrote for checking the progress of a failover job.

do {
    Clear-Host
    Write-Host "======== Monitoring Failover ========"
    Write-Host "This will refresh every 5 seconds."
    try {
        $failoverJob = Get-ASRJob -Name $failoverJob.Name
    }
    catch {
        Write-Host -ForegroundColor Red "ERROR - Unable to get status of Failover job"
        Write-Host -ForegroundColor Red "ERROR - " + $_
        log "ERROR" "Unable to get status of Failover job"
        log "ERROR" $_
        exit 
    }
    Write-Host "Failover status for $($VMName.FriendlyName) is $($failoverJob.State)"
    sleep 5;
} while (($failoverJob.State -eq "InProgress") -or ($failoverJob.State -eq "NotStarted"))

Once you get this far, the sky is the limit. Like most things, it can evolve over time. I like to add error handling and logging so we can elegantly handle a failure and have an audit trail of operations. I take this approach with most of the processes I automate, I think it’s important to start small and work up from there.


The OVF package is invalid and cannot be deployed – In the trenches with the AWS Discovery Connector

I was working with a customer recently who had trouble deploying the AWS Discovery Connector to their VMware environment. AWS offer this appliance as an OVA file. For those who aren’t aware, OVA (Open Virtualisation Archive) is an open standard used to describe virtual infrastructure to be deployed on a hypervisor of your choice. Typically speaking, these files are hashed with an algorithm to ensure that the contents of the files are not changed or modified in transit (prior to being deployed within your own environment.)

At the time of writing, AWS currently offer this appliance hashed in two flavours… MD5 or SHA256. All sounds quite reasonable right?

  • Download the OVA with a hash of your choice
  • Deploy to VMware.
  • Profit???

Wrong! I was surprised to receive an email from my customer stating that their deployment had failed (see below.)

There’s a small clue here…

The Solution

My immediate response was to fire up google and do some reading. Surely someone had blogged about this before? After all…. I am no VMware expert. I finally arrived at the VMware knowledge base, where I began sifting through supported ciphers for ESX/ESXi and vCenter. The findings were quite interesting, you can find them summarised below:

  • If your VMware cluster consists of hosts which run ESX/ESXi 4.1 or less (hopefully no one) – MD5 is supported
  • If your VMware cluster consists of hosts which run ESX/ESXi 5.x or 6.0 – SHA1 is supported
  • If your VMware cluster consists of hosts which run ESX/ESXi 6.5 or greater – SHA256 is supported

In the particular environment I was working in, the customer had multiple environments with a mix of 5.5 and 6.0 physical hosts. As I was short on time, I had no real way of telling if the MD5 hashed image would deploy on a newer environment. I also don’t have a VMware development environment to test this approach on (by design.)

After a few more minutes of googling, I was rewarded with another VMware knowledge base article. VMware provide a small utility called “OVFTool.” This applications sole purpose in life is to convert OVA files (you guessed it) ensuring that they are hashed with supported cipher of your choice. In my scenario, the file was re-written using the supported SHA1 cipher. All of this was triggered from a windows command line by executing:

ovftool.exe –shaAlgorithm=SHA1 <source image.ova> <destination image.ova>

After this I was able to successfully deploy the AWS Discovery Connector OVA as expected using my freshly minted image.

You can grab a copy of the tool – here

You can read more about VMware supported ciphers – here

Finally, I should call out that this solution is not specific to deploying the AWS Discovery Connector. Consider this approach if you are experiencing similar symptoms deploying another OVA based appliance in your VMware environment.