Azure Application Insights – No Client Source IP Address

Working with one of your customers this week who is implementing Azure API Management alongside their web applications. We are funnelling all the request logs into an Application Insights services to manage visibility of the end-to-end transaction data. We noticed that all the client GET requests had ‘0.0.0.0’ in Client IP Address.

Request PropertiesValue
Client IP address0.0.0.0

I since learned that Microsoft obfuscate this data from Azure Monitor as it’s ingested into Applications Insights for what I call a ‘privacy policy‘. As this was a corporate application anonymity wasn’t needed and the development team wanted to understand when a request was made from their application either from inside corporate network or an unknown internet address.

A good habit to get into is first do a quick review of the latest API version for ‘Microsoft.Insights/components’ which does show a boolean value for DisableIpMasking.

{
  "name": "string",
  "type": "Microsoft.Insights/components",
  "apiVersion": "2020-02-02-preview",
  "location": "string",
  "tags": {},
  "kind": "string",
  "properties": {
    "Application_Type": "string",
    "Flow_Type": "Bluefield",
    "Request_Source": "rest",
    "HockeyAppId": "string",
    "SamplingPercentage": "number",
    "DisableIpMasking": "boolean",
    "ImmediatePurgeDataOn30Days": "boolean",
    "WorkspaceResourceId": "string",
    "publicNetworkAccessForIngestion": "string",
    "publicNetworkAccessForQuery": "string"
  }
}

Reviewing the property values for ApplicationInsightsComponentProperties object DisableIpMasking gave the following short but sweet answer.

NameTypeRequriedValue
DisableIpMaskingbooleanNoDisable IP masking.

Yeah I reckon that is worth a shot!

Update ApplicationInsightsComponentProperties value DisableIpMasking

As this value only seems to be exposed through the API we have to either push a new incremental ARM template through the sausage maker or perform a API request directly. An API request seems like the quicker request method, but doing this in a script with authentication and correct structure takes time. I have a nice trick when wanting to update or add a value to an object when either of those feel like overkill.

  1. Navigate to the Azure Resource Explorer
  2. Find the Application Insights Resource Group
  3. Select Providers > Microsoft.Insights
  4. Select Components > ‘Application Insights Name

You will be shown the JSON definition of your Application Insights Object. You can tell this by the line:

"type": "microsoft.insights/components"

To know your in the right place, under properties there will be many values, we should see Application_Type, InstrumentationKey, ConnectionString, Retention, but what will be missing is DisableIpMasking. So it’s as simple as adding it.

  1. Up the top of the page toggle the blue switch to ‘Read/Write’ from ‘Read Only’.
  2. Select ‘Edit‘.
  3. Remember to add a ‘,’ to the previous last line (in my case “HockeyAppToken“) before adding your new property.

The final step is to use the PUT button to update the object. Which intern has authenticated you to the API using your existing login token, constructed the JSON object and is sending a ‘POST’ method to the API endpoint for ‘management.azure.com/subscriptions/<subscriptionId>/resourceGroups/<rgName>/providers/microsoft.insights/components/<resourceName>?api-version=2015-05-01‘. Much simpler than doing a Powershell or Bash script, what a clever little tool it is.

The result will be that new request in Application Insights will have the source NAT IP address. Unfortunately all previous requests will remain scrubbed with ‘0.0.0.0’.

Closing thoughts

This is a great way to tweak services while attempting to understand whether it’s the correct knob to turn in the Azure service. But while it’s quick, it isn’t documented. If you have a repository of deployment ARM templates make sure you go back and amend the deployment JSON. The day will come when it gets re-deployed and it wont come out the sausage maker the same. The finger will get pointed back at that Azure administrator who doesn’t follow good DevOps practices.


Upgrading Megaport Cloud Routers

Recently I had the pleasure of upgrade a Megaport Cloud Router (MCR) from version 1 to the new version 2. Version 2 MCR sits on a whole new code base and a side by side migration is required. In this blog I’ll show you how we went about the process, this could also be used if migrating MCR’s in general or any cloud connectivity for that matter.

The aim is to create the smallest outage possible with the customer on-premises connectivity to the cloud datacenter. In a fault tolerant environment this is usually done via having multiple routes advertised to the on-premises routers via a dynamic protocol. In the case of my example BGP is used throughout the environment and standard times for route propagation is only a few minutes end-to-end without interference.

In my example I will be moving an Azure Express Route. The key to moving the Express Route is that there is a primary and secondary BPG session (Green) for fault tolerance. I’ll move the secondary connection (2) from my active MCR to another MCRv2 in a staged approach to maintain connectivity for as long as possible. Once each Express Route peering sessions are connected to their own MCR, I will move the Megaport physical connection (Blue) from the MCR1 to the MCR2.

Create the new MCR

Create a new MCR in the correct datacenter location.

New MCR in NextDC M1

Add a connection to the cloud provider, using the existing service key out of the Express Route Virtual Circuit panel in your Azure subscription.

Add an Express Route with Service Key

We can see above that we have a secondary connection available. (This was completed ahead of starting this blog).

Finalise your connection, and after you select ‘order’ the designer view will deploy the Express Route connection for you.

Secondary Express Route Peering Session deployed to MCR2

Check the Connection

Give it all of an itch and a scratch and the BPG peers of Microsoft and the MCR will light up.

BPG Session status

Head over to Azure and we can check the ARP records to see the secondary peering endpoints now populating.

Here is the primary that is still online with the existing MCR that is connected to on-premises network.

Azure Express Route ARP records – Primary

We should now have some records for the secondary connection that is between the Azure Express Route Gateway and the MCR2. Select show secondary and reviewing the interface row of ‘On-Prem’ is the MCR Express Route peering IP.

Azure Express Route ARP records – Secondary

All is looking good from a layer 2 (ARP) and layer 3 perspective (BGP – below). From this point if we go look at route tables. We would see that all the primary BGP peer session will have all the on-premises routes and azure VNET routes. The secondary route table will only have the Azure routes and the peering /30 routes.

Azure Express Route route table – Secondary

If we go check our Express Route Virtual Circuits we can validate the peering IP’s used in each session match.

Azure Express Route – Peering Overview

Delete the connection between the MCR to on-premises router

Now we want to swing our on-premises router connectivity from the cross connect of the MCR1 (1) and physical port (2). Back in the designer view we have all of the required routers and connection objects to view. I’ve also underlined the button to ‘delete’ the virtual cross connect (VXC) between the on-premises router and my MCR1. Note – In our deployment this is where the outage will start, we will loose connectivity between Azure and on-premises router as I’ve not used VLAN tagging on the physical port in the example.

Delete the Virtual Cross Connect

Add a connection between the MCR2 and the on-premises router

Quickly, go and create a connection between the MCR2 and your Megaport “Port” (aka. Physical Port).

Attach MCR to Virtual Cross Connect

Make sure its your physical port not your other MCR 🙂

Select the Physical Port attachment

I’m using the exact same peering subnet for my new MCR2 so as long as I include my correct /30 subnet then my peering relationship with the on-premises router will come back willingly in a matter of seconds.

MCR to Physical Port details

Review what you have done in the designer view. You won’t have set anything into motion until you click ‘order’. So do it! From the below view you can see the following summary:

  1. Old MCR with a single Express Route Connection
  2. New MCR with single Express Route Connection
  3. Physical Port with the new connection to the MCR2
Switch the physical port from old to new MCR

Once you click order, you barely have time to scratch yourself again and the status moves from deploying (little red Megaport rocket icons), too deployed. Hurray!

Physical port to MCR connection – status deploying

Once that has all come up green. The rest of work would be done in your edge routers. The edge routers being your on-premises physical edge router connected to Megaport and your Express Route Virtual Circuits/Gateway. Do a few checks to make sure you have established end-to-end connectivity. Here is some ideas:

  • Edge Router – Review the BGP Status of the MCR.
    • show ip bgp neighbors
    • Check the MCR neighbor existing and is BGP State = Active.
    • Remote AS of the MCR by default is 133937.
    • Remember the IP address of the neighbor
  • Edge Router – Review received routes from MCR.
    • show ip bgp neighbors x.x.x.x received-routes
    • You’ll no doubt see routes from your VNET with a path of your MCR+Microsoft e.g. 133937 12076. (Microsoft uses AS 12076 for Azure public, Azure private and Microsoft peering)
  • Azure Portal – Review the ARP records and route tables.
    • The secondary connection should show all your received routes to the Express Route Gateway from the on-premises router.
  • Branch Site Router – Go check what has been advertised down to your client sites. A good old trace route will show the IP addresses of the MCR in the hops.
Express Route secondary connection with on-premises routes received for MCR peer IP

Finishing Up

You pretty much done at this stage. The software defined network engine has done its job, you now are in control of your own destiny with on-premises route propagation.

How good is Megaport! We love networking! Especially when its fast, scalable and consistent.


We love Megaport

Let us take the stress out of public cloud connectivity. Get in touch with us to understand the benefits of using a service like Megaport Cloud Router.


ARM Template Role Assignment Learnings

ARM templates are one of those things that the learning curve can be considered steep, but once you get there they make your life so much easier you’re glad you did it. If you’re like me, Google is your friend and whenever you hit an issue with your latest template you resort to searching error messages and hope someone else has not only found the solution, but also been kind enough to write it up. And the latter point is where this post comes in, when doing ARM Template Role Assignments, there are a couple of gotchas that I often forget and when Google doesn’t have any “I’m Feeling Lucky” results, it’s time I try to be a nice person!

Let’s set the scene, doing permissions in the template rather than after the deployment allows you to use incremental updates, and stops people doing “clicky clicky” changes in the environment. You know those people, “I’ll just do it quickly…”, “I’ll fix it later, honest”. And hell, we’ve all done it, but let’s try to be better than that.

So permissions is all about scope, you can assign to the Resource Group of the resource itself. The approach is actually different, and this is explained perfectly here, there is every chance you’ve stumbled across this post because of this error:

"error": {
"code": "InvalidCreateRoleAssignmentRequest",
"message": "The request to create role assignment '{guid}' is not valid. Role assignment scope '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/Microsoft.Insights/components/{resourceGroupName}' must match the scope specified on the URI  '/subscriptions/{resourceGroupName}/resourcegroups/{resourceGroupName}'."
  }

After reading the Microsoft documentation you’d think the scope property would help you, but instead I found it was easier to follow the approach the kind people at Stack Overflow explained. So lets look at some examples:

Parameters & Variables:

    "parameters": {        
"runbookAutomationOperators": {
            "value": [
                "a195af43-xxxx-49fd-xxxx-c1e0de11b118",
                "5deb670c-xxxx-4642-xxxx-de5290266bad",
                "2541d966-xxxx-4d1d-xxxx-8ecc6c2e8a39"
            ]
        }
}
"variables": {
        "automationOperatorId": "d3881f73-407a-4167-8283-e981cbba0404",
        "readerId": "acdd72a7-3385-48ef-bd42-f606fba81ae7"
    },

Notes:

  • I’ve put an array of accounts so I can loop through them and assign them
  • I’ve put the built in Azure roles as variables as they are referenced more than once (you can find the Id’s by running Get-AzRoleDefinition)

Resource Group:

        {
            "type": "Microsoft.Authorization/roleAssignments",
            "apiVersion": "2020-04-01-preview",
            "name": "[guid(concat(resourceGroup().id), resourceGroup().name, variables('readerId'), parameters('runbookAutomationOperators')[copyIndex()])]",
            "copy": {
                "name": "resourceGroupReader",
                "count": "[length(parameters('runbookAutomationOperators'))]"
                },
            "dependsOn": [],
            "properties": {
                "roleDefinitionId": "[concat('/subscriptions/', subscription().subscriptionId, '/providers/Microsoft.Authorization/roleDefinitions/', variables('readerId'))]",                        
                "principalId": "[parameters('runbookAutomationOperators')[copyIndex()]]"
            }
        }

Notes:

  • The role assignment is at the top level, as we’re doing a resource group deployment it’s the RG where the rights will be written
  • The name has to be a unique guid, so I’ve included the actual account id in the string be used to build the guid. By doing so it ensures if you have multiple assignments (which you will) that they are unique as it’s combing the Resource Group and the account being assigned to the RG
  • I’ve done a copy because I want to run it for the length of the array, in this case 3

Resource:

        {
            "type": "Microsoft.Automation/automationAccounts/providers/roleAssignments",
            "apiVersion": "2020-04-01-preview",
            "name": "[concat(parameters('automationAccountName'), '/Microsoft.Authorization/', guid(concat(resourceGroup().id), resourceId('Microsoft.Automation/automationAccounts', parameters('automationAccountName')),variables('automationOperatorId'), parameters('runbookAutomationOperators')[copyIndex()]))]",
            "copy": {
                "name": "runbookAutomationOperators",
                "count": "[length(parameters('runbookAutomationOperators'))]"
                },
            "dependsOn": [
                "[parameters('automationAccountName')]"
            ],
            "properties": {
                "roleDefinitionId": "[concat('/subscriptions/', subscription().subscriptionId, '/providers/Microsoft.Authorization/roleDefinitions/', variables('automationOperatorId'))]",
                "principalId": "[parameters('runbookAutomationOperators')[copyIndex()]]"
            }
        },

Notes:

  • This example is for an automation account and that is key, because you actually specify the type of account in the type (something you can easily change for other resource, e.g. Microsoft.Storage/storageAccounts)
  • Similar to the RG, you need a unique guid, so I’ve included the resource itself as well as the account we are allocating, so it once again is unique
  • Again we’ve done a copy so it’s ran three times, allocating the users to this Operator Role

So I talked about the name having to be a guid in both examples, but it’s a point I’d like to talk about some more. firstly, if it’s not you’ll get this error:

The role assignment ID must be a GUID

I mean, you can’t blame MS for this error, it is pretty clear. But how do you make a guid? Well I thought ARM Templates have guid functions so I jumped over to the MS doco. But on reading this I definitely over thought these two lines of text:

  • The returned value isn’t a random string, but rather the result of a hash function on the parameters. The returned value is 36 characters long. It isn’t globally unique. To create a new GUID that isn’t based on that hash value of the parameters, use the newGuid function.
  • Returns a value in the format of a globally unique identifier. This function can only be used in the default value for a parameter.

And if you’ve not had enough coffee you might think, I just want a bloody unique Guid and I don’t want to mess around with default values, especially as we’re doing a copy loop so we need some salt to make it different. But, just to state the obvious, this actually makes sense. You want to create a string that will be the same every single time, because you want to be able to run this incrementally. If it was unique, your IAM role assignment page would be a shambles as every time you try run your pipeline it’s come up with a lovely new guid! So the guid function is your friend, just ensure you include all the attributes so that the base string is unique. E.g. if you’re doing an Automation account, you want both that resource and the account you are assigning as the base string. If you don’t include that resource, then it could clash with another role assignment of that user in the RG, and if you don’t include user, well then it’s going to be the same for all users being granted access to that account.

While I appreciate this may be obvious I do hope it is useful as I really didn’t think the vendor doco was particularly clear, if anything else it’ll save Future Dave from working this out yet again because if he had a memory he’d be dangerous…


Assign Teams phone numbers using Microsoft Forms, Logic Apps and Azure Automation

Sometimes provisioning users into Office 365 services requires custom settings to be executed with PowerShell. This can present a problem when the teams responsible for managing the ongoing process have varying levels of understanding. How do you provide a front end user interface for my custom code without the need for the operators to need or know PowerShell?

This is the case for Microsoft Teams. Microsoft Phone System ‘Direct Routing’ feature lets you connect your telephony gateway (SBC) to Microsoft Phone System. With this capability you can configure on-premises telephone numbers with Microsoft Teams client. A subtle difference using Direct Routing for your PSTN connectivity over Microsoft Calling (Telstra Calling in AU) is the inability to assign phone numbers to users via the Teams Admin Portal. The only way to assign the phone number is through a PowerShell cmdlet with parameter ‘OnPremLineURI‘:

Set-CsUser -Identity $UPN -EnterpriseVoiceEnabled $true -HostedVoiceMail $true -OnPremLineURI $lineURI

So here in lies my problem. Let’s fix it.

Components

  • Microsoft Forms – The front end UI with required input fields.
  • Logic App – The glue and manages the process.
  • Azure Runbook – where my code lives to perform the steps against Office 365 API’s.

Microsoft Forms

This is a pretty basic form. I just need enough information as inputs to execute my PowerShell. The great thing about Microsoft Forms is that it has to be authenticated, the fact that it’s built into Office 365 is that it’s all done by Azure Active Directory.

Mobile Preview of the Form

Note: Unfortunately the simplicity of this form is also its short coming. I would love if we can do some form validation on the input string before it was submitted. Especially on the phone number format and length.

Create the Logic App

Open a new Blank Template in the Logic App Designer and search for Microsoft Forms and use the option ‘When a new response is submitted‘.

Start by getting the form data into the Logic App.

Assign all of the form inputs as variables in your Logic App to then be passed to our Runbook.

Azure Runbook

Create a Runbook, make sure you have defined the parameters (highlighted in lines 1-5). The Logic App will reference these automatically for you when working in the designer.

Note: All the settings we need are part of the Skype for Business PowerShell module which isn’t available in the Azure Automation Gallery. If you install Microsoft Teams module version 1.1.6 you will have the ability to execute New-CsOnlineSession and pull down all the cmdlets into the PS session. At the time of writing I don’t know a way of using a managed identity or client secret for New-CSOnlineSession, so it’s just a standard user account with bypass MFA (yuck).

 Param (
[Parameter (Mandatory = $true)][string]$upn,
[Parameter (Mandatory = $true)][string]$lineURI,
[Parameter (Mandatory = $true)][string]$dialPlan
)

$debug = $false

import-module MicrosoftTeams


if($debug -like $true){
    Write-Output "Connecting to Skype Online..."
}
$creds = Get-AutomationPSCredential -Name "SkypeCreds"
try{
    $sfboSession = New-CsOnlineSession -Credential $creds -OverrideAdminDomain "domain.onmicrosoft.com"
}
Catch{
    $errOutput = [PSCustomObject]@{
        status = "failed" 
        error = $_.Exception.Message
        step = "Connecting to Skype Online"
        cmdlet = "New-CsOnlineSession"
    }
    Write-Output ( $errOutput | ConvertTo-Json)
    exit
}
if($debug -like $true){
    Write-Output "Importing PS Session..."
}
try{
    Import-PSSession $sfboSession -AllowClobber
}
Catch{
    $errOutput = [PSCustomObject]@{
        status = "failed" 
        error = $_.Exception.Message
        step = "Importing PS Session"
        cmdlet = "Import-PSSession"
    }
    Write-Output ( $errOutput | ConvertTo-Json)
    exit
}
if($debug -like $true){
    Write-Output "Processing line: $($upn) "
}
    #Correct User
    if($upn -like $null){
        $sip = (Get-CsOnlineUser -Identity $($user.displayname)).SipAddress
        $upn = $sip.TrimStart('sip:')
    }
    #Correct Number
    if($lineURI -notlike "tel:*"){
        if($lineURI.Length -eq 12){
            $lineURI = "tel:"+$lineURI
        }
        elseif($lineURI.Length -eq 11){
            $lineURI = "tel:+"+$lineURI
        }
    }
if($debug -like $true){
    Write-Output "  INFO: Using values - $($upn) with $($lineURI)" 
    Write-Output "  INFO: Attempting to remove Skype for Business Online settings: VoiceRoutingPolicy" 
}    
    try{
        Grant-CsVoiceRoutingPolicy -PolicyName $NULL -Identity $upn
    }
    Catch{
        $errOutput = [PSCustomObject]@{
            status = "failed" 
            error = $_.Exception.Message
            step = "VoiceRoutingPolicy"
            cmdlet = "Grant-CsVoiceRoutingPolicy"
        }
        Write-Output ( $errOutput | ConvertTo-Json)
        exit
    }
if($debug -like $true){
    Write-Output "  INFO: Attempting to remove Skype for Business Online settings: UserPstnSettings" 
}    
    try{
        Set-CsUserPstnSettings -Identity $upn -AllowInternationalCalls $false -HybridPSTNSite $null | out-null
    }
    Catch{
        $errOutput = [PSCustomObject]@{
            status = "failed" 
            error = $_.Exception.Message
            step = "UserPstnSettings"
            cmdlet = "Set-CsUserPstnSettings"
        }
        Write-Output ( $errOutput | ConvertTo-Json)
        exit
    }
    # https://docs.microsoft.com/en-us/powershell/module/skype/grant-csteamsupgradepolicy?view=skype-ps
if($debug -like $true){    
    Write-Output "  INFO: Attempting to grant Teams settings: user to UpgradeToTeams (TeamsOnly)." #Upgrades the user to Teams and prevents chat, calling, and meeting scheduling in Skype for Business
}    
    try{
        Grant-CsTeamsUpgradePolicy -PolicyName UpgradeToTeams -Identity $upn
    }
    Catch{
        $errOutput = [PSCustomObject]@{
            status = "failed" 
            error = $_.Exception.Message
            step = "UpgradeToTeams"
            cmdlet = "Grant-CsTeamsUpgradePolicy"
        }
        Write-Output ( $errOutput | ConvertTo-Json)
        exit
    }
if($debug -like $true){
    Write-Output "  INFO: Attempting to set Teams settings: Enabling Telephony Features &amp; Configure Phone Number"
}
    try{
        Set-CsUser -Identity $UPN -EnterpriseVoiceEnabled $true -HostedVoiceMail $true -OnPremLineURI $lineURI
    }
    Catch{
        $errOutput = [PSCustomObject]@{
            status = "failed" 
            error = $_.Exception.Message
            step = "SetUser"
            cmdlet = "Set-CsUser"
        }
        Write-Output ( $errOutput | ConvertTo-Json)
        exit
    }
if($debug -like $true){
    Write-Output "  INFO: Attempting to grant Teams settings: TeamsCallingPolicy" #Policies designate which users are able to use calling functionality within teams and determine the interoperability state with Skype for Business
}
    try{
        Grant-CsTeamsCallingPolicy -PolicyName Tag:AllowCalling -Identity $upn
    }
    Catch{
        $errOutput = [PSCustomObject]@{
            status = "failed" 
            error = $_.Exception.Message
            step = "TeamsCallingPolicy"
            cmdlet = "Grant-CsTeamsCallingPolicy"
        }
        Write-Output ( $errOutput | ConvertTo-Json)
        exit
    }
if($debug -like $true){
    Write-Output "  INFO: Attempting to grant Teams settings: Assign the Online Voice Routing Policy"
}
    try{
        Grant-CsOnlineVoiceRoutingPolicy -Identity $upn -PolicyName Australia
    }
    Catch{
        $errOutput = [PSCustomObject]@{
            status = "failed" 
            error = $_.Exception.Message
            step = "VoiceRoutingPolicy"
            cmdlet = "Grant-CsOnlineVoiceRoutingPolicy"
        }
        Write-Output ( $errOutput | ConvertTo-Json)
        exit
    }
if($debug -like $true){
    Write-Output "  INFO: Set Dial"
}
    try{
        
        if($dialPlan -eq "National"){
            Grant-CsTenantDialPlan -PolicyName $null -Identity $upn
        }else{
            Grant-CsTenantDialPlan -PolicyName $dialPlan -Identity $upn
        }
        
    }
    Catch{
        $errOutput = [PSCustomObject]@{
            status = "failed" 
            error = $_.Exception.Message
            step = "DialPlan"
            cmdlet = "Get-CsEffectiveTenantDialPlan"
        }
        Write-Output ( $errOutput | ConvertTo-Json)
        exit
    }

    #Completion Output
    $errOutput = [PSCustomObject]@{
        status = "Completed" 
        error = "None"
        step = "endOfJob"
        cmdlet = "None"
    }
    Write-Output ( $errOutput | ConvertTo-Json)
 

Link the Runbook to your Logic App

Now we can update the Logic App with our Runbook information.

Output the details via Email

I found the best way to get consistent structured results is to have error handling in your Runbook, and parse this back to the Logic App as outputted JSON with a known schema/structure. A sample output of the JSON can be used to generate a schema, like the example below.

{
    "status":  "failed",
    "error":  "One or more errors occurred.: Unable to find an entry point named \u0027GetPerAdapterInfo\u0027 in DLL \u0027iphlpapi.dll\u0027.",
    "step":  "Connecting to Skype Online",
    "cmdlet":  "New-CsOnlineSession"
}

This enables you to have sufficient levels of diagnostics logs as part of the output. In this case I’m using a email.

The example workflow is below.

Additions

Additional functionality you could include might be:

  • Check for licenses
    • AAD Module in PowerShell, or
    • AAD Group Membership in Logic App
  • License the user via PowerShell or Graph
  • Send the response in a Teams Notification, rather than email or teams channel.
  • Email the user on successful completion detailing they have a new phone number.
  • More error handling
  • Smaller more specific Runbooks that are executed rather than a large script block, allowing for more conditions to considered per step.

Lets Talk Teams!

We have years of experience deploying unified communication in the Microsoft stack. Reach out, we have a rapid deployment solution for Teams Direct Routing leveraging the public cloud and we have tried and tested a number of flavours of SIP Providers. Trial or PoC a voice solution with minimal effort leveraging public cloud deployments

Learn More


Automating Azure Site Recovery with PowerShell

In a recent consulting engagement, I’ve needed to perform a large-scale migration of a company’s virtual machine (VM) fleet from an On-premise datacenter to Microsoft Azure. Thinking about what that actually means – We’re picking up many compute workloads that are (in most cases) essential for day to day business operation and re-homing them to a new slice of a Microsoft-managed datacenter. After coming out the other end and completing the project, I thought I would shed some light on the tools that I used and developed to make the vision a reality.

In this particular engagement the customer is a large enterprise with a VMware environment servicing 300+ VM’s. When we consider the business value behind each of these compute workloads, it quickly becomes apparent that selecting the right tooling and approach is vital to deliver a successful outcome whilst causing as little disruption to the business as possible.

Enter Azure Site Recovery

Azure Site Recovery (ASR) is Microsoft’s Disaster Recovery as a Service solution which can replicate workloads running on physical and virtual machines from one location to a secondary location. As a disaster recovery platform, it’s possible for workloads to failover and successfully failback in a disaster scenario. ASR can also be used to migrate workloads to Azure by completing the failover component without failing back.

Why Should We Automate Azure Site Recovery?

I like to automate things like this because a computer following a process that someone writes will always perform it the same way. We can expect what the output will look like. In this case that means a like for like VM that looks and feels like it did in its previous life, before being migrated. When we introduce an operator into the mix we also introduce the human element. Things like resource names and groups, VM specs, disk settings, network location and ip addresses all need to be configured for each VM migration.

To have success running migrations at scale, it is important to use known, well-tested, repeatable processes. For me, that means figuring out the best way to use a tool then automating it so that you (or anyone else) can use it the right way, everytime, easily. 

How Can We Automate Azure Site Recovery?

I use PowerShell as an automation tool on top of ASR for a couple of reasons. The main reason being that Microsoft provide and maintain a set of PowerShell modules for interacting with Azure resources, including ASR. This is known as the Az module – See our previous post on the Azure PowerShell Az module for a deeper explanation.  PowerShell can also run almost anywhere thanks to PowerShell Core, a cross-platform edition of PowerShell that runs on Windows, macOS and Linux.

Armed with PowerShell and the Az module, we can get cracking on with the fun stuff – bashing out some lines of code. My approach and methodology here usually involve a fair bit of back and forth, playing with the commands that are available to me and learning the best ways to drive them. Importantly, you don’t want to do this with live data, setting up an isolated sandpit with dummy data will go a long way in allowing you to upskill knowledge around the tools while making sure your production systems remain untouched.

Once I’ve got a handle on the commands that are needed and how they fit together, I make a MVP (minimum viable product) script. The idea here is to demonstrate that its possible for the tooling to work (it’s not pretty but it works). To paint a picture, one of my MVP scripts will usually have a bunch of variables at the start declaring all the info that is required, things like VM name, source location, target location etc. From here, I usually design the script to be ran line by line, this is mostly for simplicity sake, complexity can come later, right now it just needs to be as simple as possible. At this stage, we can demonstrate our capability to perform a migration with PowerShell. A quick example of this is setting up a replication job, preceding this line, I do a series of get statements to build up all the variables seen in the command bellow.

$replicationJob = New-AzRecoveryServicesAsrReplicationProtectedItem -VMwareToAzure -ProtectableItem $vm -Name (New-Guid).Guid -ProtectionContainerMapping $replicationPolicy -ProcessServer $ProcessServer -Account $Account -RecoveryResourceGroupId $ResourceGroup.ResourceId -logStorageAccountId $LogStorageAccount.Id -RecoveryAzureNetworkId $vnet.Id -RecoveryAzureSubnetName $failoverSubnetName

From here, I like to put some lipstick on it and make it feel like a more polished product. Personally, I like to use a series of questions and prompts to generate the variables I described in the last paragraph. I also add status checks and operator prompts to continue. An example of this could be when performing a failover, once the operator confirms he is ready to begin, the command executes the failover, then continuously checks the failover job status until it has completed, once completed, tell the user running the script that its complete. Here is an example of a status check that I wrote for checking the progress of a failover job.

do {
    Clear-Host
    Write-Host "======== Monitoring Failover ========"
    Write-Host "This will refresh every 5 seconds."
    try {
        $failoverJob = Get-ASRJob -Name $failoverJob.Name
    }
    catch {
        Write-Host -ForegroundColor Red "ERROR - Unable to get status of Failover job"
        Write-Host -ForegroundColor Red "ERROR - " + $_
        log "ERROR" "Unable to get status of Failover job"
        log "ERROR" $_
        exit 
    }
    Write-Host "Failover status for $($VMName.FriendlyName) is $($failoverJob.State)"
    sleep 5;
} while (($failoverJob.State -eq "InProgress") -or ($failoverJob.State -eq "NotStarted"))

Once you get this far, the sky is the limit. Like most things, it can evolve over time. I like to add error handling and logging so we can elegantly handle a failure and have an audit trail of operations. I take this approach with most of the processes I automate, I think it’s important to start small and work up from there.