Stuart Moore

Musings of a Data professional

Stuart Moore

Author: Stuart Moore Page 1 of 15

Announcing Data Platform Discovery Day

With a number of the large in person Data Platform conferences and meetups having to cancel or take a hiatus due to the current Covid-19 outbreak, Matt Gordon (b | t) and myself decided to do something about it.

So we came up with Data Platform Discovery Day. A 2 day online event aimed at those people who are looking for a start in the Data Platform world or are wanting to learn some fundamentals about an aspect of it.

Each day with run alongside business hours across a different continent, and will feature ten 50 minute sessions. Each day will have different speakers, and if you want there’s nothing to stop you attending both if you fancy an early morning or late night

Day 1 is running during US daytime on 29th April. The first session will begin at 9am Eastern Time, and then a new session will being at the top of each hour

Day is running during European daytime on 30th April. The first session will begin at 9am BST (UTC+1), again with a new session starting at the top of each hour.

Registration for the events with start once we’ve finalised the speakers.

And talking of speakers, the Call for Papers for both events are open. You can submit to both events if you want. Please remember that sessions are 50 minutes, and content should ideally be Level 100 material.

US Data Platform Discovery Day call for papers

European Data Platform Discover Day call for papers

We’d love to encourage first time speakers as well, so if you’d like ask any question then please get in touch with me (twitter) or Matt (twitter)

dbaSecurityScan – A new PS module for SQL Server security

While doing some work on my materials for SqlBits training day I started thinking about a few of the problems with managing SQL Server permissions.

How easy it to audit them? If someone asks you the DBA exactly who has access to object A, can you tell them? How do people get access to that object, is it via a role, a schema or an explicit permission?

Is that information in an easy to read or manipulate manner?

How do you ensure that permissions persist between upgrades? I’ve certainly seen 3rd party upgrades that have reset database level permissions. Do you have a mechanism to check every permission and put them back as they were?

We’re all doing the devops these days. Our database schema is source controlled, and we’re deploying it incrementally in pipelines and testing it. But are we doing that with our database security?

So in the classic open source way, I decided to scratch my own itch by writing something. That something is dbaSecurityScan, a PowerShell module that aims to offer a solution for all of the above.

The core functionality atm allows you to export the current security state of your database, based on these 4 models:

  • Role based
  • User based
  • Schema based
  • Object based

You can also return all of them, or a combination of whatever you need.

At the time of writing, getting the security information and testing it is implemented, and you can try it out like this:

If you’ve got any suggestions for features, or want to lend a hand then please head over to dbaSecurityScan and raise an issue or pull request respectively 🙂

Working around sqlcmd on Mac OS issues

So I’m busy working on my new dbaSecurityScan module, busily trying to write all the pester tests for the appveyor pipeline

Part of the testing for the module involves building a number of test scenarios, each of which of which needs a database spinning up for it.

On the Windows builds, that’s nice and easy as I just use this in the test script to loop through all the scenarios and run the .sql files :

However, I’m developing on my Macbook Pro as I want this module to be nicely platform agnostic. SQL Server is running nicely in docker and dbatools can connect to it happily. But I just can’t get sqlcmd to work, lots of TCP Provider: Error code 0x102 and TCP Provider: Error code 0x2AF9 messages no matter what I try. So being short on time I though I’d try something else, and hopefully this will work when I get Linux build running as well

The scripts I want to run create databases, users, schemas and whole lot more, so there’s lots of batch separators (; and GO) in them.Invoke-DbaQuery is a great function, but it doesn’t like working with batch separators (which is not through want of trying, it’s just really tricky). So time to drop back to some raw SMO to run the scripts in with ExecuteNonQuery()

I’ve added a variable to my builds that lets me pick when I want to run via SQLCMD or via SMO. At the minute I’m just using it on my machine, but if I run into problems later, it’s nice to now I can just toggle by looking at which platform it’s running.

Resetting conflicting DTC CIDs with PowerShell

I’m currently migrating a lot of SQL Server instances onto newer virtual machines. Quite a few of these instances talk to each other via Linked Server for various historical reasons. And a lot of that chat is done via distributed transactions, which means configuring MS Distributed Transaction Coordinator.

So one of the first things I need to do, is check that that DTC is working between the 2 boxes. This is dead simple with official PowerShell module for msdtc. It’s as simple as running Test-Dtc and then working through any errors. So, we just run this:

Test-Dtc -LocalComputerName Source -RemoteComputerName Destination

and as this is a post about fixing something, it won’t suprise you that I’m going to get an error message:

"The OleTx CID on SOURCE and DESTINATION is the same. The CID should be unique to each computer."
At C:\Windows\system32\WindowsPowerShell\v1.0\Modules\MsDtc\TestDtc.psm1:266 char:13
+             throw ([string]::Format($Strings.SameCids, "OleTx", $Loca ...
+             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: ("The OleTx CID ...each computer.":String) [], RuntimeException
    + FullyQualifiedErrorId : "The OleTx CID on SOURCE and DESTINATION is the same. The CID should be uniqu
   e to each computer."

But in much more fetching shared of red and yellow, announcing that “The OleTx CID on SOURCE and DESTINATION is the same. The CID should be unique to each computer.”

The cause is really simple to grasp. When the Distributed Transaction Coordinator is installed it registers a GUID to identify it, the theory being a GUID clash should be a vanishingly rare occurance.

That is, until someone’s cloning Virtual Machines. So I have a batch of shiny new VMs that all think they’re the same instance of DTC. That’s not so good. It used to be the accepted fix was to manually remove the the distributed transaction coordinator, clean the registry, restart and then reinstall everything. That sounds like a lot of work to me!

The msdtc module makes it very simple to do, so we’re staring off here:

PS C:\Windows\system32> Get-Dtc | Select-Object *


DtcName               : Local
KtmRmEndpointCid      : 72c409a9-9c7b-4d24-9e0c-b946a2e5aa4c
OleTxEndpointCid      : 3eb9ce34-4d2c-48cf-9ebe-d6e888f9b0ca
Status                : Started
UisEndpointCid        : c5f16d32-01c9-4b65-be57-4521fa4bb934
VirtualServerName     : SOURCE
XAEndpointCid         : fcec2fe2-eab2-4277-853a-6ea4d7736430
PSComputerName        :
CimClass              : root/MsDtc:DtcInstance
CimInstanceProperties : {DtcName, KtmRmEndpointCid, OleTxEndpointCid, Status...}
CimSystemProperties   : Microsoft.Management.Infrastructure.CimSystemProperties

While DTC normally installs with a default log path, we’ll just make sure and grab it first. Then it’s just as simple as Uninstall-DTC and Install-DTC:

$logPath = (Get-DtcLog).path
Uninstall-Dtc -Confirm:$false
Install-Dtc -LogPath $logPath -StartType AutoStart

And to check it’s worked, lets query dtc and check:

PS C:\Windows\system32> Get-Dtc | Select-Object *


DtcName               : Local
KtmRmEndpointCid      : 72c409a9-9c7b-4d24-9e0c-b946a2e5aa4c
OleTxEndpointCid      : 3eb9ce34-4d2c-48cf-9ebe-d6e888f9b0ca
Status                : Started
UisEndpointCid        : c5f16d32-01c9-4b65-be57-4521fa4bb934
VirtualServerName     : VMCLSTR-IVANTI
XAEndpointCid         : fcec2fe2-eab2-4277-853a-6ea4d7736430
PSComputerName        :
CimClass              : root/MsDtc:DtcInstance
CimInstanceProperties : {DtcName, KtmRmEndpointCid, OleTxEndpointCid, Status...}
CimSystemProperties   : Microsoft.Management.Infrastructure.CimSystemProperties

And there we have a new unique OleTxEndPointCid and we’re good to go


Using Log or Audit data responsibly

A couple of days ago I walked through getting information about data deletion uing the SQL Server Transaction Log. We looked how you could find when and by whom data had been removed from a SQL Server table.

So you’ve got the when, and you’ve got the who, now how do you precede? Remember you could be holding someone’s job or professional reputation in your hands.

As a long time DBA I’m well aware that applications are not always the best written things, and instructions don’t always spell out correctly what an option is going to do.

In the case I was writing about I withheld the name from general knowledge, a single senior manager was made aware that I knew and would be investigating. Why did I do this?

In the case of this particular third party application , there were a couple.

  • Deleting data within the app should write data to an internal audit trail, this hadn’t
  • Users cannot generate their own SQL queries, so this wasn’t an ad hoc query either
  • The single row delete was in it’s own transaction
  • This particular app is badly written. We’ve seen it fail to correctly write it’s own ini files on exit
  • It has a habit of crashing out for no reason

So a bit like a lot of applications out there. Now I had some time to investigate I started by going through logs on the client Windows machine and trying to replicate on my own.

Eventually I traced it back to a coding error in the application where they processed an Update as:

  1. Delete old data
  2. Insert new data
  3. Write audit log

None of this happened in an encompassing transaction (from studying the Transaction Log). So when the app crashed just after step 1, the row was just left missing and nothing there.

To the user who was logged in at the time, they wouldn’t have been aware of the crash as the GUI doesn’t display app status. The crash also happened at 16:55, which is pretty much home time so they wouldn’t have looked at the record again

Now I had the evidence I could talk to the Senior manager and show that we needed to raise this with the supplier, and that the user was blameless, but that we should recommend to users that they should double check any updates to be sure.

Releasing the name to a less trusted source without this background information would have probably led to the user being blamed for something that wasn’t really their fault

So, just because you have the information it doesn’t mean it’s actionable with adding more context and knowledge around it.

Snipe hunting in the SQL Server Transaction Log

Nice quiet day in the office, busily cracking through the To Do list when suddenly something pops into the ticketing system as a P1. Data has gone ‘missing’ from a corporate system, and there’s nothing in the audit trail about when it went missing or how it went missing. For how, read ‘Whodunnit’!

Now if there’s nothing in the audit trails, and noone is confessing then that doesn’t leave me with very much to go on. A slight break through is that someone clearly recalls seeing the data at 09:45 on the 3rd of Febuary, but that’s as good as it gets. We’re now on the 6th Febuary, so that’s a window of just over 3 days it could have gone missing.

Now, as any DBA knows, SQL Server comes equipped with a bit in ‘audit’ for any operation that modifies data. It’s just awkward and complex to read and search.

The name of that audit is the Transaction Log. SQL Server won’t do any data modification without it being written into the log, as SQL Server doesn’t count the transaction committed until it’s logged as it needs it for recovery after a restart.

The transaction log holds a LOT of information. Searching 3 days worth of it would be like looking for something in a hay stack. And at the moment we don’t even know if that’s a needle, a pin or an apple seed

An aside about reading Transaction Logs

SQL Server comes with 2 built in, but undocumented, functions fn_dblog and fn_dump_dblog. Undocumented means Microsoft don’t publish documentation and reserve the right to modift them without notice. But they’re a pretty open secret, and there’s pleny of info out there that

For the purposes of this post, you just to need to know this about them:

  • fn_dblog reads the current transaction log
  • fn_dump_dblog reads transaction log backups

If you’re on a SQL version lower that SQL Server 2014 or SQL Server 2012 SP2 then there is a know bug that uses up threads and can cause hanging.

Finding the time of deletion

So the first thing to do is to is to work out a time window when the data went missing. This was going to be a tedious process, so lets automate it. If we’re automating a SQL Server task, then dbatools becomes the obvious answer.

The plan is:

  1. Restore the database the last time the data was seen
  2. Roll forward in 6 hour increments until the data disappears
  3. Roll forward in 30 minute steps from the beginning of the last 6 hour lost until the data disappears
  4. Roll forward from the 30 minute start to the disappearance of the data in 5 minute increments

So we now have a 5 minute window we need to search in the transaction log backup, which is much better than 72 hours

To speed things up we’ll scan all the backup headers first and save them into an object which we can reuse through out the process. We’ll also rename the database, and change the filenames on restore. This is because we are restoring the database onto the instance it came from, the reason we have to do this will become clearer later on

We also gather some extra information about the missing rows, which I’ll explain in a moment.

# Time to start Restoring from
$startTime = get-date ('03/02/2020 09:45')

# The time in minutes between restore points
$windowSize = @('720','30','5')

# Used to track how far through the windows we are
$counterLoop = 0

# Query to test if the object has been deleted during the window
# We also gather some internal SQL Server page information we'll need later
$sqlQuery = 'select db_id(), sys.fn_PhysLocFormatter(%%physloc%%),* from dbo.Objects where ObjectID=''1'''

# Scan all the backup headers first to speed things up
$backups = Get-DbaBackupInformation -Path c:\RestoredBackps -SqlInstance MyInstance 

# Set a variable to tell restore whether to start a new restore, or to continue

$restoreContiue = $false
while ($counterLoop -lt $windowSize.count){
    $backups | Restore-DbaBackup -SqlInstance MyInstance -DatabaseName RestoreDb -ReplaceDbNameInFile -TrustDbBackupHistory -RestoreTime $startTime.AddMinutes($windowSize[$loopcount]) -Standby c:\Standby -Continue:$restoreContiue

    #Prevent the old page information from being overwritten
    $oldResult = $sqlResult

    $sqlResult = Invoke-DbaQuery -SqlInstance MyInstance -DatabaseName RestoreDb -Query $sqlQuery
    
    if ($sqlResult.count -eq 0){
        # If we get no results, we're past when the data has disappeared
        # So, move to the next loop and start a new restore
        
        $counterLoop++
        $restoreContiue  = $false
    } else {
        # If data still there, start the next increment
        
        $startTime = $startTime.AddMinutes($windowSize[$counterLoop])
        $restoreContiue = $true
    }
}
Write-Host "Data disappeared between $startTime and $($startTime.AddMinutes($windowSize[$counterLoop]-1))`n"
Write-Host "Missing data was on the following pages:`n"
$oldResult

Now we know when the data disappeared, now we just want how

Searching the Transaction Logs

SQL Server Transaction logs store a LOT of information, so querying them will return more information that you really want. Also the information in the Transaction Log is for the benefit of SQL Server, so not all of it is easily readable for us humans

For instance, when we’re looking for a specific missing row the easiest way is to search for transaction log records that modified the physical location the row existed on. While the actual row data and SQL details are in the log, they’re encoded, so getting to them is a lot of work.

We’re looking for a specific row in a specific table so we need the following 3 bits of information to identify it’s physical location:

  • Datafile ID
  • Page ID
  • Slot ID

These can be found for a specific row with the following query using the fn_PhysLocFormatter function which handily returns the physical location of a row in a table:

select db_id(),sys.fn_PhysLocFormatter(%%physloc%%) from dbo.Objects where ObjectID='1'

This will give you result like:

db_idFile:Page:Slot
5(1:232:58)

The reason we were saving this from the previous iteration of the loop is because it wouldn’t exist once the data had been deleted.

Now all we need to do is to search the transaction log. However, there’s one more small step before we can do that. In the transaction log the FileID and PageID are stored as fixed length hexadecimal values, and we’ve just pulled them out in decimal format.

I convert them with PowerShell as it’s a little easier for me to remember:

PS C:\ '{0:x4}' -f 1
0001
PS C:\ '{0:x8}' -f 336
00000150

Note how the FileID and PageID are padded out to 4 characters and 8 characters respectively. This query will return all the transactions that affected that slot on that page in that file. Depending on how busy that page is that might be a lot.

select * from fn_dblog(NULL,NULL) where [Page ID]='0001:00000150' and [Slot ID]='58'

Fiding out who and when

You’ll instantly find out just how much information there is in a transaction log record. So let’s trim down the data a bit much to a couple of things we want to know:

  • Who issued the command
  • Exactly when it was run

And also limit it to just delete records:

select [Transaction ID],[Begin Time],[Transaction SID] from fn_dblog(NULL,NULL) where [Page ID]='0001:00000150' and [Slot ID]='58' and Operation='LOP_DELETE_ROWS

If you’re lucky and this returnselect [Transaction ID],[Begin Time],[Transaction SID] from fn_dblog(NULL,NULL) where [Page ID]=’0001:00000150′ and [Slot ID]=’58’ and Operation=’LOP_DELETE_ROWSs a single row, you’ll find that only [Transaction ID] is populated as the other goodies are recorded at the Transaction wrapper level, not the statement level, so we take the Transaction ID and use that:

select [Transaction ID],[Begin Time],suser_sname([Transaction SID]) from fn_dblog(NULL,NULL) where [Transaction ID]=’0000:000003ba’

select [Transaction ID],[Begin Time],suser_sname([Transaction SID]) from fn_dblog(NULL,NULL) where [Transaction ID]='0000:000003ba'

This returns all the rows for the specified transaction in the transaction log. You’ll have 2 rows for the Transaction wrapper, the BEGIN and the COMMIT, these will be the first and last row.

The rows in between will be the actual deletion records. You might have one or more depending how much data was in the row, or if it was part of a delete than remove multiple rows.

The BEGIN transaction record contains the information you wanted:

  • [Begin Time] gives you the time the delete started
  • [Transaction SID] gives you the Security Identifier of the account that executed the transaction.

We user SUSER_SNAME to convert the SID to a username. This is the reason I mentioned that you want to do this on the same instance as the original database was on, this is to make sure the SIDs match up and you get the right name!

Conclusion

As you can see, everything you’d ever want to know about what happened in SQL server is in the transaction log. But it’s not an easy beast to work with, much better to look at better methods to save you having to do this.

Now with added CISSP

It’s all been a bit quiet around here with lots on at work and training for a marathon. One piece of new I’ve not menitoned is that I’m now CISSP certified 👍. I sat the exam in December, but it’s taken a while for the paperwork and accreditation to get sorted and for everything to become official

The main thing I’ve taken away from studying for the certification is that process rules everything. As an in the trenches DBA we’re more worried about the specifics of SQL Server permissions or the Oracle auiting is correct. But why are we doing those things, and are we doing them correctly?

Correctly doesn’t just mean technically correct, we’re almost certainly doing them that way or things will break. What I mean is are we correctly implementing the processes and policies that drive the rest of the business?

For example, everyone loves backups (yeah, I know, this is my favourite thing as well). But how much should you be keeping? Do you really need 3 years of backups? Is that just increasing the amount of data you could loose in a breach? Are the older backups encrypted, and would restoring and encrypting them break the purposes they were kept for?

If you’re never going to use them why are you keeping them? If it’s just for a CYA audit reason, then why not just keep the audit logs? Less chance of leaking PII or Financial data if you’ve just got the bare bones of X did Y on Z.

This alignment with Organisational policy is a core reqiurement for doing SQL Server security correctly. It will allow you to concentrate on exactly what needs to be done and not just running around implementing ad-hoc fixes every time a hole appears.

Expect more posts on this topic over the coming months. And if you want a deep dive into SQL Server Security then I’m presenting a full day workshop at SqlBits 2020SQL Server Security from the Groud Up on Wednesday 1st April

SQLBits logo

SQL Server Security from the ground up at SQLBits 2020

I’m pleased to announce the I will be presenting a full day workshop at SQLBits 2020 on Wednesday 1st April.

The topic for the day is ‘SQL Server Security from the bottom up’.

We’ll be looking at what is required to ensure that the data stored SQL Server is secure, and that your organisation can trust that data in it’s mission.

This is more than just a technical workshop. We’ll be spendin time looking into how you can’t secure data without the organisation buying in to the process. To generate a working security policy you’re going to need approval from the top, if your CEO isn’t willing to enforce security then you’re fighting a losing battle. So you need to know how to present an argument at that level for the appropriate level of security and the resources to implement it.

So topics we’ll be covering will includer

  • Data Ownership
  • Risk Analysis
  • Seperation of Duties
  • Policies and Responsibilities
  • Cost of Security
  • What is out of your hands
  • Organisation Education

Don’t worry, there’ll be plenty of techical content as well. We’ll be looking into

  • Cloud vs On Premise
  • Setting up the operating, system if you have one
  • Setting up SQL Server
  • Permissions
  • Development best practices
  • Encryption
  • and much more

The sessions is aimed at all levels of SQL DBA, Developer or anyone who has to ensure the security of data. No previous experience is expected. Any technical examples will be provided so you can work with them on your own time, or take them back to show your colleagues

Until 31st December the price for 2 full training days and 3 days of conference sessions is £999, moving up to £1199, and then £1499 from the 15 Febuary, so get in quick for a barging

If you’ve any questions then please drop me a comment, reply below, or get in touch via Twitter.

Making SQL Agent Jobs Availability Group aware with dbatools

A new system has rocked up at work. To keep the database nice and available across a couple of sites we’ve implemented a SQL Server Availability Group solution

The setup for Availability Groups is well documented and dbatools has plenty of AG commands to help out and keep things in sync across the replicas.

But our issue was coping with all the 3rd party SQL Server stored procedures that weren’t Availability Group aware.

What do I mean by Availability Group aware? When running on an Availability Group, one SQL Server instance ‘owns’ the database at any point in time, but the SQL Agent jobs have to be replicated across all of the instances in the cluster. So you want to make sure that your SQL Server Agent jobs only do work on the instance that currently owns the Availability Group.

Doing this is pretty simple. Below is a piece of T-SQL that checks if the current SQL Server Instance is the primary instance in the AG. If it isn’t then we exit with an error.

IF (SELECT 
	repstate.role_desc
        FROM sys.dm_hadr_availability_replica_states repstate 
			INNER JOIN sys.availability_groups ag 
				ON repstate.group_id = ag.group_id AND repstate.is_local = 1) != 'Primary'
    BEGIN
       RAISERROR ('Not Primary', 2, 1)
    END

We exit with an error so we can make use of a SQL Agent Jobsteps ‘OnFailure’ option to quietly exit the job.

Why do we want to quietly exit the job? If we exit with an error, then your monitoring system will hammer you with lots of alerts of regularly failing jobs (you are monitoring your SQL Agent jobs aren’t you?).

As we’re going to be using PowerShell to push this around a lot of jobs, let’s throw it into a variable:

$stepsql = "IF (SELECT 
	repstate.role_desc
        FROM sys.dm_hadr_availability_replica_states repstate 
			INNER JOIN sys.availability_groups ag 
				ON repstate.group_id = ag.group_id AND repstate.is_local = 1) != 'Primary'
    BEGIN
       RAISERROR ('Not Primary', 2, 1)
    END"

Next we’re going to grab all the Agent jobs we want to update. Luckily for me, the company prefixed all of their jobs with a unique stamp, so I just used a filter on the job name:

$jobs = Get-DbaAgentJob -SqlInstance MyInstance | Where-Object {$_.Name -like 'SVC_*'}

To keep things easy to read and save line wrapping, I like to use parameter splatting to keep it clean. So we create a hashtable of values like so:

$jobParameter = @{
    SqlInstance = 'MyInstance'
    StepName = 'AgCheck'
    Database = 'Master'
    Subsystem = 'TransactSql'
    StepId = '1'
    OnFailAction = 'QuitWithSuccess'
    OnSuccessAction = 'GoToNextStep'
    Command = $stepsql
    Insert = $True
}

The Insert switch is new as of 15th October 2019 (I’ve just added it via a Pull Request). When it’s specified the command will insert the new step at the stepid specified. So in this example, it’s going to be the first step executed as the steps start from 1

The Insert switch causes the command to increment the StepID of all subsequent Job steps by 1 so it can fit in. It will also increment the OnFailStep and OnSuccessStep values if the target steps have been moved so the flow isn’t affected.

In this example we set our OnFailAction to be QuitWithSuccess, as mentioned above this will stop our logging system filling up

All that’s left is to loop through all of the jobs in our collection and use New-DbaAgentJobStep to insert it:

Foreach ($job in $jobs) {
    New-DbaAgentJobStep -Job $job @jobParameter
}

To do this across the other Availability Group nodes we have 3 options, we can either modify out hashtable to make use of New-DbaAgentJobStep‘s ability to target multiple SQL Server instances:

$jobParameter = @{
    SqlInstance = ('MyInstance','MyInstance2','MyInstance3')
    StepName = 'AgCheck'
    Database = 'Master'
    Subsystem = 'TransactSql'
    StepId = '1'
    OnFailAction = 'QuitWithSuccess'
    OnSuccessAction = 'GoToNextStep'
    Command = $stepsql
    Insert = $True
}

Or setup and test on a single now, and then use Sync-DbaAvailabilityGroup. This will sync a wide range of objects around an Availability Group (jobs, logins, credentials, custom errors, and many more). If you only want to synchronise the SQL Server agent jobs then Copy-DbaAgentJob will do just that.

Hopefully this little change is going to make a few people’s life easier, it’s certainly done that for me.

Prevent mistakes with Azure Resource Locks

Sometimes you have to give people a little more access to an Azure environment than you might like, and then there’s the chance of someone accidentally deleting a resource.

A resource deletion may not sound like too much of a big thing if you’re deploying Infrastructure as code, hey we’ll just terraform apply again and it’ll pop backup.

In theory that’s a great idea, just with one big problem. The new resource isn’t the old resource!

For an example, an Azure SQL Database server is a unique resource. If you delete one you lose any backups you’ve taken as they’re hosted on the server. Spinning up a new one isn’t going to get them back! A phone call to MS Support may if you’re quick and lucky

To avoid this you want to be user Azure Resource Locks. Think of these as the Azure version of child proof locks on your kitchen drawers. Yes, they may occaisonally mean you’ve got an extra step to get a knife out, but the little on can’t get their hands on it.

Auzre Resource Locks

First thing about Azure Resource Locks is that they apply to everyone and every role. Even if you’ve the Owner role on a Resource Group via RBAC, if there’s an Azure Resource Lock on that Resource Group you’re going to be blocked until you’ve removed the lock

This is great because it prevents those “oh ****, that was the wrong subscription” moments

Locks apply downwards from the resource they’re applied to. So if you apply one on a Resource group then it’s lock applies to every resource within that resource group. Apply it to an Azure SQL Database server, and it will apply to all of the Databases on that server.

Azure Resource Lock Types

Resource locks come in 2 flavours

  • CanNotDelete
  • ReadOnly

CanNotDelete does what it says on the tin. Once this lock is applied the resource (and it’s children) can not be deleted, even if you use -force

ReadOnly implements CanNotDelete and also prevents any modification of the locked resource and it’s children

Setting Azure Resource Locks

You can set Azure Resource Locks via the Azure Portal, Azure CLI, Azure Powershell or ARM Templates. Below are how you can set the same CanNotDelete lock on the Lock Resource Group using each of the 4 options:

  • Azure Portal
  • ARM Template

Create a template.json file:

{
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {},
    "variables": {},
    "resources": [
        {
            "type": "Microsoft.Authorization/locks",
            "apiVersion": "2015-01-01",
            "name": "LockGroupNotDelete",
            "properties":
            {
                "level": "CanNotDelete",
                "notes": ""
            }
        }
    ],
    "outputs": {}
}

Which you’d deploy with:

New-AzResourceGroupDeployment -ResourceGroupName lock -Name lock -TemplateFile ./template.json
  • Azure CLI:
az lock create --name LockGroupNotDelete --lock-type CanNotDelete --resource-group Lock
  • Azure PowerShell:
New-AzResourceLock -LockName LockGroupNotDelete -LockLevel CanNotDelete -ResourceGroupName Lock

What you’ll see with Azure Resource Locks

So now we’ve seen how to create a resource lock, what are going to see if we try to delete the resourcegroup, just to prove it works and also so we know what to look out for when we bump into one we didn’t expect to see

  • Azure Portal

Azure CLI

Azure PowerShell

As you can see the Resource Locks will stop you deleting the resource, which is nice. The errors messages are also nice and informative, so you know the resource is locked and at which scope the lock is placed. Which makes it easier to find the lock to remove it. Talking of removing locks:

Removing Azure Resource Locks

You can remove locks with any of the methods you can use to create them, so you’re free to mix and match how you do things.

  • Azure Portal
  • Azure CLI
az lock delete --name LockGroupNotDelete --resource-group Lock
  • Azure PowerShell
Remove-AzResourceLock -ResourceGroupName lock -LockName LockGroupNotDelete

Page 1 of 15

Powered by WordPress & Theme by Anders Norén