Musings of a Data professional

Stuart Moore

Month: April 2020

Please clean down your SQL Server backup history

From looking around at some of the Issues that get logged for dbatools, it looks like people never clear out their SQL backup history from msdb. This causes slow queries and potentially misleading backup trees to be returned.

In my (not so humble) opinion you really shouldn’t let your backup history fill up with, let’s be honest, useless junk. In this post I’ll go through why I think you should be keeping way less in there than people think you should

It’s not an audit trail

A common reason people give for keeping years worth of backup history records is that they need it to show the auditors. I’ve yet to find an auditor who wants a raw SQL database of database to look at.

The options I’ve used in the past are to use:

  • Export it to Excel or CSV on a regular basis. Easier to hand off to an auditor, compresses well and get’s it out of your database
  • Monitoring system records. You are monitoring your backups? In that case, you should have record of no failed jobs.

Please check what your auditors are actually asking for and why. Unless they’re also asking for verification and restore test results pure backup history is worthless. Because as a number of people have said over the years (in various ways):

An untested backup is just a wish that may not be granted

Me, probably

Will you ever go that far back?

So you’ve got 2 years worth of SQL Server backup history tucked away in msdb. Now, be honest, are you ever going to use it? Really, you’re going to roll a database back 6 months? Why?

That would be a hell of a lot of data to throw away to /dev/null or have to import all over again. Would there ever be a business case for doing that?

And once you’ve restored it, that leads on to…..

Once it’s restored, it’s a new database

You restore a SQL Server database and you start 2 new things:

  • A new Recovery Fork
  • A new LSN chain

All databases are running along a Recovery Fork (identified by a RecoveryForkGUID), even a new one will have one. You perform a restore, and SQL Server will generate a new Recovery Fork under the covers. Much like Ghostbusters, SQL Server won’t let you cross streams

As covered elsewhere on here, SQL Server uses LSNs (Log Sequence Numbers) to track database actions, and as backups are database actions they are also linked to LSNs. You’ll be taking the database back to an LSN in the past, and restarting the count from there. This is why SQL Server uses the recovery fork. Even if you were lucky enough to have backups starting on the same LSN (and it’s the sort of luck where I’d be asking you for the lottery numbers) you still won’t be able to use the backup.

So if all the backups are still in the backup history they’re useless, and potentially very confusing when you come to do a restore

Can you even restore it?

If you’re storing 2 years of old backup history, are you also storing the old backups as well? If you don’t have the backups to restore, again the history is worthless and might as well be deleted

I’m pretty hot on backups, but even I don’t keep things that long. Currently we’ll keep everything for 2 weeks, then we’ll get rid of Transaction Log and Differential backups. After 4 weeks we’ll only keep the Full backup from the last day of the month, and then at the end of the year we’ll only keep the last Full backup

So again, why keep the history if I can’t use it to restore most of the databases it contains?

Do you need that granularity?

As mentioned in the last reason you may end up with just a few backups kicking around that may need restoring for investigations or legal reasons. Do you need the backup history to restore those?

With tools like Restore-DbaDatabase I’d say no. While generating the restore chain from a folder full of backups may take a while, for a single database backup you’re only looking at seconds to scan the file(s) and be restoring. Probably quicker than trying to find the records in a bloated backup history table 😉

So what do I suggest

I’d suggest really looking at why you’re storing all that probably useless backup history.

Personally I consider anything more that a months worth as being too much, you (and your boss) may think that’s a bit tight. But I can’t ever see us needing to roll backup to a point in time over a month ago. Most of the times we do need to, the end of the month will be good enough.

So, to clear down the backup history, schedule sp_delete_backuphistory to run once a week like so:

use msdb
go

declare @oldest DATETIME  = Getdate()-28;
exec sp_delete_backuphistory @oldest_date=@oldest;

This will get rid of ALL backup history over 4 weeks old from ALL databases. This will work on SQL Server from 2005 upwards

Or you can enable the ‘History Cleanup Task’ in your maintenance plans.

Or if you’re using Ola’s great SQL Server maintenance scripts they’ll install a sp_delete_backuphistory job for you

So, go on. Get those deletions automated and make your life easier 🙂

Feedback

Obviously, all of the above is just my opinion and thoughts on best practice I’ve picked up over the years. You may have a need to keep years of backup history, and I’d love to hear what it is.

dbaSecurityScan update

As mentioned before, I’ve started working on dbaSecurityScan, a PowerShell module for tracking changes to a databases security model and bring the database back in to line.

Making the most of not being able to leave the house for very long I’ve been cracking along on getting the basics in place. Unlike a PowerShell module like dbatools, this one doesn’t break down nicely into small lumps so it’s likely that new features will be added in big lumps until we hit a critical mass.

So, I’ve just pushed up a nice big change that shows some of the core features working properly. In the current version we have:

  • Creating a database security config from an existing database for the following security principals
    • Users
    • Objects
    • Roles
    • Schemas
  • Test the database security config against a database
  • Take the test results and generate a list of actions that need to be undertaken to bring the database back inline
  • Run a ‘WhatIf ‘ run of the fixes to generate a list of actions to be undertaken
  • Apply the fixes to bring a database back in line, with a choice of running all actions, just those that add/grant security measures or just those that drop/revoke security measures

Some code to show how those features currently work:

On the to do list for the next couple of weeks of lockdown are:

  • Add fixing functions for roles and objects
  • Add Pipeline support
  • Add better comment based help
  • Allow comparison of config documents

Please get in touch if you’ve got any ideas for features. If you want to lend a hand then I’m happy to take Pull Request over at https://dbasecurityscan.io/.

Please also report any bugs/issues over on github as well.

Data Platform Discovery Day 2020 Agenda available

After sifting through a huge heap of great session submission we finally got it narrowed down to 10 sessions for the US day and the European day.

We’ve got some really top quality speakers and sessions for you. We’ve got introductions to Power BI, Azure Data Factor, Containers, SQL Queries and using Azure Notebooks for Data Science

To have a look for yourself the agendas are here:

Data Platform Discovery Day 2020 – US Agenda

Data Platform Discovery Day 2020 – European Agenda

You can attend one, or both if you don’t mind a very early morning or very late night. Registration is for the entire day, but you can drop in and out for the sessions you want to see.

To register head over to these links:

Data Platform Discovery Day – US Registration

Data Platform Discovery Day – European Registration

More details are available from the event website – Data Platform Discover Day website

Custom Teams Backgrounds on Mac OS

With the current pandemic enforced usage of Teams over chatting at the water machine I wanted a way to hide the background.

My wife’s a nurse, so the home ‘office’ is hers for patient confidentiality atm, which means I’m nomading around the house. This means my background varies a lot, so having something different than an untidy shelf of books, the kitchen counter or a blinding window glare behind me would be nice

There’s plenty of windows hints on having a custom Teams background on Windows, but not so much about having them on a Mac. With a bit of digging it turn out that if you want a custom Teams background on your Mac it’s quite simple after all.

All you need to do is to drop a file into:

~/Library/Application Support/Microsoft/Teams/Backgrounds/Uploads

If you’re not used to *nix paths, the ~ translates to your home directory. So for me on my Macbook Pro with the username stuart, that ends up being

/Users/stuart/Library/Application Support/Microsoft/Teams/Backgrounds/Uploads

Just drop the images you want Teams to use as a custom background into that folder and they’ll show up in the gallery.

It seems Teams isn’t picky about which file format you use for a custom background. I’ve tried png, jpeg and gif, and they all work fine. Unfortunately animated gifs don’t work, which is both a good and a bad thing depending on your personal love of gifs

Announcing Data Platform Discovery Day

With a number of the large in person Data Platform conferences and meetups having to cancel or take a hiatus due to the current Covid-19 outbreak, Matt Gordon (b | t) and myself decided to do something about it.

So we came up with Data Platform Discovery Day. A 2 day online event aimed at those people who are looking for a start in the Data Platform world or are wanting to learn some fundamentals about an aspect of it.

Each day with run alongside business hours across a different continent, and will feature ten 50 minute sessions. Each day will have different speakers, and if you want there’s nothing to stop you attending both if you fancy an early morning or late night

Day 1 is running during US daytime on 29th April. The first session will begin at 9am Eastern Time, and then a new session will being at the top of each hour

Day is running during European daytime on 30th April. The first session will begin at 9am BST (UTC+1), again with a new session starting at the top of each hour.

Registration for the events with start once we’ve finalised the speakers.

And talking of speakers, the Call for Papers for both events are open. You can submit to both events if you want. Please remember that sessions are 50 minutes, and content should ideally be Level 100 material.

US Data Platform Discovery Day call for papers

European Data Platform Discover Day call for papers

We’d love to encourage first time speakers as well, so if you’d like ask any question then please get in touch with me (twitter) or Matt (twitter)

Powered by WordPress & Theme by Anders Norén