Stuart Moore

Musings of a Data professional

Stuart Moore

Page 2 of 13

Bulk uploading CSVs to Azure SQL Database with dbatools

PowerShellLike most people we’re busy moving ourselves over to Azure, and like a lot of people (even though they won’t admit it) we’ve got years of data stashed away in CSV files. Go on, own up there’s years worth of department membership stashed away in a HR csv folder somewhere in your organisation ๐Ÿ˜‰

To get some of this data usable for reporting we’re importing it into Azure SQL Database so people can start working their way through it, and we can fix up errors before we push it through into Azure Data Lake for mining. Being a fan of dbatools it was my first port of call for automating something like this.

Just to make life interesting, I want to add a time of creation field to the data to make tracking trends easier. As this information doesn’t actually exist in the CSV columns, I’m going to use LastWriteTime as a proxy for the creationtime.

$Files = Get-ChildItem \\server\HR\HandSTraining\Archive -Filter *.Csv
$SqlCredential = Get-Credential

ForEach ($File in $Files | Where-Object {$_.Length -gt 0}) {
    $InputObject = ConvertFrom-Csv -InputObject (Get-Content $File.fullname -raw) -Header UserName, StatusName
    $InputObject | Add-Member -MemberType NoteProperty -Value $File.LastWriteTime -Name DateAdded
    $DataTable = $InputObject | Out-DbaDataTable
    Write-DbaDataTable -SqlInstance superduper.database.windows.net -Database PreventPBI -Table Training -InputObject $DataTable -Schema dbo -SqlCredential $SqlCredential -RegularUser
    Remove-Variable InputObject
}

Working our way through that, we have:

$Files = Gci \\server\HR\HandSTraining\Archive -Filter *.Csv
$SqlCredential = Get-Credential

Setup the basics we’re going to need throughout. Grab all the csv files off of our network share. I prefer grabbing credentials with Get-Credential, but if you’d prefer to embed them in the script you can use:


We then ForEach through all the files, having filterer out the empty ones

    $InputObject = ConvertFrom-Csv -InputObject (Get-Content $File.fullname -raw) -Header UserName, StatusName
    $InputObject | Add-Member -MemberType NoteProperty -Value $File.LastWriteTime -Name DateAdded

Load the file contents into a object with ConverTo-Csv. These csv files don’t contain a header row so I’m use the -Header parameter to force them in. This also helps with Write-DbaDataTable as I can ensure that the object names match with the Sql column names for the upload

Then we add a new property to our Input Object. Doing it this way we add it to every ‘row’ in the object at once. If you want to add multiple new properties just keep doing this for each one.

    $DataTable = $InputObject | Out-DbaDataTable
    Write-DbaDataTable -SqlInstance superduperdb.database.windows.net -Database HealthAndSafety -Table Training -InputObject $DataTable -Schema dbo -SqlCredential $SqlCredential -RegularUser

Convert our InputObject into a datatable, which is the format Write-DbaDataTable needs for input.

And then the command that does the loading, Write-DbaDataTable. There are only things here that you have to do differently for loading to an Azure SQL database as opposed to a normal SQL Server instance. For Azure SQL Databases you have to use a SQL Credential as the underlying dlls don’t work (yet) with the various Integrate Authentication options. You need to use the RegularUser switch. Normally dbatools will assume you have sysadmin rights on your SQL Server instance as they are needed for many of the tasks. In an Azure SQL Database you can’t have those rights as they don’t exists, so without Regular user you’ll get a nice error message. Just something to look out for, I’ve tripped myself up in the past when repointing load scripts.

Then we drop InputObject and go round the loop again until we’re finished.

Easy and very quick, and now I can just point PowerBI at it and let the users and analysts work out what they want to do with it.

Changes to SQL Relay session selection process 2018

This year the SQL Relay committee has decided to make a couple of changes to our session selection procedure.

Our aim this year is to be as inclusive and transparent on our process as we can manage. To this end, this post is going to lay out our aims and how we plan to achieve them.

We want everyone to feel they can submit without worrying that their submission will be overlooked due to Gender, Disability, Availability to do the whole tour or any other reason you may be concerned about.

To do this we are moving to a blind selection process. I (Stuart) will be the only member of the committee who can see the list of submissions. At the end of the Call for Sessions I will release only the session abstracts to the rest of the committee members, who will then choose the sessions based only on the abstracts.

Then, should we have openings due to people’s availability we will go through again to fill in any holes in the agenda.

If you require any special requirements then please add them to your submission. They won’t be used during the selection process, but will allow us and our venues to ensure we cover them
The Call for Papers is available here – SQL Relay 2018 CfP

If you are worried about any other reasons you may feel that your submission may be overlooked and want to talk them through, then please contact me (comments, Contact Form or on Twitter (@napalmgram). Anything you ask will be treated in confidence.

If you would like some help with or an independent review your abstract then please get in touch with Speaking Mentors where some of the best SQL Session submitters can help you out.

How are SQL Server Log Sequence Numbers (LSNs) generated?

If you’ve ever dug down in the SQL Server transaction logs or had to build up restore chains, then you’ll have come across Log Sequence Numbers (LSNs). Ever wondered why they’re so large, why they all look suspiciously the same, why don’t they start from 0 and just how does SQL Server generate these LSNs? Well, here we’re going to take a look at them

Below we’ll go through examples of how to look inside the current transaction log, and backed up transaction logs. This will involve using some DBCC commands and the undocumented fn_dblog and fn_dump_dblog function. The last 2 are very handy for digging into SQL Server internals, but be wary about running them on a production system without understanding what’s going on. They can leave filehandles and processes behind that can impact on your system.

So with that warning out of the way, let’s push on

Read More

dbachecks – SQL Server compliance testing with simple configuration management

if you’ve not heard yet, the people behind the dbatools PowerShell module (including me) have a new toolset for you, dbachecks. dbachecks uses Pester to let you validate your SQL Server estate in a simple way and generate meaningful graphical reports. The official launch of the module is at SQL Bits 2018

Out of the box dbachecks uses test values that we’ve found to be the most appropriate from our years of experience with SQL Server. But these may not be the best values for your particular organisation. For example, we expect to see a Full backup less than 24 hours old for each database, in your case you might only take a Full backup once a week and use differential backups during the week. So we needed a flexible and simple system to let you change the values. dbatools friend Friedrich Weinmann (b | t) has written a great PowerShell module framework called PsFramework which we’ve integrated into dbachecks to handle the configuration of the tests

Out of the box dbachecks will load it’s default config values from the file .\internal\configurations\configuration.ps1 (#! link to GH). To see the current values use Get-DbcConfig:
Get-DbcConfig ouput

I’ve just picked the top few rows for the screenshot, as of writing (15/02/2018) there are 98 config options available. We’ve tried to make the configuration option names obvious. There are main sections of configuration values group:

  • app – Configuration for the module
  • domain – Configuration for authentication
  • mail – Configuration for sending reports
  • policy – The values for the tests
  • skip – Controls which tests should be excluded

And each has a useful desription as well to make it easy to find. In the screenshot you’ll notice that I’ve configured the mail.* options to suit my test environment. So now I can use Send-DbcMailMessage to email my test results without having to specify all the parameters. To set a configuration parameter you use the Set-DbcConfig command:

Set-DbcConfig -Name mail.subject -Value "DbcChecks report"

Set-DbcConfig example

Or perhaps as mentioned at the start of this post, you don’t want a Full backup in the last day, but in the last 7 days. You can easily configure that:

Set-DbcConfig -Name policy.backupfullmaxdays -Value 7

And now your tests will be checking you’re never more than 7 days from a full backup. By default we’re checking that you’ve taken a Differential backup in the last 25 hours, so you’re now good to go!

And don’t worry, you don’t need to reset these every time you run the tests. The PsFramework module persists these non default sessions in the Windows registry at:

Computer\HKEY_CURRENT_USER\Software\Microsoft\WindowsPowerShell\PSFramework\Config\Default

So on my system we can see:
DbaChecks registry storage

Now what happens if you want to distribute these config changes to multiple servers, or share them with colleagues to make sure you’re all singing the same tune. We’d strongly recommend you don’t modify the .\internal\configurations\configuration.ps1 file directly, as that will be replaced whenever you update the module. To make your life easier (which is the entire point of the module) we’ve included Export and import functionality. To export the configuration to an easy to parse JSON file you simply run:

Export-DbcConfig -Path c:\path\of\yourchoice\filename.json

If you omit the path value then by default we will export the results to $script:localapp\config.json. Now you’ve got a simple JSON file it’s easy to source control (you are using source control aren’t you?) to keep track of changes and make sure they were implemented as expected. If you want to apply that configuration to another install, then it’s simply a case of running the import command:

Import-DbcConfig -Path c:\path\of\yourchoice\filename.json

and you’re done. We support UNC, so again it’s simple to have a central repository to apply the same configuration. And it works well with just fragments of configuration as well, so if all you wanted to control were the email settings you can create a JSON file like this:

[
    {
        "Name":  "mail.failurethreshhold",
        "Value":  0,
        "Description":  "Number of errors that must be present to generate an email report"
    },
    {
        "Name":  "mail.from",
        "Value":  "null@stuart-moore.com",
        "Description":  "Email address the email reports should come from"
    },
    {
        "Name":  "mail.smtpserver",
        "Value":  "smtp.stuart-moore.com",
        "Description":  "Store the name of the smtp server to send email reports"
    },
    {
        "Name":  "mail.subject",
        "Value":  "DbcChecks report from stuart",
        "Description":  "Subject line of the email report"
    },
    {
        "Name":  "mail.to",
        "Value":  "SqlReports@stuart-moore.com",
        "Description":  "Email address to send the report to"
    }
]

Then you can import just this snippet to set the configuration for those options. Makes it easy to seperate out enterprise level configuration changes from the actual SQL test options. The same technique can be used to ‘force’ the correct backup testing parameters in all cases, while letting other tests be customised as needed.

Running simple PowerShell commands against multiple servers with a timeout

PowerShellMany years ago I wrote a post on passing functions into Start-Job (Calling a PowerShell function in a Start-Job script block when itโ€™s defined in the same script). Over the years I’ve had a number of emails from people asking about how to use it to fix their own situation. In many of these cases there’s been a simpler way to achieve what they wanted to do.

Sometimes a non obvious solution using PowerShell basics can be much simpler. Patrick posted up that he was having problems running a command against a list of servers and wanting to be able to skip those that time out. Using an injected function for this was a bit of overkill. I had a premade solution I use, which relies on Invoke-Command and is simpler:

$session = New-PSSession -ComputerName Server1, server2,server3 -ErrorAction SilentlyContinue
$ServiceName = 'MSSQL`$'
$timeout = 10
$remote = Invoke-Command -Session $session -ScriptBlock {Get-Service -Name *$using:servicename*} -AsJob 
$remote | Wait-Job  -Timeout $timeout
$output = $remote | Receive-Job 
$output | ForEach {"$($_.Name) on $($_.PsComputerName) is $($_.Status)"}

We setup new PsSessions using New-PsSession, I set ErrorAction to SilentlyContinue just in case a host isn’t available for some reason (if I was being good I’d try/catch here).

As we’re just using PS standard functionality here with Get-Service there’s no need to build a a new function, we can just call this directly here. By calling Invoke-Command against a session pointed at numerous hosts we can PowerShell handle all the connection management here and just assume the command will be ran against each host. if we were running against a lot of hosts then we would want to look into using the -ThrottleLimit parameter to limit the number of concurrent hosts we’re hitting. The one little trick here is using the using scope modifier here so PS pulls in the variable defined in our main scope (gory details on scoping here

As we called Get-Service with the -AsJob switch we can now treat it as a job, so we can use the PS jobs cmdlets to manage it. The first thing we want to do is skip those jobs that are taking longer than out specified timeout value (in this case 10 seconds). So we pass out Invoke-Command job into Wait-Job with a Timeout parameter. PS will now keep an eye on each job and drop those exceeding our timeout limit.

Once we’ve gotten all the jobs that met our timelimit we grab the ouput using Receive-Job, and then just process it like any other PowerShell object

As we’ve removed a lot of complexity it’s much easier to revise or reuse the framework at a later date.

T-SQL Tuesday #97 – What I’ll be working to improve in 2018

T-SQL TuesdayThis month Mala has asked us to post about our Learning Goals for 2018. Now, there’s nothing like nailing your colours to the mast so you can come back in a year and see if you actually did something (though let’s not talk about my 2017 exercise plans!), so here goes:

Tech Learning Goals

Networking

I read TCP Illustrated years ago, so apart from IPv6 I’m pretty solid on the fundamentals. but the increasing number of ways these can be manipulated and managed has ballooned. And no matter where your data and applications are going to live in the future it’s still going to be essential to know how they can be connected. So I’m definitely pencilling in some time with our Network Infrastructure team to get an overview of what’s out there, and what they see as the future. I know I can bingle that sort of stuff, but it’s even better to get some professional advice

Then I’ll be hitting Pluralsight and other video and looking forward to building some very different labs to the ones I’m used to. For once the servers and the data won’t be the focus.

The aim is to sit in a meeting with our network team and keep up with everything! Though I might draw the line at memorising the Cisco model numbers like the seem to have

More Automation

I’ve increasingly automated a lot of my repetitive work tasks. But I need to automate a lot of the routine work that’s come in a filled up my ‘spare’ time. Now I’m as guilty as the next IT Pro about sticking to the easy fix for automation and sticking to technologies I know. Well, for 2018 I’m going to start branching out and make the automation to force me to learn some new technologies. So rather than sticking to PowerShell I’ll be looking at Python and Go in more depth, looking at more serverless Cloud technologies for data transformation and endpoints rather than spinning up more VMs or containers and trying to let other Cloud services take the strain rather than me.

If I get this right and make progress, then the other things on this list will come to pass. Freeing up time to work on the good stuff will be the biggest pay off.

New Personal skill

Improve my Writing

I’m not a natural writer by any stretch of the imagination, my natural inclination is numbers, equations and strange pencil diagrams that only I can decode. This tends to limit me when writing documentation, and blog posts take a disproportionate amount of time to write.

So I’m going to head to the Nottingham library to get some books on writing. And to make sure I put it into practice, I’m going to be keeping a journal from January 1st (don’t worry, it won’t be online!). Hopefully forcing myself to write everyday without an audience will help me with writing for an audience. So I’ve 2 weeks to find a nice pen and notebook to try and inspire my inner Samuel Pepys!

New non tech skill

Get better at marketing myself

In the classic British manner I’m happy to hid my lamp under a bushel. So for next year I’m going to try to put myself out there. I’m hoping this will be help along with some from improving my writing as above. I’m going to make more use of asking other tech friends to review session submissions with useful feedback. Asking for feedback from organisers on my submissions, including when I succeed so I can build on good things as well as fixing the bad.

And hoping that the increased blog posting from my better writing skills means I’ll have more content to share here, and from the new tech skills more interesting things to write about.

The metrics for this one will be the number of events I present at in 2018, the number of visitors to my blog.

Conclusions

To me that looks like a good solid set of skills to work on. They complement each other, so an improvement in one should help with another one. They can also be worked on at different rates, which stops things getting stressful if something has to be put aside due to other pressures.

pywinrm install fail – error in cryptography setup command: Invalid environment marker: python_version < '3'

I’ve been playing around with Ansible again for an incoming project. It involves Windows, DSC and SQL Server as well, so I needed WinRM available to do anything. Since I last played with Ansible installing pywinrm seems to have got a lot trickier. I followed a number of online guides, and kept banging up against this error:

Error

Downloading/unpacking cryptography>=1.3 (from requests-ntlm>=0.3.0->pywinrm)
  Downloading cryptography-2.1.4.tar.gz (441kB): 441kB downloaded
  Running setup.py (path:/tmp/pip_build_root/cryptography/setup.py) egg_info for package cryptography
    error in cryptography setup command: Invalid environment marker: python_version < '3'
    Complete output from command python setup.py egg_info:
    error in cryptography setup command: Invalid environment marker: python_version < '3'

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_root/cryptography
Storing debug log for failure in /root/.pip/pip.log

This is on the Ubuntu/trusty64 image I’m using to build my control node with Packer.

With a couple of hours of bingling around I tracked it down to an issue with python’s setuptools package, which then meant an upgrade was needed to pip itself. The set of steps needed in the end is:

Solution

sudo apt-get install libffi-dev libssl-dev -y
sudo apt-get install python-pip -y
sudo -H pip install --upgrade setuptools
sudo -H pip install --upgrade pip
sudo -H pip install pywinrm
sudo -H pip install ansible

And this is now giving me a solid install each time, apart from a couple of deprecated warnings that aren’t anything that are going to affect my needs for a while.

Hopefully this will save someone else a day running around various github and support sites.

Nottingham and East Midlands PowerShell User Group

PowerShellInterested in any form of PowerShell usage and based around the East Midlands and Nottingham? Then this could be the group for you. We’re looking to cover anything that uses (or can be used with) PowerShell. So topics that are fair game include:

  • AD Management
  • Scripting
  • Source Control
  • DevOps
  • Azure/AWS/Cloud Provider of your choice
  • Exchange management
  • SQL Server
  • Pester testing
  • Continuous Integration
  • .Net Internals
  • Generally anything that would interest someone using PowerShell or give their career a boost!

Nothing is set in stone yet as we want to get some feedback from potential members. There’s a date booked for the 8th Febuary (Kick Off meeting), but what happens is up to you.

Would you prefer a traditional usergroup with booked speakers given presentations in a formal setting, or something more informal like a roundtable/whiteboard sessions? Or perhaps half and half?

We’d love to know what you’d like to see or learn. So please either drop a comment below or sign up for the Kick Off Meeting and give us feedback on Meetup.

Complex SQL Server restore scenarios with the dbatools Restore-DbaDatabase pipeline

dbatools logoNo matter how hard the dbatools team try, there’s always someone who wants to do things we’d never thought. This is one of the great things with getting feedback direct from a great community. Unfortunately a lot of these ideas are either too niche to implement, or would be a lot of complex code for a single use case

As part of the Restore-DbaDatabase stack rewrite, I wanted to do make things easier for users to be able to get their hands dirty within the Restore stack. Not necessarily needing to dive into the core code and the world of Github Pull Requests, but by manipulating the data flowing through the pipeline using standard PowerShell techniques All the while being able to do the heavy listing with out code.

So, below the fold we’ll be looking at some examples of how you can start going to town with your restores

Read More

Deep copy arrays in PowerShell

PowerShellJust a quick post for something I had to deal with for another post I’m writing. I wanted to do some work where I’d reuse a base array over a number of passes in a loop, and didn’t want any change made impacting on later iterations. If you’ve tried copying an array containing arrays or other objects in PowerShell in the usual manner then you’ll have come across this problem:

Copying array of arrays fails in powerShell

As is common in .Net, the copy is ‘Copy by Reference’ so you don’t get a nice shiny new independent array to play with. All that’s copied is the references to the original’s place in memory. Therefore any changes to either object affects both, as both variable names are looking at the same piece of memory. This is nice and efficient in terms of storage and speed of copying, but not great for my purposes.

There are various workarounds kicking around if you’re using simple arrays, but they tend to breakdown when you’ve got arrays that contain arrays or other PowerShell objects. My method for copying them is a little down and dirty, but it works 95% of the time for what I want. The trick is to Serialize the object, and then DeSerialize it into the new one:

Deep Copy Powershell array using serialization

And, voila we have the outcome I wanted. Just to make the line of code easier to read, here it is:

$arr3 = [Management.Automation.PSSerializer]::DeSerialize([Management.Automation.PSSerializer]::Serialize($arr2))

Now, I mentioned up front that this works ~95% of the time. The times it doesn’t work for me are when the underlying object type doesn’t serialize nicely. The most common one I come across is BigInt. This ‘deserializes’ back in as a non integer type and then won’t play nice when compared to other ‘real’ BigInt value, so make sure to check you have the values you think you should do

Page 2 of 13

Powered by WordPress & Theme by Anders Norén