Musings of a Data professional

Stuart Moore

Month: April 2013

Simple plot of perfmon data in R

In the last part (here) we setup a simple R install so we could look at analysing and plotting perfmon data in R. In this post we’ll look about creating a very simple plot from a perfmon CSV. In later posts I’ll show some examples of how to clean the data up, to pull it from a SQL Server repository, combine datasets for analysis and some of the other interesting things R lets you do.

So lets start off with some perfmon data. Here’s a CSV (R-perfmon) that contains the following counters:

  • Physical Disk C:\ Average Disk Queue Length
  • Physical Disk C:\ Average Disk Read Queue Length
  • Physical Disk C:\ Average Disk Write Queue Length
  • % Processor Time
  • Processes
  • Processor Queue Length

Perfmon was set to capture data every 15 seconds.

Save this to somewhere. For the purposes of the scripts I’m using I’ll assume you’ve put it in the folder c:\R-perfmon.

Fire up your R environment of choice, I’ll be using R Studio. So opening a new instance, and I’m granted by a clean workspace:

On the left hand side I’ve the R console where I’ll be entering the commands and on the right various panes that left me explore the data and graphs I’ve created.

As mentioned before R is a command line language, it’s also cAse Sensitive. So if you get any strange errors while running through this example it’s probably worth checking exactly what you’ve typed. If you do make a mistake you can use the cursor keys to scroll back through commands, and then edit the mistake.

So the first thing we need to do is to install some packages, Packages are a means of extending R’s capabilities. The 2 we’re going to install are ggplot2 which is a graphing library and reshape2 which is a library that allows us to reshape the data (basically a Pivot in SQL Server terms). We do this with the following command:

install.packages(c("ggplot2","reshape2"))

You may be asked to pick a CRAN mirror, select the one closest to you and it’ll be fine. Assuming everything goes fine you should be informed that the packages have been installed, so they’ll now be available the next time you use R. To load them into your current session, you use the commands:

library("ggplot2")
library("reshape2")

So that’s all the basic housekeeping out of the way, now lets load in some Perfmon data. R handles data as vectors, or dataframes. As we have multiple rows and columns of data we’ll be loading it into a dataframe.

data <-read.table("C:\\R-perfmon\\R-perfmon.csv",sep=",",header=TRUE)

IF everything’s worked, you’ll see no response. What we’ve done is to tell R to read the data from our file, telling it we’re using , as the seperator and that the first row contains the column headers. R using the ‘<-‘ as an assignment operator.

To prove that we’ve loaded up some data we can ask R to provide a summary:

summary(data)
    X.PDH.CSV.4.0...GMT.Daylight.Time...60.
 04/15/2013 00:00:19.279:   1
 04/15/2013 00:00:34.279:   1
 04/15/2013 00:00:49.275:   1
 04/15/2013 00:01:04.284:   1
 04/15/2013 00:01:19.279:   1
 04/15/2013 00:01:34.275:   1
 (Other)                :5754
 X..testdb1.PhysicalDisk.0.C...Avg..Disk.Queue.Length
 Min.   :0.000854
 1st Qu.:0.008704
 Median :0.015553
 Mean   :0.037395
 3rd Qu.:0.027358
 Max.   :4.780562

 X..testdb1.PhysicalDisk.0.C...Avg..Disk.Read.Queue.Length
 Min.   :0.000000
 1st Qu.:0.000000
 Median :0.000980
 Mean   :0.017626
 3rd Qu.:0.003049
 Max.   :4.742742

 X..testdb1.PhysicalDisk.0.C...Avg..Disk.Write.Queue.Length
 Min.   :0.0008539
 1st Qu.:0.0076752
 Median :0.0133689
 Mean   :0.0197690
 3rd Qu.:0.0219051
 Max.   :2.7119064

 X..testdb1.Processor._Total....Processor.Time X..testdb1.System.Processes
 Min.   :  0.567                               Min.   : 77.0
 1st Qu.:  7.479                               1st Qu.: 82.0
 Median : 25.589                               Median : 85.0
 Mean   : 25.517                               Mean   : 87.1
 3rd Qu.: 38.420                               3rd Qu.: 92.0
 Max.   :100.000                               Max.   :110.0

 X..testdb1.System.Processor.Queue.Length
 Min.   : 0.0000
 1st Qu.: 0.0000
 Median : 0.0000
 Mean   : 0.6523
 3rd Qu.: 0.0000
 Max.   :58.0000

And there we are, some nice raw data there. Some interesting statistical information given for free as well. Looking at it we can see that our maximum Disk queue lengths aren’t anything to worry about, and even though our Processor peaks at 100% utilisation, we can see that it spends 75% of the day at less the 39% utilisation. And we can see that our Average queue length is nothing to worry about.

But lets get on with the graphing. At the moment R doesn’t know that column 1 contains DateTime information, and the names of the columns are rather less than useful. To fix this we do:

cname<-c("Time","Avg Disk Queue Length","Avg Disk Read Queue Length","Avg Disk Write Queue Length","Total Processor Time%","System Processes","System Process Queue Length")
colnames(data)<-cname
data$Time<-as.POSIXct(data$Time, format='%m/%d/%Y %H:%M:%S')
mdata<-melt(data=data,id.vars="Time")

First we build up an R vector of the column names we’d rather use, “c” is the constructor to let R know that the data that follows is to interpreted as vector. Then we pass this vector as an input to the colnames function that renames our dataframe’s columns for us.

On line 3 we convert the Time column to a datetime format using the POSIXct function and passing in a formatting string.

Line 4, we melt our data. Basically we’re turning our data from this:

Time Variable A Variable B Variable C
19/04/2013 14:55:15 A1 B2 C9
19/04/2013 14:55:30 A2 B2 C8
19/04/2013 14:55:45 A3 B2 C7

to this:

ID Variable Value
19/04/2013 14:55:15 Variable A A1
19/04/2013 14:55:30 Variable A A2
19/04/2013 14:55:45 Variable A A3
19/04/2013 14:55:15 Variable B B2
19/04/2013 14:55:30 Variable B B2
19/04/2013 14:55:45 Variable B B2
19/04/2013 14:55:15 Variable C C9
19/04/2013 14:55:30 Variable C C8
19/04/2013 14:55:45 Variable C C7

This allows to very quickly plot all the variables and their values against time without having to specify each series

ggplot(data=mdata,aes(x=Time,y=value,colour=variable))+
geom_point(size=.2) +
stat_smooth() +
theme_bw()

This snippet introduces a new R technique. By putting + at the end of the line you let R know that the command is spilling over to another line. Which makes complex commands like this easier to read and edit. Breaking it down line by line:

  1. tells R that we want to use ggplot to draw a graph. The data parameter tells ggplot which dataframe we want to plot. aes lets us pass in aesthetic information, in this case we tell it that we want Time along the x axis, the value of the variable on the y access, and to group/colour the values by variable
  2. tells ggplot how large we’d like the data points on the graph.
  3. This tells we want ggplot to draw a best fit curve through the data points.
  4. Telling ggplot which theme we’d like it to use. This is a simple black and white theme.

Run this and you’ll get:

Perfmon data plotted on graph using R

The “banding” in the data points for the System processes count is due to the few discrete values that the variable takes.

The grey banding around the fitted line is a 95% confidence interval. The wider it is the greater variance of values there are at that point, but in this example it’s fairly tight to the plot so you can assume that it’s a good fit to the data.

We can see from the plot that the Server is ticking over first thing in the morning then we see an increases in load as people start getting into the office from 07:00 onwards. Appears to be a drop off over lunch as users head away from their clients, and then picks up through the afternoon. Load stays high through the rest of the day, and in this example it ends higher than it started as there’s an overnight import running that evening, though if you hadn’t know that this plot would have probably raised questions and you’d have investigate. Looking at the graph it appears that all the load indicators (Processor time% and System Processes) follow each other nicely, later in this series we’ll look at analysing which one actually leads the other

You can also plot the graph without all the data points for a cleaner look:

ggplot(data=mdata,aes(x=Time,y=value,colour=variable))+
+ stat_smooth() +
+ theme_bw()

R plot of perfmon without points

So putting all that together we have the following script which can just be cut and pasted across into R:

install.packages(c("ggplot2","reshape2"))
library("ggplot2")
library("reshape2")
data <-read.table("C:\\R-perfmon\\R-perfmon.csv",sep=",",header=TRUE)
cname<-c("Time","Avg Disk Queue Length","Avg Disk Read Queue Length","Avg Disk Write Queue Length","Total Processor Time%","System Processes","System Process Queue Length")
colnames(data)<-cname
data$Time<-as.POSIXct(data$Time, format='%m/%d/%Y %H:%M:%S')
mdata<-melt(data=data,id.vars="Time")
ggplot(data=mdata,aes(x=Time,y=value,colour=variable))+
geom_point(size=.2) +
stat_smooth() +
theme_bw()

Plotting/Graphing Perfmon data in R – Part 1

Perfmon is a great tool for getting lots of performance information about Microsoft products. But once you’ve got it it’s not always the easiest to work with, or present to management. As a techy you want to be able to use the data to compare systems across time, or even to cross corellate performance between seperate systems.

And management want graphs. Well, we all want graphs. For many of my large systems I now know the shape of graph that plotting CPU against time should give me, or the amount of through put per hour, and it only takes me a few seconds to spot when something isn’t right rather than poring through reams of data.

Perfmon data plotted on graph using R

There are a number of ways of doing this, but my preferred method is to use a tool called R. R is a free programming language aimed at Statistical analysis, this means it’s pefectly suited to working with the sort of data that comes out of perfmon, and it’s power means that when you want to create baselines or compare days it’s very useful. It’s also very easy to write simple scripts in R, so all the work can be done automatically each night.

It’s also one of the tools of choice for statistical analysis in the Big Data game, so getting familiar with it is a handy CV point for SQL Server DBAs.

Setting R up is fairly straight forward, but I though I’d cover it here just so the next parts of this series follow on.

The main R application is available from the CRAN (Comprehensive R Archive Network) repository – http://cran.r-project.org/mirrors.html From there, pick a local mirror, Click the Download for Windows link, select the “base” subdirectory, and then “Download R 3.0.0 for Windows” (version is correct at time of writing 18/04/2013).

You now have an installer, double click on it and accept all the defaults. If you’re running a 64-bit OS then it will by default install both the 32-bit and 64-bit versions. You’ll now have a R folder on your Start Menu, and if you select the you’ll end up at the R console:

R console on Windows

R is a command line language out of the box. You can do everything from within the console, but it’s not always the nicest place. So the next piece of software to obtain and install is RStudio, which can be downloaded from here, and installed using the defaults. This is a nicer place to work:

R studio analysing perfmon data

You should now have a working R setup. In the next part we’ll load up some perfmon data and do some work with it.

Nottingham and Beeston Pub Recommendations for SQLBits

SQLBits-beer

OK, so I thought it might be handy for SQLBits 2013 attendees to have some guidance to the better pubs in Nottingham and Beeston.

My idea of a better pub is a good wide range of well kept beer, plenty of room, friendly knowledgable staff, food available and somewhere I wouldn’t mind spending a night sitting chatting even if I wasn’t drinking their beer

If you’re staying at the East Midlands Conference Center, then you are on the Beeston side of town. There’s some very good pubs that side, but they are about a 30 minute walk from the conference centre or a 10 minute taxi ride (probably about £6 each way). If you do want to venture out, then the best 2 are:

The Victoria Hotel

Right behind Beeston train station. Great pub with a wide range of ales, ciders and whiskeys. Also has very good food. Can get quite busy, though has a large covered outside area with heating.

The Crown

Twice winner of the East Midlands CAMRA pub of the year, and pretty deservedly so.

If you’re staying in Nottingham City centre then there’s no end of choice. My personal favourites are:

Kean’s Head

Smallish pub in the Lace Market area, by St Mary’s church. Can get very busy at office closing time, but thins out once they’ve had a pint. Good range of ales on, and a lot of bottles as well. Plenty of European lagers. Good food as well.

Lincolnshire Poacher

Little bit out of town up Mansfield Road, but worth the trip (the Golden Fleece on the way up can be worth stopping for one if you need a breather). Another former CAMRA pub of the year winner (we seem to have quite a few in Nottingham). Always well kept beers, and a huge whisky selection. Another pub with a very good food menu. And it is holding a mini beer festival the weekend of SQLBits so there’ll be even more choice than usual.

The Round House

Built into what remain of the old Nottingham City Centre hospital. Unusual in that I’ve never been in a completely circular pub anywhere else. Good range of beers,  good tasting platters (3x 1/3 pints plus bar snacks). Food OK, but standard pub options done well rather than anything exciting

Hand and Heart

A little wander up Derby Road past the cathedral, but worth it for a pub built back into the caves of Nottingham. Always well kept beers with a good range of guests. Restaurant style eating available.

Nottingham CAMRA keeps a good list of other pubs as well. And while SQLBits is on, they’ll be hosting their annual Mild in May event, where a larger number of pubs will be making sure to carry some of the finer examples of this beer type

If you want any other recommendation, or something close to a particular hotel, just drop me a line and I’m sure I can come up with something for you.

T-SQL Tuesday #41 – How I learnt to love presenting, got bored, and then rekindled the flame

T-SQL Tuesday

Bob Pusateri (@sqlbob) is hosting this months T-SQL Tuesday, and the question is about why you love presenting.

This is the story of how I cam to love presenting, how we went through a rocky patch, but patched it up in the end.

Years ago I was getting swamped by work. Everywhere I looked there seemed to be another call coming in, another request for something to be done, another bit of housekeeping that was begging to get done, all of it landing on me and swamping me. End result,an unhappy DBA who didn’t feel as though he was developing any new skills or progressing his career.

Then I realised, hang on I’m part of a team, why aren’t we sharing? So at a rare team meeting I asked around, and it turned out we all felt the same. While we all wrote our mandated documentation, no one really felt they could delve into some else’s domain. So we decided the best way was to do presentations across the team, cross training each other into our areas.

This was a revelation. Suddenly people were interested in helping out, documentation didn’t seem quite so much of a chore as there was a real point to it. Team meetings changed from a dull routine to satisfy management into something to look forward to as a chance to learn something new. Everyone in the team learnt some new skills, and we became a much more efficient and cohesive team. And because we were efficient we had more time to learn new things for ourselves, so everyone was a winner. And I discovered I loved presenting. Seeing the light break as someone grasped a new concept,  having to approach things from a different angle, having to break down concepts to there simplest levels to make sure I understood them; all of this was great

But soon the shine wore off a bit. We’d passed on all the work related information, and the excitement of presenting had warn off a bit as I knew my  audience to well. With every presentation I knew the level to target, and even the best way to address a topic to specific team members. So me and my presenting stumbled around in the doldrums for a bit. I’d occasionally get excited when I learnt something new to me, but the thrill just wasn’t there anymore…..

I’d been attending SQL User groups for a year or so, and loved seeing some of the top SQL Server gurus from around the world presenting to small groups and having to deal with such a wide range of audience knowledge and engagement. I mean, only a short slot to present something technical, to a group you’ve never met, when you’ve no idea if there’s going to be an expert in the crowd ready to pick you up on something, where there could be a large section who have no interest in your topic but you still need to win them over, this looked like just my sort of gig.

I’ve now presented at 2 SQL Server user groups. And each time I got that same feeling from years ago. The flame is back, and I’m constantly thinking about how to improve my presentations, or how I can build one out of things I’m currently working on. I’m also paying more attention to speakers presenting skills than I did before, trying to work out if there’s anything I could ‘borrow’ to improve my skills. And I dig deeper into topics, because I want to be able to explain each oddity, or be ready for the off kilter question from the audience.

In fact most of my training for the coming year is based around becoming a better presenter and teacher. I want to do more User group presentations, have submitted to a couple of SQL Saturday events, and want to try to do some of the larger conferences next year. I’m also working on an MCT, and am even considering taking some speaking classes

So go on, get up on the stage. Presenting, whether to colleagues or a User Group could be just what you need to perk up your career or rekindle your passion for your job.

Migrating perfmon CSV files into SQL Server for analysis

Up to a certain size and for quick sampling using perfmon data in a CSV file makes life easy. Easily pulled into Excel, dumped into a pivot table and then analysed and distributed.

But once they get up to a decent size, or you’re working with multiple files or you want to correlate multiple logs from different servers, then it makes sense to migrate them into SQL Server. Windows comes with a handy utility to do this, relog, which can be used to convert perfmon output to other formats. Including SQL Server.

First off you’ll need to set up an ODBC DSN on the machine you’re running the import from. Nothing strange here, but you need to make sure you use the standard SQL Server driver.

Make sure you use the standard SQL Driver when setting up an ODBC connection for relog to load perflog data into SQL Server

If you use the SQL Server Native Client you’re liable to run into this uninformative error:

0.00%Error: A SQL failure occurred. Check the application event log for any errors.

and this also unhelpful 3402 error in the Application Event log:

The informative 3402 message relog returns if you're using the wrong ODBC driver

Once you’ve got that setup it’s just the syntax of the relog command to get through. The documentation says you need to use:

-o { output file | DSN!counter_log }

What it doesn’t say is that counter_log is the name of the database you want to record the data into. So assuming you want to migrate your CSV file, c:\perfmon\log.csv, into the database perfmon_data using the DSN perfmon_obdbc, you’d want to use:

relog "c:\perfmon\log.csv" -f SQL -o SQL:perfmon_odbc!perfmon_data

Running that you should get a nice whirling % count as your records are loaded up to SQL Server.

Learning from more than just the content

While picking up new technical skills is the main reason for attending SQL Server events like User Groups and SQL Saturdays, there’s also another very good reason for attending. Watching some of the presenters just present is of great value.

All the speakers are obviously technically very good, but what really separates them from the other technical people present is their presentation styles. There’s a wide range on display, from the ultra focused business style talking about career enhancement to the enthusiastic geek motivator who’s all about getting you fired up about the newest tech.

But even with the difference overall feel, there’s still some common points that I aim to always incorporate into my presentations:

  • Be prepared. They make it look pretty effortless as they turn up and go. But that’s down to have rehearsed and practiced the material, and knowing they have a backup should anything go wrong
  • Engage with the audience. There isn’t a feeling of “you’ll sit there and watch and listen for the next 45 minutes”. They try to get the audience to interact with the material and think about how it would work for them
  • Time Management. They have the time built in to answer questions on the fly, but also know where they are in the presentation so they can politely say ‘talk later’ if they’re running behind. And from watching them do the same presentation to multiple audiences they also know how to extend it if noone’s asking questions.
  • Clear, easy to read materials. Slides aren’t cluttered with logos or fancy backgrounds. And occasionally handouts for frequently referenced info, this is a great idea, and makes it easy to take something away at the end.

So that’s just 4 quick and easy things to incorporate into your own presenting then! But in my opinion they all tie together, and the main thing you can do to bring them out is PRACTICE.

I’m sure my cats know more about SQL Server then any other cats on the planet, but they do insist on sitting there watching me present to no one with slides on the wall of the home office. Luckily my wife will also put up with me doing it, and in fact will often be a source of good feedback. As a non technical person, if she feels she’s sort of following the plot I’ve got the level right.

And it’s very rare that I’ll give the same presentation twice. Each time I present something I try to get feedback (and really appreciate it being given) and then use that to feed back for the next presentation.

Powered by WordPress & Theme by Anders Norén