Capturing Disk Space within SQL Server – 2000 to Current

Over the years, there have been many different ways to capture disk space within SQL Server.    Some of them were a lot more “user friendly” than others, but in the end we just want to get the job done.  Here are the ways that I am aware of through the years and some of the advantages and disadvantages of each.

SQL 2000-2005

In the SQL 2000 days, there were not a ton of options for the person who wanted to run stuff out of the SQL Agent and load it to SQL Server.   You really had just a couple of things:

  • xp_fixeddrives
/*Option 1 - the easy approach that does not give you quite as much information*/
CREATE TABLE FreeSpace
(
 id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
 ,DriveLetter VARCHAR(5)
 ,FreeSpaceMB INT
 ,DateCreated DATETIME DEFAULT GETDATE()
);
GO

And then you could run a process that would load this data into your tables periodically.

INSERT INTO dbo.FreeSpace(DriveLetter, FreeSpaceMB)
EXEC xp_fixeddrives;

This is not an awful option, it is easy to set up and runs quickly and requires no real “coding” knowledge outside of basic T-SQL, but it gives you a very limited amount of information.   You only see the drives that SQL is doing something with and you only see the Free Space that is on those drives.  In many cases that is good enough but it is certainly not fantastic.

  • xp_cmdshell or command prompt options

Doing this you can get the information in a variety of ways, so I will only go through a very simple one just to give you a feel.   It is nice, in that you can find out what the total space is and what the free space is but it is a bit of a pain because of the way that you have to roll through to get the drive letters available.   If you need to have Total Space along with Free Space and you are still running SQL 2000 (which I sincerely hope you are not) then this will get it for you.

First the set up:  Create our table to store this data.

/*Option 2 - More information but a big mess.*/
CREATE TABLE DriveSpace
(
 id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
 ,DriveLetter VARCHAR(5)
 ,FreeSpaceGB INT
 ,TotalSpaceGB INT
 ,DateCreated DATETIME DEFAULT GETDATE()
);
GO

Now we go through and get all of this information.  This is a many stepped process, so let me walk you through it before putting in the fairly lengthy script.    The first thing it has to do is grab the drives that are on the server.   This is pre-powershell (note I am assuming a SQL 2000 machine) so we had to do this with wmi calls through xp_cmdshell, so standard security rules and permissions apply.  After we have grabbed those disks, we then need to go and get the space data (free and total) for them.  That is also done through xp_cmdshell and specifically via fsutil.  Once we have grabbed that information, we have to parse through it and dump it to find the relevant values so that we can finally insert them into our DriveSpace table.  Long explanation, so now here is the code and I tried to comment it in a way that would separate those parts out as well as possible.

/*Create the table to store the local drives on the server*/
IF OBJECT_ID('tempdb..#drives') IS NOT NULL 
 DROP TABLE #drives;
GO
CREATE TABLE #drives
(
 DriveName VARCHAR(200)
);
GO
/*Get the list of local drives*/
INSERT INTO #drives
EXEC xp_cmdshell 'wmic logicaldisk get description,name | findstr /C:"Local"';
GO
/*Create the table to store the holding data of the drive information. Not well formed.*/
IF OBJECT_ID('tempdb..#diskspace_hold') IS NOT NULL
 DROP TABLE #diskspace_hold;
GO
CREATE TABLE #diskspace_hold
( ColumnOutput VARCHAR(200)
);
GO
/*Same as above, but with drive associated to it.*/
IF OBJECT_ID('tempdb..#diskspace') IS NOT NULL
 DROP TABLE #diskspace;
GO
CREATE TABLE #diskspace
( Drive VARCHAR(5)
 ,ColumnOutput VARCHAR(200)
);

GO
DECLARE @drive VARCHAR(2)
 ,@cmd VARCHAR(2000);
/*First get a distinct list of the drives*/
DECLARE csr CURSOR FOR
SELECT DISTINCT LEFT(LTRIM(REPLACE(DriveName, 'Local Fixed Disk', '')), 2) AS DriveLetter
FROM #drives
WHERE DriveName LIKE 'Local Fixed Disk%'

OPEN csr
FETCH NEXT FROM csr INTO @drive

WHILE @@FETCH_STATUS = 0
BEGIN
/*For each drive grab the disk space (free and total) to load into our table for parsing*/
SET @cmd = 'fsutil volume diskfree ' + @drive
PRINT @cmd
TRUNCATE TABLE #diskspace_hold
INSERT INTO #diskspace_hold
 ( ColumnOutput )
EXEC xp_cmdshell @cmd;

/*Load into our table with the drive associated to it*/
INSERT INTO #diskspace
 ( Drive, ColumnOutput )
SELECT @drive, 
 ColumnOutput 
FROM #diskspace_hold AS dh

FETCH NEXT FROM csr INTO @drive;
END
CLOSE csr;
DEALLOCATE csr;
GO
DECLARE @drive VARCHAR(5),
 @gbfree int,
 @gbtotal int;
/*Now that we have our drives with disk space data, parse it out to a usable integer style format*/
DECLARE csr CURSOR
FOR
 SELECT DISTINCT
 d.Drive
 FROM #diskspace AS d;

OPEN csr;
FETCH NEXT FROM csr INTO @drive;

WHILE @@FETCH_STATUS = 0
BEGIN

 SELECT @gbfree = CAST(REPLACE(d.ColumnOutput, 'Total # of free bytes : ', '') AS BIGINT) / 1073741824
 FROM #diskspace AS d
 WHERE d.ColumnOutput LIKE 'Total # of free bytes%'
 AND d.Drive = @drive 

 SELECT @gbtotal = CAST(REPLACE(d.ColumnOutput, 'Total # of bytes : ', '') AS BIGINT) / 1073741824 
 FROM #diskspace AS d
 WHERE d.ColumnOutput LIKE 'Total # of bytes%'
 AND d.Drive = @drive 
 
 INSERT INTO dbo.DriveSpace
 (
 DriveLetter,
 FreeSpaceGB,
 TotalSpaceGB,
 DateCreated
 )
 SELECT 
 @drive, 
 @gbfree,
 @gbtotal,
 GETDATE();



 FETCH NEXT FROM csr INTO @drive;
END

CLOSE csr;
DEALLOCATE csr;
GO

 

SQL 2008

To begin with, the methods mentioned above will work for SQL 2008 as well.  But with SQL 2008 came the addition of Powershell as a SQL Agent step option.  This opened things up a little bit for those wanting to get disk space in a much cleaner way than what I have above.   That said, a fair number of people have put together various scripts to do this and the one that I prefer is the one that the Scripting Guy blogs about.  And even better than that, it goes through a whole lot more than just capturing disk space, it shows you how to capture data, in general, with Powershell.   It is an excellent read, and he does it a lot better than I could (especially since it is the method that I used with SQL 2008 anyway).  Plus it can cross servers, if you want a central repository.

 

SQL 2008 R2 and Higher

Once we got to SQL 2008 R2, everything got even simpler for those who just wanted to grab this data and store it locally.   I first read about this at SQLPerformance.com from a post by Erin Stellato talking about Proactive monitoring.  It just does not get easier than this.

SELECT DISTINCT
 vs.volume_mount_point AS [Drive],
 vs.logical_volume_name AS [Drive Name],
 vs.total_bytes/1024/1024 AS [Drive Size MB],
 vs.available_bytes/1024/1024 AS [Drive Free Space MB]
FROM sys.master_files AS f
CROSS APPLY sys.dm_os_volume_stats(f.database_id, f.file_id) AS vs
ORDER BY vs.volume_mount_point;

So that is my list of ways to capture drive information from within SQL Server from SQL 2000 to present.   It has morphed fairly drastically over the years and has gotten much easier over time.

 

Migrating to Azure SQL Database – System.OutofMemory

Have you ever gone through the process of importing your newly created dacpac into Azure SQL Database and after many minutes or hours of waiting, you get a failure of type

 Exception of type 'System.OutOfMemoryException' was thrown. (Microsoft.Data.Tools.Schema.Sql) 

Well, I did, and it was a very large pain to debug.  Here is where I ended up.

Initial Setup

To start with, the dacpac was originally created on a SQL Server 2012 RTM (No SP’s or CU’s applied).    The dacpac itself seemed fine, because I was able to successfully load it to my own local SQL Server 2014 instance without any problem, but for whatever reason the same would not work when pointing to our Azure SQLDB instance.  It is notable that the process was done by loading the dacpac from local storage, not from Azure Storage.

First Attempt

My first thought was to just see if I created thet dacpac from the database on my local instance (that was created from the original dacpac anyway) how things would work when coming from a dacpac created on SQL 2014.    So I did, and it worked.   That seemed very nice, but I have absolutely no love for the idea of having to

  • Create dacpac on SQL 2012 instance
  • Install dacpac on SQL 2014 instance
  • Create new dacpac with the same data from SQL 2014 instance
  • Load dacpac to Azure SQL DB

This just did not seem like fun to me, but given the scenario it could be a reasonable work around (we do not have to do this often).  However, when doing this same approach with a different and larger database (around 8 GB which is small to me in the real world but seemingly large for Azure SQL DB) the same process failed anyway.   So this was not fool proof.

Next Attempt

Since the old standard dacpac did not work in all cases, I figured I would try one of the many other solutions that came up when searching for issues of this nature.  One of the first that came up was the Azure Migration Wizard from Codeplex, so I thought I would give it a shot.   The process itself is pretty self-explanatory once you had downloaded the application and I selected the following for my parameters:

  • On the opening screen I selected “Analyze/Migrate” and chose the “Database” button.   For the Target Server I selected “Azure SQL Database”
  • Clicking Next brought me to a standard server connection item where I entered the name of my local server and database that I had previously loaded that dacpac into.
  • On the choose objects screen, I chose “Script All Database Objects”
  • I had it then generate the script and when it asked for the credentials to my Azure SQL Database I added them and connected.
  • From here the application did its thing.  Several hours later, everything completed without problem (no out of memory exception with this application, at least not in my experience so far).

This seemed to work great.    The application appears to do similar things to a bacpac, in as much as it appears to call bcp at some level underlying, appears to process the commands in a similar fashion and appears to try and do basically the same kind of stuff.   But it works and so far it has worked for me in each of my scenarios.

So if you come across this issue, this would be my recommendation in terms of trying something new.  It is easy, free and seems to work.

 

Unused Indexes and How Much Am I Saving?

Every so often, I get that itch to see what all indexes are out there on our production systems that have not been hit since the last reset of the cache.    I have a script that I have been using for years and there are a bunch of them out there, but the one that I have always used for my own purposes is the one that Pinal Dave posted quite a few years ago.   As I was looking through this today I figured that it would be really nice to see how much disk space was being used by the unused indexes, so it seemed worth making some modifications to in order to incorporate that data as well.    Again, you will note that I have made some slight modifications to Pinal’s script and have added my own little section to grab the index size along with the data.  For my purposes, I really only wanted to see those indexes that have had no reads against them at all since that last reboot, because I figure those are the ones that I want to dig into further.

As with all scripts, run at your own risk.  And the results DO NOT MEAN that you want to automatically just drop these indexes, they just mean that you may want to look at them as POTENTIAL candidates to drop.   Lots of things need to be taken into account before dropping indexes.

USE YourDBHere;
With unused 
as
(
 SELECT
 o.name AS TableName
 , i.name AS IndexName
 , i.index_id AS IndexID
 , dm_ius.user_seeks AS UserSeek
 , dm_ius.user_scans AS UserScans
 , dm_ius.user_lookups AS UserLookups
 , dm_ius.user_seeks + dm_ius.user_scans + dm_ius.user_lookups as reads
 , dm_ius.user_updates AS UserUpdates
 , p.TableRows
 , 'DROP INDEX ' + QUOTENAME(i.name) + ' ON ' + QUOTENAME(s.name) + '.' + QUOTENAME(OBJECT_NAME(dm_ius.OBJECT_ID)) AS 'drop statement'
 FROM sys.dm_db_index_usage_stats dm_ius
 INNER JOIN sys.indexes i 
 ON i.index_id = dm_ius.index_id AND dm_ius.OBJECT_ID = i.OBJECT_ID
 INNER JOIN sys.objects o 
 ON dm_ius.OBJECT_ID = o.OBJECT_ID
 INNER JOIN sys.schemas s 
 ON o.schema_id = s.schema_id
 INNER JOIN (SELECT SUM(p.rows) TableRows, p.index_id, p.OBJECT_ID
 FROM sys.partitions p GROUP BY p.index_id, p.OBJECT_ID) p
 ON p.index_id = dm_ius.index_id AND dm_ius.OBJECT_ID = p.OBJECT_ID
 WHERE OBJECTPROPERTY(dm_ius.OBJECT_ID,'IsUserTable') = 1
 AND dm_ius.database_id = DB_ID()
 AND i.type_desc = 'nonclustered'
 AND i.is_primary_key = 0
 AND i.is_unique_constraint = 0
), 
idxsize as
(
 SELECT i.[name] AS IndexName
 ,object_name(i.object_id) as tablename
 ,SUM(s.[used_page_count]) * 8 / 1000000. AS IndexSizeGB
 FROM sys.dm_db_partition_stats AS s
 INNER JOIN sys.indexes AS i ON s.[object_id] = i.[object_id]
 AND s.[index_id] = i.[index_id]
 GROUP BY object_name(i.object_id), i.[name]
)
Select *
from unused u
 join idxsize i 
 on u.indexname = i.indexname and u.tablename = i.tablename
where u.reads = 0
order by indexsizegb desc;

FreeCon 2015

I was lucky enough to be one of the people selected to attend FreeCon 2015 from PASS Summit and the experience was a very enlightening one that really challenged me with how I should move forward in my career.   Brent Ozar and Kendra Little did an amazing job putting this together and gathering a group of people who were all engaged, passionate about SQL Server and who were happy to share their experiences to help the group.   Because of this, I wanted to share a few of the key things that the FreeCon did for me.

  1.  Encouraged me to consider how I would want to brand myself within the SQL community.   A huge part of the FreeCon included thought-provoking discussion about the types of words that you feel describe you and the types of words that you feel make you stand out.    These are things that had never really occurred to me to even consider in the past and it was challenging to come up with words that made sense for me.  Being surrounded by SQL MVP’s, some of the more active members of the SQL community and some of the best minds in the SQL Server world really made that a challenge for me.   And frankly, it was humbling and a big part of why I chose to go with the humblesql name.
  2. Encouraged me to start blogging.   In the group discussions, one of the big things pointed out numerous times was the idea that there are already so many fantastic SQL bloggers.  However, in that same discussion nearly all of them said that we all have something to offer that community.   This was very encouraging to me, and frankly it made sense.   I can not ever recall going to a SQL Server session or reading a  blog from one of the many excellent SQL bloggers, where at the end I felt that I had gotten nothing.    There are obviously times when you get the life-changing experience and there are times where you just pull a nugget of information from information where you are already very well informed, but in all cases I feel like if I go into it with the right frame of mind that I will learn something new.   And if that is true, then there is no good reason why I should not blog as well, and potentially add a small amount of value to someone else.   So here we are.
  3. Time management!    The last point that I really want to discuss that I learned from the FreeCon was how to manage my time wisely.   One of the suggestions for those looking to blog or looking to speak or looking to be active in the SQL community (or any community really) was to make it a priority.  Brent gave a number of guidelines for how long it takes to build certain kinds of blogs, how long it takes to build certain types of sessions, etc…but the thing that stuck out the most was the concept of making it part of your calendar.   If you really do want to blog, set aside time to get that done on a weekly basis.   I am still in the process of making this work for myself, and there are struggles for me, but I feel like in just these first few weeks I have learned a great deal and hopefully am getting better.

For those who get the chance to attend one of these in the future, I highly recommend it.   For those of you that were there and shared this experience, I would love to know what your key takeaways were.

Temp Tables vs Table Variables

If you Google “Temp Tables vs Table Variables” you will come up with a myriad of results all of which contain excellent information.   Some examples are:

SQL Server Planet – gives a fantastic matrix of the various things that can be accomplished through each option

Stackoverflow – Rory and Stackoverflow – Martin Smith – each of which gives you insight into the different advantages and disadvantages of one vs. the other.  Things like scoping, transactional behavior, indexing, recompilation, locking and statistics.   This is a tremendous amount of information and can really help you learn the internals of the two options.  

Code Project – gives a nice little run through of the performance of each type with time to complete for the varying record set lengths

Pretty much anything that you could want to know about temp tables and table variables you can find in one of these links.

What I want to discuss here is why it makes such a big difference on these larger data sets, as each of the above links clearly states is under what circumstances you do not want to use table variables.  Below I have a very simple set of queries that will demonstrate what I see as the biggest issue with larger datasets being used with table variables.

First I will create my table variable and my temp table with the same dataset:


USE AdventureWorks2014;
GO
Declare @tbl table (SalesOrderID int
, CarrierTrackingNumber varchar(40)
, OrderQty int
, ProductId int
, UnitPrice float);

insert into @tbl
select SalesOrderID
, CarrierTrackingNumber
, OrderQty
, ProductID
, UnitPrice
from [Sales].[SalesOrderDetail];

if object_id('tempdb..#t') is not null
drop table #t;
Create table #t
(SalesOrderID int
, CarrierTrackingNumber varchar(40)
, OrderQty int
, ProductId int
, UnitPrice float);

insert into #t
select SalesOrderID
, CarrierTrackingNumber
, OrderQty
, ProductID
, UnitPrice
from [Sales].[SalesOrderDetail];

And from there I will select out the records, doing a simple join over to a second table that has a direct reference within the temp table or table variable.


Select t.SalesOrderID, soh.AccountNumber, soh.BillToAddressID, soh.OrderDate
from @tbl t
join Sales.SalesOrderHeader soh on t.SalesOrderID = soh.SalesOrderID;

Select t.SalesOrderID, soh.AccountNumber, soh.BillToAddressID, soh.OrderDate
from #t t
join Sales.SalesOrderHeader soh on t.SalesOrderID = soh.SalesOrderID;

Using SQL Sentry Plan Explorer we can see that first off the Estimated cost for the Select against the table variable is less than that of the select against the Temp table.  In my case, this is against a table of 120,000 rows.

PctOfPlan

 

Don’t listen to that, because it is not always accurate.  It is just an estimate even if you have the actual execution plan.

So in looking at the Query Plan’s we see that for the table variable

QueryPlan

 

This looks like a pretty good plan, right?   Clustered index seek on the real table, nested loops to bring it together…nothing alarming about this at first glance.

Heck, compare it to the plan for the temp table

QPTempTbl

And it makes it almost seem better.  This has a table scan against the real table, a hash match and it even has that darn warning (in this case that is just for a missing index suggestion but at first glance you might think this is worse).

HOWEVER, this is actually the problem.   The estimated cost makes it seem better, the plan even kind of looks better at initial glance but the reality of it is that this is a  MUCH WORSE plan.    And why does it do this?

EstRow

Because that table variable  ALWAYS has an estimated row count of 1 row if you do not put optimizer hints on the query.  And because of that, the optimizer will assume that doing index seeks instead of scans is a much better approach. In a case like this one where we have over 100,000 rows, that is most assuredly NOT the case.

That Index seek with the nested loops to bring it together ends up doing over 360,000 reads in total, where the table scan with the hash match ends and the “warning” only does 1,246.

WithReads

We can fairly easily remedy this situation by putting a simple Option (Recompile) at the end of our query that accesses that table variable as you can see below.   And when that is done, the query plan that the optimizer creates ends up being the exact same as that of the Temp Table.


Select t.SalesOrderID, soh.AccountNumber, soh.BillToAddressID, soh.OrderDate
from @tbl t
join Sales.SalesOrderHeader soh on t.SalesOrderID = soh.SalesOrderID
option (recompile);

This is a good solution but it is one that comes at the expense of expecting everyone who writes these queries to know this fact and how to get around it.

Table variables have their place but that estimated number of rows being equal to 1 can definitely have an impact on the types of query plans the optimizer will choose.    So when you are making that decision, take this into account and choose wisely or remember that Option (Recompile).