Last Wednesday I delivered a webinar for Confio titled “Database Design: Size DOES Matter”. I had a piece of code at the end that evaluated defined datatypes to the current data values in those columns. The idea being that if a column is defined as BIGINT but contains values that are best defined as a SMALLINT the script would recognize this and flag that column as a candidate for being altered. The script also outputs a couple of result sets, one result set for the amount of disk space and one result set for the amount of logical I/O you could potentially save.

I had many requests for that script but I don’t feel it is ready to be shared yet. One person asked me for guidance on how to build their own script and that gave me an idea: I could build a scaled down version of my script and share it here. That way if someone wanted to take my script and add their own code on top of it they could.

So here it is. This script will only evaluate the TINYINT, SMALLINT, INT, and BIGINT datatypes. It is for SQL Server only, and should run against any version from SQL 2005 and later.

Please don’t blindly follow the advice that any script provides. You MUST know a few things about the database design, the business requirements, etc., before you start altering columns. You certainly would NOT want to change from BIGINT to INT and then find out you ran out of values sooner than necessary! I would also recommend that you test thoroughly. In fact, let’s just go with the standard disclaimer:

DISCLAIMERDo not run code you find on the internet in your production environment without testing it first. Do not use this code if your vision becomes blurred. Seek medical attention if this code runs longer than four hours. On rare occasions this code has been known to cause one or more of the following: nausea, headaches, high blood pressure, popcorn cravings, and the impulse to reformat tabs into spaces. If this code causes your servers to smoke, seek shelter. Do not taunt this code. 

/*=============================================
  File: check_datatype_int.sql
 
  Author: Thomas LaRock, http://thomaslarock.com/contact-me/
 
  Summary: This script will verify defined datatypes to the data
           that currently exists. 
 
  Returns: Two (2) results sets. The first is for an estimate of 
           disk space savings. The second is for potential
           logical I/O savings.
 
  Date: September 4th, 2012
 
  SQL Server Versions: SQL2005, SQL2008, SQL2008R2, SQL2012
 
  You may alter this code for your own purposes. You may republish
  altered code as long as you give due credit. 
 
  THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY
  OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT
  LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR
  FITNESS FOR A PARTICULAR PURPOSE.
 
=============================================*/
 
--CHANGE TO WHATEVER DATABASE YOU WANT TO EVALUATE
USE [AdventureWorks2012]
GO
 
SET NOCOUNT ON
SET ANSI_WARNINGS OFF
 
IF  EXISTS (SELECT * FROM tempdb.dbo.sysobjects 
WHERE id = OBJECT_ID(N'tempdb..#tmp_ValDT') 
AND type in (N'U'))
DROP TABLE #tmp_ValDT
GO
 
IF  EXISTS (SELECT * FROM tempdb.dbo.sysobjects 
WHERE id = OBJECT_ID(N'tempdb..#tmp_spTmp') 
AND type in (N'U'))
DROP TABLE #tmp_spTmp
GO
 
CREATE TABLE #tmp_ValDT
	(stmt nvarchar(max) NULL,
	min_val bigint NULL,
	max_val bigint NULL,
	rc bigint,
	table_name sysname,
	schema_name sysname,
	column_name sysname,
	TypeName sysname,
	recomm_DT sysname NULL,
	recomm_byte smallint NULL,
	max_length smallint,
	precision tinyint,
	scale tinyint)
GO
 
CREATE TABLE #tmp_spTmp
	(name nvarchar(128) NULL,
	rows char(11) NULL,
	reserved varchar(18) NULL,
	data varchar(18) NULL,
	index_size varchar(18) NULL,
	unused varchar(18))
GO
 
 
INSERT INTO #tmp_ValDT
SELECT '' AS stmt
, NULL -- min value not known yet
, NULL -- max value not known yet
, NULL -- rowcount not known yet
, tbl.name AS table_name
, SCHEMA_NAME(tbl.schema_id) AS schema_name
, col.name AS column_name
, t.name AS TypeName
, NULL -- recommended DT not known yet
, NULL -- recommended DT bytes known yet
, col.max_length
, col.PRECISION
, col.scale
FROM sys.tables AS tbl
INNER JOIN sys.columns col ON tbl.OBJECT_ID = col.OBJECT_ID
INNER JOIN sys.types t ON col.user_type_id = t.user_type_id
--not evaluating user defined datatypes, or MAX types, at this time
WHERE t.is_user_defined = 0
AND t.is_assembly_type = 0
AND col.max_length <> -1
ORDER BY schema_name, table_name;
 
DECLARE @SQL nvarchar(max)
DECLARE DT CURSOR
FOR 
SELECT table_name, schema_name, column_name, TypeName, max_length 
FROM #tmp_ValDT
 
DECLARE @table_name sysname, @schema_name sysname
DECLARE @column_name sysname, @TypeName sysname
DECLARE @min_val nvarchar(max), @max_val nvarchar(max)
DECLARE @max_length nvarchar(max), @rc nvarchar(max), @max_lenOUT nvarchar(max)
OPEN DT
 
FETCH NEXT FROM DT INTO @table_name, @schema_name, @column_name, @TypeName, @max_length
WHILE (@@fetch_status <> -1)
BEGIN
	IF (@@fetch_status <> -2)
	BEGIN
 
		IF @TypeName IN ('tinyint', 'smallint', 'int', 'bigint')
		BEGIN
 
			SET @SQL = 'SELECT @min_val = MIN(['+@column_name+']), @max_val = MAX(['+@column_name+']), @rc = COUNT(*) FROM ['+@schema_name+'].['+@table_name+']'		
			EXEC sp_executesql @SQL, N'@min_val bigint OUTPUT
			, @max_val bigint OUTPUT
			, @rc bigint OUTPUT', @min_val = @min_val OUTPUT, @max_val = @max_val OUTPUT, @rc = @rc OUTPUT
 
			IF CONVERT(BIGINT, @min_val) < -2147483648 OR CONVERT(BIGINT, @max_val) > 2147483648
				BEGIN
					UPDATE #tmp_ValDT
					SET stmt = @SQL, min_val = @min_val, max_val = @max_val, rc = @rc,
						recomm_DT = 'bigint', recomm_byte = 8
					WHERE CURRENT OF DT
				END
			ELSE IF @min_val < -32768 OR @max_val > 32768
				BEGIN
					UPDATE #tmp_ValDT
					SET stmt = @SQL, min_val = @min_val, max_val = @max_val, rc = @rc,
						recomm_DT = 'int', recomm_byte = 4
					WHERE CURRENT OF DT
				END
			ELSE IF @min_val < 0 OR @max_val > 255
				BEGIN
					UPDATE #tmp_ValDT
					SET stmt = @SQL, min_val = @min_val, max_val = @max_val, rc = @rc,
						recomm_DT = 'smallint', recomm_byte = 2
					WHERE CURRENT OF DT
				END
			ELSE IF @min_val >= 0 AND @min_val <= 255 AND @max_val >= 0 AND @max_val <= 255
				BEGIN
					UPDATE #tmp_ValDT
					SET stmt = @SQL, min_val = @min_val, max_val = @max_val, rc = @rc,
						recomm_DT = 'tinyint', recomm_byte = 1
					WHERE CURRENT OF DT
				END
			ELSE IF @min_val IS NULL OR @max_val IS NULL
				BEGIN
					PRINT 'empty tables: ' + @SQL
				END
			ELSE
				BEGIN
					PRINT 'how did i get here int?'
				END
		END
 
	END
	FETCH NEXT FROM DT INTO @table_name, @schema_name, @column_name, @TypeName, @max_length
END
 
CLOSE DT
DEALLOCATE DT
GO
 
INSERT INTO #tmp_spTmp
EXEC sp_MSforeachtable 'EXECUTE sp_spaceused [?];'
 
--summarize the possible space savings
SELECT schema_name + '.' + table_name AS [Tablename]
, column_name AS [ColumnName], TypeName AS [CurrentDT]
, max_length AS [Length], recomm_DT AS [RecommendedDT]
, recomm_byte AS [RecommendedLength]
, CASE WHEN recomm_DT NOT IN ('varchar', 'char')
	THEN ((max_length - recomm_byte) * rc)/(1024.0*1024.0) 
	ELSE (recomm_byte * 2)/(1024.0*1024.0) 
	END AS [Space_Saved_MB]
FROM #tmp_ValDT
WHERE TypeName <> recomm_DT
AND recomm_byte <> 0
ORDER BY [Space_Saved_MB] DESC
 
--summarize the possible logical i/o savings
SELECT vdt.table_name, SUM(vdt.max_length-vdt.recomm_byte) AS [row_savings_bytes] 
, SUM(vdt.max_length-vdt.recomm_byte) * spt.rows / 8192 AS [LIO_savings_pages]
FROM #tmp_ValDT vdt INNER JOIN #tmp_spTmp spt ON vdt.table_name = spt.name
GROUP BY vdt.table_name, spt.rows
ORDER BY 3 DESC

I hope you find this script useful. If you decide to use it, please provide me some feedback on what you think, so I can incorporate that feedback into the full version I hope to make available soon.