04 Sep Does This Datatype Make My Column Look Fat?
Last Wednesday I delivered a webinar for Confio titled “Database Design: Size DOES Matter”. I had a piece of code at the end that evaluated defined datatypes to the current data values in those columns. The idea being that if a column is defined as BIGINT but contains values that are best defined as a SMALLINT the script would recognize this and flag that column as a candidate for being altered. The script also outputs a couple of result sets, one result set for the amount of disk space and one result set for the amount of logical I/O you could potentially save.
I had many requests for that script but I don’t feel it is ready to be shared yet. One person asked me for guidance on how to build their own script and that gave me an idea: I could build a scaled down version of my script and share it here. That way if someone wanted to take my script and add their own code on top of it they could.
So here it is. This script will only evaluate the TINYINT, SMALLINT, INT, and BIGINT datatypes. It is for SQL Server only, and should run against any version from SQL 2005 and later.
Please don’t blindly follow the advice that any script provides. You MUST know a few things about the database design, the business requirements, etc., before you start altering columns. You certainly would NOT want to change from BIGINT to INT and then find out you ran out of values sooner than necessary! I would also recommend that you test thoroughly. In fact, let’s just go with the standard disclaimer:
DISCLAIMER: Do not run code you find on the internet in your production environment without testing it first. Do not use this code if your vision becomes blurred. Seek medical attention if this code runs longer than four hours. On rare occasions this code has been known to cause one or more of the following: nausea, headaches, high blood pressure, popcorn cravings, and the impulse to reformat tabs into spaces. If this code causes your servers to smoke, seek shelter. Do not taunt this code.
/*============================================= File: check_datatype_int.sql Author: Thomas LaRock, http://thomaslarock.com/contact-me/ Summary: This script will verify defined datatypes to the data that currently exists. Returns: Two (2) results sets. The first is for an estimate of disk space savings. The second is for potential logical I/O savings. Date: September 4th, 2012 SQL Server Versions: SQL2005, SQL2008, SQL2008R2, SQL2012 You may alter this code for your own purposes. You may republish altered code as long as you give due credit. THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE. =============================================*/ --CHANGE TO WHATEVER DATABASE YOU WANT TO EVALUATE USE [AdventureWorks2012] GO SET NOCOUNT ON SET ANSI_WARNINGS OFF IF EXISTS (SELECT * FROM tempdb.dbo.sysobjects WHERE id = OBJECT_ID(N'tempdb..#tmp_ValDT') AND type in (N'U')) DROP TABLE #tmp_ValDT GO IF EXISTS (SELECT * FROM tempdb.dbo.sysobjects WHERE id = OBJECT_ID(N'tempdb..#tmp_spTmp') AND type in (N'U')) DROP TABLE #tmp_spTmp GO CREATE TABLE #tmp_ValDT (stmt nvarchar(max) NULL, min_val bigint NULL, max_val bigint NULL, rc bigint, table_name sysname, schema_name sysname, column_name sysname, TypeName sysname, recomm_DT sysname NULL, recomm_byte smallint NULL, max_length smallint, precision tinyint, scale tinyint) GO CREATE TABLE #tmp_spTmp (name nvarchar(128) NULL, rows char(11) NULL, reserved varchar(18) NULL, data varchar(18) NULL, index_size varchar(18) NULL, unused varchar(18)) GO INSERT INTO #tmp_ValDT SELECT '' AS stmt , NULL -- min value not known yet , NULL -- max value not known yet , NULL -- rowcount not known yet , tbl.name AS table_name , SCHEMA_NAME(tbl.schema_id) AS schema_name , col.name AS column_name , t.name AS TypeName , NULL -- recommended DT not known yet , NULL -- recommended DT bytes known yet , col.max_length , col.PRECISION , col.scale FROM sys.tables AS tbl INNER JOIN sys.columns col ON tbl.OBJECT_ID = col.OBJECT_ID INNER JOIN sys.types t ON col.user_type_id = t.user_type_id --not evaluating user defined datatypes, or MAX types, at this time WHERE t.is_user_defined = 0 AND t.is_assembly_type = 0 AND col.max_length <> -1 ORDER BY schema_name, table_name; DECLARE @SQL nvarchar(max) DECLARE DT CURSOR FOR SELECT table_name, schema_name, column_name, TypeName, max_length FROM #tmp_ValDT DECLARE @table_name sysname, @schema_name sysname DECLARE @column_name sysname, @TypeName sysname DECLARE @min_val nvarchar(max), @max_val nvarchar(max) DECLARE @max_length nvarchar(max), @rc nvarchar(max), @max_lenOUT nvarchar(max) OPEN DT FETCH NEXT FROM DT INTO @table_name, @schema_name, @column_name, @TypeName, @max_length WHILE (@@fetch_status <> -1) BEGIN IF (@@fetch_status <> -2) BEGIN IF @TypeName IN ('tinyint', 'smallint', 'int', 'bigint') BEGIN SET @SQL = 'SELECT @min_val = MIN(['+@column_name+']), @max_val = MAX(['+@column_name+']), @rc = COUNT(*) FROM ['+@schema_name+'].['+@table_name+']' EXEC sp_executesql @SQL, N'@min_val bigint OUTPUT , @max_val bigint OUTPUT , @rc bigint OUTPUT', @min_val = @min_val OUTPUT, @max_val = @max_val OUTPUT, @rc = @rc OUTPUT IF CONVERT(BIGINT, @min_val) < -2147483648 OR CONVERT(BIGINT, @max_val) > 2147483648 BEGIN UPDATE #tmp_ValDT SET stmt = @SQL, min_val = @min_val, max_val = @max_val, rc = @rc, recomm_DT = 'bigint', recomm_byte = 8 WHERE CURRENT OF DT END ELSE IF @min_val < -32768 OR @max_val > 32768 BEGIN UPDATE #tmp_ValDT SET stmt = @SQL, min_val = @min_val, max_val = @max_val, rc = @rc, recomm_DT = 'int', recomm_byte = 4 WHERE CURRENT OF DT END ELSE IF @min_val < 0 OR @max_val > 255 BEGIN UPDATE #tmp_ValDT SET stmt = @SQL, min_val = @min_val, max_val = @max_val, rc = @rc, recomm_DT = 'smallint', recomm_byte = 2 WHERE CURRENT OF DT END ELSE IF @min_val >= 0 AND @min_val <= 255 AND @max_val >= 0 AND @max_val <= 255 BEGIN UPDATE #tmp_ValDT SET stmt = @SQL, min_val = @min_val, max_val = @max_val, rc = @rc, recomm_DT = 'tinyint', recomm_byte = 1 WHERE CURRENT OF DT END ELSE IF @min_val IS NULL OR @max_val IS NULL BEGIN PRINT 'empty tables: ' + @SQL END ELSE BEGIN PRINT 'how did i get here int?' END END END FETCH NEXT FROM DT INTO @table_name, @schema_name, @column_name, @TypeName, @max_length END CLOSE DT DEALLOCATE DT GO INSERT INTO #tmp_spTmp EXEC sp_MSforeachtable 'EXECUTE sp_spaceused [?];' --summarize the possible space savings SELECT schema_name + '.' + table_name AS [Tablename] , column_name AS [ColumnName], TypeName AS [CurrentDT] , max_length AS [Length], recomm_DT AS [RecommendedDT] , recomm_byte AS [RecommendedLength] , CASE WHEN recomm_DT NOT IN ('varchar', 'char') THEN ((max_length - recomm_byte) * rc)/(1024.0*1024.0) ELSE (recomm_byte * 2)/(1024.0*1024.0) END AS [Space_Saved_MB] FROM #tmp_ValDT WHERE TypeName <> recomm_DT AND recomm_byte <> 0 ORDER BY [Space_Saved_MB] DESC --summarize the possible logical i/o savings SELECT vdt.table_name, SUM(vdt.max_length-vdt.recomm_byte) AS [row_savings_bytes] , SUM(vdt.max_length-vdt.recomm_byte) * spt.rows / 8192 AS [LIO_savings_pages] FROM #tmp_ValDT vdt INNER JOIN #tmp_spTmp spt ON vdt.table_name = spt.name GROUP BY vdt.table_name, spt.rows ORDER BY 3 DESC
I hope you find this script useful. If you decide to use it, please provide me some feedback on what you think, so I can incorporate that feedback into the full version I hope to make available soon.