How To: Right-sizing The Datatypes Currently In The Buffer Pool

Recently I shared a piece of code that would help you to right size the datatypes inside of a database. I’ve received a lot of feedback since that post and it has spurred me to think a little bit deeper on the topic of datatypes and performance.

The script I shared only examined whats are called “exact number” data types. It was meant to serve as an introduction for folks that wanted an example so they could think about building their own, more robust scripts. While I was looking to add in extra logic it suddenly dawned on me that there had to be a smarter way to get the job done.

It could be the case that I decide to point this script at a rather large database in order to see what possible savings I could get by right-sizing the datatypes. But what if that database was rarely used? If so then I would be looking to get the benefits of disk storage, but that is fairly static. I want my script to be smarter and only do the work that is necessary, now.

What I mean is this: I want the script to focus on the pages that are being used in memory right now. 

So how do I find those pages? Great question! As luck would have it we have a way to get to those details by looking inside the sys.dm_os_buffer_descriptors dynamic management view (DMV). For example, if you wanted to know how many pages were currently loaded into memory for each database you could run the following:

SELECT count(*)AS cached_pages_count
    ,CASE database_id 
        WHEN 32767 THEN 'ResourceDb' 
        ELSE db_name(database_id) 
        END AS Database_name
FROM sys.dm_os_buffer_descriptors
GROUP BY db_name(database_id) ,database_id
ORDER BY cached_pages_count DESC;

For our purposes we want to know details at a lower level. A little Google-fu led me to this post by Bill Graziano (@billgraziano | blog). I took that code and modified it slightly to come up with this:

SELECT obj.[name] AS [TableName], 
        i.[name] AS [Objname], 
        i.[type_desc] AS [Type], 
        count(*) AS [Buffered_Page_Count], 
        count(*) * 8192 / (1024 * 1024) as [Buffer_MB] 
FROM sys.dm_os_buffer_descriptors AS bd 
    INNER JOIN 
    (
        SELECT object_name(object_id) AS name 
            ,index_id ,allocation_unit_id, object_id
        FROM sys.allocation_units AS au
            INNER JOIN sys.partitions AS p 
                ON au.container_id = p.hobt_id 
                    AND (au.type = 1 OR au.type = 3)
        UNION ALL
        SELECT object_name(object_id) AS name   
            ,index_id, allocation_unit_id, object_id
        FROM sys.allocation_units AS au
            INNER JOIN sys.partitions AS p 
                ON au.container_id = p.hobt_id 
                    AND au.type = 2
    ) AS obj 
        ON bd.allocation_unit_id = obj.allocation_unit_id
LEFT JOIN sys.indexes i on i.object_id = obj.object_id 
        AND i.index_id = obj.index_id
INNER JOIN sys.objects o on i.object_id = o.object_id
WHERE database_id = db_id()
AND o.type <> 'S'
GROUP BY obj.name, obj.index_id , i.[name],i.[type_desc]
ORDER BY Buffered_Page_Count DESC

Now I can take that code and focus on the exact objects that have data pages currently taking up space inside the buffer pool. I can combine the code above into the previous script and get a version of this code that will look to right-size the datatypes that are most likely to be affecting performance right now.

The code would then look like this:

DISCLAIMERDo not run code you find on the internet in your production environment without testing it first. Do not use this code if your vision becomes blurred. Seek medical attention if this code runs longer than four hours. On rare occasions this code has been known to cause one or more of the following: nausea, headaches, high blood pressure, popcorn cravings, and the impulse to reformat tabs into spaces. If this code causes your servers to smoke, seek shelter. Do not taunt this code. 

/*=============================================
  File: check_datatype_int_memory.sql

  Author: Thomas LaRock, https://thomaslarock.com/contact-me/

  Summary: This script will verify defined datatypes to the data
           that currently exists. 

  Returns: Two (2) results sets. The first is for an estimate of 
           disk space savings. The second is for potential
           logical I/O savings.

  Date: September 11th, 2012

  SQL Server Versions: SQL2005, SQL2008, SQL2008R2, SQL2012

  You may alter this code for your own purposes. You may republish
  altered code as long as you give due credit. 

  THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY
  OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT
  LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR
  FITNESS FOR A PARTICULAR PURPOSE.

=============================================*/

--CHANGE TO WHATEVER DATABASE YOU WANT TO EVALUATE
USE [AdventureWorks2012]
GO

SET NOCOUNT ON
SET ANSI_WARNINGS OFF

IF  EXISTS (SELECT * FROM tempdb.dbo.sysobjects 
WHERE id = OBJECT_ID(N'tempdb..#tmp_ValDT') 
AND type in (N'U'))
DROP TABLE #tmp_ValDT
GO

IF  EXISTS (SELECT * FROM tempdb.dbo.sysobjects 
WHERE id = OBJECT_ID(N'tempdb..#tmp_spTmp') 
AND type in (N'U'))
DROP TABLE #tmp_spTmp
GO

IF  EXISTS (SELECT * FROM tempdb.dbo.sysobjects 
WHERE id = OBJECT_ID(N'tempdb..#tmp_bufObj') 
AND type in (N'U'))
DROP TABLE #tmp_bufObj
GO

CREATE TABLE #tmp_ValDT
        (stmt nvarchar(max) NULL,
        min_val bigint NULL,
        max_val bigint NULL,
        rc bigint,
        table_name sysname,
        schema_name sysname,
        column_name sysname,
        TypeName sysname,
        recomm_DT sysname NULL,
        recomm_byte smallint NULL,
        max_length smallint,
        precision tinyint,
        scale tinyint)
GO

CREATE TABLE #tmp_spTmp
        (name nvarchar(128) NULL,
        rows char(11) NULL,
        reserved varchar(18) NULL,
        data varchar(18) NULL,
        index_size varchar(18) NULL,
        unused varchar(18))
GO

CREATE TABLE #tmp_bufObj
        (name nvarchar(128) NULL,
        objname sysname NULL,
        type nvarchar(60) NULL,
        Buffered_Page_Count int NULL,
        Buffer_MB int NULL
        )
GO

--find the tables currently in memory
INSERT INTO #tmp_bufObj
SELECT obj.[name] AS [TableName], --nvarchar(128)
        i.[name] AS [Objname], --sysname
        i.[type_desc] AS [Type], --nvarchar(60)
        COUNT_BIG(*) AS [Buffered_Page_Count], --int
        COUNT_BIG(*) * 8192 / (1024 * 1024) as [Buffer_MB] --int
FROM sys.dm_os_buffer_descriptors AS bd 
    INNER JOIN 
    (
        SELECT object_name(object_id) AS name 
            ,index_id ,allocation_unit_id, object_id
        FROM sys.allocation_units AS au
            INNER JOIN sys.partitions AS p 
                ON au.container_id = p.hobt_id 
                    AND (au.type = 1 OR au.type = 3)
        UNION ALL
        SELECT object_name(object_id) AS name   
            ,index_id, allocation_unit_id, object_id
        FROM sys.allocation_units AS au
            INNER JOIN sys.partitions AS p 
                ON au.container_id = p.hobt_id 
                    AND au.type = 2
    ) AS obj 
        ON bd.allocation_unit_id = obj.allocation_unit_id
LEFT JOIN sys.indexes i on i.object_id = obj.object_id 
        AND i.index_id = obj.index_id
INNER JOIN sys.objects o on i.object_id = o.object_id
WHERE database_id = db_id()
AND o.type <> 'S'
GROUP BY obj.name, obj.index_id , i.[name],i.[type_desc]
ORDER BY Buffered_Page_Count DESC

--build table by joining to what is in memory currently
INSERT INTO #tmp_ValDT
SELECT '' AS stmt
, NULL -- min value not known yet
, NULL -- max value not known yet
, NULL -- rowcount not known yet
, tbl.name AS table_name
, SCHEMA_NAME(tbl.schema_id) AS schema_name
, col.name AS column_name
, t.name AS TypeName
, NULL -- recommended DT not known yet
, NULL -- recommended DT bytes known yet
, col.max_length
, col.PRECISION
, col.scale
FROM sys.tables AS tbl
INNER JOIN sys.columns col ON tbl.OBJECT_ID = col.OBJECT_ID
INNER JOIN sys.types t ON col.user_type_id = t.user_type_id
--not evaluating user defined datatypes, or MAX types, at this time
INNER JOIN #tmp_bufObj bo ON tbl.name = bo.name COLLATE DATABASE_DEFAULT
WHERE t.is_user_defined = 0
AND t.is_assembly_type = 0
AND col.max_length <> -1
ORDER BY schema_name, table_name;

DECLARE @SQL nvarchar(max)
DECLARE DT CURSOR
FOR 
SELECT table_name, schema_name, column_name, TypeName, max_length 
FROM #tmp_ValDT

DECLARE @table_name sysname, @schema_name sysname
DECLARE @column_name sysname, @TypeName sysname
DECLARE @min_val nvarchar(max), @max_val nvarchar(max)
DECLARE @max_length nvarchar(max), @rc nvarchar(max), @max_lenOUT nvarchar(max)
OPEN DT

FETCH NEXT FROM DT INTO @table_name, @schema_name, @column_name, @TypeName, @max_length
WHILE (@@fetch_status <> -1)
BEGIN
        IF (@@fetch_status <> -2)
        BEGIN

                IF @TypeName IN ('tinyint', 'smallint', 'int', 'bigint')
                BEGIN
                
                        SET @SQL = 'SELECT @min_val = MIN(['+@column_name+']), @max_val = MAX(['+@column_name+']), @rc = COUNT(*) FROM ['+@schema_name+'].['+@table_name+']'            
                        EXEC sp_executesql @SQL, N'@min_val bigint OUTPUT
                        , @max_val bigint OUTPUT
                        , @rc bigint OUTPUT', @min_val = @min_val OUTPUT, @max_val = @max_val OUTPUT, @rc = @rc OUTPUT

                        IF CONVERT(BIGINT, @min_val) < -2147483648 OR CONVERT(BIGINT, @max_val) > 2147483648
                                BEGIN
                                        UPDATE #tmp_ValDT
                                        SET stmt = @SQL, min_val = @min_val, max_val = @max_val, rc = @rc,
                                                recomm_DT = 'bigint', recomm_byte = 8
                                        WHERE CURRENT OF DT
                                END
                        ELSE IF @min_val < -32768 OR @max_val > 32768
                                BEGIN
                                        UPDATE #tmp_ValDT
                                        SET stmt = @SQL, min_val = @min_val, max_val = @max_val, rc = @rc,
                                                recomm_DT = 'int', recomm_byte = 4
                                        WHERE CURRENT OF DT
                                END
                        ELSE IF @min_val < 0 OR @max_val > 255
                                BEGIN
                                        UPDATE #tmp_ValDT
                                        SET stmt = @SQL, min_val = @min_val, max_val = @max_val, rc = @rc,
                                                recomm_DT = 'smallint', recomm_byte = 2
                                        WHERE CURRENT OF DT
                                END
                        ELSE IF @min_val >= 0 AND @min_val <= 255 AND @max_val >= 0 AND @max_val <= 255
                                BEGIN
                                        UPDATE #tmp_ValDT
                                        SET stmt = @SQL, min_val = @min_val, max_val = @max_val, rc = @rc,
                                                recomm_DT = 'tinyint', recomm_byte = 1
                                        WHERE CURRENT OF DT
                                END
                        ELSE IF @min_val IS NULL OR @max_val IS NULL
                                BEGIN
                                        PRINT 'empty tables: ' + @SQL
                                END
                        ELSE
                                BEGIN
                                        PRINT 'how did i get here int?'
                                END
                END

        END
        FETCH NEXT FROM DT INTO @table_name, @schema_name, @column_name, @TypeName, @max_length
END

CLOSE DT
DEALLOCATE DT
GO

INSERT INTO #tmp_spTmp
EXEC sp_MSforeachtable 'EXECUTE sp_spaceused [?];'

--summarize the possible space savings
SELECT schema_name + '.' + table_name AS [Tablename]
, column_name AS [ColumnName], TypeName AS [CurrentDT]
, max_length AS [Length], recomm_DT AS [RecommendedDT]
, recomm_byte AS [RecommendedLength]
, CASE WHEN recomm_DT NOT IN ('varchar', 'char')
        THEN ((max_length - recomm_byte) * rc)/(1024.0*1024.0) 
        ELSE (recomm_byte * 2)/(1024.0*1024.0) 
        END AS [Space_Saved_MB]
FROM #tmp_ValDT
WHERE TypeName <> recomm_DT
AND recomm_byte <> 0
ORDER BY [Space_Saved_MB] DESC

--summarize the possible logical i/o savings
SELECT vdt.table_name, SUM(vdt.max_length-vdt.recomm_byte) AS [row_savings_bytes] 
, SUM(vdt.max_length-vdt.recomm_byte) * spt.rows / 8192 AS [LIO_savings_pages]
FROM #tmp_ValDT vdt INNER JOIN #tmp_spTmp spt ON vdt.table_name = spt.name
GROUP BY vdt.table_name, spt.rows
ORDER BY 3 DESC

Think of how a car engine gets tuned. Some things you will do while the engine is running, others you will do when the engine is turned off. The previous version of this script didn’t take into account if the database was being used at all. In all likelihood you will end up saving disk space but you may not see much performance gains unless you know that the data is being used.

The script in this post is trying to help you tune something while it is running. You can look inside the buffer pool at what pages are currently residing in memory (a “hot spot” of activity) and focus your efforts such that they will likely have an immediate impact on disk and reduction in logical I/O.

The end result should be a finely tuned engine!

10 thoughts on “How To: Right-sizing The Datatypes Currently In The Buffer Pool”

  1. Thank you for sharing!
    I have to change to:
    cast(count(*) as bigint) * 8192 / (1024 * 1024) as [Buffer_MB]

    otherwise I get arithmetic overflow.

    Thanks!

    Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.