SQL Server DBA and Azure SQL Insights

How to Manage a Very Large Database in SQL Server

Few years ago when I started my career as a SQL Server DBA, I remember a 200-500 GB sized database was considered as a very large database. During that time maintaining and managing those databases were not a tedious task. But over the years the definition of a very large database has changed. Today we’re looking at database size in terabytes and petabytes.

With cloud computing, hardware requirements needed to sustain the growth of databases are now just one click away. Auto-scaling is now a blessing to companies, as they previously had to run through their budgets whenever there was a need to add resources to the database servers.

Because database sizes have grown over time, managing and maintaining them have become a pain. And when I say managing and maintaining a database, it means taking regular backups, performing index maintenance, integrity checks, etc. etc.

Most of the time we try to archive the old/cold data so that we can keep the database size in check. But sometimes there are cases where the scope to archive the database is very limited. This is especially in medical and financial sectors where old data is still used for various purposes.

Given the rate of database daily growth, I’m going to take you through some database management tasks to give you a better understanding of how you can keep up.

Consider an OLTP database that is active 24*6 and is around 10TB in size.

Backup Strategies

Taking daily full backup of databases whose size is 10TB can be a very demanding task, especially for an OLTP database. Even with better hardware, the backup time would be around five to seven hours. Therefore, having a proper backup strategy in place is as important as maintaining its availability.

It would be wise to consider having a weekly full backup with daily differential and hours transaction backup when looking at the time and cost of resources.

Using third party tools to backup the database is a better option. These tools not only help you in reducing the time taken to backup the database, but also reduces the size of the compressed backup.

INDEX Maintenance

Performing normal index maintenance tasks on a very large database is not the same as performaning on a regular size database. REBUILDING index on big tables having big indexes is very time consuming. This is something that also causes blocking on the servers which hampers the performance of other applications.

The only way to maintain the indexes on such a huge database is to REORGANIZE them. REBUILD index option can only be chosen during index corruption or when there is an absolute need to REBUILD a particular large index.

One important option that we need to consider for creating indexes on such a large database is to use is to specify WITH SORT_IN_TEMPDB in the statement. SORT_IN_TEMPDB forces the index build to occur in tempdb, and when it is complete, it is then written to its destination filegroup. This is an often-overlooked option that can provide huge reductions in index build times.

Another advantage of reorganizing the index is that we can stop the operation in the middle if the execution time overlaps business hours – and this will not cause the index to go in ROLLBACK state.

Database Consistency Check (DBCC)

DBCC command is used to check the consistency and integrity of the database. This command helps us to make sure that our databases are in a healthy state and if in case any issue/corruption occurs then this command helps to identify and fix the issue.

Executing the DBCC command is very resource-intensive. It causes constrain on both memory and disk. Running DBCC command on such a large database can be very risky because if the command does not execute in the allotted time frame and if we try to KILL the execution process it will go in ROLLBACK state. This is not only time consuming but also jeopardizes the consistency of the database. Hence running this command on a very large OLTP database is not a feasible option.

The speed of completion of this command completely depends on the Memory provided and the type of RAID used for hosting the tempdb database.

Other options can be used such as DBCC CHECKTABLE against individual tables or groups of tables on a rotating basis, DBCC CheckDB WITH PHYSICAL_ONLY. This option limits the processing to check the integrity of the physical structure of the page and record headers in addition to the consistency between the pages for the allocation structures (data and indexes).

The most recommended and best option is to have a STANDBY server, restore the backup of the production database on that server, and then run the DBCC command. If the consistency checks run ok on the standby database, the production database should be ok as it is the source of the standby. If the standby database reports corruption, then DBCCs or other tests for corruption can be run against the production database.

I hope this gives you a better understanding of the complexity and management options for very large databases in SQL Server.

Backup Report via Email using Power Shell Script.

$ServerList = "C:\DBA\Servers.csv"

$OutputFile = "C:\DBA\Output.htm"

$emlist="sqldbanow@gmail.com"

$MailServer="smtp.sqldbanow.com"

$HTML = '<style type="text/css">

#Header{font-family:"Trebuchet MS", Arial, Helvetica, sans-serif;width:100%;border-collapse:collapse;}

#Header td, #Header th {font-size:14px;border:1px solid #98bf21;padding:3px 7px 2px 7px;}

#Header th {font-size:14px;text-align:left;padding-top:5px;padding-bottom:4px;background-color:#A7C942;color:#fff;}

#Header tr.alt td {color:#000;background-color:#EAF2D3;}

</Style>'

$HTML += "<HTML><BODY><Table border=1 cellpadding=0 cellspacing=0 width=100% id=Header>

<TR>

<TH><B>Database Name</B></TH>

<TH><B>RecoveryModel</B></TD>

<TH><B>Last Full Backup Date</B></TH>

<TH><B>Last Differential Backup Date</B></TH>

<TH><B>Last Log Backup Date</B></TH>

</TR>"

[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO') | out-null

Import-CSV $ServerList |ForEach-Object {

$ServerName=$_.ServerName

$AppName=$_.ApplicationName

$HTML += "<TR bgColor='#ccff66'><TD colspan=8 align=center>$ServerName - $AppName</TD></TR>"

$SQLServer = New-Object ('Microsoft.SqlServer.Management.Smo.Server') $ServerName

Foreach($Database in $SQLServer.Databases)

{

$DaysSince = ((Get-Date) - $Database.LastBackupDate).Days

$DaysSinceDiff = ((Get-Date) - $Database.LastDifferentialBackupDate).Days

$DaysSinceLog = ((Get-Date) - $Database.LastLogBackupDate).Days

IF(($Database.Name) -ne 'tempdb' -and ($Database.Name) -ne 'model')

{

if ($Database.RecoveryModel -like "simple" )

{

if ($DaysSince -gt 1){

$HTML += "<TR >

<TD>$($Database.Name)</TD>

<TD>$($Database.RecoveryModel)</TD>

<TD bgcolor='RED'>$($Database.LastBackupDate)</TD>

<TD>$($Database.LastDifferentialBackupDate)</TD>

</TR>"

}

if ($Database.RecoveryModel -like "full" )

{

if ($DaysSince -gt 1){

$HTML += "<TR >

<TD>$($Database.Name)</TD>

<TD>$($Database.RecoveryModel)</TD>

<TD bgcolor='RED'>$($Database.LastBackupDate)</TD>

<TD>$($Database.LastDifferentialBackupDate)</TD>

<TD>$($Database.LastLogBackupDate)</TD>

</TR>"

}

if ($DaysSince -lt 1)

{

$HTML += "<TR >

<TD>$($Database.Name)</TD>

<TD>$($Database.RecoveryModel)</TD>

<TD bgcolor='GREEN'>$($Database.LastBackupDate)</TD>

<TD>$($Database.LastDifferentialBackupDate)</TD>

<TD>$($Database.LastLogBackupDate)</TD>

</TR>"

}

$HTML += "</Table></BODY></HTML>"

$HTML | Out-File $OutputFile

Function sendEmail

{

param($from,$to,$subject,$smtphost,$htmlFileName)

$body = Get-Content $htmlFileName

$body = New-Object System.Net.Mail.MailMessage $from, "$to", $subject, $body

$body.isBodyhtml = $true

$smtpServer = $MailServer

$smtp = new-object Net.Mail.SmtpClient($smtpServer)

$smtp.Send($body)

}

$date = ( get-date ).ToString('MM/dd/yyyy')

$emlist

sendEmail sqldbanow@gmail.com $emlist "SQLDBANOW Test Server Backup Report for - $Date" $MailServer $OutputFile

---------------------------------------------

-- You need to create one csv file with servers list as below mentioned screenshot C:\DBA\Servers.csv

Finding the Largest Tables in SQL Server for Performance Optimization

Finding the Largest Tables in SQL Server for Performance Optimization

When optimizing database performance, one of the key areas to focus on is index design. Often, slow queries are caused by inefficient indexing, especially when dealing with large tables. Identifying the largest tables in your database can help you prioritize index tuning and optimize storage.

Identifying the Top 20 Largest Tables in SQL Server

The following SQL query will return to the top 20 largest tables in your SQL Server database based on the amount of space they consume. It calculates both used and allocated space for each table:

SELECT TOP 20

SCHEMA_NAME(tab.schema_id) + '.' + tab.name AS [Table],

CAST(SUM(spc.used_pages * 8) / 1024.00 AS NUMERIC(36, 2)) AS Used_MB,

CAST(SUM(spc.total_pages * 8) / 1024.00 AS NUMERIC(36, 2)) AS Allocated_MB

FROM sys.tables tab

JOIN sys.indexes ind ON tab.object_id = ind.object_id

JOIN sys.partitions part ON ind.object_id = part.object_id AND ind.index_id = part.index_id

JOIN sys.allocation_units spc ON part.partition_id = spc.container_id

GROUP BY SCHEMA_NAME(tab.schema_id) + '.' + tab.name

ORDER BY SUM(spc.used_pages) DESC;

Review bad indexes

-- Review bad indexes

--Possible Bad NC Indexes (writes > reads)

-- Taking into connsideration the complete workload, and how long your instance has been running

-- Advice not to drop or disable indexes unless the is 100% certainty that it is no loger substantially benefitial

SELECT OBJECT_NAME(s.[object_id]) AS [Table Name], i.name AS [Index Name], i.index_id,

is_disabled, i.is_hypothetical, i.has_filter, i.fill_factor,

user_updates AS [Total Writes], user_seeks + user_scans + user_lookups AS [Total Reads],

user_updates - (user_seeks + user_scans + user_lookups) AS [Difference]

FROM sys.dm_db_index_usage_stats AS s WITH (NOLOCK)

INNER JOIN sys.indexes AS i WITH (NOLOCK)

ON s.[object_id] = i.[object_id]

AND i.index_id = s.index_id

WHERE OBJECTPROPERTY(s.[object_id],'IsUserTable') = 1

AND s.database_id = DB_ID()

AND user_updates > (user_seeks + user_scans + user_lookups)

AND i.index_id > 1

ORDER BY [Difference] DESC, [Total Writes] DESC, [Total Reads] ASC OPTION (RECOMPILE);

Improve the index design-- Review missing indexes

-- Review missing indexes

--Missing Index Query

SELECT TOP 25 dm_mid.database_id AS DatabaseID, dm_migs.avg_user_impact*(dm_migs.user_seeks+dm_migs.user_scans) Avg_Estimated_Impact, dm_migs.last_user_seek AS Last_User_Seek, OBJECT_NAME(dm_mid.OBJECT_ID,dm_mid.database_id) AS [TableName], 'CREATE INDEX [IX_' + OBJECT_NAME(dm_mid.OBJECT_ID,dm_mid.database_id) + '_' + REPLACE(REPLACE(REPLACE(ISNULL(dm_mid.equality_columns,''),', ','_'),'[',''),']','') + CASE WHEN dm_mid.equality_columns IS NOT NULL AND dm_mid.inequality_columns IS NOT NULL THEN '_' ELSE '' END + REPLACE(REPLACE(REPLACE(ISNULL(dm_mid.inequality_columns,''),', ','_'),'[',''),']','') + ']' + ' ON ' + dm_mid.statement + ' (' + ISNULL (dm_mid.equality_columns,'') + CASE WHEN dm_mid.equality_columns IS NOT NULL AND dm_mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + ISNULL (dm_mid.inequality_columns, '') + ')' + ISNULL (' INCLUDE (' + dm_mid.included_columns + ')', '') AS Create_Statement FROM sys.dm_db_missing_index_groups dm_mig INNER JOIN sys.dm_db_missing_index_group_stats dm_migs ON dm_migs.group_handle = dm_mig.index_group_handle INNER JOIN sys.dm_db_missing_index_details dm_mid ON dm_mig.index_handle = dm_mid.index_handle WHERE dm_mid.database_ID = DB_ID() ORDER BY Avg_Estimated_Impact DESC

SQL Server DBA and Azure SQL Insights

Pages

How to Manage a Very Large Database in SQL Server

Backup Report via Email using Power Shell Script.

Finding the Largest Tables in SQL Server for Performance Optimization

Review bad indexes

Improve the index design-- Review missing indexes

Report Abuse

My Blog List