SQL SERVER DBA and Azure SQL

TSQL Scripts for Troubleshooting Common Issues with SQL Server

This blog provides TSQL samples for troubleshooting common SQL Server issues as listed below. You can modify the parameters (i.e. database name, table name, keyword name, duration, etc.) based on customized environment settings and requirements. (Test these scripts before implementing in PROD environment. Please be aware of all potential risks for implementing the script in your PROD environment.)

1. Performance Troubleshooting(blocking, high CPU, memory, idle, query execution)

2. Deadlock and Database Object ID mapping

3. HA (Clustered SQL Server)

4. Backup & Transaction Log Related Issue

5. Query Store (QDS)

6. Database Encryption (TDE)

7. Tool (Profiler trace and X-event)

Performance Troubleshooting

=========================

(blocking, high CPU, memory, idle, query execution)

1.List all active sessions and its queries that contain your target table name or specific TSQL structure (input your target key word)

SELECT sqltext.TEXT,

req.session_id,

req.status,

req.command,

req.cpu_time,req.database_id,

req.total_elapsed_time

FROM sys.dm_exec_requests req

CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS sqltext where sqltext.text like '%Your Target Key Word%'

2.List all sleeping user sessions that have been idle for over 15 minutes with detailed queries (You can customize the session’s status and idle time)

SELECT CURRENT_TIMESTAMP as currenttime, datediff(minute,last_batch,GETDATE()) as 'idletime_in_minute' ,sp.status,sp.spid,sp.login_time,sp.program_name,sp.hostprocess,sp.loginame,text FROM sys.sysprocesses sp CROSS APPLY sys.dm_exec_sql_text(sp.sql_handle) AS QT where sp.status = 'sleeping' and datediff(minute,last_batch,GETDATE()) >15 and spid>50

3.List top 10 high CPU queries that currently running in this SQL instance

SELECT s.session_id,r.status,r.blocking_session_id 'Blk by',r.wait_type,wait_resource,r.wait_time / (1000 * 60) 'Wait M',r.cpu_time,r.logical_reads,r.reads,r.writes,r.total_elapsed_time / (1000 * 60) 'Elaps M',Substring(st.TEXT,(r.statement_start_offset / 2) + 1,

((CASE r.statement_end_offset WHEN -1 THEN Datalength(st.TEXT) ELSE r.statement_end_offset END - r.statement_start_offset) / 2) + 1) AS statement_text,

Coalesce(Quotename(Db_name(st.dbid)) + N'.' + Quotename(Object_schema_name(st.objectid, st.dbid)) + N'.' + Quotename(Object_name(st.objectid, st.dbid)), '') AS command_text,r.command,s.login_name,

s.host_name,s.program_name,s.last_request_end_time,s.login_time, r.open_transaction_count FROM sys.dm_exec_sessions AS s JOIN sys.dm_exec_requests AS r ON r.session_id = s.session_id

CROSS APPLY sys.Dm_exec_sql_text(r.sql_handle) AS st WHERE r.session_id != @@SPID ORDER BY r.cpu_time desc

4. List top 10 high memory usage queries that currently running in this SQL instance

SELECT mg.session_id,mg.granted_memory_kb,mg.requested_memory_kb,mg.ideal_memory_kb,mg.request_time,mg.grant_time,mg.query_cost,mg.dop,st.[TEXT],qp.query_plan

FROM sys.dm_exec_query_memory_grants AS mg CROSS APPLY sys.dm_exec_sql_text(mg.plan_handle) AS st

CROSS APPLY sys.dm_exec_query_plan(mg.plan_handle) AS qp ORDER BY mg.required_memory_kb DESC

5. List detailed memory usage for each memory clerk

DBCC MEMORYSTATUS

6. List memory usage for plan cache and its maximum size based on current setting. By removing the WHERE condition, you will get a full list for all memory cache clerks’ information

select name, type, buckets_count

from sys.dm_os_memory_cache_hash_tables

where name IN ( 'SQL Plans' , 'Object Plans' , 'Bound Trees' ,'Extended Stored Procedures')

select name, type, pages_kb, entries_count

from sys.dm_os_memory_cache_counters

where name IN ( 'SQL Plans' , 'Object Plans' , 'Bound Trees' ,'Extended Stored Procedures')

7. List progressive (on-going) execution plan for a specific session (Starting with SQL 2016)

For SQL 2016 and 2017, please first run below TSQL in the query session of which the execution plan you wish to extract later

set statistics profile on

Run below query to extract the on-going execution plan for your target session (input the SPID in the bracket)

SELECT * FROM sys.dm_exec_query_statistics_xml(59);

8.List all block header queries that currently detected in this SQL instance

declare @blocker varchar(100),@sql varchar(100)

print convert(varchar(20), getdate(),120)

select distinct blocked into #blocker from sysprocesses where blocked <> 0

DECLARE blocker CURSOR FOR

select spid from sysprocesses where spid in (select * from #blocker) and blocked = 0

OPEN blocker

FETCH NEXT

FROM blocker

INTO @blocker

WHILE @@FETCH_STATUS = 0

BEGIN

set @sql='DBCC inputbuffer(' + @blocker + ')'

execute (@sql)

set @sql= 'select * from sysprocesses where spid=' + @blocker

execute (@sql)

FETCH NEXT

FROM blocker

INTO @blocker

END

CLOSE blocker

DEALLOCATE blocker

drop table #blocker

9.Kill all sleeping sessions that has been idle over 1 hour (You can customize the idle duration)

DECLARE @user_spid INT

DECLARE CurSPID CURSOR FAST_FORWARD

FOR

SELECT SPID

FROM master.dbo.sysprocesses (NOLOCK)

WHERE spid>50

AND status='sleeping' -- only sleeping threads

AND DATEDIFF(HOUR,last_batch,GETDATE())>=1 -- thread sleeping for 1 hours

AND spid<>@@spid -- ignore current spid

OPEN CurSPID

FETCH NEXT FROM CurSPID INTO @user_spid

WHILE (@@FETCH_STATUS=0)

BEGIN

PRINT 'Killing '+CONVERT(VARCHAR,@user_spid)

EXEC('KILL '+@user_spid)

FETCH NEXT FROM CurSPID INTO @user_spid

END

CLOSE CurSPID

DEALLOCATE CurSPID

10.Kill all block headers. This script will continuously scan every 5 seconds for all block header sessions and kill all block headers

Use master

while 1=1

Begin

declare @blocker varchar(100),@sql varchar(100)

print convert(varchar(20), getdate(),120)

select distinct blocked into #blocker from sysprocesses where blocked <> 0

DECLARE blocker CURSOR FOR

select spid from sysprocesses where spid in (select * from #blocker) and status='sleeping'

OPEN blocker

FETCH NEXT

FROM blocker

INTO @blocker

WHILE @@FETCH_STATUS = 0

BEGIN

set @sql='DBCC inputbuffer(' + @blocker + ')'

execute (@sql)

set @sql= 'kill ' + @blocker

execute (@sql)

FETCH NEXT

FROM blocker

INTO @blocker

END

CLOSE blocker

DEALLOCATE blocker

drop table #blocker

waitfor delay '0:0:05'

End

Deadlock and Database Object ID mapping

===================================

Below are 3 examples to map the issued object based on the ID to its source database, index or schema.

1. Key type wait resource

waitresource=KEY: 6:12345678990 (987654321a9b)

database_id = 6
hobt_id = 12345678990
hash value = (987654321a9b)

We can use below TSQL to check the database name based on DATABASE ID

select db_name(6)

Use below TSQL to check the schema, object, and index details related to this key

USE DatabaseName

SELECT

sc.name as schema_name,

so.name as object_name,

si.name as index_name

FROM sys.partitions AS p

JOIN sys.objects as so on

p.object_id=so.object_id

JOIN sys.indexes as si on

p.index_id=si.index_id and

p.object_id=si.object_id

JOIN sys.schemas AS sc on

so.schema_id=sc.schema_id

WHERE hobt_id = 12345678990

2. Object Wait resource type

waitresource=OBJECT: 6:1234567890:4

database_id = 6
Object ID = 1234567890

We can use below TSQL to check the specific object name based on the waitresource details

select OBJECT_NAME(1234567890,6)

3. Page Wait Resource Type

waitresource=“PAGE: 6:3:70133 ” = Database_Id : FileId : PageNumber

database_id=6
data_file_id = 3
page_number = 70133

HA (Clustered SQL Server)

===================

List the latest error encountered by local AG replica for connection timeout

select r.replica_server_name, r.endpoint_url,

rs.connected_state_desc, rs.last_connect_error_description,

rs.last_connect_error_number, rs.last_connect_error_timestamp

from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r

on rs.replica_id=r.replica_id

where rs.is_local=1

List the current redo rate, redo queue size, log send rate, log send queue size for current AG replica

SELECT CURRENT_TIMESTAMP as currenttime,drs.last_commit_time,ar.replica_server_name, adc.database_name, ag.name AS ag_name, drs.is_local, drs.is_primary_replica, drs.log_send_queue_size, drs.log_send_rate, drs.redo_queue_size, drs.redo_rate FROM sys.dm_hadr_database_replica_states AS drs INNER JOIN sys.availability_databases_cluster AS adc ON drs.group_id = adc.group_id AND drs.group_database_id = adc.group_database_id INNER JOIN sys.availability_groups AS ag ON ag.group_id = drs.group_id INNER JOIN sys.availability_replicas AS ar ON drs.group_id = ar.group_id AND drs.replica_id = ar.replica_id

Restart endpoint
- Check SQL AG endpoint name

SELECT * FROM sys.endpoints

Restart AG endpoint on this local replica

ALTER ENDPOINT <Your AG Endpoint Name> STATE=STOPPED

ALTER ENDPOINT <Your AG Endpoint Name> STATE=STARTED

Backup & Transaction Log Related Issue

================================

List all backup history and backup file details for the past 7 days (You can modify the date)

SELECT CONVERT(CHAR(100), SERVERPROPERTY('Servername')) AS Server, msdb.dbo.backupset.database_name, msdb.dbo.backupset.backup_start_date, msdb.dbo.backupset.backup_finish_date, msdb.dbo.backupset.expiration_date,

CASE msdb..backupset.type WHEN 'D' THEN 'Database' WHEN 'L' THEN 'Log' END AS backup_type, msdb.dbo.backupset.backup_size, msdb.dbo.backupmediafamily.logical_device_name,

msdb.dbo.backupmediafamily.physical_device_name, msdb.dbo.backupset.name AS backupset_name, msdb.dbo.backupset.description

FROM msdb.dbo.backupmediafamily INNER JOIN msdb.dbo.backupset ON msdb.dbo.backupmediafamily.media_set_id = msdb.dbo.backupset.media_set_id

WHERE (CONVERT(datetime, msdb.dbo.backupset.backup_start_date, 102) >= GETDATE() - 7)

ORDER BY msdb.dbo.backupset.database_name, msdb.dbo.backupset.backup_finish_date

List all databases’ data and log file path and logical name

USE master;

SELECT name 'Logical Name', physical_name 'File Location' FROM sys.master_files;

List each database’ transaction log size usage and space

DBCC SQLPERF(LOGSPACE)

List VLF size and counts for all databases as well as detailed size for your target database

select * from sys.dm_db_log_info(5); /*input here your DB ID to get detailed size for existing VLFs on this database*/

SELECT [name], s.database_id,

COUNT(l.database_id) AS 'VLF Count',

SUM(vlf_size_mb) AS 'VLF Size (MB)',

SUM(CAST(vlf_active AS INT)) AS 'Active VLF',

SUM(vlf_active*vlf_size_mb) AS 'Active VLF Size (MB)',

COUNT(l.database_id)-SUM(CAST(vlf_active AS INT)) AS 'In-active VLF',

SUM(vlf_size_mb)-SUM(vlf_active*vlf_size_mb) AS 'In-active VLF Size (MB)'

FROM sys.databases s

CROSS APPLY sys.dm_db_log_info(s.database_id) l

GROUP BY [name], s.database_id

ORDER BY 'VLF Count' DESC

For the purpose of shrinking transaction log file, check the log_reuse_wait_desc type for all database

select name,database_id,log_reuse_wait, log_reuse_wait_desc from sys.databases

Please refer this link for how to fix each of the log_reuse_wait_desc type when it is not “NOTHING” https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-databases-transac...

Database Corruption Issue

Force repair of database with REPAIR_ALLOW_DATA_LOSS option (data loss expected and sometimes can cause more damage. Please refrain from using this method unless as for a last resort. More details for fixing database corruption, please refer this link https://docs.microsoft.com/en-us/sql/t-sql/database-console-commands/dbcc-checkdb-transact-sql?view=...)

Alter database YourDBName set single_user with rollback immediate;

DBCC CHECKDB('YourDBName', REPAIR_ALLOW_DATA_LOSS);

Alter database YourDBName set multi_user with rollback immediate;

Query Store (QDS)

=================

1.List a database’s query store status

select * from sys.database_query_store_options

2. List all databases that have query store configured

select name as 'DATABASE NAME',

CASE is_query_store_on when 1 then 'ENABLED'

else 'OTHER' END AS 'QUERY STORE STATE'

from sys.databases

where is_query_store_on = 1

order by 1 ;

Database Encryption (TDE)

======================

Check if the DMK exist in master database

USE master

SELECT * FROM sys.symmetric_keys WHERE name = '##MS_DatabaseMasterKey##'

Check if the certificate is created in master database. A certificate is equivalent to an asymmetric key

USE master

select * from sys.certificates

Check if the database is encrypted/progress (encryption_state = 3 encrypted; =2 in progress)

USE master

SELECT db_name(database_id) [TDE Encrypted DB Name], c.name as CertName, encryptor_thumbprint , dek.* FROM sys.dm_database_encryption_keys dek INNER JOIN sys.certificates c on dek.encryptor_thumbprint = c.thumbprint

Check if the DMK in the master database is now encrypted by the SMK (is_master_key_encrypted_by_server = 1)

select is_master_key_encrypted_by_server, * from sys.databases

where database_id = db_id(N'master')

Tool (Profiler trace and X-event)

=======================

Import SQL Profiler trace into SQL Server database tables

USE DatabaseName

select * into Sample from fn_trace_gettable('c:\trace\YourFolderToStoreTheTrace\YourTraceFile.trc',default) where eventclass in (10, 12)

For event class ID, check the list in this link https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-trace-setevent...

Use XELoader to process X-event and import the data into SQL database table for data analysis purpose.

XELoader can be downloaded from this opensource link

https://github.com/suresh-kandoth/XELoader

Use CMD script to load the X-event files into SQL Server database

Sample script:

C:\XELoader\6.2>XELoader.exe -D"C:\Users\yixin\Desktop\xeloader\xevent" -SYourServerName\InstanceName -dYourDBName

Use TSQL query to check aggregated data result

Sample TSQL:

SELECT sum(c_duration) as SUM_DURAION,sum(c_signal_duration) as SUM_SIGNAL_DURATION,c_wait_type from [xel].[wait_info] group by c_wait_type order by SUM_DURAION desc

DISCLAIMER : THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

Source

How to Remove “Restricted User” in SQL Server

Use below mentioned code for resolving this issue.

Use [SQLDBANOW]

ALTER DATABASE [SQLDBANOW] SET SINGLE_USER WITH ROLLBACK IMMEDIATE

ALTER DATABASE [SQLDBANOW] SET MULTI_USER

How much memory is each SQL Server database using?

As you know SQL Server stores database data and index pages in memory in an area know as the Buffer Pool.

You can use the DMV sys.dm_os_buffer_descriptors to see how much of the buffer pool memory is being used by which database. This snippet will tell you how many clean and dirty (modified since last checkpoint or read from disk) pages from each database are in the buffer pool. You can modify further.

SELECT

(CASE WHEN ([is_modified] = 1) THEN 'Dirty' ELSE 'Clean' END) AS 'Page State',

(CASE WHEN ([database_id] = 32767) THEN 'Resource Database' ELSE DB_NAME (database_id) END) AS 'Database Name',

COUNT (*) AS 'Page Count'

FROM sys.dm_os_buffer_descriptors

GROUP BY [database_id], [is_modified]

ORDER BY [database_id], [is_modified];

GO

Availability Group issues fixed with Alerts

Availability Group issues fixed with Alerts

Being able to work with the Availability Groups and closely analyzing the SQL service, I am amazed by how all of that works. These SQL servers work fast and quickly without any Glitch. However, the service still requires the monitoring and analysis to ensure they are working in the right form. But don’t you think SQL server would be perfect to solve issues of AG?

Well, the answer to this would be a big yes!

In this post regarding availability group issues, I will mention some of the problems that I came across while working with the Availability Groups. For those issues and problems, what automatic solution can be done.

Problem: 1 Suspended data movement

The first and foremost issues that most people encounter in the Availability Group is the data that is glitching and not moving between the primary and secondary replica. This action or Glitch creates a suspended state for the movement of the data.

Because of the suspended data issue, one would want to restore the data as soon as possible and get the flow going again. But if the data is suspended for a long while, one has to take actions that will require a lot of effort. You might have to remove the database from the Availability Groups, or you will have to get the data or the actions back in sync by attaching the files and restoring the log files. So, why not try the SQL server to resolve this issue and get the data out of the suspended state.

Solution resuming the data

The solution to this data resuming problem is usually very easy. You can Resume the suspended data back into the data movement by issuing the t-SQL statement.

ALTER DATABASE [DB] SET HADR RESUME;

One just has to know that the data movement has been suspended for a particular database. However, time matters a lot to determine and respond to the suspended data movement. One has to understand the drastic major that could be taken to get the data back in sync. The SQL Server will raise an error 35264 in case of data suspension. This sys message is;

AlwaysOn Availability Groups data movement for database ‘%.*ls’ has been suspended for the following reason: “%S_MSG” (Source ID %d; Source string: ‘%.*ls’). One has to Resume the database manually to Resume the data Movement in a particular database. For resuming the database and availability of databases, you can check the SQL online server books.

Automating with SQL server agent alerts for the resumption of data movement

For a particular error that is running the job, you have to create the SQL agent alert. It will help in the automation of the resuming data movement. To discover and determine which data movement has been suspended out of which particular database, we have to hit this system of dynamic management views to ensure that we know a particular database for further resuming the data movement. The DMV sys DM HADR replica database will guide you through and tell you what particular database is suspended and which data movement needs to be resumed. You can create the Alert and the job as follows.

Job script to resume data movement

By creating the job and alert, the availability group will get the Alert as soon as the data is suspended. It will immediately create the t SQL statement for resuming the data and get it out of the suspended state. However, when you have finally created the job, you need to take another step. An alert also needs to set up to trap the error and to run the job of data resuming successfully.

Trapping suspended data movement with an alertConfigure alert to run the job

When the database is in the suspended state, the SQL server will finally trap the error and will help in running the job after the creation of the job and the Alert. But if you are wondering how you can start the data movement and get it flowing again and out of the suspended state, you will have to work on the script below.

Wrapping it all up in a script

The T-SQL script to create the job and alert is:

/******************************************************************************

Create job to resume AG database data movement for suspended databases.

Create alert to catch when data movement has been suspended, and run the job.

*******************************************************************************

MODIFICATION LOG

*******************************************************************************

2018-11-20 WGS Initial creation.

******************************************************************************/

USE [msdb]

BEGIN TRANSACTION;

DECLARE @ReturnCode INT;

SELECT @ReturnCode = 0;DECLARE @jobname sysname = N’AG – Resume Data Movement’;

DECLARE @categoryname sysname = ‘HADR-Availability Group’;

IF EXISTS (SELECT name FROM dbo.sysjobs where name = @jobname)

BEGIN

EXECUTE msdb.dbo.sp_delete_job @job_name = @jobname;

END;

IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=@categoryname AND category_class=1)

BEGIN

EXECUTE @ReturnCode = msdb.dbo.sp_add_category @class=N’JOB’, @type=N’LOCAL’, @name=@categoryname;

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;

END;

EXECUTE @ReturnCode = msdb.dbo.sp_add_job

@job_name=@jobname,

@enabled=1,

@notify_level_eventlog=0,

@notify_level_email=2,

@notify_level_netsend=2,

@notify_level_page=2,

@delete_level=0,

@description=N’Resume data movement on suspended Availability Group databases.

This job can be run manually, or from an alert’,

@category_name=N’HADR-Availability Group’,

@owner_login_name=N’sa’;

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;

EXECUTE @ReturnCode = msdb.dbo.sp_add_jobserver @job_name=@jobname, @server_name = N'(local)’

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;

EXECUTE @ReturnCode = msdb.dbo.sp_add_jobstep

@job_name=@jobname,

@step_name=N’Resume data movement in AG databases’,

@step_id=1,

@cmdexec_success_code=0,

@on_success_action=1,

@on_fail_action=2,

@retry_attempts=0,

@retry_interval=0,

@os_run_priority=0,

@subsystem=N’TSQL’,

@command=N’DECLARE @SQLCMD VARCHAR(1000);

DECLARE cDBSuspended CURSOR FOR

SELECT ”ALTER DATABASE [” + DB_NAME(database_id) + ”] SET HADR RESUME;”

FROM sys.dm_hadr_database_replica_states drs

JOIN sys.availability_replicas ar ON ar.replica_id = drs.replica_id

WHERE ar.replica_server_name = @@SERVERNAME

AND drs.is_suspended = 1;

OPEN cDBSuspended;

FETCH NEXT FROM cDBSuspended INTO @SQLCMD;

WHILE @@FETCH_STATUS = 0

BEGIN

EXECUTE (@SQLCMD);

FETCH NEXT FROM cDBSuspended INTO @SQLCMD;

END;

CLOSE cDBSuspended;

DEALLOCATE cDBSuspended;

‘,

@database_name=N’master’,

@flags=0;

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;

EXECUTE @ReturnCode = msdb.dbo.sp_update_job

@job_name=@jobname,

@enabled=1,

@start_step_id=1,

@notify_level_eventlog=0,

@notify_level_email=2,

@notify_level_netsend=2,

@notify_level_page=2,

@delete_level=0,

@description=N’Resume data movement on suspended Availability Group databases.’,

@category_name=N’HADR-Availability Group’,

@owner_login_name=N’sa’,

@notify_email_operator_name=N”,

@notify_netsend_operator_name=N”,

@notify_page_operator_name=N”;

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;

DECLARE @alertname sysname = N’AG Data Movement suspended’;

IF EXISTS (SELECT * FROM msdb.dbo.sysalerts WHERE name = @alertname)

BEGIN

EXECUTE @ReturnCode = msdb.dbo.sp_delete_alert @alertname;

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;

END;

Alert text:

AlwaysOn Availability Groups data movement for database ‘%.*ls’ has been suspended for the following reason: “%S_MSG” (Source ID %d; Source string: ‘%.*ls’). To resume data movement on the database, you will need to resume the database manually. For information about how to resume an availability database, see SQL Server Books Online.

EXECUTE @ReturnCode = msdb.dbo.sp_add_alert

@name=@alertname,

@message_id=35264,

@severity=0,

@enabled=1,

@delay_between_responses=0,

@include_event_description_in=0,

@job_name = @jobname;

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;

COMMIT TRANSACTION;

GOTO EndSave

QuitWithRollback:

IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION;

EndSave:

Problem: 2 Failover

One other issue that you will come across while you are working with the availability groups is the failover condition. When some data needs to be updated from the database, the available third party intervenes for showcasing the primary replica. However, the adjustment is required and needed for the cluster witness configuration. There may be a possibility that you will have to turn off the old primary replica. In this case, the SQL shows an error 1480 that interprets as failover of Availability Group. However, it can occur for both the primary replica of new and old databases. You can read the text of the error from sys messages as;

The %S_MSG database “%.*ls” is changing roles from “%ls” to “%ls” because the mirroring session or availability group failed over due to %S_MSG. However, the message that is shown after the error is only for the information of the user, and you cannot take any action on this message.

There are three roles that each replica will go through in the process of failover. When there is an old primary replica, it will change from primary to resolving state and then to do the secondary state. However, the old secondary replica I will change from secondary to resolving state and then to the primary state. You can track all of these roles and Change of these states in the error message.

There are three possible conditions for which the alerts can be shown. The first condition is if the failover occurred. Secondly, if the replica is in the instance of the primary role and the failover has occurred. Whereas the last and the third potential alert can show that the replica is now at the secondary instance. The script that is available below will help in creating the last two potential conditions of the alerts.

Use this event to run a job when the replica becomes primary

USE [msdb]

EXEC msdb.dbo.sp_add_alert @name=N’AG Failover Detected – Now Primary’,

@message_id=1480,

@severity=0,

@enabled=1,

@delay_between_responses=0,

@include_event_description_in=0,

@event_description_keyword=N'”RESOLVING” to “PRIMARY”‘,

@job_id=N’00000000-0000-0000-0000-000000000000′

Use this event to run a job when the replica becomes secondary

USE [msdb]

EXEC msdb.dbo.sp_add_alert @name=N’AG Failover Detected – Now Secondary’,

@message_id=1480,

@severity=0,

@enabled=1,

@delay_between_responses=0,

@include_event_description_in=0,

@event_description_keyword=N'”RESOLVING” to “SECONDARY”‘,

@job_id=N’00000000-0000-0000-0000-000000000000′

When you have successfully added the alerts, it will help in the modification of the error to successfully run a particular job. These alerts will not only tell you about the failover condition but also help you to know the instance of the replica. If you just want to know about the failover, you can conveniently remove the message on the Alert to alert on.

Configuring alert to detect AG failover

Wrap up

In the above article, we have successfully learned how to capture the issues in the system and fix the availability groups using the alerting system of the SQL server agent. We have also learned to respond to these errors and alerts by running the jobs after the creation of alerts and fix these issues using the SQL server.

Reference: Article

SQL Server - Choosing Between AWS EC2 vs AWS RDS

Database Solution - SQL Server Cloud Migration Strategy

The below mentioned table provides a side-by-side comparison of SQL Server features supported on Amazon RDS and Amazon EC2. Use this information to understand the differences between the two services and to choose the best approach for your use case.

SQL SERVER DBA and Azure SQL

Pages

TSQL Scripts for Troubleshooting Common Issues with SQL Server

How to Remove “Restricted User” in SQL Server

How much memory is each SQL Server database using?

Availability Group issues fixed with Alerts

SQL Server - Choosing Between AWS EC2 vs AWS RDS

Report Abuse

My Blog List