SQL SERVER DBA and Azure SQL

Availability Group issues fixed with Alerts

Availability Group issues fixed with Alerts

Being able to work with the Availability Groups and closely analyzing the SQL service, I am amazed by how all of that works. These SQL servers work fast and quickly without any Glitch. However, the service still requires the monitoring and analysis to ensure they are working in the right form. But don’t you think SQL server would be perfect to solve issues of AG?

Well, the answer to this would be a big yes!

In this post regarding availability group issues, I will mention some of the problems that I came across while working with the Availability Groups. For those issues and problems, what automatic solution can be done.

Problem: 1 Suspended data movement

The first and foremost issues that most people encounter in the Availability Group is the data that is glitching and not moving between the primary and secondary replica. This action or Glitch creates a suspended state for the movement of the data.

Because of the suspended data issue, one would want to restore the data as soon as possible and get the flow going again. But if the data is suspended for a long while, one has to take actions that will require a lot of effort. You might have to remove the database from the Availability Groups, or you will have to get the data or the actions back in sync by attaching the files and restoring the log files. So, why not try the SQL server to resolve this issue and get the data out of the suspended state.

Solution resuming the data

The solution to this data resuming problem is usually very easy. You can Resume the suspended data back into the data movement by issuing the t-SQL statement.

ALTER DATABASE [DB] SET HADR RESUME;

One just has to know that the data movement has been suspended for a particular database. However, time matters a lot to determine and respond to the suspended data movement. One has to understand the drastic major that could be taken to get the data back in sync. The SQL Server will raise an error 35264 in case of data suspension. This sys message is;

AlwaysOn Availability Groups data movement for database ‘%.*ls’ has been suspended for the following reason: “%S_MSG” (Source ID %d; Source string: ‘%.*ls’). One has to Resume the database manually to Resume the data Movement in a particular database. For resuming the database and availability of databases, you can check the SQL online server books.

Automating with SQL server agent alerts for the resumption of data movement

For a particular error that is running the job, you have to create the SQL agent alert. It will help in the automation of the resuming data movement. To discover and determine which data movement has been suspended out of which particular database, we have to hit this system of dynamic management views to ensure that we know a particular database for further resuming the data movement. The DMV sys DM HADR replica database will guide you through and tell you what particular database is suspended and which data movement needs to be resumed. You can create the Alert and the job as follows.

Job script to resume data movement

By creating the job and alert, the availability group will get the Alert as soon as the data is suspended. It will immediately create the t SQL statement for resuming the data and get it out of the suspended state. However, when you have finally created the job, you need to take another step. An alert also needs to set up to trap the error and to run the job of data resuming successfully.

Trapping suspended data movement with an alertConfigure alert to run the job

When the database is in the suspended state, the SQL server will finally trap the error and will help in running the job after the creation of the job and the Alert. But if you are wondering how you can start the data movement and get it flowing again and out of the suspended state, you will have to work on the script below.

Wrapping it all up in a script

The T-SQL script to create the job and alert is:

/******************************************************************************

Create job to resume AG database data movement for suspended databases.

Create alert to catch when data movement has been suspended, and run the job.

*******************************************************************************

MODIFICATION LOG

*******************************************************************************

2018-11-20 WGS Initial creation.

******************************************************************************/

USE [msdb]

BEGIN TRANSACTION;

DECLARE @ReturnCode INT;

SELECT @ReturnCode = 0;DECLARE @jobname sysname = N’AG – Resume Data Movement’;

DECLARE @categoryname sysname = ‘HADR-Availability Group’;

IF EXISTS (SELECT name FROM dbo.sysjobs where name = @jobname)

BEGIN

EXECUTE msdb.dbo.sp_delete_job @job_name = @jobname;

END;

IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=@categoryname AND category_class=1)

BEGIN

EXECUTE @ReturnCode = msdb.dbo.sp_add_category @class=N’JOB’, @type=N’LOCAL’, @name=@categoryname;

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;

END;

EXECUTE @ReturnCode = msdb.dbo.sp_add_job

@job_name=@jobname,

@enabled=1,

@notify_level_eventlog=0,

@notify_level_email=2,

@notify_level_netsend=2,

@notify_level_page=2,

@delete_level=0,

@description=N’Resume data movement on suspended Availability Group databases.

This job can be run manually, or from an alert’,

@category_name=N’HADR-Availability Group’,

@owner_login_name=N’sa’;

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;

EXECUTE @ReturnCode = msdb.dbo.sp_add_jobserver @job_name=@jobname, @server_name = N'(local)’

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;

EXECUTE @ReturnCode = msdb.dbo.sp_add_jobstep

@job_name=@jobname,

@step_name=N’Resume data movement in AG databases’,

@step_id=1,

@cmdexec_success_code=0,

@on_success_action=1,

@on_fail_action=2,

@retry_attempts=0,

@retry_interval=0,

@os_run_priority=0,

@subsystem=N’TSQL’,

@command=N’DECLARE @SQLCMD VARCHAR(1000);

DECLARE cDBSuspended CURSOR FOR

SELECT ”ALTER DATABASE [” + DB_NAME(database_id) + ”] SET HADR RESUME;”

FROM sys.dm_hadr_database_replica_states drs

JOIN sys.availability_replicas ar ON ar.replica_id = drs.replica_id

WHERE ar.replica_server_name = @@SERVERNAME

AND drs.is_suspended = 1;

OPEN cDBSuspended;

FETCH NEXT FROM cDBSuspended INTO @SQLCMD;

WHILE @@FETCH_STATUS = 0

BEGIN

EXECUTE (@SQLCMD);

FETCH NEXT FROM cDBSuspended INTO @SQLCMD;

END;

CLOSE cDBSuspended;

DEALLOCATE cDBSuspended;

‘,

@database_name=N’master’,

@flags=0;

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;

EXECUTE @ReturnCode = msdb.dbo.sp_update_job

@job_name=@jobname,

@enabled=1,

@start_step_id=1,

@notify_level_eventlog=0,

@notify_level_email=2,

@notify_level_netsend=2,

@notify_level_page=2,

@delete_level=0,

@description=N’Resume data movement on suspended Availability Group databases.’,

@category_name=N’HADR-Availability Group’,

@owner_login_name=N’sa’,

@notify_email_operator_name=N”,

@notify_netsend_operator_name=N”,

@notify_page_operator_name=N”;

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;

DECLARE @alertname sysname = N’AG Data Movement suspended’;

IF EXISTS (SELECT * FROM msdb.dbo.sysalerts WHERE name = @alertname)

BEGIN

EXECUTE @ReturnCode = msdb.dbo.sp_delete_alert @alertname;

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;

END;

Alert text:

AlwaysOn Availability Groups data movement for database ‘%.*ls’ has been suspended for the following reason: “%S_MSG” (Source ID %d; Source string: ‘%.*ls’). To resume data movement on the database, you will need to resume the database manually. For information about how to resume an availability database, see SQL Server Books Online.

EXECUTE @ReturnCode = msdb.dbo.sp_add_alert

@name=@alertname,

@message_id=35264,

@severity=0,

@enabled=1,

@delay_between_responses=0,

@include_event_description_in=0,

@job_name = @jobname;

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;

COMMIT TRANSACTION;

GOTO EndSave

QuitWithRollback:

IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION;

EndSave:

Problem: 2 Failover

One other issue that you will come across while you are working with the availability groups is the failover condition. When some data needs to be updated from the database, the available third party intervenes for showcasing the primary replica. However, the adjustment is required and needed for the cluster witness configuration. There may be a possibility that you will have to turn off the old primary replica. In this case, the SQL shows an error 1480 that interprets as failover of Availability Group. However, it can occur for both the primary replica of new and old databases. You can read the text of the error from sys messages as;

The %S_MSG database “%.*ls” is changing roles from “%ls” to “%ls” because the mirroring session or availability group failed over due to %S_MSG. However, the message that is shown after the error is only for the information of the user, and you cannot take any action on this message.

There are three roles that each replica will go through in the process of failover. When there is an old primary replica, it will change from primary to resolving state and then to do the secondary state. However, the old secondary replica I will change from secondary to resolving state and then to the primary state. You can track all of these roles and Change of these states in the error message.

There are three possible conditions for which the alerts can be shown. The first condition is if the failover occurred. Secondly, if the replica is in the instance of the primary role and the failover has occurred. Whereas the last and the third potential alert can show that the replica is now at the secondary instance. The script that is available below will help in creating the last two potential conditions of the alerts.

Use this event to run a job when the replica becomes primary

USE [msdb]

EXEC msdb.dbo.sp_add_alert @name=N’AG Failover Detected – Now Primary’,

@message_id=1480,

@severity=0,

@enabled=1,

@delay_between_responses=0,

@include_event_description_in=0,

@event_description_keyword=N'”RESOLVING” to “PRIMARY”‘,

@job_id=N’00000000-0000-0000-0000-000000000000′

Use this event to run a job when the replica becomes secondary

USE [msdb]

EXEC msdb.dbo.sp_add_alert @name=N’AG Failover Detected – Now Secondary’,

@message_id=1480,

@severity=0,

@enabled=1,

@delay_between_responses=0,

@include_event_description_in=0,

@event_description_keyword=N'”RESOLVING” to “SECONDARY”‘,

@job_id=N’00000000-0000-0000-0000-000000000000′

When you have successfully added the alerts, it will help in the modification of the error to successfully run a particular job. These alerts will not only tell you about the failover condition but also help you to know the instance of the replica. If you just want to know about the failover, you can conveniently remove the message on the Alert to alert on.

Configuring alert to detect AG failover

Wrap up

In the above article, we have successfully learned how to capture the issues in the system and fix the availability groups using the alerting system of the SQL server agent. We have also learned to respond to these errors and alerts by running the jobs after the creation of alerts and fix these issues using the SQL server.

Reference: Article

SQL Server - Choosing Between AWS EC2 vs AWS RDS

Database Solution - SQL Server Cloud Migration Strategy

The below mentioned table provides a side-by-side comparison of SQL Server features supported on Amazon RDS and Amazon EC2. Use this information to understand the differences between the two services and to choose the best approach for your use case.

New and Enhanced Features in SQL Server 2022

SQL Server 2022 new features

The new features in SQL Server 2022 include:

Parameter Sensitive Plan Optimization

Query Store enhancements

Link to Azure SQL Managed Instance

Contained Availability Groups

Azure Synapse Link for SQL

Multi-Write Replication

Azure Active Directory authentication

Azure Purview integration

SQL Server Ledger

AWS S3 storage integration

Page Life Expectancy, Buffer Cache Hit Ratio

Page Life Expectancy (PLE) in SQL Server:

Page Life Expectancy is the number of seconds a page will stay in the buffer pool without references.

If your page stays for more time in the buffer pool your PLE is high which leads to high performance. Every time request comes there are more chances that it may find its data in the cache itself, instead of going to the hard drive to read the data.

PLE measures in seconds.

PLE is one of the performance counters of SQL Server.

Higher the PLE, performance of SQL Server is good.

As per recommendation, the value of PLE counter is around 300 seconds.

You can check this value for your SQL server using below DMV.

SELECT object_name, counter_name, cntr_value from sys.dm_os_performance_counters WHERE [object_name] LIKE '%Buffer Manger%' AND [counter_name] = 'page life expectancy'

Buffer Cache Hit Ratio

Buffer Cache Hit Ratio indicates the percentage of pages found in the buffer cache without having to read from the disk.

The ration is the total number of cache hits divided by the total number of cache lookups over the last few thousand page accesses.

Buffer Cache Hit Ratio is one of the performance counters of SQL server.

You can increase the buffer cache hit ratio by increasing the amount of memory available to SQL Server or by using the buffer pool extension feature.

You can check this value for your SQL server using below DMV.

SELECT object_name, counter_name, cntr_value from sys.dm_os_performance_counters WHERE [object_name] LIKE '%Buffer Manger%' AND [counter_name] = 'Buffer cache hit ratio'

Who is a DBA?

An IT Professional, who is responsible for Installation, Configuration, Upgrading, Administration, Monitoring, Maintenance and Security of Databases in any organization.

Phases of a DBA Career path (not limited to these only)

Junior DBAs
Mid-level DBAs
Senior DBAs
DBA consultants
Manager or director of database administration/information technology
Data Architects
Release Managers
Change Managers

Main Responsibilities

Installing and upgrading the database server tools
Planning future storage requirements for the database systems
Modifying the database structure based on information given by application developers
Enrolling users and maintaining system security
Ensuring compliance with database vendor license agreement
Controlling and monitoring user access to the databases
Monitoring and optimizing the performance of the databases
Planning for backup and recovery of database information
Data Archiving
Backing up and restoring databases
Contacting database vendor for technical support
Generating various reports by querying from database as per need
Participating in Disaster Recovery Tests
Working closely with other teams like Network Operation Teams, Monitoring Teams

High Level Skills that a DBA must have

Top Communication skills
Good knowledge of database theory
Good knowledge of database design, logical or physical
Good knowledge of RDBMS
Knowledge of structured query language (SQL), e.g. Transact-SQL
Understanding of distributed computing architectures
Understanding of the underlying operating system
Understanding of storage technologies, memory management, disk arrays, NAS/SAN, networking
Understanding of routine maintenance, recovery, and handling failover of a Database

As I mentioned, today’s jobs are more demanding. DBAs are now a days performing many roles and when they specialized in those roles, they grow like

Production Support DBAs: These DBAs are focused on the physical aspects of database administration such as DBMS installation, configuration, patching, upgrades, backups, restores, refreshes, performance optimization, maintenance and disaster recovery.

Development DBAs: These DBAs are focused on the logical and development aspects of database administration such as data model design and maintenance, DDL (data definition language) generation, SQL writing and tuning, coding stored procedures, collaborating with developers to help choose the most appropriate DBMS feature/functionality and other pre-production activities.

Application DBAs: These DBAs are usually found in organizations that have purchased 3rd party application software such as ERP (enterprise resource planning) and CRM (customer relationship management) systems.

Hybrid DBAs: As name suggests, these DBAs are usually those who performs almost all the tasks which are mentioned. These DBAs might not be specialized however they are the one who grows like Architects at enterprise level.

SQL SERVER DBA and Azure SQL

Pages

Availability Group issues fixed with Alerts

SQL Server - Choosing Between AWS EC2 vs AWS RDS

New and Enhanced Features in SQL Server 2022

Page Life Expectancy, Buffer Cache Hit Ratio

Who is a DBA?

Report Abuse

My Blog List