Availability Group issues fixed with Alerts

 Availability Group issues fixed with Alerts

Being able to work with the Availability Groups and closely analyzing the SQL service, I am amazed by how all of that works. These SQL servers work fast and quickly without any Glitch. However, the service still requires the monitoring and analysis to ensure they are working in the right form. But don’t you think SQL server would be perfect to solve issues of AG?

Well, the answer to this would be a big yes!

In this post regarding availability group issues, I will mention some of the problems that I came across while working with the Availability Groups. For those issues and problems, what automatic solution can be done.

Problem: 1 Suspended data movement

The first and foremost issues that most people encounter in the Availability Group is the data that is glitching and not moving between the primary and secondary replica. This action or Glitch creates a suspended state for the movement of the data.

Because of the suspended data issue, one would want to restore the data as soon as possible and get the flow going again. But if the data is suspended for a long while, one has to take actions that will require a lot of effort. You might have to remove the database from the Availability Groups, or you will have to get the data or the actions back in sync by attaching the files and restoring the log files. So, why not try the SQL server to resolve this issue and get the data out of the suspended state.

Solution resuming the data

The solution to this data resuming problem is usually very easy. You can Resume the suspended data back into the data movement by issuing the t-SQL statement.

ALTER DATABASE [DB] SET HADR RESUME;

One just has to know that the data movement has been suspended for a particular database. However, time matters a lot to determine and respond to the suspended data movement. One has to understand the drastic major that could be taken to get the data back in sync. The SQL Server will raise an error 35264 in case of data suspension. This sys message is; 

AlwaysOn Availability Groups data movement for database ‘%.*ls’ has been suspended for the following reason: “%S_MSG” (Source ID %d; Source string: ‘%.*ls’). One has to Resume the database manually to Resume the data Movement in a particular database. For resuming the database and availability of databases, you can check the SQL online server books.

Automating with SQL server agent alerts for the resumption of data movement

For a particular error that is running the job, you have to create the SQL agent alert. It will help in the automation of the resuming data movement. To discover and determine which data movement has been suspended out of which particular database, we have to hit this system of dynamic management views to ensure that we know a particular database for further resuming the data movement. The DMV sys DM HADR replica database will guide you through and tell you what particular database is suspended and which data movement needs to be resumed. You can create the Alert and the job as follows.

Job script to resume data movement

By creating the job and alert, the availability group will get the Alert as soon as the data is suspended. It will immediately create the t SQL statement for resuming the data and get it out of the suspended state. However, when you have finally created the job, you need to take another step. An alert also needs to set up to trap the error and to run the job of data resuming successfully.

Trapping suspended data movement with an alertConfigure alert to run the job

When the database is in the suspended state, the SQL server will finally trap the error and will help in running the job after the creation of the job and the Alert. But if you are wondering how you can start the data movement and get it flowing again and out of the suspended state, you will have to work on the script below.

Wrapping it all up in a script

The T-SQL script to create the job and alert is:

/******************************************************************************


Create job to resume AG database data movement for suspended databases.


Create alert to catch when data movement has been suspended, and run the job.


*******************************************************************************


MODIFICATION LOG


*******************************************************************************


2018-11-20 WGS Initial creation.


******************************************************************************/


USE [msdb]


GO


BEGIN TRANSACTION;


DECLARE @ReturnCode INT;


SELECT @ReturnCode = 0;DECLARE @jobname sysname = N’AG – Resume Data Movement’;


DECLARE @categoryname sysname = ‘HADR-Availability Group’;


IF EXISTS (SELECT name FROM dbo.sysjobs where name = @jobname)


BEGIN


EXECUTE msdb.dbo.sp_delete_job @job_name = @jobname;


END;


IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=@categoryname AND category_class=1)


BEGIN


EXECUTE @ReturnCode = msdb.dbo.sp_add_category @class=N’JOB’, @type=N’LOCAL’, @name=@categoryname;


IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;


END;


EXECUTE @ReturnCode = msdb.dbo.sp_add_job


@job_name=@jobname,


@enabled=1,


@notify_level_eventlog=0,


@notify_level_email=2,


@notify_level_netsend=2,


@notify_level_page=2,


@delete_level=0,


@description=N’Resume data movement on suspended Availability Group databases.


This job can be run manually, or from an alert’,


@category_name=N’HADR-Availability Group’,


@owner_login_name=N’sa’;


IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;


EXECUTE @ReturnCode = msdb.dbo.sp_add_jobserver @job_name=@jobname, @server_name = N'(local)’


IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;


EXECUTE @ReturnCode = msdb.dbo.sp_add_jobstep


@job_name=@jobname,


@step_name=N’Resume data movement in AG databases’,


@step_id=1,


@cmdexec_success_code=0,


@on_success_action=1,


@on_fail_action=2,


@retry_attempts=0,


@retry_interval=0,


@os_run_priority=0,


@subsystem=N’TSQL’,


@command=N’DECLARE @SQLCMD VARCHAR(1000);


DECLARE cDBSuspended CURSOR FOR


SELECT ”ALTER DATABASE [” + DB_NAME(database_id) + ”] SET HADR RESUME;”


FROM sys.dm_hadr_database_replica_states drs


JOIN sys.availability_replicas ar ON ar.replica_id = drs.replica_id


WHERE ar.replica_server_name = @@SERVERNAME


AND drs.is_suspended = 1;


OPEN cDBSuspended;


FETCH NEXT FROM cDBSuspended INTO @SQLCMD;


WHILE @@FETCH_STATUS = 0


BEGIN


EXECUTE (@SQLCMD);


FETCH NEXT FROM cDBSuspended INTO @SQLCMD;


END;


CLOSE cDBSuspended;


DEALLOCATE cDBSuspended;


‘,


@database_name=N’master’,


@flags=0;


IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;


EXECUTE @ReturnCode = msdb.dbo.sp_update_job


@job_name=@jobname,


@enabled=1,


@start_step_id=1,


@notify_level_eventlog=0,


@notify_level_email=2,


@notify_level_netsend=2,


@notify_level_page=2,


@delete_level=0,


@description=N’Resume data movement on suspended Availability Group databases.’,


@category_name=N’HADR-Availability Group’,


@owner_login_name=N’sa’,


@notify_email_operator_name=N”,


@notify_netsend_operator_name=N”,


@notify_page_operator_name=N”;


IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;


DECLARE @alertname sysname = N’AG Data Movement suspended’;


IF EXISTS (SELECT * FROM msdb.dbo.sysalerts WHERE name = @alertname)


BEGIN


EXECUTE @ReturnCode = msdb.dbo.sp_delete_alert @alertname;


IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;


END;


/*


Alert text:


AlwaysOn Availability Groups data movement for database ‘%.*ls’ has been suspended for the following reason: “%S_MSG” (Source ID %d; Source string: ‘%.*ls’). To resume data movement on the database, you will need to resume the database manually. For information about how to resume an availability database, see SQL Server Books Online.


*/


EXECUTE @ReturnCode = msdb.dbo.sp_add_alert


@name=@alertname,


@message_id=35264,


@severity=0,


@enabled=1,


@delay_between_responses=0,


@include_event_description_in=0,


@job_name = @jobname;


IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback;


COMMIT TRANSACTION;


GOTO EndSave


QuitWithRollback:


IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION;


EndSave:


GO

Problem: 2 Failover

One other issue that you will come across while you are working with the availability groups is the failover condition. When some data needs to be updated from the database, the available third party intervenes for showcasing the primary replica. However, the adjustment is required and needed for the cluster witness configuration. There may be a possibility that you will have to turn off the old primary replica. In this case, the SQL shows an error 1480 that interprets as failover of Availability Group. However, it can occur for both the primary replica of new and old databases. You can read the text of the error from sys messages as;

The %S_MSG database “%.*ls” is changing roles from “%ls” to “%ls” because the mirroring session or availability group failed over due to %S_MSG. However, the message that is shown after the error is only for the information of the user, and you cannot take any action on this message.

There are three roles that each replica will go through in the process of failover. When there is an old primary replica, it will change from primary to resolving state and then to do the secondary state. However, the old secondary replica I will change from secondary to resolving state and then to the primary state. You can track all of these roles and Change of these states in the error message.

There are three possible conditions for which the alerts can be shown. The first condition is if the failover occurred. Secondly, if the replica is in the instance of the primary role and the failover has occurred. Whereas the last and the third potential alert can show that the replica is now at the secondary instance. The script that is available below will help in creating the last two potential conditions of the alerts.

/*


Use this event to run a job when the replica becomes primary


*/

USE [msdb]

GO

EXEC msdb.dbo.sp_add_alert @name=N’AG Failover Detected – Now Primary’,

@message_id=1480,

@severity=0,

@enabled=1,

@delay_between_responses=0,

@include_event_description_in=0,

@event_description_keyword=N'”RESOLVING” to “PRIMARY”‘,

@job_id=N’00000000-0000-0000-0000-000000000000′

GO

/*

Use this event to run a job when the replica becomes secondary

*/

USE [msdb]

GO

EXEC msdb.dbo.sp_add_alert @name=N’AG Failover Detected – Now Secondary’,

@message_id=1480,

@severity=0,

@enabled=1,

@delay_between_responses=0,

@include_event_description_in=0,

@event_description_keyword=N'”RESOLVING” to “SECONDARY”‘,

@job_id=N’00000000-0000-0000-0000-000000000000′

GO

When you have successfully added the alerts, it will help in the modification of the error to successfully run a particular job. These alerts will not only tell you about the failover condition but also help you to know the instance of the replica. If you just want to know about the failover, you can conveniently remove the message on the Alert to alert on.

Configuring alert to detect AG failover

Wrap up

In the above article, we have successfully learned how to capture the issues in the system and fix the availability groups using the alerting system of the SQL server agent. We have also learned to respond to these errors and alerts by running the jobs after the creation of alerts and fix these issues using the SQL server.

Reference: Article 

SQL Server - Choosing Between AWS EC2 vs AWS RDS

 Database Solution - SQL Server Cloud Migration Strategy

The below mentioned table provides a side-by-side comparison of SQL Server features supported on Amazon RDS and Amazon EC2. Use this information to understand the differences between the two services and to choose the best approach for your use case.




New and Enhanced Features in SQL Server 2022

 SQL Server 2022 new features

The new features in SQL Server 2022 include:

Parameter Sensitive Plan Optimization

Query Store enhancements

Link to Azure SQL Managed Instance

Contained Availability Groups

Azure Synapse Link for SQL

Multi-Write Replication

Azure Active Directory authentication

Azure Purview integration

SQL Server Ledger

AWS S3 storage integration

Page Life Expectancy, Buffer Cache Hit Ratio

 Page Life Expectancy (PLE) in SQL Server:

Page Life Expectancy is the number of seconds a page will stay in the buffer pool without references.

If your page stays for more time in the buffer pool your PLE is high which leads to high performance. Every time request comes there are more chances that it may find its data in the cache itself, instead of going to the hard drive to read the data.

PLE measures in seconds.

PLE is one of the performance counters of SQL Server.

Higher the PLE, performance of SQL Server is good.

As per recommendation, the value of PLE counter is around 300 seconds.

You can check this value for your SQL server using below DMV.

SELECT object_name, counter_name, cntr_value from sys.dm_os_performance_counters WHERE [object_name] LIKE '%Buffer Manger%' AND [counter_name] = 'page life expectancy'

Buffer Cache Hit Ratio

Buffer Cache Hit Ratio indicates the percentage of pages found in the buffer cache without having to read from the disk.

The ration is the total number of cache hits divided by the total number of cache lookups over the last few thousand page accesses.

Buffer Cache Hit Ratio is one of the performance counters of SQL server.

You can increase the buffer cache hit ratio by increasing the amount of memory available to SQL Server or by using the buffer pool extension feature.

You can check this value for your SQL server using below DMV.

SELECT object_name, counter_name, cntr_value from sys.dm_os_performance_counters WHERE [object_name] LIKE '%Buffer Manger%' AND [counter_name] = 'Buffer cache hit ratio'


The Role of a Database Administrator (DBA) in IT

A Database Administrator (DBA) plays a crucial role in managing, configuring, and securing databases within an organization. They handle a range of responsibilities from installation to maintenance, ensuring that the database systems are efficient, secure, and reliable.

Phases of a DBA Career Path

  1. Junior DBAs
  2. Mid-level DBAs
  3. Senior DBAs
  4. DBA Consultants
  5. Manager or Director of Database Administration/Information Technology
  6. Data Architects
  7. Release Managers
  8. Change Managers

Main Responsibilities

  • Installing and Upgrading Database Tools: Setting up and upgrading database server tools to ensure the latest features and security patches are applied.

  • Planning Storage Requirements: Assessing and planning for future storage needs to accommodate database growth.

  • Modifying Database Structure: Making necessary changes to the database structure based on requirements provided by application developers.

  • User Management and Security: Enrolling users, managing system security, and ensuring compliance with database vendor license agreements.

  • Access Control: Monitoring and controlling user access to the databases to maintain data security.

  • Performance Monitoring and Optimization: Regularly monitoring database performance and making optimizations to enhance efficiency.

  • Backup and Recovery Planning: Developing strategies for backup and recovery to ensure data integrity and availability.

  • Data Archiving: Implementing data archiving strategies to manage historical data efficiently.

  • Database Backup and Restoration: Performing regular backups and restoring databases as needed.

  • Vendor Support: Contacting database vendors for technical support and troubleshooting issues.

  • Report Generation: Generating reports by querying the database as required.

  • Disaster Recovery Testing: Participating in disaster recovery tests to prepare for potential database failures.

  • Collaboration: Working closely with other teams such as Network Operations and Monitoring Teams to ensure seamless database operations.

High-Level Skills Required

  • Communication Skills: Excellent communication skills are essential for interacting with various teams and stakeholders.

  • Database Theory: Strong understanding of database theory and principles.

  • Database Design: Knowledge of both logical and physical database design.

  • RDBMS Knowledge: Proficiency in Relational Database Management Systems (RDBMS).

  • SQL Expertise: Skills in SQL, including Transact-SQL, for querying and managing databases.

  • Distributed Computing: Understanding of distributed computing architectures.

  • Operating Systems: Familiarity with the underlying operating system that supports the database.

  • Storage Technologies: Knowledge of storage technologies, including memory management, disk arrays, NAS/SAN, and networking.

  • Maintenance and Recovery: Expertise in routine maintenance, recovery procedures, and handling database failovers.

Specialized DBA Roles

  • Production Support DBAs: Focused on the physical aspects of database management, including installation, configuration, patching, backups, restores, and disaster recovery.

  • Development DBAs: Specialize in the logical and development aspects, such as data model design, SQL writing, performance tuning, and pre-production activities.

  • Application DBAs: Typically work with third-party application software like ERP and CRM systems to manage and optimize databases.

  • Hybrid DBAs: Versatile professionals who perform a wide range of tasks and may evolve into architectural roles within the enterprise.

As the field of database administration continues to evolve, DBAs are expected to adapt and specialize in various aspects of database management, contributing to their growth and career advancement.