SQL Server Optimizations for High Concurrency

SQL Server Optimizations for High Concurrency

Our business needs very robust, low latency, highly available and durable online transactional system which supports high concurrency for about four weeks in a year. It’s almost like Thanksgiving sale where you mark down very popular item (think of iPhone ) by 100 % . We operate on state level ( K-12 ) online assessment market where entire state takes test in spring for few weeks. To operate in this market , you need to have a robust system which can absorb such a sudden spike in transaction volumes.

Infrastructure:
We use SQL Server 2016 on Windows Server Failover cluster. Our storage layer includes RAID 10 SAN Storage and Local SSD for TempDB.

LATCH Waits
During initial load testing, we have noticed high occurrences of LATCH Waits. While investigating the root cause for LATCH waits , we found that few tables being accessed by high number of concurrent sessions thus creating LATCH Waits. Most of our transactions are very small data set inserts and updates. We also encountered Last Page Insertion Contention.

Latches are internal to the SQL engine and are used to provide memory consistency, whereas locks are used by SQL Server to provide logical transactional consistency.

PAGELATCH is used to synchronize short term access to database pages that reside in the Buffer cache, as opposed to a PAGEIOLATCH, which is used to synchronize physical access to pages in disk. These are normal in every system, the problem being when there is contention . In our use case many concurrent sessions accesses a single page, causing waits and hindering the ability to perform these inserts and updates efficiently.

A page in SQL Server is 8KB and can store multiple rows. To increase concurrency and performance, buffer latches are held only for the duration of the physical operation on the page, unlike locks which are held for the duration of the logical transaction.

From the root cause Analysis, it became very clear that we have LATCH Contention on few tables which needed to be alleviated to improve the throughput.

Latch Waits Noticed

Latch Waits Noticed

From the above screenshot, you can see that we had very high LATCH contention for Page 5261488. You can turn on the Trace Flag 3604 to further investigate the Page Contents.

Latch Waits and PFS Page or Not

TempDB Allocation Page Contention

DBCC TRACEON(3604)
CREATE TABLE PageResults (ParentObject sysname,	OBJECT sysname ,Field sysname ,VALUE nvarchar(MAX))
INSERT INTO PageResults (ParentObject,	Object	,Field,	VALUE)
EXEC ('DBCC PAGE(67,1,3987384,3) WITH tableresults')

Root Cause Identification:

Clear WaitStats and Buffer Cache to Initialize

-- Remove all elements from the plan cache for one database
DECLARE @intDBID INT;
SET @intDBID = ( SELECT [dbid]
FROM   master.dbo.sysdatabases
WHERE  name = 'LoadTestDB'
);
-- Flush the procedure cache for one database only
IF @intDBID IS NOT NULL
BEGIN
DBCC FLUSHPROCINDB (@intDBID);
END;
GO
-- Reset wait and latch statistics.
DBCC SQLPERF('sys.dm_os_latch_stats' , CLEAR)
DBCC SQLPERF('sys.dm_os_wait_stats' , CLEAR)

Wait Stats Query:

Declare @ExcludedWaits Table (WaitType sysname not null primary key)
-- Waits that can be ignored
Insert Into @ExcludedWaits
Values ('CLR_SEMAPHORE'),
('SQLTRACE_BUFFER_FLUSH'),
('WAITFOR'),
('REQUEST_FOR_DEADLOCK_SEARCH'),
('XE_TIMER_EVENT'),
('BROKER_TO_FLUSH'),
('BROKER_TASK_STOP'),
('CLR_MANUAL_EVENT'),
('CLR_AUTO_EVENT'),
('FT_IFTS_SCHEDULER_IDLE_WAIT'),
('XE_DISPATCHER_WAIT'),
('XE_DISPATCHER_JOIN'),
('BROKER_RECEIVE_WAITFOR');
Select SessionID = WT.session_id,
WaitDuration_ms = WT.wait_duration_ms,
WaitType = WT.wait_type,
WaitResource = WT.resource_description,
Program = S.program_name,
QueryPlan = CP.query_plan,
SQLText = SUBSTRING(ST.text, (R.statement_start_offset/2)+1,
((Case R.statement_end_offset
When -1 Then DATALENGTH(ST.text)
Else R.statement_end_offset
End - R.statement_start_offset)/2) + 1),
DBName = DB_NAME(R.database_id),
BlocingSessionID = WT.blocking_session_id,
BlockerQueryPlan = CPBlocker.query_plan,
BlockerSQLText = SUBSTRING(STBlocker.text, (RBlocker.statement_start_offset/2)+1,
((Case RBlocker.statement_end_offset
When -1 Then DATALENGTH(STBlocker.text)
Else RBlocker.statement_end_offset
End - RBlocker.statement_start_offset)/2) + 1)
From sys.dm_os_waiting_tasks WT
Inner Join sys.dm_exec_sessions S on WT.session_id = S.session_id
Inner Join sys.dm_exec_requests R on R.session_id = WT.session_id
Outer Apply sys.dm_exec_query_plan (R.plan_handle) CP
Outer Apply sys.dm_exec_sql_text(R.sql_handle) ST
Left Join sys.dm_exec_requests RBlocker on RBlocker.session_id = WT.blocking_session_id
Outer Apply sys.dm_exec_query_plan (RBlocker.plan_handle) CPBlocker
Outer Apply sys.dm_exec_sql_text(RBlocker.sql_handle) STBlocker
Where R.status = 'suspended' -- Waiting on a resource
And S.is_user_process = 1 -- Is a used process
And R.session_id <> @@spid -- Filter out this session
And WT.wait_type Not Like '%sleep%' -- more waits to ignore
And WT.wait_type Not Like '%queue%' -- more waits to ignore
And WT.wait_type Not Like -- more waits to ignore
Case When SERVERPROPERTY('IsHadrEnabled') = 0 Then 'HADR%'
Else 'zzzz' End
And Not Exists (Select 1 From @ExcludedWaits
Where WaitType = WT.wait_type)
ORDER BY WaitDuration_ms DESC
Option(Recompile); -- Don't save query plan in plan cache

Common Latch Waits

;
WITH [Latches] AS
(SELECT
[latch_class],
[wait_time_ms] / 1000.0 AS [WaitS],
[waiting_requests_count] AS [WaitCount],
100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage],
ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum]
FROM sys.dm_os_latch_stats
WHERE [latch_class] NOT IN (
N'BUFFER')
AND [wait_time_ms] > 0
)
SELECT
MAX ([W1].[latch_class]) AS [LatchClass],
CAST (MAX ([W1].[WaitS]) AS DECIMAL(14, 2)) AS [Wait_S],
MAX ([W1].[WaitCount]) AS [WaitCount],
CAST (MAX ([W1].[Percentage]) AS DECIMAL(14, 2)) AS [Percentage],
CAST ((MAX ([W1].[WaitS]) / MAX ([W1].[WaitCount])) AS DECIMAL (14, 4)) AS [AvgWait_S]
FROM [Latches] AS [W1]
INNER JOIN [Latches] AS [W2]
ON [W2].[RowNum] <= [W1].[RowNum]
GROUP BY [W1].[RowNum]
HAVING SUM ([W2].[Percentage]) - MAX ([W1].[Percentage]) < 95; -- percentage threshold
GO

Latch Waits Resolution:

Approaches Considered:

  1. In -Memory OLTP
  2. Replacing Identity Integer Column with GUID as leading Column for Index
  3. HASH Partioning with Computed Column

Out of these three options, In-Memory OLTP looked very promising but we didn’t have enough time to implement in-memory OLTP Migration. So we adopted second solution. We replaced Identity Integer Columns with GUID which did increase page splits and index fragmentation but our workload is insert heavy. So we made this trade-off.This is not our preferred resolution but we made this trade off based on available time and resources.

In future, we are planning on migrating these hot tables to In-Memory OLTP. Just replacing INT’s with GUIDs, we did see about 20-30 X Performance improvements.

TechniqueAdvantagesDisadvantages
Non-sequential key/indexAllows the use of other partitioning features, such as archiving data using a sliding window scheme and partition switch functionality.Possible challenges when choosing a key/index to ensure „close enough to‟ uniform distribution of inserts all of the time.

GUID as a leading column can be used to guarantee uniform distribution with the caveat that it can result in excessive page-split operations.

Random inserts across B-Tree can result in too many page-split operations and lead to latch contention on non-leaf pages.
Hash partitioning with computed columnTransparent for inserts.Partitioning cannot be used for intended management features such as archiving data using partition switch options.

Can cause partition elimination issues for queries including individual and range based select/update, and queries that perform a join.

Adding a persisted computed column is an offline operation.
Effort Less Development EffortMore Development Effort

TempDB Allocation Page Contention  

SELECT  session_id ,
wait_type ,
wait_duration_ms ,
blocking_session_id ,
resource_description ,
ResourceType = CASE WHEN CAST(RIGHT(resource_description,
LEN(resource_description)
- CHARINDEX(':',
resource_description,
3)) AS INT) - 1 % 8088 = 0
THEN 'Is PFS Page'
WHEN CAST(RIGHT(resource_description,
LEN(resource_description)
- CHARINDEX(':',
resource_description,
3)) AS INT) - 2
% 511232 = 0 THEN 'Is GAM Page'
WHEN CAST(RIGHT(resource_description,
LEN(resource_description)
- CHARINDEX(':',
resource_description,
3)) AS INT) - 3
% 511232 = 0 THEN 'Is SGAM Page'
ELSE 'Is Not PFS, GAM, or SGAM page'
END
FROM    sys.dm_os_waiting_tasks
WHERE   wait_type LIKE 'PAGE%LATCH_%'
AND resource_description LIKE '2:%'

TempDB PFS Page Contention  Noticed

PFS_Page_Contention

PFS_Page_Contention

TempDB Non Allocation Page Contention

TempDB Non Allocation Page Contention

TempDB Non Allocation Page Contention

PFS Page Contention Alleviation:

From the root cause Analysis, it became very clear that number of TempDB Files we had was not sufficient. We followed the industry standard best practice of 8 Equal Sized TempDB Data Files because we had 8 Cores. We increased our TempDB File Counts by 4 and repeated the load Test Run. At 24 TempDB Files, we didn’t encounter PFS Page Contention and we settled on 24 TempDB Files. This number is lot higher than the best practice recommendation but that’s what worked for us and we derived this through numerous Load Test Runs.

ALTER DATABASE [tempdb] ADD FILE (NAME = N'temp9',FILENAME = N'T:TempDBtempdb_mssql_9.ndf',SIZE = 2097152KB,FILEGROWTH =1048576KB )
GO
ALTER DATABASE [tempdb] ADD FILE (NAME = N'temp10',FILENAME = N'T:TempDBtempdb_mssql_10.ndf',SIZE = 2097152KB,FILEGROWTH =1048576KB )
GO
ALTER DATABASE [tempdb] ADD FILE (NAME = N'temp11',FILENAME = N'T:TempDBtempdb_mssql_11.ndf',SIZE = 2097152KB,FILEGROWTH =1048576KB )
GO
ALTER DATABASE [tempdb] ADD FILE (NAME = N'temp12',FILENAME = N'T:TempDBtempdb_mssql_12.ndf',SIZE = 2097152KB,FILEGROWTH =1048576KB )
GO
ALTER DATABASE [tempdb] ADD FILE (NAME = N'temp13',FILENAME = N'T:TempDBtempdb_mssql_13.ndf',SIZE = 2097152KB,FILEGROWTH =1048576KB )
GO
ALTER DATABASE [tempdb] ADD FILE (NAME = N'temp14',FILENAME = N'T:TempDBtempdb_mssql_14.ndf',SIZE = 2097152KB,FILEGROWTH =1048576KB )
GO
ALTER DATABASE [tempdb] ADD FILE (NAME = N'temp15',FILENAME = N'T:TempDBtempdb_mssql_15.ndf',SIZE = 2097152KB,FILEGROWTH =1048576KB )
GO
ALTER DATABASE [tempdb] ADD FILE (NAME = N'temp16',FILENAME = N'T:TempDBtempdb_mssql_16.ndf',SIZE = 2097152KB,FILEGROWTH =1048576KB )
GO

ThreadPool Waits Observed

We also encountered ThreadPool waits because we had few thousand transactions coming in per second. We increased our Max Worker Thread size. Extreme caution needs to be exercised to modify this configuration because this might lead to unwanted consequences.

Default Values

Max Number of Worker Threads Defaults

Max Number of Worker Threads Defaults

EXEC sp_configure 'max worker threads', 1472 ;
GO
RECONFIGURE;
GO

ACCESS_METHODS_HOBT_VIRTUAL_ROOT Waits Noticed

ACCESS_METHODS_HOBT_VIRTUAL_ROOT

ACCESS_METHODS_HOBT_VIRTUAL_ROOT

This latch class is when a thread is waiting for access to the in-memory metadata entry containing a B-tree’s root page ID. EX access is required to change the root page ID, which typically happens when a B-tree becomes a level deeper during heavy insertions into a new index and the existing root page has to split. Every B-tree traversal has to start at the root page, which requires obtaining this latch in SH mode.

Max Degree of Parallelism and Cost Threshold

We did several run with different Max Degree of Parallelism and Cost Threshold and settled on following values based on our load Test Observation.

EXEC sp_configure 'max degree of parallelism','12'
RECONFIGURE

EXEC sp_configure 'cost threshold for parallelism','30'
RECONFIGURE

Recommended Values by Microsoft

NUMA NodeProcessor ConfigurationRecommended MaxDOP Values
Server with single NUMA node Less than 8 logical processors Keep MAXDOP at or below # of logical processors
Server with single NUMA node Greater than 8 logical processors Keep MAXDOP at 8
Server with multiple NUMA nodes Less than 8 logical processors per NUMA node Keep MAXDOP at or below # of logical processors per NUMA node
Server with multiple NUMA nodes Greater than 8 logical processors per NUMA node Keep MAXDOP at 8

Lock Pages in Memory:

We also enabled lock Pages in Memory

Lock Pages in Memory
Lock Pages in Memory

Trace Flag 1118

In SQL Server 2016, By default this trace flag is on. This trace flag switches allocations in tempdb from single-page at a time for the first 8 pages, to immediately allocate an extent (8 pages). It’s used to help alleviate allocation bitmap contention in tempdb under a heavy load of small temp table creation and deletion. But previous versions, you need to explicitly turn on this trace flag.

Learnings:

We learned a lot during this exercise. Supporting high concurrency ( few thousands transactions per second ) needs very robust system and majority of the standard best practices are not applicable in these scenarios. We ran into Latch Waits, Thread Pool Waits, TempDB Allocation Contention and Write Log Waits and solved one problem at a time after carefully considering the tradeoffs. It took several weeks of coordinated effort from several teams to identify the bottleneck in the system and alleviate those bottlenecks.

References:

TopicReference Resource
Diagnosing and Resolving Latch Contentionhttp://ramblingsofraju.com/sql-server/sql-server-optimizations-for-high-concurrency/attachment/sqlserverlatchcontention-5/
Paul Randall Common Latch Waits and What they Meanhttps://www.sqlskills.com/blogs/paul/most-common-latch-classes-and-what-they-mean/
Paul Randall TempDB Allocation Page Contentionhttp://www.sqlskills.com/blogs/paul/inside-the-storage-engine-gam-sgam-pfs-and-other-allocation-maps/
Max Number of Worker Threads Documentationhttps://docs.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-max-worker-threads-server-configuration-option
MSDN Blog Article on Heavy Insert Workloadhttps://blogs.msdn.microsoft.com/blogdoezequiel/2013/05/23/pagelatch_ex-waits-and-heavy-inserts/
Enable Lock Pages in Memoryhttps://technet.microsoft.com/en-us/library/ms190730(v=sql.105).aspx
Max Degree of Parallelism Recommendationshttps://support.microsoft.com/en-us/help/2806535/recommendations-and-guidelines-for-the-max-degree-of-parallelism-configuration-option-in-sql-server
Lock Pages in Memory Discussionshttps://www.simple-talk.com/sql/database-administration/great-sql-server-debates-lock-pages-in-memory/
Cost Threshold for Parallelismhttps://technet.microsoft.com/en-us/library/ms188603(v=sql.105).aspx
Paul Randall Blog on Trace Flag 1118http://www.sqlskills.com/blogs/paul/misconceptions-around-tf-1118/

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *