Common Sense Relational and Development Patterns :
Great Articles on SQL Magazine Talk:
SQL Magazine Best Practices Part One
SQL Magazine Best Practices Part Two
Pattern Description |
---|
RBAR - Row By Agonizing Row |
Accessing Objects Inconsistently |
Logical vs Physical Deletes |
Ad-hoc SQL vs Stored Procedures |
Stored Procedure Exit Code |
Insert Only vs Update |
Static Data Access |
All-in-One Mess |
Excessive Re-compilations and Hint Misuses |
Transactions and Isolation Levels |
Incorrect Data Types |
Handling NULLS |
ANSI vs UNICODE aka CHAR vs NVARCHAR |
USE UNION ALL RATHER THAN UNION IF POSSIBLE |
DONT USE TRANSACTIONS IF NOT NEEDED |
FILES AND FILEGROUPS: |
Columns – Force NON NULL if possible |
Avoid NonSargable Search Conditions |
SET NOCOUNT ON |
RBAR – If you are a relational database professional, you know RBAR ( Row By Agonizing Row ) is performance killer.
Relational Algebra provides optimum performance for set based operations and RDBMS are build on top of Relational Algebra concepts.It makes a world of difference when application gets all the result set and provides acknowledgment to SQL Server rather than fetching and processing one row at a time.
Still I have seen many developers follow Row By Row Operations even-though they have the option of using set based Approach. I always wonder why?.
Simple Talk Article on RBAR
DataSet-Vs-DataReader Blog
Accessing Objects Inconsistently
Accessing objects inconsistently include frequent Deadlocks for queries that access overlapping tables and also long-term blocking for queries that access overlapping tables.I have seen many developers forget this simple common sense best practice which causes nightmare deadlock scenarios which takes lot of expertise and resources to fix. If you ask the question whats the need to Access Table A and then Table B , other developer accesses Table B and then Table A?. Most of the time it end up being not paying attention to this minor details.
Logical vs Physical Deletes
I have experienced with many databases where physical deletes are performed on important tables. Having Status column and doing the logical delete would save you ton of headache. When you weigh-in pros/cons of logical vs physical deletes, logical deletes are more preferable. You can devise purging or archiving strategy for inactive records. This is lot more elegant than physical deletes.
Ad-hoc SQL vs Stored Procedures
There are numerous articles on benefits of Stored Procedures. Main advantages
Code Reusability
Query Plans
Optimizations
Network bandwidth
Above aspects provide good performance improvements, the major advantage you get is safe-guard against SQL Injection. Almost 30 % all security breaches are attributed to SQL Injection.
Stored Procedure Exit Code
Every stored procedure execution returns the exit code. If its successful, its going to be 0. Its better to probe exit code before diving into result set. This makes a huge difference in performance and code maintainability.
Insert Only vs Update
If you can getaway with insert only operations and not to do any updates, that would save you ton of headache. I have experienced ton of deadlock issues on MERGE or UPSERT statements on high concurrency system.
Again, you can have sound archiving strategy to solve space related problems.
Static Data Access
When you ask the question of “How often does the country list” change?.
Based on the history of the world , not that often.
Then why does the application fetches the country list data for every single user logins?. It makes no sense to complain about the database where simple best practices are forgotten.
Making conscious decision on Static vs Dynamic Data and caching Static data makes huge overall application performance improvement.
All-in-One Mess
I have seen 5000+ lines of code stored procedures which involves accessing numerous tables, string parsing and ton of business logic.
When you ask the question Why RDBMS are used for string parsing, the answer is not that pretty. Usually lack of good design patterns would result in these operations.
In real life I have seen very cumbersome stored procedures , where nobody wants to spend time troubleshooting the mess.
Break out big queries into separate, intermediate queries because often times those intermediate queries and simplified queries, are easier for the query optimizer to optimize which results in higher quality plan , meaning better performance. The other thing is don’t be afraid to use sub stored-procedures. So instead of having one massive store procedure, you could have a controller stored-procedure and then based on the input parameters you could forward it to other sub-stored procedures.
Excessive Recompilations and Hint Misuses
I have seen in my experience where RECOMPILE option is used inside SQL Server Stored procedure which end-up causing excessive recompilation. The data was pretty static and query plan generated at each compilation end up being the same. When in doubt, you let SQL Server decide best course of action rather than you forcing HINTS.If you have a compelling data to force the HINT , then it might make sense.
Transactions and Isolation Levels
Having explicit transaction with higher ISOLATION level can wreck havoc on your concurrency and cause unwanted effects.
Evaluate how the data is going to be used. Do you really need explicit Transactions. Not every thing we do in real life is needs Transaction. We can live with eventual consistency and close to reality data. Work with your business to identify the data accuracy needs and pick ISOLATION level which suits your business needs.
In real life, its harder to gleam that information.
Incorrect Data Types
I have seen this happen ubiquitously.Even Experienced developers and architects make this mistake frequently.
Example, whether you need INT or GUID. Clear understanding of various Data Types pros and cons makes a huge difference in designing optimally performing databases.
Picking GUID DataType would cause lot of page splits , which in turn would cause more IOs.If you are already suffering from IO Bottleneck, these choices would pretty much kill the performance of your system as well as your queries.
On the other hand, if you have several hundred processes simultaneously writing to single table, GUID would help avoid hotspots.
This link provides very good data driven analysis on GUID vs INT data types.
Very good article on HOTSPOT Contention.
HOTSPOT Contention issue articel
Cost of GUID Article
GUID vs INT Data
Criteria | GUIDs | Seq. GUIDs | BIGINT | INT |
---|---|---|---|---|
Storage | 16 bytes | 16 bytes | 8 bytes | 4 bytes |
Insert/Update performance | Slowest | Comparable but the index keys are larger in size. For wider tables, this would be slower than Integer values. | Faster than sequential GUIDs | Fastest |
Hotspot contention | Very rare | Yes | Yes | Highest, due to smaller size of RIDs |
Fragmentation/Page Splits | High | Minimal | Minimal | Minimal |
JOIN Performance/SORT operations | Least performance(Rank 4 = Least performance) | Better than random GUIDs due lesser fragmentation (Rank: 3) | High performance (Rank: 2) | HighPerformance(Rank: 1) |
Logical reads | Rank 4=Highest | Rank 3 | Rank 2 | Rank 1=Least |
Merging data across servers | Beneficial | Beneficial | Difficult | Difficult |
Uniqueness | Rare chance of duplicates | Globally unique. Virtually no chance of collisions | Limited by range of BIGINT | Limited by ran |
Having this understanding can make a huge difference in designing and scaling your database.
DATEIME DATA TYPE USUAGE
GETDATE() AND GETUTCDATE() SQL Server in-built functions are used for capturing DATETIME values for CreatedDateTime,UpdatedDateTime values of a row.
GETDATE() returns Server DATETIME stamp. When you want to backup database from one region and restore it on another region and make sense out of GETDATE(),
it will be hard. You don’t know whether it came from where.
Using UTC DateTime saves you ton of headache down the line. You probably don’t think about going international or multi-region right away. But things get there,
these small considerations make a huge difference being successful.
Here is a quick comparison of the different Date and Time Data types,
Datatype | Range | Precision | Nbr Bytes | User Specified Precision |
---|---|---|---|---|
SMALL DATETIME | 1900-01-01 to 2079-06-06 | 1 minute | 4 | No |
DATETIME | 1753-01-01 to 9999-12-31 | .00333 seconds | 8 | No |
DATETIME2 | 0001-01-01 to 9999-12-31 23:59.59.9999999 | 100 ns | 6-8 | Yes |
DATE | 0001-01-01 to 9999-12-31 | 1 day | 3 | No |
TIME | 00:00:00.0000000 to 23:59.59.9999999 | 100 ns | 3-5 | Yes |
DATETIMEOFFSET | 0001-01-01 to 9999-12-31 23:59.59.9999999 | 100 ns | 8-10 | Yes |
Handling NULLS
There are two schools of thought on handling NULLS. Somebody recommends not to have any NULL-able columns and some say its OK to have NULL-able columns.In the world of RDBMS Systems, NULL occupies a special place. It is not a value, nor is it a state such as true or false, but it’s an absence of any known value. NULL is used explicitly to indicate when a value is unknown, undefined, does not, or cannot exist.
NULL Gotchas :
Comparing NULLs . ( It depends on ANSI_NULLS settings)
Concatenating NULL Values. This might surprise you if you don’t pay attention.
ANSI vs UNICODE aka CHAR vs NVARCHAR
I have seem some system use ANSI datatypes and had hard time converting into Unicode compliant. It was like almost a year long project.
Ask yourself is saving space is important or having the flexibility to support UNICODE characters is important. You never know when the opportunties from the NON-ANSI languauges would come. While taking the ASCII character set as its starting point, the Unicode Standard goes far beyond ASCII’s limited ability to encode only the upper- and lowercase letters A through Z. It provides the capacity to encode all characters used for the written languages of the world—more than 1 million characters can be encoded. No escape sequence or control code is required to specify any character in any language.
UNICODE – Double Byte
ANSI – Single Byte
SQL Server Specific Best Practices
SET NOCOUNT ON
SET NOCOUNT ON prevents the sending of DONE_IN_PROC messages to the client for each statement.This setting SET NOCOUNT to ON can provide a significant performance boost, because network traffic is greatly reduced.The setting specified by SET NOCOUNT is in effect at execute or run time and not at parse time.
USE JOINS WHEN POSSIBLE
This provides SQL Server opportunity to make best choices on indexes . Use JOIN directly rather than sub-queries.
USE UNION ALL RATHER THAN UNION IF POSSIBLE
UNION
The UNION command is used to select related information from two tables. With UNION, only distinct values are selected.
UNION ALL
The UNION ALL command is equal to the UNION command, except that UNION ALL selects all values.
A UNION statement effectively does a DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.
DONT USE TRANSACTIONS IF NOT NEEDED
Operate on Small Result Sets
Limit the Number of Columns in SELECT List
Use Highly SELECTIVE Indexes
Avoid NonSargable Search Conditions
Use BETWEEN rather than IN / OR Operator
Avoid Arithmetic Operators on WHERE Clause
Avoid FUNCTIONS on WHERE Clause
Avoid OPTIMIZER Hints unless you absolutely know your data ( JOIN Hint , INDEX Hint , FORCEPLAN Hint )
Columns – Force NON NULL if possible
Force DRI if possible ( Declarative Referential Integrity )
Avoid Data Type Conversion
Use EXISTS rather than COUNT(*) queries
Use Indexes for Aggregate and Sort Conditions
Use SET NOCOUNT ON
Keep the transaction short if needed.
Reduce LOCK Overhead
Reduce the Logging Overhead
FILES AND FILEGROUPS:
We all are guilty of creating database with one file group and one data file and log file because that’s the default option for SQL Server.It creates numerous problems in terms of scaling, copying and creates maintenance nightmare for scalability.
Create at least two file groups and make atleast 2 additional data files. That way if we want to scale, we can place the files groups in different spindle, RAID ,etc.
It lends itself nicely and we can take filegroup backups and restore it without doing the entire database backup and restore.Its common sense instead of one big bang file, lets split into multiple files and file-groups which provides nimbleness in maintenance and scaling.