Delete data in SQL Server
Create Date: August 24, 2020 at 06:12 PM         | Tag: DATABASE ADMINISTRATION         | Author Name: Sun, Charles |
Best Practices For Deleting SQL Server Data
Problem
Deleting data from a SQL Server database is something I really don't like to do as it's always better if it can be done through an application that has been thoroughly tested. However, inevitably there comes a time where some backend data cleanup needs to be done. Below are some best practices I like to follow when deleting data.
Solution
Before we get into the details of this tip I just want to mention that you should ALWAYS HAVE A BACKUP when you are going to be deleting data. I know everyone knows how important backups are, but before you delete anything you probably want to verify you have a good one by restoring it somewhere first. You can never be too careful when it comes to removing data.
With that said, let's setup a couple sample tables for us to run our example queries against. The first table below is the table that contains our actual data and the second one contains a list of records to be deleted from the main table. Here is the T-SQL to create and load these tables.
--Table Setup CREATE TABLE Main (col1 INT, col2 INT); CREATE TABLE ToDelete (col3 INT); DECLARE @val INT SELECT @val=1 WHILE @val < 50000 BEGIN INSERT INTO Main VALUES (@val,round(rand()*100000,0)); IF (@val % 1000) = 0 INSERT INTO ToDelete VALUES (@val); SELECT @val=@val+1; END;
Now let's remove the records from the Main table based on the records in the ToDelete table using the following simple query.
DELETE FROM Main WHERE col1 IN (SELECT col1 From ToDelete); (49999 row(s) affected)
Oops. It looks like a few too many rows were deleted. What happened? If we look closely at my query I put the wrong column name in the subquery by accident and because this column name exists in the primary table I removed everything. Let's look at a few different ways that we could have avoided this issue.
For our first solution had I used table aliases for each table this would not have happened as the query would error out as follows:
DELETE FROM Main WHERE col1 IN (SELECT td.col1 FROM ToDelete td); Msg 207, Level 16, State 1, Line 2 Invalid column name 'col1'.
In this case I could then fix the query to use the correct column name and it would remove the records that were supposed to be removed in the first place.
DELETE FROM Main WHERE col1 IN (SELECT td.col3 FROM ToDelete td) (49 row(s) affected)
Another solution to this issue and one I use quite often is to always check how many records are going to be deleted before running the delete. If we created and ran the script that follows when we needed to delete our records we would have noticed that it was going to delete too many records when the SELECT returned 49999. I always comment out the DELETE part of the script so it does not get executed by accident.
SELECT COUNT(1) --DELETE FROM Main WHERE col1 IN (SELECT col1 From ToDelete); (No column name) 49999
Once I see that it has returned too many records I can fix my query and then run them successfully as shown below.
SELECT COUNT(1) --DELETE FROM Main WHERE col1 IN (SELECT col3 From ToDelete); -- SELECT output (No column name) 49 -- DELETE output (49 row(s) affected)
One final good practice to use when deleting data from a database is to always wrap your DELETE statement in a BEGIN TRAN - COMMIT/ROLLBACK TRAN code block. With the method outlined below you can run the BEGIN TRAN and your DELETE, then verify how many records were affected before you COMMIT your changes. As I did in the previous solution I've commented out the COMMIT part of my script so it does not get run by accident.
BEGIN TRAN DELETE FROM Main WHERE col1 IN (SELECT col1 From ToDelete); --ROLLBACK TRAN --COMMIT TRAN -- DELETE output (49999 row(s) affected)
As with the previous example once I see that too many rows were delete I can rollback the transaction and fix the script as shown below. In this case it returned the correct number and then I can run my commit statement.
BEGIN TRAN DELETE FROM Main WHERE col1 IN (SELECT col3 From ToDelete); --ROLLBACK TRAN --COMMIT TRAN -- DELETE output (49 row(s) affected)
I understand this is a trivial example, but in the real world where related or foreign key columns are not always named to follow a standard or there is no good environment to test your scripts in, issues like this can pop up. It's always best to make sure you have a backup of your data and if possible test your scripts on a copy of the production database before running any updates or deletes. Even what you think might be a small update could end up being a big issue with the wrong syntax.
Next Steps
- Other tips on table aliases:
- What Does BEGIN TRAN - ROLLBACK TRAN - COMMIT TRAN Mean
- Look for a future tip on a way to protect yourself when deleting data
This SQL Server tutorial explains how to use the GOTO statement in SQL Server (Transact-SQL) with syntax and examples.
Description
The GOTO statement causes the code to branch to the label after the GOTO statement.
Syntax
The syntax for the GOTO statement in SQL Server (Transact-SQL) consists of two parts - the GOTO statement and the Label Declaration:
GOTO statement
The GOTO statement consists of the GOTO keyword, followed by a label_name.
GOTO label_name;
Label Declaration
The Label Declaration consists of the label_name, followed by at least one statement to execute.
label_name: {...statements...}
Note
- label_name must be unique within the scope of the code.
- There must be at least one statement to execute after the Label Declaration.
Example
Let's look at how to use the GOTO statement in SQL Server (Transact-SQL).
For example:
DECLARE @site_value INT; SET @site_value = 0; WHILE @site_value <= 10 BEGIN IF @site_value = 2 GOTO TechOnTheNet; SET @site_value = @site_value + 1; END; TechOnTheNet: PRINT 'TechOnTheNet.com'; GO
In this GOTO example, we have created one GOTO statements called TechOnTheNet. If @site_value equals 2, then the code will branch to the label called TechOnTheNet.
Example:
--backup table
set @table_name = ''
set @backup_table_name = 'Temp_' + @table_name + '_' + @comment + '_' + @ticketID
if object_id(@backup_table_name) is null
begin
set @sql = 'select * into ' + @backup_table_name + ' from table WITH (NOLOCK) where id = ' + @id
exec(@sql)
end
else
begin
set @sql = 'insert into ' + @backup_table_name + 'select * from table WITH (NOLOCK) where id = ' + @id
exec(@sql)
end
BEGIN TRAN T1
BEGIN TRY
delete from table WITH (ROWLOCK)
Where id = @id
END TRY
BEGIN CATCH
ROLLBACK TRAN T1
SET @trancode = 'T1 Broken, cannot delete records for ' + @id
SELECT @trancode, ERROR_NUMBER() AS ErrorNumber
,ERROR_MESSAGE() AS ErrorMessage
goto done
END CATCH
fetch next from claim_procedure_id_cursor
into @id
end
close claim_procedure_id_cursor
deallocate claim_procedure_id_cursor
--
done:
print 'If there is an error message, please run script at off-work time. Otherwise, it means that scrinpt runs successfully.'
New Comment
Unique index in SQL Server
Create Date: July 08, 2020 at 03:15 PM         | Tag: DATABASE ADMINISTRATION         | Author Name: Sun, Charles |
Create Unique Indexes
This topic describes how to create a unique index on a table in SQL Server 2019 (15.x) by using SQL Server Management Studio or Transact-SQL. A unique index guarantees that the index key contains no duplicate values and therefore every row in the table is in some way unique. There are no significant differences between creating a UNIQUE constraint and creating a unique index that is independent of a constraint. Data validation occurs in the same manner, and the query optimizer does not differentiate between a unique index created by a constraint or manually created. However, creating a UNIQUE constraint on the column makes the objective of the index clear. For more information on UNIQUE constraints, see Unique Constraints and Check Constraints.
When you create a unique index, you can set an option to ignore duplicate keys. If this option is set to Yes and you attempt to create duplicate keys by adding data that affects multiple rows (with the INSERT statement), the row containing a duplicate is not added. If it is set to No, the entire insert operation fails and all the data is rolled back.
Note
You cannot create a unique index on a single column if that column contains NULL in more than one row. Similarly, you cannot create a unique index on multiple columns if the combination of columns contains NULL in more than one row. These are treated as duplicate values for indexing purposes.
In This Topic
-
Before you begin:
-
To create a unique index on a table, using:
Before You Begin
Benefits of a Unique Index
-
Multicolumn unique indexes guarantee that each combination of values in the index key is unique. For example, if a unique index is created on a combination of LastName, FirstName, and MiddleName columns, no two rows in the table could have the same combination of values for these columns.
-
Provided that the data in each column is unique, you can create both a unique clustered index and multiple unique nonclustered indexes on the same table.
-
Unique indexes ensure the data integrity of the defined columns.
-
Unique indexes provide additional information helpful to the query optimizer that can produce more efficient execution plans.
Typical Implementations
Unique indexes are implemented in the following ways:
-
PRIMARY KEY or UNIQUE constraint
When you create a PRIMARY KEY constraint, a unique clustered index on the column or columns is automatically created if a clustered index on the table does not already exist and you do not specify a unique nonclustered index. The primary key column cannot allow NULL values.
When you create a UNIQUE constraint, a unique nonclustered index is created to enforce a UNIQUE constraint by default. You can specify a unique clustered index if a clustered index on the table does not already exist.
For more information, see Unique Constraints and Check Constraints and Primary and Foreign Key Constraints.
-
Index independent of a constraint
Multiple unique nonclustered indexes can be defined on a table.
For more information, see CREATE INDEX (Transact-SQL).
-
Indexed view
To create an indexed view, a unique clustered index is defined on one or more view columns. The view is executed and the result set is stored in the leaf level of the index in the same way table data is stored in a clustered index. For more information, see Create Indexed Views.
Limitations and Restrictions
-
A unique index, UNIQUE constraint, or PRIMARY KEY constraint cannot be created if duplicate key values exist in the data.
-
A unique nonclustered index can contain included nonkey columns. For more information, see Create Indexes with Included Columns.
Security
Permissions
Requires ALTER permission on the table or view. User must be a member of the sysadmin fixed server role or the db_ddladmin and db_owner fixed database roles.
Using SQL Server Management Studio
To create a unique index by using the Table Designer
-
In Object Explorer, expand the database that contains the table on which you want to create a unique index.
-
Expand the Tables folder.
-
Right-click the table on which you want to create a unique index and select Design.
-
On the Table Designer menu, select Indexes/Keys.
-
In the Indexes/Keys dialog box, click Add.
-
Select the new index in the Selected Primary/Unique Key or Index text box.
-
In the main grid, under (General), select Type and then choose Index from the list.
-
Select Columns, and then click the ellipsis (...).
-
In the Index Columns dialog box, under Column Name, select the columns you want to index. You can select up to 16 columns. For optimal performance, select only one or two columns per index. For each column you select, indicate whether the index arranges values of this column in ascending or descending order.
-
When all columns for the index are selected, click OK.
-
In the grid, under (General), select Is Unique and then choose Yes from the list.
-
Optional: In the main grid, under Table Designer, select Ignore Duplicate Keys and then choose Yes from the list. Do this if you want to ignore attempts to add data that would create a duplicate key in the unique index.
-
Click Close.
-
On the File menu, click Savetable_name.
Create a unique index by using Object Explorer
-
In Object Explorer, expand the database that contains the table on which you want to create a unique index.
-
Expand the Tables folder.
-
Expand the table on which you want to create a unique index.
-
Right-click the Indexes folder, point to New Index, and select Non-Clustered Index....
-
In the New Index dialog box, on the General page, enter the name of the new index in the Index name box.
-
Select the Unique check box.
-
Under Index key columns, click Add....
-
In the Select Columns fromtable_name dialog box, select the check box or check boxes of the table column or columns to be added to the unique index.
-
Click OK.
-
In the New Index dialog box, click OK.
Using Transact-SQL
To create a unique index on a table
-
In Object Explorer, connect to an instance of Database Engine.
-
On the Standard bar, click New Query.
-
Copy and paste the following example into the query window and click Execute.
Copy
USE AdventureWorks2012; GO -- Find an existing index named AK_UnitMeasure_Name and delete it if found IF EXISTS (SELECT name from sys.indexes WHERE name = N'AK_UnitMeasure_Name') DROP INDEX AK_UnitMeasure_Name ON Production.UnitMeasure; GO -- Create a unique index called AK_UnitMeasure_Name -- on the Production.UnitMeasure table using the Name column. CREATE UNIQUE INDEX AK_UnitMeasure_Name ON Production.UnitMeasure (Name); GO
For more information, see CREATE INDEX (Transact-SQL).
SQL SERVER – Performance Impact of Unique Index
Today I am going to share a story of my client who has hired me over 7 times for my most popular consulting service Comprehensive Database Performance Health Check for different servers and situations. In the latest engagement, we had a very interesting situation of Performance Impact of Unique Index.
Background Story of Unique Index
In the recent engagement, the client wanted a minimum 20% improvement in the performance of their system. We had a very interesting situation as in the first round of health check we had gone through all the low hanging fruits and already had boosted SQL Server Performance by over 100%. Now in this round, they still want to push the performance further 20%. This time we had to start looking beyond the obvious issues and dig different in their system.
It would have been easy for me to help them tune their queries and get the additional performance but they always have a very straight forward requirement – No Change in Code. This is because in their business changing code would require a large amount of deployment and release cycle and they wanted to stay away from it.
Performance Impact of Unique Index
As we had gone through the server configurations and database settings along with the indexes the option was to look at the query. However, as mentioned earlier in this blog post, I was not allowed to change the code and that lead me to look at their table structure a bit more in-depth.
While looking into the table structure, I realized that most of the indexes were not marked as Unique indexes. However, after talking to a developer and senior architect, we realized that indexes can be unique. We changed the nearly 70% indexes from non-clustered indexes to non-clustered unique indexes and noticed a lot of performance improvement. At the end of the day when my client ran the final performance test, they discovered by just changing the index type we were able to get nearly 23% of performance improvement which was over 3% from what we actually wanted.
Reproduction of the Scenario
First, we will create two tables. On one table we will create a non-clustered index and on another one, we will create a non-clustered unique index.
1 2 |
|
1 2 |
|
Now let us populate both the tables with the data
1 2 3 4 5 6 7 |
|
1 2 3 4 5 6 7 |
|
Once the data is populated, let us turn on the statistics.
1 2 |
|
Additionally, let us turn on the actual execution plan. Here is how you can enable the actual execution plan in SQL Server Management Studio (SSMS).
Now run the following two statements in a single batch.
1 2 3 4 5 6 |
|
At this point in time, we will observe two things 1) Statistics 2) Execution Plan
Statistics Details in Messages
Let us see details about the statistics in the messages. I have removed details which were not relevant and kept only the logical reads.
Table ‘Table1’. Scan count 1, logical reads 5612
Table ‘Table2’. Scan count 1, logical reads 5435
You can see in the statistics details the logical reads are much lesser in table 2 where we do Unique non-clustered Indexes. This means that unique non-clustered indexes really helped in reducing the IO from the disk by few hundreds of the pages.
Winner: Unique Indexes by saving IO
Execution Plan
When we noticed the execution plan, we noticed that the execution plan of table1 (non-unique index) is 55% and for the table2 (unique index) is 45%.
It is clear that unique indexes also removed an operator from the execution plan which was Stream Aggregate and also performed better in terms of overall Query Cost.
Winner: Unique Indexes by saving overall resources
Here is the code to clean up our scenario.
1 2 |
|
Final Verdict
In the various tests, I have done in the past converting existing nonclustered index a unique nonclustered index always helped with the performance.
However, SQL Server is a huge world and it is totally possible that I have not come across the scenario where unique nonclustered indexes are slower. I am sure the scenario exists somewhere and this is where I will request you to share the details with me and I will publish on this blog with due credit to you.
Well, until we find another story unique nonclustered index is a clear winner.
Reference: Pinal Dave (https://blog.sqlauthority.com)
New CommentSql server automatically trim right spaces in where clause
Create Date: May 11, 2020 at 02:07 PM         | Tag: SQL         | Author Name: Sun, Charles |
When I compared a word with right white spaces and without spaces, the sql server give the same results. It looks like sql server does automatically right trim. I found the useful post below.
Spaces in WHERE clause for SQL Server [duplicate]
I want to find out the records in which a certain column contains exactly one space and nothing else. So I wrote the first of the following queries:
select COUNT(*)
from mytable
where col = ' ' -- One space
select COUNT(*)
from mytable
where col = ' ' -- Two spaces
select COUNT(*)
from mytable
where col = ' ' -- Three spaces
However, all three queries return the exact same records. Does Microsoft SQL Server not distinguish between the amount of spaces? How can I query exactly for one, two or more spaces?
Yes, it ignores trailing spaces in comparisons.
You can try to append a delimiting character.
SELECT count(*)
FROM mytable
WHERE col + 'X' = ' X';
New Comment