Delete duplicate rows in sql

3/29/2023

If we are going to delete duplicate rows, we need to map all of the rows we are about to delete to the rows we are going to keep.

Often, this is not a desirable (or possible) option. It does make it much easier if you have many tables referencing TableX, because if you alter the PK, you will need to alter all of those other tables. In my opinion, while it's not ideal, it's OK to leave that identity as your PK if you must, as long as you add a true natural unique constraint on the logical key of your table. In many cases, it will be easier to choose the second option. Once you have scrubbed the data and determined that columns A and B form a good key, you have a choice to make: Update all foreign tables to reference TableX by columns A and B, or leave ID as the primary key and add a unique constraint on columns A,B. But that's a whole 'nother blog I suppose ….) (By the way, if you have MS Access or Crystal Reports or even Excel handy, do yourself a favor and don't try to generate this report using a combination of Query Analyzer and Notepad– use the proper tools for the job. If it's not even close, then those two columns will probably not be good candidates for your natural key. Scrub the data (e.g., make sure "Jeff" matches "Jeffrey") as necessary until you are please that this report produces the desired results. Create a report off a SELECT similiar to this:įor each "MinID", verify that the those rows returned are true duplicates. It's very important to make absolutely sure that columns A and B truly are a good primary key candidate. You'd like to delete duplicates and clean up TableX, but how do you resolve those foreign keys to their "new" ID's ? For example, if ID's 3,4 and 6 are all duplicates, and you decide to keep only 3, you must first repoint all foreign tables referencing ID's 4 and 6 to reference ID 3 before you safely remove the duplicate rows. Many other tables have foreign key references to TableX through the ID column. It is clear a better, natural primary key is a combination of columns A and B.

TableX has an identity column "ID" set up as the primary key, and contains many duplicates. Here's a scenario many of us have encountered: Delete Duplicates And Resolve Foreign Key References in SQL

0 Comments

Delete duplicate rows in sql

Leave a Reply.

Author

Archives

Categories