tsql - "Merging" two tables in T-SQL - replacing or preserving duplicate IDs -


I have a web application that uses a large table (millions of rows, approximately 30 columns). Let's call that table in 30 columns, this table has a primary key called "ID", and there is another column called "Campaign ID".

As part of the application, users can create new "campaigns".

These data sets have similar structure to the table, but usually there are only 10,000-20,000 rows.

Each line will have a unique "id" in the new data set, but they will all share the same campaign ID. In other words, the user is loading full data for a new "campaign" Therefore, all 10,000 rows have the same "campaign id".

In general, users are uploading data for a new campaign, so no row in the table with the same Campaign IDA. Since "ID" is unique to each campaign, every row of new data will be unique in the ID table.

However, in rare cases where a user tries to load a set of new rows, a "campaign" that is already in the database, before the first table is required for that campaign All old lines were to be removed, and then insert new rows from the new data set.

Therefore, my stored procedure

  • include in the table A ([column]), select [column] from #TableB
  • Drop #TableB
  • it just worked fine is.

    But the new requirement is to give users 3 options when they upload new data to handle "duplicates" - examples where the user is already uploading data for a table.

    1. Remove all the data in the table data with the same campaign ID, then enter all the #TableB new data (this is the old behavior.) With this option, they will never duplicate. )
    2. If a line in #TableB has the same ID of a row in the table, then update that table in the table row from #TableB (effectively, this old data is replaced with new data " Transferring ")
    3. if a line in #TableB If the same ID line Alikaa, overlooked #TableB the line (basically, it is going to protect the original data, and ignore the new data).

    A user does not select it by line-by-line, he chooses how the data will be merged, and this argument is applied to the entire data set.

    In the same way I used that on MySQL, I used the "Load Data INFILE" function "Change" or "Ignore" option. But I do not know how to do it with SQL Server / T-SQL.

    Any solution must be efficient enough to handle this fact that there are millions of rows in the table, and #TableB (new data sets) can contain 10k-20k rows.

    I googled for something like "merge" command (something that is being supported for SQL Server 2008), but I only have access to SQL Server 2005.

    In some pseudocode, I want something like this:

    If the user selects option 1: [I am setting it all here - I have this work] / P>

    If User selects option 2 (Replaced):

      TableA.id = # TableB.id as a source by using #TableB as a target Merge with TableA, then when matching, update the row in the table with the row from #TableB, otherwise match #TableB to the row in the TableA Merge  

    If the user selects option 3 (protected):

      # Use the table B to target as a table in the form of a table On tableA.id = # TableB.id is nothing to match when not matching, then insert row from #Ta  

    How about this?

    Option 2: Start Tran; Remove from existing table (choose 1 from table, where tablea.id = tableb.id); Choose from the table * insert from the table; Tran;

    Option 3:

      Start Tran; Remove where it is from the tablebick (Select 1 from table, where tablea.id = tableb.id); Choose from the table * insert from the table; Tran;  

    For display, you must be correct until the id field (large table) is indexed in the table area.


    Comments

    Popular posts from this blog

    python - Overriding the save method in Django ModelForm -

    html - CSS autoheight, but fit content to height of div -

    qt - How to prevent QAudioInput from automatically boosting the master volume to 100%? -