Skip to content

upsert

The Upsert Strategy is used to efficiently update existing records while inserting new ones in a database or data warehouse. It ensures that incoming data is merged with existing data based on a unique key, preventing duplicates and maintaining data integrity.

How It Works

  • Check for Existing Records: Match incoming data against the existing table using a unique key (e.g., ID, order_number).
  • Update Matching Records: If a record exists, update the relevant fields with new values.
  • Insert New Records: If no matching record is found, insert the new row into the table.
  • Ensure Data Consistency: This process maintains a single source of truth, reducing redundancy.

When to Use It

  • When handling incremental updates where some records need modification, while others are new.
  • In event-driven architectures, where real-time updates are needed in a data pipeline.

Example

js
{
    "identity": "fgXOw4zY"
    "source_uri": "postgresql://postgres:@localhost:5432/db_name",
    "destination_uri": "clickhouse://default:@localhost:9000/dlt?http_port=8123&secure=0",
    "resources": [
      {
        "source_table_name": "users",
        "write_disposition_config": { 
          "type": "merge"
          "strategy": "upsert"
          "config": { 
                "primary_key": "id", 
                "hard_delete_column": "deleted_at", 
            } 
        } 
      }
    ]
}

Parameters Table

ParameterTypeRequiredDescription
primary_keystring/list✅ YesThe primary key(s) is used for primary-key based upserts.
hard_delete_columnstring❌ NoA column that marks records for deletion from the destination dataset.

Parameter Details

1. primary_key

  • Used to uniquely identify records in the dataset.
  • Can be a single column (string) or multiple columns (tuple).
  • Update a record if the key exists in the target table
  • Insert a record if the key does not exist in the target table
  • Needs to be unique

2. hard_delete_column

  • Specifies a column that indicates whether a record should be deleted.
  • If this column is present and marked (e.g., is_deleted = True), the corresponding record will be removed before inserting new data.

WARNING

The primary-key is expected to be unique.De-duplication is not handled in this strategy.

Released under the MIT License.