WordPress CSV Importer Plugin – Skipping Duplicates

The meat of this post (until I copy it over) is here:

http://wordpress.org/support/topic/plugin-csv-importer-using-a-unique-field-to-prevent-duplicates

There has been a few difficulties in making the required changes to the plugin, so I’m just going to include my version for download here. This is version 0.3.7 of “CSV Importer” for WordPress, but it has a filter hook at the point the records are imported, so that a custom filter can be used to validate the record before deciding to import it or skip it. In my case the validation involved checking for duplicates, so the record was not loaded twice.

csv-importer 0.3.7 [download the importer zip file]

So, here are some belated details of how this worked.

The importer download listed above adds a new filter on each row that it reads from the CSV file. Here is the main import loop within the customised CSV importer:

foreach ($csv->connect() as $csv_data) {
    $csv_data = apply_filters('csv_importer_validate', $csv_data, $this, $skipped + $imported + 1);
    if (!empty($csv_data) && $post_id = $this->create_post($csv_data, $options)) {
        $imported++;
        $comments += $this->add_comments($post_id, $csv_data);
        $this->create_custom_fields($post_id, $csv_data);
    } else {
        $skipped++;
    }
}

That filter can be caught in your own code. I happened to do that in the custom theme’s functions.php However, it may make more sense to put it into a custom plugin rather than the theme. Depending on whether your filter handler function is a static method, a method on an instantiated singleton, or a free-floating global function, one of the following methods should work:


add_filter('csv_importer_validate', __CLASS__ . '::csv_importer_validate', 10, 3);

or

add_filter('csv_importer_validate', array($this, 'csv_importer_validate'), 10, 3);

or

add_filter('csv_importer_validate', 'my_function_csv_importer_validate', 10, 3);

So what do you do in your filter handler? The basic structure of the filter handler will look like this:

function csv_importer_validate($csv_data, $importer, $row_number) {
    // $csv_data contains a row from the CSV input file
    // $importer is the importer object, so you can write log entries like this:
    // $importer->log[{'error'|'notice'}][] = 'message';
    // $row_number is handy for logging
    // There are two return values:
    // 1. If you want this row to import, then return $csv_data.
    // 2. If you want this row to be discarded and not imported, then return null.
}

So, you would like your filter to perform a check to see if the row has already been imported. You basically need to take some fields from $csv_data and use them to look up an existing post using get_posts(). If you find a post, then it has already been loaded, and you don’t need to load it again.

You may actually want to load it again, and instead modify $csv_data to make it unique. I’ve not tried that, but I can see it could be useful in some instances.

InĀ  my example, I was loading playable track samples (URLs to MP3s) into a record shop. The tracks were a custom post type (call, unsurprisingly, “tracks”) and had some custom key data. Each track had a title and a URL, which was their basic payload. Then there was the SKU of the shop product they linked to, and the track number. The combination of the SKU and track number was the unique key of the track.

So we do a get_posts() lookup on the tracks, with a meta_query looking at the SKU and track number (also an optional disc number, but we won’t complicate this example with that). This is the meta_query:

$meta_query = array(
array(
'key' => 'sku',
'value' => $csv_data['sku'],
),
array(
'key' => 'track_number',
'value' => $csv_data['track_number'],
),
);

and this is the lookup to see if there are any tracks with this SKU and track number already loaded:

$tracks = get_posts(array(
'post_type' => 'track',
'numberposts' => 1,
'meta_query' => $meta_query,
'post_status' => 'publish',
'suppress_filters' => true, // Just to keep the overhead down
));

If $tracks is not empty, then return null to prevent this row from being loaded, otherwise return $cvs_data.

What you use to check for uniqueness, depends upon what you are loading and where your unique keys are. Hopefully the above example should be enough to get started.

3 Responses to WordPress CSV Importer Plugin – Skipping Duplicates

  1. Andy 2013-10-17 at 15:31 #

    This is exactly what I’m looking for – thanks!
    Would you be so kind as to share your filter for checking duplicates?
    I’m having trouble figuring out how to define my unique (external) key.

    • Jason Judge 2013-10-18 at 00:49 #

      I’ve added some more details to the main post. It lists the main guts of the filter for the use-case I had. If you want to see the end result, the track listings are on albums such as this:

      http://www.soulbrother.com/shop/honey-wine/

  2. Andy 2013-10-21 at 14:43 #

    Ah, I get it. Thanks a million – you’re a life saver!

Leave a Reply