MySQL – finding counts of data by columns

Okay, the title may not be clear, but this is basically what I was trying to do, and I present here the script that achieved it.

During a data migration of a legacy Access database (actually, nine Access databases) to Sugar CRM, I wanted to be able to find any data that was missing. Looking at the numbers of rows migrated is only half the story.

Because the source data was spread over so many different databases, there was a complex set of mapping rules, to map the old database columns onto the new database. This was not a simple column to column mapping – some recoreds split out into multiple records. Some columns became flags in other tables or reference links between two tables. Due to this complexity, some source tables did not get mapped properly, so some columns were not complete. This left me with, for example, all the postal codes from databases 1 to 5, but no postal codes from databases 6 to 9. Trying to find these was proving a pain.

The script I came up with in the end, with some help from the fine people at Stack Overflow, can be downloaded below. It is a PHP5 script, is not complicated and hopefully not obscure in any way.

The script will loop through each table, then look through each column in each table and report the number of rows for which that column contains some data. The output is a simple HTML table for each MySQL table.

I hope it is useful. Have fun with it!

<?php

// Author: Jason Judge <jason.judge@academe.co.uk>
// Date: August 2011
// Script: PHP5 with PDO.
// Purpose: to list the counts of non-NULL and non-empty string columns
// on a list of tables. It basically tells us how much data there is in
// each column. If the primary key column is included, then that tells us
// the total number of rows in that table, for comparison.
// Licence: GPL2.0

class table {

    // DB login criteria.
    public $db_host = 'localhost';
    public $db_name = 'database-name';
    public $db_user = 'user-name';
    public $db_pass = 'user-password';

    // A list of all the tables that we would like to look through.
    // These are the main tables in a SugarCRM migration.
    public $all_tables = array(
        'accounts',
        'accounts_cstm',
        'contacts',
        'contacts_cstm',
        'sw_airports',
        'sw_airports_cstm',
        'sw_bookings',
        'sw_bookings_cstm',
        'sw_expedition',
        'sw_expedition_cstm',
    );

    // Any columns we are not interested in.
    // These will be excluded across all the tables.
    public $skip_columns = array(
        //'id',
        //'id_c',
        'date_entered',
        'date_modified',
        'modified_user_id',
        'created_by',
        'assigned_user_id',
    );

    function main() {
        // Connect to database
        $db = new PDO('mysql:host='.$this->db_host.';dbname='.$this->db_name.';charset=UTF8', $this->db_user, $this->db_pass);
        $db->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);

        echo '<html><body>';

        // Loop for each table.
        foreach($this->all_tables as $table) {
            echo "<h1>Table: $table</h1>";

            // Get the list of columns in the table.
            $sql = 'SELECT column_name FROM information_schema.columns where TABLE_NAME = :table';

            $stmt = $db->prepare($sql);
            if (!$stmt) print_r($db->errorInfo());

            $stmt->bindParam(':table', $table, PDO::PARAM_STR);
            $stmt->execute();

            // Fetch the column names into an array.
            // PDO may have a short-cut for this, but I'm new to PDO.
            $columns = array();
            while ($row = $stmt->fetch(PDO::FETCH_ASSOC, PDO::FETCH_ORI_NEXT)) {
                $columns[] = ($row['column_name']);
            }

            echo '<table>';

            // Loop for each column in the table.
            foreach($columns as $column) {
                // Skip any columns in our skip-list.
                if (in_array($column, $this->skip_columns)) continue;

                // Write out a table row for each column.
                echo '<tr>';

                    // Fetch the count of rows that match our not-empty criteria in
                    // the table.
                    $sql = 'SELECT COUNT(*) AS not_empty FROM ' . $table
                        . ' WHERE ' . $column . ' IS NOT NULL AND ' . $column . ' <> ''';

                    $stmt = $db->prepare($sql);
                    if (!$stmt) print_r($db->errorInfo());

                    $stmt->execute();

                    $row = $stmt->fetch(PDO::FETCH_ASSOC, PDO::FETCH_ORI_NEXT);

                    // Display the table cells: teh column name and the number of not-empty rows.
                    echo "<td>$column</td>";
                    echo "<td>" . $row['not_empty'] . '</td>';

                echo '</tr>';
            }
            echo '</table>';
        }

        echo '</body></html>';
    }

}

$table = new table();
$table->main();
No comments yet.

Leave a Reply