Logical Keys in data

Use information_schema to learn about logical/business keys and find associations in messy schemas.


Context

Databases often evolve messily, especially in enterprise systems.
Relationships may exist in the data long before they are enforced as foreign keys. In that case, you can use information_schema to discover candidate logical or business keys by looking for columns that behave similarly across tables.

Here is a deliberately messy schema:

CREATE DATABASE shop;
USE shop;

CREATE TABLE purchase (
    item_id       INT,
    quantity      INT,
    total         DECIMAL(10, 2),
    user_id       INT,
    purchase_date DATE
);

CREATE TABLE audit_log (
    ItemId  INT,
    Date    DATE,
    Message VARCHAR(100),
    UserId  INT
);

CREATE TABLE Product (
    `Item Id` INT,
    Name      VARCHAR(20),
    Price     INT
);

There are no declared foreign keys, but item_id, ItemId, and Item Id are strong candidates for the same logical identifier.
The key idea is not to “prove” a relationship from metadata alone, but to reduce the search space before you validate it against actual data.


Query information_schema

information_schema.columns contains structural metadata for every table in the database. A self-join lets you compare columns across tables. Start by matching on data_type:

SELECT c1.table_name  AS `Table Name 1`,
       c1.column_name AS `Column Name 1`,
       c1.data_type   AS `Simple Data Type`,
       c1.column_type AS `Extended Data Type`,
       c2.table_name  AS `Table Name 2`,
       c2.column_name AS `Column Name 2`,
       c2.data_type   AS `Simple Data Type`,
       c2.column_type AS `Extended Data Type`
FROM information_schema.columns c1
         JOIN information_schema.columns c2
              ON c1.table_schema = c2.table_schema
                  AND c1.data_type = c2.data_type
                  AND c1.table_name <> c2.table_name
WHERE c1.table_schema = 'shop'
ORDER BY c1.table_name, c1.column_name;

data_type is broad (VARCHAR, INT, DATE), so it can generate many false positives. For example, two VARCHAR columns are not necessarily compatible if one is VARCHAR(20) and the other is VARCHAR(100).
If you want stricter matching, use instead: column_type.

SELECT c1.table_name  AS `Table Name 1`,
       c1.column_name AS `Column Name 1`,
       c1.data_type   AS `Simple Data Type`,
       c1.column_type AS `Extended Data Type`,
       c2.table_name  AS `Table Name 2`,
       c2.column_name AS `Column Name 2`,
       c2.data_type   AS `Simple Data Type`,
       c2.column_type AS `Extended Data Type`
FROM information_schema.columns c1
         JOIN information_schema.columns c2
              ON c1.table_schema = c2.table_schema
                  AND c1.column_type = c2.column_type
                  AND c1.table_name <> c2.table_name
WHERE c1.table_schema = 'shop'
ORDER BY c1.table_name, c1.column_name;

That is more precise, but still not enough to identify business keys. The real signal comes from names, because logical keys are often duplicated with inconsistent capitalisation. Normalising names with LOWER() or UPPER() is a cheap, high-value heuristic:

SELECT c1.table_name  AS `Table Name 1`,
       c1.column_name AS `Column Name 1`,
       c1.data_type   AS `Simple Data Type`,
       c1.column_type AS `Extended Data Type`,
       c2.table_name  AS `Table Name 2`,
       c2.column_name AS `Column Name 2`,
       c2.data_type   AS `Simple Data Type`,
       c2.column_type AS `Extended Data Type`
FROM information_schema.columns c1
         JOIN information_schema.columns c2
              ON c1.table_schema = c2.table_schema
                  AND c1.column_type = c2.column_type
                  AND c1.table_name <> c2.table_name
WHERE c1.table_schema = 'shop'
  AND UPPER(c1.column_name) = UPPER(c2.column_name)
ORDER BY c1.table_name, c1.column_name;

Names can also differ by spaces and underscores, so normalise those too:

SELECT c1.table_name  AS `Table Name 1`,
       c1.column_name AS `Column Name 1`,
       c1.data_type   AS `Simple Data Type`,
       c1.column_type AS `Extended Data Type`,
       c2.table_name  AS `Table Name 2`,
       c2.column_name AS `Column Name 2`,
       c2.data_type   AS `Simple Data Type`,
       c2.column_type AS `Extended Data Type`
FROM information_schema.columns c1
         JOIN information_schema.columns c2
              ON c1.table_schema = c2.table_schema
                  AND c1.column_type = c2.column_type
                  AND c1.table_name <> c2.table_name
WHERE c1.table_schema = 'shop'
  AND REPLACE(REPLACE(LOWER(c1.column_name), ' ', ''), '_', '')
    = REPLACE(REPLACE(LOWER(c2.column_name), ' ', ''), '_', '')
ORDER BY c1.table_name, c1.column_name;

Summary