Explore Database with Information Schema

Use information_schema to inspect unfamiliar databases: tables, columns, foreign keys, and indexes, with an emphasis on what the metadata can and cannot reliably tell you.


Context

Starting work on a mature database can be intimidating, especially in enterprise systems. information_schema gives you a portable way to inspect the schema without touching application tables. It is the fastest way to build a mental model of the database: what exists, how tables relate, and where the likely performance and integrity constraints are. information_schema is metadata, not truth in every detail. Some values are exact, some are estimates, and some are engine-specific. Use it to orient yourself, then verify assumptions with the actual workload.


Tables

The TABLES view helps you identify the biggest and most important tables. Row counts and sizes are especially useful for spotting hot paths and migration risks.

SELECT
    table_name AS name,
    table_rows AS estimated_row_count,
    create_time AS created_at,
    format_bytes(t.data_length + t.index_length) AS size_mb
FROM information_schema.TABLES
WHERE table_schema = '<name>'
ORDER BY table_rows DESC;

Two details matter here:

If you need exact counts, query the table directly — but do that selectively, because COUNT(*) on large tables can be expensive.


Columns

The COLUMNS view shows how the schema is shaped: types, nullability, defaults, and numeric precision. This is where you learn whether the domain model is strict or loosely typed.

General Statistics

SELECT
    COUNT(*) AS column_count,
    data_type,
    is_nullable
FROM information_schema.COLUMNS
WHERE table_schema = '<name>'
GROUP BY data_type, is_nullable
ORDER BY column_count DESC;

This is useful for spotting patterns, such as many nullable columns or an overuse of wide text fields. A high nullability rate can be a design signal: either the domain is optional, or the schema is carrying too many “unknown yet” states.

Numerical Columns

SELECT
    table_name,
    column_name,
    numeric_precision,
    numeric_scale
FROM information_schema.COLUMNS
WHERE table_schema = '<name>'
  AND data_type IN ('decimal', 'numeric')
ORDER BY numeric_scale DESC, table_name, column_name;

Precision and scale matter because they reveal the business contract encoded in the schema. A DECIMAL(19,4) column says something very different from a DECIMAL(10,0): one is likely money-like, the other is likely an integer-like quantity.

A useful mental model: schema types are part of your domain rules. If the scale is too small, the bug may be silent truncation; if it is too large, you may be storing and comparing values more loosely than intended.


Relationships (FKs)

Foreign keys tell you how tables are connected and what the database will do on update/delete. That is more than documentation — it is the enforcement layer for referential integrity.

SELECT
    cu.table_name,
    cu.column_name,
    cu.constraint_name AS fk_name,
    cu.referenced_table_name,
    cu.referenced_column_name,
    rc.update_rule,
    rc.delete_rule
FROM information_schema.KEY_COLUMN_USAGE cu
         LEFT JOIN information_schema.REFERENTIAL_CONSTRAINTS rc
                   ON cu.constraint_schema = rc.constraint_schema
                       AND cu.constraint_name = rc.constraint_name
WHERE cu.table_schema = '<name>'
  AND cu.referenced_table_name IS NOT NULL
ORDER BY cu.table_name, cu.column_name;

This query is valuable because it exposes two hidden invariants:

If a schema has no foreign keys, that does not necessarily mean it has no relationships; it may mean the application enforces them instead. That is a major risk signal, because integrity then depends on every write path being correct.


Indexes

Indexes are where schema understanding becomes performance understanding. A table can look normal until you inspect whether its access patterns are actually supported.

KEY_COLUMN_USAGE shows constrained key columns, but it does not fully describe indexes. For a more complete view, use STATISTICS.

SELECT
    table_name,
    index_name,
    column_name,
    seq_in_index,
    non_unique,
    cardinality,
    nullable
FROM information_schema.STATISTICS
WHERE table_schema = '<name>'
ORDER BY table_name, index_name, seq_in_index;

A few practical rules:

An index is not “good” just because it exists. It is good when it matches real predicates, joins, and sort orders. A low-cardinality index on a highly repetitive column may add write cost without helping reads.


How to use this in practice

Use information_schema in this order:

  1. Tables — identify large or suspiciously important tables.
  2. Columns — learn domain shape, nullability, and type discipline.
  3. Relationships — verify integrity rules and delete/update behaviour.
  4. Indexes — check whether the likely access paths are actually supported.

That sequence works because it moves from structure to semantics to performance. It also helps you avoid a common failure mode: optimizing before you understand the data model.