Indexing in Databases

Most developers have worked with databases and performed querying to fetch data, such as fetching users' posts. However, for queries of the same kind, it would be great if we could make them faster than normal. This is where the concept of indexes comes into play.

Setting up a PostgreSQL Container

To start playing around with dummy data and see how fast queries work without indexing, let's start a PostgreSQL container by running the following command:

docker run -p 5423:5423 -e POSTGRES_PASSWORD=mypassword -d postgres

Connecting to the Container and Creating Dummy Data

Connect to the container and create some dummy data:

docker exec -it container_id /bin/bash
psql -U postgres

Create a simple schema for a Medium-like application:

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    password VARCHAR(255) NOT NULL,
    name VARCHAR(255)
);

CREATE TABLE posts (
    post_id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    title VARCHAR(255) NOT NULL,
    description TEXT,
    image VARCHAR(255),
    FOREIGN KEY (user_id) REFERENCES users(user_id)
);

Seed the database with some dummy data:

DO $$
DECLARE
    returned_user_id INT;
BEGIN
    -- Insert 5 users
    FOR i IN 1..5 LOOP
        INSERT INTO users (email, password, name) VALUES
        ('user'||i||'@example.com', 'pass'||i, 'User '||i)
        RETURNING user_id INTO returned_user_id;

        FOR j IN 1..500000 LOOP
            INSERT INTO posts (user_id, title, description)
            VALUES (returned_user_id, 'Title '||j, 'Description for post '||j);
        END LOOP;
    END LOOP;
END $$;

Testing Query Performance

Try running a query to get all the posts of a user and log the time it took:

EXPLAIN ANALYSE SELECT * FROM posts WHERE user_id=1 LIMIT 5;

Focus on the execution time and remember it.