Most developers have worked with databases and performed querying to fetch data, such as fetching users' posts. However, for queries of the same kind, it would be great if we could make them faster than normal. This is where the concept of indexes comes into play.
To start playing around with dummy data and see how fast queries work without indexing, let's start a PostgreSQL container by running the following command:
docker run -p 5423:5423 -e POSTGRES_PASSWORD=mypassword -d postgres
Connect to the container and create some dummy data:
docker exec -it container_id /bin/bash
psql -U postgres
Create a simple schema for a Medium-like application:
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
password VARCHAR(255) NOT NULL,
name VARCHAR(255)
);
CREATE TABLE posts (
post_id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL,
title VARCHAR(255) NOT NULL,
description TEXT,
image VARCHAR(255),
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
Seed the database with some dummy data:
DO $$
DECLARE
returned_user_id INT;
BEGIN
-- Insert 5 users
FOR i IN 1..5 LOOP
INSERT INTO users (email, password, name) VALUES
('user'||i||'@example.com', 'pass'||i, 'User '||i)
RETURNING user_id INTO returned_user_id;
FOR j IN 1..500000 LOOP
INSERT INTO posts (user_id, title, description)
VALUES (returned_user_id, 'Title '||j, 'Description for post '||j);
END LOOP;
END LOOP;
END $$;
Try running a query to get all the posts of a user and log the time it took:
EXPLAIN ANALYSE SELECT * FROM posts WHERE user_id=1 LIMIT 5;
Focus on the execution time and remember it.