Development

Postgres JSONB is a Double-edged Sword

Sethonne June 05, 2025 5 min read

The road to optimized architecture is one of continuous learning. What works today for 200 users might need a huge ass refactor at 20,000. But for now, JSONB has been a welcome ally in my quest for a snappier application.

For the first six or so months of my document generation website, I followed the conventional wisdom of database design that has been passed down from generation to generation in Stack Overflow discussions, Reddit threads, and even in my very own Information Management courses in University, and that is to normalize anything and absolutely everything under the sun.

Each piece of related data got its own neat table, linked by foreign keys. For context, my summary_reports table (a type of document my university required for organizational compliance) would need to have relations to separate tables for objectives, outcomes, challenges, and comments. It was clean, absolutely textbook. It felt right.

But "right" doesn't always mean "fast."

The Slow Creep of Latency

My application, a Hono-Bun server, felt snappy at first in development. But as I started to simulate more realistic usage for even a modest number of users (aiming for around 200 active), a problem emerged: API response times for operations like fetching or creating a full summary report were… sluggish. Like actually noticeably laggy, and it did not help at all that my VPS, where I host my server, is located in Germany while my users are Filipino.

After digging into the logs and tracing requests, the culprit wasn't a single inefficient query. It was death by a thousand cuts, or, in this case, queries. To construct a single summary report object for an HTTP response, my service layer was orchestrating way too many database calls:

Fetch the main summary_reports row.
Fetch all related report_objectives rows.
Fetch all related report_outcomes rows.
Fetch all related report_challenges rows.
Fetch all related report_comments rows.

You don't even want to know how much slower write operations were.

Each of these, while individually fast, added up. Network latency for each round trip to the database, the overhead of query parsing, and the application logic to stitch it all together created a bottleneck. My supposedly beautiful and textbook normalized schema was, ironically, hampering the performance of my database server requests.

The Crossroads of Tradition vs. My Current Use Case

I found myself at a crossroads. Online forums on Reddit, Stack Overflow, and various tech blogs and resources largely preach the gospel of normalization: "Always use separate tables, especially if your data is not dynamic (i.e., has a consistent structure)." And my individual data items were structured, meaning an objective always had specific fields, a comment was always a comment.

But the user doesn't care about my database's E-R diagram. They only care about how quickly the page loads.

Enter JSONB: A Pragmatic Leap

I started researching alternatives that could reduce the query storm. PostgreSQL's JSONB type kept popping up. The idea was simple but powerful: what if all those related collections of objectives, outcomes, etc., lived inside the main summary report row as JSONB arrays?

Conceptually, the shift looked like this:

Before (Simplified Drizzle ORM Snippets):

// summary_reports.model.ts
export const summaryReports = pgTable("summary_reports", { id: serial("id").primaryKey(), /* ... so on so forth */ });

export const reportObjectives = pgTable("report_objectives", {
  id: serial("id").primaryKey(),
  reportId: integer("report_id").references(() => summaryReports.id),
  objective: text("objective"),
  achievement: text("achievement"),
});
// ... and similar for outcomes, challenges, comments

After (Simplified Drizzle ORM Snippet):

// merged_summary_reports.model.ts
export const mergedSummaryReports = pgTable("merged_summary_reports", {
  id: serial("id").primaryKey(),
  // ... other main fields
  objectives: jsonb("objectives").$type<ReportObjectiveItem[]>().default([]),
  outcomes: jsonb("outcomes").$type<ReportOutcomeItem[]>().default([]),
  // ... and similar for challenges, comments
});

This meant that fetching a complete summary report could potentially become a single database query. The transaction logic for creating or updating a report would also simplify, operating on a single table. It was a compelling prospect.

Trading Joins for JSON: Shaving a Whole 3s of Response Time

The refactor involved several steps:

Updating Drizzle ORM Schemas: As shown above, consolidating the table definitions.
Modifying Zod Schemas: Ensuring my validation layer understood the new nested JSONB structures.
Rewriting the Database Service Layer: My summaryReportService no longer needed complex functions to insert into five different tables within a transaction. Now, it was a single insert or update to mergedSummaryReports.
Simplifying API Controllers: The Hono router endpoints for fetching or mutating reports became cleaner.

The most immediate result? API response times plummeted. Operations that previously involved 5+ distinct database queries now executed in one. The difference was distinct. It went from 4.2s on average to 1.2s on average, which I plan to optimize further by building a cache layer with Redis.

The JSONB Trade-offs: What I Gained and What I "Gave Up"

Switching to JSONB wasn't a silver bullet without considerations. Here’s what I learned from reading online:

Data Integrity Shift: While PostgreSQL ensures the JSONB is valid JSON, the internal structure of my arrays (e.g., ensuring every objective has an objective and achievement field) is now more heavily reliant on my application-level validation (I love Zod). The database itself isn't enforcing these nested schemas as rigidly as it would with separate typed columns.
Querying Inside JSONB: If I needed to run complex ad-hoc queries like "Find all reports across the entire system where a specific outcome text appears," it's doable with JSONB operators, but it's generally more complex and potentially less performant than querying an indexed column in a separate report_outcomes table.
- Crucial Detail: GIN Indexes. This is something I plan to look into after reading up on it online. For any frequent querying into JSONB fields (e.g., WHERE objectives @> '[{"objective": "Enhance leadership"}]'), adding a GIN index on the JSONB column (CREATE INDEX idx_gin_objectives ON merged_summary_reports USING GIN (objectives);) is absolutely essential for performance. Without it, such queries would perform full table scans.
TOAST and Compression: Large JSONB fields are managed by PostgreSQL's TOAST (The Oversized Attribute Storage Technique). This means they might be compressed and stored "out-of-line." When you query even a small part of a TOASTed JSONB column, PostgreSQL might have to "de-toast" (retrieve and decompress) the entire value. The choice of compression algorithm (pglz vs. lz4 in newer PostgreSQL versions) can also impact CPU usage and storage here.

Was JSONB the Right Choice for Me?

For my specific scenario – a document generation app with ~200 users where the primary performance bottleneck was fetching complete "document-like" entities (reports, proposals) via multiple discrete queries – yes, moving to JSONB was a significant win. It directly addressed my main pain point: high API latency due to excessive database round trips.

I will admit, however, that I probably wouldn't be recommending this method or even doing it at all if this application had to scale to over thousands of users. But since it won't, then it's not something I need to deal with. (hehe)

The common advice to "always use separate tables, especially if your data is not dynamic" is sound from a relational purity and universal query flexibility standpoint and is definitely still the right mindset. However, my data, while structured within the JSONB arrays, was being treated as part of a larger, cohesive document in my application.

This journey taught me that "best practices" are often contextual. Understanding the trade-offs is key. While I sacrificed some of the database-level enforcement and the simplicity of certain ad-hoc queries, I gained a crucial improvement in user-facing performance for my most common operations.

Development