From Data Lakes to Data Puddles: Shrinking What Engineers See to What They Actually Need

Team Simpl
Team Simpl
3 min read
From Data Lakes to Data Puddles: Shrinking What Engineers See to What They Actually Need

Most teams don’t suffer from a lack of data.

They suffer from too much surface area.

Warehouses, lakes, event streams, OLTP databases, caches, feature stores, search indexes. On top of that: dashboards, admin consoles, SQL IDEs, log viewers, APM, ad‑hoc scripts.

When something concrete needs to be answered — “What happened to this user’s order?” — you don’t need the lake. You need a puddle: a small, opinionated slice of data that answers one question well.

This post is about making that shift on purpose.

It’s about:

  • Shrinking what engineers see to what they actually need
  • Turning sprawling lakes into focused, question-shaped puddles
  • Doing it in a way that’s calm, safe, and repeatable

A tool like Simpl exists exactly for this middle layer: a focused, read‑first database browser that helps teams move from “everything, all at once” to “just the rows that matter.”


Why shrinking the surface area matters

Wide data access feels powerful. It also quietly erodes attention, safety, and shared understanding.

1. Cognitive load: the invisible tax

Every extra table, column, chart, or tool you expose is a decision:

  • Which table is the source of truth?
  • Is this column still used?
  • Does this dashboard reflect production or a replica?

Under pressure — an incident, a high‑stakes support ticket — that decision tax compounds. People:

  • Scan schema trees instead of following a clear trail
  • Re‑run old queries because they can’t tell which one is canonical
  • Bounce between tools, hoping one of them “looks right”

We wrote about this in more depth in Database Work Without the Map: Navigating Production by Question, Not Schema or Service: when you start from the map, you invite noise. When you start from the question, you can afford to see less.

2. Safety: more surface, more risk

Wide visibility tends to drag along wide capability:

  • Browsers that can both read and mutate
  • Consoles that expose every table, including the sharp ones
  • IDEs that make it trivial to run UPDATE in production

The more surface you expose, the more you rely on:

  • Informal norms (“please don’t touch that table”)
  • Tribal knowledge (“oh, that dashboard is wrong”)
  • Heroics (“ask Alice, she knows which query is safe”)

Shrinking the surface area — to read‑only, opinionated, question‑shaped puddles — reduces the blast radius by design.

3. Collaboration: everyone sees a different lake

When everyone navigates a massive data lake on their own, each person builds a private mental map:

  • Custom filters in dashboards
  • Local SQL scripts
  • Saved queries in personal folders

The result: during incidents, you’re not just debugging the system. You’re debugging each other’s maps.

Puddles — small, shared, named ways to read — create a common language:

  • “Run the Failed Payment Trail.”
  • “Check the User Signup Puddle for that email.”

We explored this idea for recurring debug flows in The Quiet Query Template: Turning Recurring Debug Flows into Opinionated, One-Click Reads.


a wide, foggy data lake with countless abstract data icons floating on the surface, gradually narrow


From lakes to puddles: a different mental model

Most tools are lake‑shaped:

  • Start at the schema or catalog
  • Show everything by default
  • Let you carve out what you need with filters and SQL

A puddle‑first approach inverts this:

Start from a concrete question, then expose only the minimum data needed to answer it.

You’re no longer asking, “Where in the lake is my answer?” You’re asking, “What is the smallest, stable view of data that answers this class of questions?”

Some examples of puddles:

  • Single user journey across services

  • Failed payment investigation

    • Inputs: payment ID or external processor ID
    • Output: relevant rows from payments, payment_attempts, invoices, plus a minimal log excerpt
  • Feature flag rollout check

    • Inputs: flag name, environment
    • Output: a small set of rows showing current flag state, recent changes, and affected user cohorts

Each of these is a puddle: narrow, named, and reusable.

A browser like Simpl is built to host these puddles: instead of dropping you into a blank SQL editor or a full admin console, it lets you define and reuse opinionated read flows that match how your team actually investigates.


Principles for designing good data puddles

Shrinking the surface area is not just about hiding tables. It’s about designing better entry points.

Here are some principles that keep puddles calm and useful.

1. Question-first, not schema-first

Every puddle should answer a sentence you can say out loud:

  • “Show me everything I need to understand why this user’s order failed.”
  • “Show me the last 10 events for this job and whether they wrote rows.”
  • “Show me what changed for this feature flag in the last hour.”

If you can’t phrase the question clearly, the puddle will sprawl.

Checklist:

  • Write the question in plain language.
  • List the minimum entities and fields needed.
  • Refuse to add anything that doesn’t serve the question.

2. Read-only by default

Puddles are for understanding, not surgery.

  • No INSERT, UPDATE, or DELETE.
  • No schema changes.
  • No “quick fix in prod” buttons.

This is where a focused browser like Simpl helps: it’s designed as a read‑first layer, sitting post-admin and pre-CLI, so you can safely give more engineers access to production‑like data without handing them a loaded admin console.

3. Opinionated inputs, not free-form filters

Puddles work best when the entry conditions are clear:

  • Good: “Paste a user_id or email here.”
  • Good: “Paste a payment_id or stripe_charge_id.”
  • Less good: “Filter by any column on any table.”

Tight inputs:

  • Make the puddle easier to use under pressure
  • Encourage consistent usage across the team
  • Reduce the chance of expensive, unbounded queries

4. Small, stable outputs

A puddle should feel like a tight report, not a schema browser with training wheels.

Concretely:

  • Limit to the fewest tables that matter
  • Pre‑join where it’s stable and safe
  • Show only the columns that carry signal
  • Order rows by a clear, question‑aligned key (usually time or severity)

If people keep exporting to CSV and rebuilding their own views, the puddle is either too small (missing key fields) or too large (hard to scan).

5. Linked, not duplicated

Puddles don’t have to contain everything. They can link to deeper trails:

  • From a User Overview Puddle → link to User Billing Trail
  • From a Failed Job Puddle → link to Job Retry History

The key is that the links are opinionated and one‑click, not “open the console and start over.” We explore this kind of design more in Database Work Without Bookmarks: Using Opinionated Trails Instead of Saved Queries.


A concrete path: shrinking your own lake

You don’t need a full data re‑architecture to start. You can move from lake to puddles incrementally.

Here’s a pragmatic sequence.

Step 1: Inventory the questions, not the tables

Spend a week collecting the real questions people ask:

  • Support tickets
  • On‑call incidents
  • Product analytics questions that keep repeating

For each, write a one‑line summary:

  • “Why did this user get double‑charged?”
  • “Which users were affected by this feature flag misconfiguration?”
  • “Which jobs have been stuck in processing for more than 30 minutes?”

You’ll notice patterns. A small set of question types drive most of the noisy, ad‑hoc data work.

Step 2: Identify the current path to answers

For each recurring question type, trace the current workflow:

  • Which tools are opened?
  • Which queries are run (or re‑run from Slack history)?
  • Where do people get stuck or ask for help?

Map this as a simple list:

  1. Open logs and search for user_id.
  2. Copy correlation ID.
  3. Open SQL IDE, run SELECT * FROM orders WHERE user_id = ….
  4. Realize there’s a payments table; join manually.
  5. Paste screenshots into Slack.

This is your before picture.

If you want a deeper framing of this exercise, From Tickets to Tables to Code: A Straight-Line Workflow for Everyday Production Debugging walks through it step by step.

Step 3: Design the first puddle

Pick one high‑value question type. For that question, design a puddle with:

  • Name: clear and non‑clever (e.g., User Payment Trail)
  • Inputs: one or two identifiers (e.g., user_id, email)
  • Tables: the minimum set that carries the story
  • Columns: only those needed to answer “what happened?” and “who is affected?”
  • Order: usually by time or lifecycle step

Then, decide where this puddle lives:

  • In a focused browser like Simpl as an opinionated read flow
  • As a parameterized query behind a small internal tool
  • As a runbook in SQL, wired into your existing database browser

The tool matters less than the stance: question‑first, read‑only, small surface.

Step 4: Make it the default entry point

A puddle only works if people reach for it first.

  • Link it from your on‑call runbook
  • Paste it into the “how to debug payments” doc
  • Pin it in the relevant Slack channel

Then, during the next incident or support ticket of that type, insist on starting from the puddle:

  • “Paste the user_id into the User Payment Trail first.”
  • “If that doesn’t answer it, then we drop to raw SQL.”

You’re not banning the lake. You’re inserting a calm, opinionated layer in front of it.

Step 5: Iterate based on real use

After a few weeks, review how the puddle behaved:

  • Which columns did people ignore?
  • Which fields did they keep asking to add?
  • Did anyone still feel the need to open the full schema right away?

Adjust:

  • Remove unused fields
  • Add the one or two missing ones that kept coming up
  • Tighten inputs if queries were too broad or slow

Then repeat the cycle for the next question type.


a minimalist dashboard on a laptop screen showing a single, focused data view shaped like a small po


Where a tool like Simpl fits

You can build puddles with almost any stack, but some tools fight you. They’re built for:

  • Arbitrary exploration
  • Full admin control
  • BI‑style aggregation and charting

Browser, Not BI: When to Reach for Simpl Instead of Yet Another Dashboard goes deeper on this, but the short version is:

  • Dashboards tell you that something is wrong.
  • Admin tools let you change what is wrong.
  • A browser like Simpl helps you quietly read why it’s wrong, at the row level.

Simpl is opinionated about this middle layer:

  • Read‑only by design for everyday production work
  • Question‑shaped entry points, not just schema trees
  • Opinionated trails instead of a graveyard of saved queries

It’s a good fit if you want to:

  • Give more engineers safe, direct access to production‑like data
  • Encode recurring debug flows as reusable trails
  • Shrink the visible surface without hiding the truth

What you get when you work in puddles

Once you start shrinking the surface area, a few things change.

1. Calmer incidents

At 2 a.m., the difference between:

  • “Open the console and find the right tables,” and
  • “Paste the user_id into this trail,”

is the difference between wandering and walking.

Your on‑call engineers:

  • Spend less time scanning schemas
  • Spend more time understanding what actually happened
  • Make fewer risky moves out of frustration

We wrote more about this scenario in The Calm Data On‑Call: A Minimal Workflow for Incident Reads at 2 a.m..

2. Safer access for more people

When the default surface is a small, read‑only puddle, you can:

  • Let more engineers self‑serve
  • Let support or success teams answer more questions directly
  • Reduce the number of “can you run this query for me?” bottlenecks

Instead of training everyone on your entire lake, you train them on a small set of named puddles.

3. Shared understanding that survives turnover

Puddles are documentation that runs:

  • They encode how you investigate, not just what you select.
  • They survive onboarding, role changes, and incident churn.

New teammates don’t need to memorize table relationships. They learn the puddles:

  • “For payments, start here.”
  • “For signups, start there.”

Over time, your lake remains large, but your working surface — the part people actually touch — stays small, intentional, and calm.


Summary

Moving from data lakes to data puddles is not about owning less data. It’s about showing less, on purpose.

You:

  • Start from concrete questions, not from schemas
  • Design small, read‑only, opinionated views that answer those questions
  • Make those views the default entry points for incidents, support, and everyday debugging
  • Use tools like Simpl to host these puddles as calm, repeatable trails

The result is a team that:

  • Spends less time wandering through lakes
  • Makes fewer risky moves in production
  • Shares a clearer, more durable understanding of how the system actually behaves in rows

Take the first step

You don’t need to redesign your stack.

Pick one recurring question — the one that keeps sending people back to raw SQL or screenshots in Slack.

  1. Write the question in a single sentence.
  2. List the minimum tables and columns needed to answer it.
  3. Build a small, read‑only puddle that takes one input and returns that view.
  4. Make it the default path for that question for the next month.

If you want a place built for this kind of work, try hosting that first puddle in Simpl. Treat it as an experiment in seeing less:

  • Less schema.
  • Less tool‑hopping.
  • Less noise.

Just the rows you actually need — and nothing more.

Browse Your Data the Simpl Way

Get Started