Redr
1 / 178

Redr · Study Guide

Software Mistakes and Tradeoffs

How to Make Good Programming Decisions

Tomasz Lelek & Jon Skeet

Unofficial AI-assisted study guide. Not affiliated with or endorsed by the author or publisher. For educational use — supplements, not replaces, the original work.

Contents

  • 01Introduction
  • 02Code Duplication Is Not Always Bad
  • 03Exceptions vs. Other Error-Handling Patterns
  • 04Balancing Flexibility and Complexity
  • 05Premature Optimization vs. Optimizing the Hot Path
  • 06Simplicity vs. Cost of Maintenance for Your API
  • 07Working Effectively With Date and Time Data
  • 08Leveraging Data Locality and Memory of Your Machines
  • 09Third-Party Libraries: Libraries You Use Become Your Code
  • 10Consistency and Atomicity in Distributed Systems
  • 11Delivery Semantics in Distributed Systems
  • 12Managing Versioning and Compatibility
Ch. 01

Introduction

Every meaningful software decision is a tradeoff that forecloses other paths. The chapter previews recurring tensions — testing strategy, code patterns, architecture — and argues engineers must weigh competing forces consciously rather than apply rules mechanically.

Ch. 01

Decisions Limit Future Evolution

The longer a system lives, the harder it becomes to back out of earlier design decisions. Tradeoff awareness is what keeps optionality alive — the wrong choice locked in early compounds into expensive migrations later.

Ch. 01

Unit vs. Integration Test Proportions

Heavy unit testing is fast and isolated but misses integration bugs; integration tests catch real wiring but are slower and more brittle. Neither extreme is correct — the right mix depends on deployment risk and feedback-speed needs.

Ch. 01

Patterns Are Context-Dependent

GoF code patterns and architecture patterns all carry costs. Applying Singleton, Strategy, microservices, or event-driven architecture outside their sweet spot adds complexity without payoff.

Ch. 01

Microservices Buy Scale With Complexity

Microservices enable independent scaling and deployment but introduce network, consistency, and operational overhead. They are appropriate only when the organization and load justify the cost.

Ch. 01

Code Quality Is Multidimensional

Readability, testability, modifiability, and performance often pull against each other. "Quality" must be defined per project rather than treated as a universal absolute.

Ch. 01

Time-to-Market and SLAs Constrain Design

External pressures — deadlines, contractual availability targets, SLAs — often dictate which tradeoff is acceptable. The "best" technical answer loses to the shippable answer when the calendar bites.

Ch. 01 · Vocab
Tradeoff
A decision that gains in one dimension at the cost of another.
Unit test
A test exercising a single component in isolation, typically without I/O.
Integration test
A test exercising multiple components together, including external systems.
Test pyramid
Heuristic test mix — many unit tests, fewer integration, very few end-to-end.
Ch. 01 · Vocab
Design pattern
A reusable solution template for a recurring code-level problem.
Architecture pattern
A system-level structural choice (monolith, microservices, event-driven).
Scalability
Capacity to handle growing load by adding resources.
SLA
Contractual guarantee for metrics like availability or latency.
Ch. 01 · Quiz1 / 4

Multiple choice

A startup with five engineers is deciding between a monolith and microservices for a brand-new product with uncertain requirements and modest expected load. Which framing is most consistent with this chapter?

Ch. 01 · Quiz2 / 4

Spot the issue

A team mandates a fixed ratio: 90% unit tests, 10% integration tests, no exceptions, regardless of project. What's the mistake?

Ch. 01 · Quiz3 / 4

Multiple choice

Which of the following is the strongest sign that "code quality" is being treated naively on a project?

Ch. 01 · Quiz4 / 4

True / False

Once an early design decision is locked in, the cost of reversing it tends to compound as the system grows.

Ch. 02

Code Duplication Is Not Always Bad

The chapter pushes back on dogmatic DRY, arguing premature abstraction couples unrelated code paths and slows delivery, especially across service boundaries. The right question is not "is this duplicated?" but "do these call sites have the same reason to change?"

Ch. 02

DRY Can Hide Complexity, Not Remove It

Forcing two superficially similar code paths into a single abstraction couples them. When their requirements later diverge, the abstraction breaks down and the cost of untangling exceeds the original duplication.

Ch. 02

Duplication Across Codebases vs. Within One

Removing duplication inside one service is usually cheap. Removing it across independently deployed services drags in coordination, versioning, and release-cycle costs that often exceed the savings.

Ch. 02

Shared Libraries Cause Version Coordination Pain

When N services depend on a shared library, a breaking change forces synchronized upgrades. Even non-breaking changes create lock-step deployment pressure that erodes service autonomy.

Ch. 02

Microservice Extraction Is a Heavy Hammer

Extracting common logic into its own service adds network calls, failure modes, and availability dependencies. The operational complexity is often worse than tolerating the original duplication.

Ch. 02

Loose Coupling Can Be Bought With Duplication

Two services owning their own copy of similar logic remain free to evolve at different speeds. Autonomy is frequently more valuable than a single canonical implementation.

Ch. 02

Same Code, Different Reasons to Change

Apparent duplication may serve distinct business domains. Merging them violates the single-responsibility intent and produces brittle abstractions that fight every future change.

Ch. 02

Premature Abstraction Beats Premature Duplication

It is easier to deduplicate later, once the real shape is known, than to untangle a wrong abstraction. Wait for the third occurrence before extracting.

Ch. 02 · Vocab
DRY
Principle stating every piece of knowledge should have a single authoritative representation.
Tight coupling
Components depend strongly on each other's internals and must change together.
Loose coupling
Components interact through narrow, stable interfaces and evolve independently.
Shared library
A packaged module of code depended on by multiple services or applications.
Ch. 02 · Vocab
Microservice extraction
Refactoring shared logic into a separately deployed service.
Abstraction
A generalized interface that hides implementation differences behind a common surface.
Delivery speed
How quickly a team can ship changes, often degraded by coordination costs.
Ch. 02 · Quiz1 / 4

Spot the issue

Two independently deployed services, Billing and Reporting, each have a 40-line function that formats currency values for their own domain. A platform engineer extracts the function into a shared library used by both services. Six months later, Billing needs to round half-up for tax compliance while Reporting still needs banker's rounding. What's the underlying mistake?

Ch. 02 · Quiz2 / 4

Multiple choice

What is the strongest argument for tolerating duplication across two independently deployed microservices?

Ch. 02 · Quiz3 / 4

Multiple choice

Which heuristic does the chapter offer for when extraction is actually justified?

Ch. 02 · Quiz4 / 4

Spot the issue

A platform team publishes a `common-utils` JAR depended on by 14 internal services. A "small" non-breaking change ships, and the team posts in #engineering: "Please redeploy your service this week so everyone is on v2.3." What does this reveal?

Ch. 03

Exceptions vs. Other Error-Handling Patterns

Error handling is fundamentally an API-design problem. The chapter surveys checked/unchecked exceptions, anti-patterns, third-party boundaries, concurrency, and the functional Try type — arguing the right mechanism depends on whether the error is recoverable and where it crosses boundaries.

Ch. 03

Exception Hierarchy Design

A well-shaped hierarchy — specific exceptions inheriting from broader categories — lets callers catch at the right level of granularity instead of falling back to catch-all blocks that swallow unrelated failures.

Ch. 03

Checked vs. Unchecked in Public APIs

Checked exceptions force callers to acknowledge failure modes (correctness aid, signature pollution); unchecked keep signatures clean but rely on documentation. Pick based on whether the error is recoverable and part of the contract.

Ch. 03

Anti-Pattern — Exceptions as Control Flow

Throwing and catching to drive normal logic is slow and obscures intent. Expected outcomes belong in return types, not the exception channel.

Ch. 03

Anti-Pattern — Swallowing Exceptions

Empty catch blocks and missed `try-with-resources` usage cause silent bugs and leaks. Original causes must be preserved via exception chaining when rethrowing.

Ch. 03

Wrap Third-Party Exceptions at the Boundary

Translating external exceptions into your domain types prevents library types from bleeding into callers. It also protects you when you swap the dependency for a different implementation.

Ch. 03

Exceptions Cross Thread Boundaries Poorly

In `ExecutorService` or `CompletableFuture` workflows, exceptions are captured into the future or lost unless you explicitly use `get`, `exceptionally`, `handle`, or an uncaught-exception handler.

Ch. 03

The Functional Try<T> Type

`Try<T>` encodes success/failure in the value, making errors first-class data that flows through `map`/`flatMap`. It composes cleanly in streams and async chains where exceptions are awkward.

Ch. 03 · Vocab
Checked exception
Exception the compiler forces callers to catch or declare.
Unchecked exception
RuntimeException subclass that propagates without compile-time enforcement.
Exception hierarchy
Inheritance tree of exception types organizing error categories.
Exception swallowing
Catching an exception without logging, rethrowing, or otherwise handling it.
Ch. 03 · Vocab
Exception chaining
Preserving the original exception as the cause of a wrapping exception.
Try-with-resources
Java construct that auto-closes AutoCloseable resources even on exception.
Try<T> monad
Functional container holding either a success value or captured failure.
CompletableFuture
Java's async primitive; exceptions surface via exceptionally/handle/get.
Ch. 03 · Quiz1 / 4

Spot the issue

What's the mistake?

public Integer parseAge(String s) {
    try {
        return Integer.parseInt(s);
    } catch (NumberFormatException e) {
        return null;
    }
}
// Caller:
for (String s : userInputs) {
    Integer age = parseAge(s);
    if (age != null) { process(age); }
}
Ch. 03 · Quiz2 / 4

Spot the issue

What's the mistake?

Future<Result> f = executor.submit(() -> doWork());
// later...
if (f.isDone()) {
    handle(f.get());  // crashes? what crashes?
}
Ch. 03 · Quiz3 / 4

Multiple choice

A service depends on a third-party HTTP client that throws `OkHttpException`. The service's domain layer catches `OkHttpException` directly throughout the codebase. What's the chapter's recommended fix?

Ch. 03 · Quiz4 / 4

Multiple choice

What is the most accurate framing of checked vs. unchecked exceptions in a public library API?

Ch. 04

Balancing Flexibility and Complexity

The chapter examines tension between making an API extensible enough to accommodate diverse clients versus keeping it simple to maintain safely. Every public extension point becomes a permanent contract, so flexibility should be added deliberately rather than by default.

Ch. 04

Start Simple, Add Flexibility on Demand

Begin with a robust but inextensible component. Introduce extension points only after identifying a concrete client need, avoiding speculative generality that costs forever.

Ch. 04

Public Extension Points Are Forever

Removing functionality from a published API is a breaking change. Every hook or listener you expose becomes a maintenance liability you typically cannot retract.

Ch. 04

Pluggable Frameworks via Dependency Injection

Rather than hardcoding a metrics library, let clients supply their own through an interface. This gives flexibility without ballooning the surface area of the core API.

Ch. 04

Hooks Let Clients Inject Behavior, but Require Guards

A hooks API allows arbitrary client code to execute inside your component's lifecycle. You must defend against slow hooks, throwing hooks, and hooks that mutate shared state.

Ch. 04

Performance Cost of Hooks Is Real

Every hook invocation adds overhead and places untrusted client code on your hot path. Even simple hooks can dominate runtime when called inside tight loops.

Ch. 04

Listeners vs. Hooks

Listeners are notified after an event and cannot influence the host's behavior — invariants stay safe. Hooks are more powerful because they can alter outcomes, but that power is exactly what makes them dangerous.

Ch. 04

Immutability Preserves Invariants When Extending

Designing listener payloads and hook inputs around immutable data prevents client extensions from corrupting internal state — a classic failure mode of plugin-style APIs.

Ch. 04

Flexibility Cost-Benefit Analysis

For each proposed extension point, ask whether the gain in client capability justifies the cost of testing, documenting, and supporting it across future versions.

Ch. 04 · Vocab
Extensibility
Ability for clients to add or modify behavior without forking the source.
Hook
A registered callback the host invokes at a defined point, able to alter inputs/outputs.
Listener
A registered callback notified of events but unable to change host behavior.
Pluggable framework
Design that accepts a client-supplied implementation behind an interface.
Ch. 04 · Vocab
Immutability
Property of data objects that cannot be mutated after construction.
Breaking change
Any modification to a public API forcing clients to change their code.
API surface
Total set of types, methods, and configuration points a library exposes.
Ch. 04 · Quiz1 / 4

Spot the issue

A library author adds a `BeforeWriteHook` extension point "in case some future user wants to transform records." No client has asked for it. The hook is exposed as `public`, runs inside the write loop, and accepts mutable record objects. What's wrong?

Ch. 04 · Quiz2 / 4

Multiple choice

A team needs to let clients observe record-write events without letting clients alter what gets written. Which extension mechanism best matches that intent?

Ch. 04 · Quiz3 / 4

Multiple choice

Why does the chapter recommend designing hook and listener payloads as immutable data?

Ch. 04 · Quiz4 / 4

Spot the issue

A logging library hardcodes Dropwizard Metrics as its sole metrics backend. A new client uses Micrometer and needs the library to emit metrics there instead. What's the better design?

Ch. 05

Premature Optimization vs. Optimizing the Hot Path

The chapter reframes Knuth's adage by distinguishing premature optimization (no data) from targeted optimization of the hot path (the small fraction handling most work). It demonstrates how to define SLAs, locate bottlenecks with load tests, and validate improvements with microbenchmarks.

Ch. 05

Premature Optimization Needs No Data

Optimizing without an SLA or measurements typically wastes effort and complicates code. The same optimization, guided by data on a real hot path, can be the right call.

Ch. 05

The 80/20 Hot Path

A small subset of routes or methods accounts for the vast majority of execution time. Concentrating effort on that subset yields disproportionate gains versus blanket optimization.

Ch. 05

Define an SLA Before Optimizing

Throughput, latency, and concurrent-user targets convert "make it faster" into a falsifiable goal. You then know when to stop optimizing and avoid over-engineering.

Ch. 05

Load-Test to Expose Hot Paths

Tools like Gatling configure concurrent users against your SLA, producing traffic realistic enough to surface bottlenecks that unit-level reasoning misses entirely.

Ch. 05

Instrument With MetricRegistry

Adding timers and meters around suspected hot regions turns guesses into measurements, letting you locate the genuine hot path rather than the one you assumed existed.

Ch. 05

JMH Microbenchmarks Validate Localized Changes

Once a hot method is identified, JMH provides JVM-aware, warmup-correct measurements that confirm a proposed optimization actually helps — and by how much.

Ch. 05

Benchmark Before and After

A "performance improvement" may turn out to be no improvement at all. Always benchmark before and after rather than trusting intuition about what is slow.

Ch. 05

Caching Has Its Own Tradeoffs

Caching dramatically improves throughput but introduces invalidation, memory footprint, and changing workload assumptions. The hit rate that justified it can shift the moment traffic patterns change.

Ch. 05 · Vocab
Hot path
The subset of code executed for nearly every request; highest-leverage optimization target.
Premature optimization
Performance work undertaken before measurements justify it.
SLA
Specification of required throughput, latency, and capacity.
NFR
Nonfunctional requirement — how well the system performs vs. what it does.
Ch. 05 · Vocab
Pareto principle
80/20 observation — a small portion produces most of the output.
Gatling
Scala-based load-testing framework for simulating concurrent users.
JMH
Java Microbenchmark Harness for JVM-aware microbenchmarks.
Microbenchmark
Small, focused performance test of a single method or fragment.
Ch. 05 · Quiz1 / 4

Spot the issue

An engineer says: "I'm rewriting the parser in hand-rolled bytecode-friendly Java because it's probably the hot path." There's no profile, no load test, no SLA — just a hunch. What's the mistake?

Ch. 05 · Quiz2 / 4

Multiple choice

A service must serve 500 requests/second at p99 < 200ms for 1,000 concurrent users. Which artifact does the chapter say you should produce first, before any optimization work?

Ch. 05 · Quiz3 / 4

Multiple choice

Which tool combination does the chapter recommend for the workflow "find the real hot path, then validate a localized fix"?

Ch. 05 · Quiz4 / 4

Spot the issue

A team adds an in-memory cache fronting a query that runs 10,000x/minute, and ships immediately. Six months later, traffic patterns shift toward many unique queries, the hit rate drops below 5%, and the cache now consumes 8 GB of heap while doing little. What does the chapter say this team got wrong?

Ch. 06

Simplicity vs. Cost of Maintenance for Your API

The chapter compares two strategies for surfacing dependent-library configuration: directly re-exposing settings vs. wrapping them in a tool-specific abstraction. Through lifecycle events for adding and deprecating settings, it shows the abstracted API carries substantially higher long-term cost.

Ch. 06

APIs Have UX, and UX Has a Price

REST endpoints, CLIs, and library configurations are user interfaces too. Making them ergonomic for callers requires sustained maintenance investment from the API owner.

Ch. 06

Direct Exposure Minimizes Wrapper Maintenance

Forwarding the dependent client's configuration object means new underlying settings appear automatically — at the cost of leaking the dependent library's vocabulary to your users.

Ch. 06

Abstracted Settings Improve UX, Demand Maintenance

Defining your own configuration types and translating to the underlying library gives users a cleaner, tool-shaped API. But every change in the underlying library must be re-mapped through the abstraction layer.

Ch. 06

Configuration Is Part of the Public Contract

Users wire configuration into deployment scripts and infrastructure-as-code. Changing it is as breaking as changing method signatures, which constrains how either approach can evolve.

Ch. 06

Asymmetric Cost of Adding Settings

The direct-exposure tool gets new settings for free. The abstracted tool requires a new field, validation, mapping code, documentation, and tests for every addition.

Ch. 06

Asymmetric Cost of Removing Settings

Removing a setting from the direct tool forces an immediate user-facing break. The abstracted tool can keep the old name working via translation — trading short-term UX stability for long-term complexity.

Ch. 06

Match Strategy to Evolution Rate

Stable libraries with sophisticated users tolerate direct exposure. Rapidly evolving dependencies with broad audiences justify the maintenance cost of abstraction.

Ch. 06 · Vocab
API UX
How easy and pleasant an API is to learn, use correctly, and evolve with.
Direct exposure
Wrapper forwards a dependent library's configuration types unchanged.
Abstraction layer
Wrapper defines its own vocabulary and maps it onto the dependent library.
Deprecation
Marking a feature as discouraged and slated for removal, with a migration window.
Ch. 06 · Vocab
Leaky abstraction
An abstraction that fails to fully hide its underlying implementation.
Configuration mechanism
The set of settings and validation rules parameterizing a library.
Ch. 06 · Quiz1 / 4

Multiple choice

A wrapper around a rapidly evolving SDK chooses to directly expose the SDK's configuration object. The SDK then adds a new connection-timeout setting. What is the natural consequence for the wrapper and its users?

Ch. 06 · Quiz2 / 4

Spot the issue

A wrapper around an SDK chose the abstracted-settings approach, defining its own configuration types and mapping them to the underlying library. The SDK is on a 4-week release cadence and adds, renames, or removes settings every release. The wrapper team is now buried in mapping work and lags two SDK versions behind. What does the chapter suggest about the original strategy?

Ch. 06 · Quiz3 / 4

Multiple choice

Why does the chapter argue that configuration is part of the public contract?

Ch. 06 · Quiz4 / 4

Spot the issue

A wrapper using the abstraction layer approach decides to remove a deprecated setting in the next release because the underlying SDK removed it. The maintainer skips the deprecation window. What does the chapter say is lost by skipping the abstraction's main advantage here?

Ch. 07

Working Effectively With Date and Time Data

Largely authored by Jon Skeet (creator of Noda Time), this chapter argues date/time is deceptively simple and produces disproportionate production bugs. It builds precise vocabulary, then walks through scoping requirements, picking the right library, writing testable code, and surviving DST and tzdb edge cases.

Ch. 07

Machine Time vs. Civil Time

Machine time is a single instant on the timeline; civil time is a wall-clock reading tied to a calendar and place. Mixing the two — storing a `LocalDateTime` when you need an `Instant` — is the most common date/time bug.

Ch. 07

A Time Zone Is Not an Offset

A UTC offset (`+01:00`) is a fixed displacement. A time zone (`Europe/Warsaw`) is a rule set mapping instants to offsets over time. Storing only the offset loses DST information and breaks future calculations.

Ch. 07

Limit Your Scope Before Writing Code

A calendar reminder needs civil time; a log timestamp needs an instant. Picking the narrowest representation avoids carrying complexity you'll never use.

Ch. 07

Use a Modern Library

java.time (JSR-310) and Noda Time model the key distinctions at the type level. Legacy `java.util.Date`/`Calendar` and `System.DateTime` conflate concepts and force bugs at every boundary.

Ch. 07

Inject a Clock for Testability

Calling `Instant.now()` directly hard-codes "the real clock" and makes tests flaky and time-bound. Pass a `Clock` abstraction so tests can supply a fixed or fake clock.

Ch. 07

Serialize in ISO 8601 / RFC 3339

Use machine-parseable formats with explicit offsets (`2026-05-12T14:30:00Z`) when crossing processes or storage. Locale-dependent strings silently corrupt round-trips.

Ch. 07

Calendar Arithmetic Is Not Commutative

Adding "1 month" then "1 day" need not equal "1 day" then "1 month" near month boundaries. Decide and document the policy rather than trusting library defaults blindly.

Ch. 07

DST Creates Skipped and Ambiguous Times

Spring-forward skips local times; fall-back duplicates them. Code must explicitly choose a resolver strategy (push forward, push back, throw) — defaults vary by library and silently produce wrong instants.

Ch. 07 · Vocab
Instant
A single unambiguous point on the global timeline, typically nanoseconds since epoch.
Epoch
The fixed reference point from which instants are measured (commonly Unix epoch).
LocalDateTime
Civil value with no zone or offset attached.
UTC offset
A fixed signed duration between local time and UTC at a specific moment.
Ch. 07 · Vocab
Time zone (IANA / tzdb)
Named rule set mapping each instant to a UTC offset over time.
ZonedDateTime
Civil time combined with a zone, including full DST rules.
DST transition
Discontinuity producing skipped or ambiguous local times.
tzdb
IANA time zone database — the authoritative, frequently-updated dataset of zone rules.
Ch. 07 · Quiz1 / 4

Spot the issue

What's the mistake?

// Schedule a 9am wake-up alarm for tomorrow in the user's city
Instant alarm = Instant.now().plus(Duration.ofHours(24));
saveAlarm(alarm);
Ch. 07 · Quiz2 / 4

Multiple choice

A service stores every event as `(local_datetime, utc_offset)` — for instance `(2026-05-12T14:30, +01:00)`. Six months later, the team needs to compute "this event time, but one year from now, in the same city." Why is this representation insufficient?

Ch. 07 · Quiz3 / 4

Spot the issue

What's the mistake?

public class OrderService {
    public void place(Order o) {
        o.setPlacedAt(Instant.now());  // direct system clock
        repo.save(o);
    }
}
Ch. 07 · Quiz4 / 4

Multiple choice

For data crossing process or storage boundaries, the chapter recommends serializing date/time values as:

Ch. 08

Leveraging Data Locality and Memory of Your Machines

When datasets exceed a single machine's memory, the dominant cost is moving bytes over the network. This chapter explains data locality ("ship code to data"), contrasts in-memory vs. disk-based processing (Spark vs. MapReduce), and shows how to implement standard and broadcast joins.

Ch. 08

Move Computation to Data

Serialized code is small; partitioned data is huge. Pushing the function to where the data already lives turns an O(dataset) network transfer into an O(code) one — the founding insight behind MapReduce and Spark.

Ch. 08

Partitioning vs. Sharding

Partitioning splits one logical dataset within a system for parallel processing. Sharding splits it across independently administered systems. Mechanics overlap; operational implications differ.

Ch. 08

Partitioning Algorithm Choice Matters

Hash partitioning gives uniform distribution but destroys range locality. Range partitioning preserves locality but invites hotspots. Choose based on which queries dominate.

Ch. 08

Co-Partitioning Makes Joins Cheap

A join is cheap iff both sides are partitioned the same way on the join key. Otherwise the system must shuffle — a full network reshuffle that often dominates job time.

Ch. 08

Broadcast Join for Small Sides

Sending a small lookup table to every executor lets each partition of the large table join locally. The tradeoff is memory pressure per executor and a hard upper bound on broadcast size.

Ch. 08

MapReduce Disk-First vs. Spark RAM-First

MapReduce writes intermediate results to disk between every stage (resilient but slow). Spark keeps RDDs/DataFrames in memory across stages (fast but constrained by cluster RAM).

Ch. 08

Memory Hierarchy Drives Design

RAM is five-to-six orders of magnitude faster than disk and three-to-four faster than network. Designs that minimize the slowest hop almost always win.

Ch. 08

API Choice Encodes Join Strategy

Using `broadcast(df)` explicitly tells Spark to ship the small side. Without it, Spark may default to a sort-merge or shuffle-hash join that costs far more.

Ch. 08 · Vocab
Data locality
Principle that computation should run on the node holding its input data.
Partition
One slice of a distributed dataset, assigned to a single executor.
Shard
Independently-managed partition, typically of a database.
Hash partitioning
Assigns rows by hash(key) mod N; uniform but locality-destroying.
Ch. 08 · Vocab
Shuffle
All-to-all network exchange between executors, usually from a partition mismatch.
Broadcast join
Ships the small side to every executor, eliminating shuffle on the large side.
RDD / DataFrame
Spark's resilient distributed in-memory dataset abstractions.
Executor
A JVM process on a worker node that holds partitions and runs tasks.
Ch. 08 · Quiz1 / 4

Spot the issue

A Spark job joins a 2 TB `events` table with a 5 MB `country_codes` lookup table. The job runs for hours; the Spark UI shows a massive shuffle of `events` across the cluster. The code reads: What's the better join strategy?

events.join(country_codes, "country_id")
Ch. 08 · Quiz2 / 4

Multiple choice

A Spark cluster is hash-partitioned by `user_id`. The team frequently runs range queries like "all events between user_id 1,000 and 2,000." Why is performance disappointing, and what's the tradeoff?

Ch. 08 · Quiz3 / 4

Multiple choice

The data-locality principle "move computation to data" wins primarily because:

Ch. 08 · Quiz4 / 4

Spot the issue

A team picks Spark over MapReduce specifically because "RAM is faster than disk" — but their job's working set is 5x cluster memory, so Spark constantly spills to disk and the job underperforms a tuned MapReduce equivalent. What did they miss?

Ch. 09

Third-Party Libraries: Libraries You Use Become Your Code

Importing a library imports its behavior, bugs, concurrency model, license, and transitive dependencies — and you become responsible for all of them. The chapter gives a structured framework for choosing, configuring, testing, and maintaining dependencies, ending with an adopt-or-reimplement checklist.

Ch. 09

Library Defaults Aren't Safe Defaults

Maintainers tune defaults for the common case across all users, rarely your case. Audit and override connection pool sizes, timeouts, retry counts, and serialization formats deliberately.

Ch. 09

Sync vs. Async Locks In Concurrency Model

A blocking client called from a reactive pipeline eats your thread pool. An async client in synchronous code adds complexity for no benefit. Match the library's model to your application's.

Ch. 09

Test With Fakes, Not Mocks of Third-Party Types

Mocking a library's classes couples your tests to its API surface and stubs in behavior the real code doesn't have. Prefer a vendor fake or a thin internal interface you own.

Ch. 09

Use the Library's Integration Testing Toolkit

Many mature libraries ship a Testcontainers image, embedded server, or emulator. These give real behavior under test without production cost.

Ch. 09

Transitive Dependencies Cause Dependency Hell

Two libraries may demand incompatible versions of a common third dependency. Resolve via build-tool pinning or dependency convergence — don't pretend the conflict isn't there.

Ch. 09

Vendor Lock-In Is a Tradeoff, Not a Sin

A managed cloud SDK may give 10x productivity at the cost of portability. Make the choice explicitly and isolate the lock-in behind an interface if exit is plausible.

Ch. 09

License Is a Property You Must Verify

Permissive licenses (MIT, Apache 2.0) carry few obligations. Copyleft licenses (GPL, AGPL) can require you to open-source your derived work. Getting this wrong means months of lawyer-driven rewrites.

Ch. 09

A Library vs. a Framework

A library is a tool you call. A framework calls you — inversion of control. Frameworks give leverage but lock your application architecture to their lifecycle.

Ch. 09 · Vocab
Transitive dependency
A library pulled in indirectly through another dependency.
Diamond dependency
Two dependencies demand incompatible versions of a third.
Test double
Generic term for any stand-in used in place of a real collaborator.
Fake
A working but simplified implementation suitable for tests.
Ch. 09 · Vocab
Mock
A test double programmed with expectations and verified afterward.
Inversion of control
Framework calls your code rather than the other way around.
Vendor lock-in
Cost of switching away from a specific vendor or library.
Copyleft license
License requiring derivative works to be distributed under compatible terms.
Ch. 09 · Quiz1 / 4

Spot the issue

A SaaS company ships a closed-source product that links statically against a GPL-licensed library "because it had the best parser." Sales wants to keep the source private. What does the chapter say went wrong?

Ch. 09 · Quiz2 / 4

Multiple choice

Why does the chapter recommend vendor fakes or thin internal interfaces over Mockito-style mocks of third-party classes?

Ch. 09 · Quiz3 / 4

Spot the issue

A service imports a popular HTTP client with default settings: unbounded connection pool, no read timeout, infinite retries. The service deploys to production and a single slow downstream causes thread starvation and a cascading outage. What's the chapter's framing?

Ch. 09 · Quiz4 / 4

Multiple choice

A team is choosing between a managed cloud queue SDK (10x faster integration, deep cloud lock-in) and an open-source self-hosted broker (portable, more operational work). The chapter's framing is:

Ch. 10

Consistency and Atomicity in Distributed Systems

Moving from a single-node application to a horizontally scaled deployment breaks assumptions about atomicity. This chapter walks through retries, idempotency, CQRS, and deduplication, showing how naive single-node implementations produce subtle race conditions when generalized.

Ch. 10

At-Least-Once Is the Network's Default

When service A retries a call to service B, B may have already processed the request. The network cannot tell you which — designing around this reality is non-negotiable once you leave a single node.

Ch. 10

Idempotency Makes Retries Safe

Producers must design operations so that applying the same request twice has the same effect as applying it once — upserts by stable business key rather than blind inserts. Otherwise retries silently corrupt state.

Ch. 10

CQRS — Command Query Responsibility Segregation

Splitting the write path from the read path lets each side scale and evolve independently. The tradeoff is eventual consistency between the two models that callers must account for.

Ch. 10

Naive Deduplication Breaks Under Concurrency

"Check if seen, then mark seen" is a classic check-then-act race. Two nodes processing the same message concurrently both pass the check before either mark lands.

Ch. 10

Single-Node Solutions Hide Shared State

In-process maps or local caches give the illusion of correctness during development. They fail the moment the service is replicated.

Ch. 10

Atomic Compound Operations Beat Locking

A single atomic primitive — conditional write, `INSERT ... ON CONFLICT`, compare-and-set — collapses check and act into one step and eliminates the race window.

Ch. 10

Distributed Transactions Trade Availability for Consistency

Database transactions give atomicity for free on one node. Porting that guarantee across services requires two-phase commit or sagas at significant availability and latency cost.

Ch. 10

Choose the Boundary of Atomicity Deliberately

Identify the smallest unit of work that must be atomic and design state so that unit fits inside a single storage primitive, rather than spreading it across services.

Ch. 10 · Vocab
At-least-once delivery
Guarantee where a message is delivered one or more times; duplicates possible.
Idempotency
Performing an operation N times produces the same result as performing it once.
CQRS
Architectural pattern separating write-side commands from read-side query models.
Deduplication
Detecting and discarding duplicate requests, typically keyed on a request ID.
Ch. 10 · Vocab
Race condition
Bug where correctness depends on unpredictable interleaving of concurrent operations.
Atomic operation
An operation executing as an indivisible unit; never observed in an intermediate state.
Distributed transaction
Transaction spanning multiple resource managers, coordinated via 2PC or sagas.
Ch. 10 · Quiz1 / 4

Spot the issue

What's the mistake?

// Deduplication across two app nodes sharing a database
if (!seen.contains(messageId)) {
    process(message);
    seen.add(messageId);
}
Ch. 10 · Quiz2 / 4

Multiple choice

A team migrates a single-node service to three replicas behind a load balancer. Their in-process dedupe map (`HashMap<UUID, Boolean>`) now silently lets duplicates through. The chapter's diagnosis is:

Ch. 10 · Quiz3 / 4

Multiple choice

Why does the chapter argue that idempotency is the foundation for safe retries in distributed systems?

Ch. 10 · Quiz4 / 4

Spot the issue

A microservices team needs an order-creation flow to atomically (1) charge a card via the payments service, (2) reserve inventory via the inventory service, and (3) persist the order in its own DB. They wrap all three in a single distributed transaction using two-phase commit. What's the tradeoff the chapter highlights?

Ch. 11

Delivery Semantics in Distributed Systems

Building on Chapter 10, this chapter concretely shows how delivery semantics work in event-driven systems using Kafka as the running example. It walks through producer acks, idempotent and transactional producers, consumer offset management, and how combined choices yield at-most-once, at-least-once, or effectively exactly-once.

Ch. 11

Delivery Semantics Are End-to-End

At-most-once, at-least-once, and exactly-once are emergent results of producer config, broker durability, and consumer commit timing combined. Tuning only one side is a common mistake.

Ch. 11

Producer Acks Trade Durability for Latency

`acks=0` is fire-and-forget (fastest, can lose data). `acks=1` waits for the partition leader only. `acks=all` waits for all in-sync replicas — most durable, highest latency.

Ch. 11

Idempotent Producers Eliminate Retry Duplicates

Kafka's idempotent producer attaches a producer ID and per-partition sequence numbers, letting the broker drop duplicate sends caused by retries on transient errors.

Ch. 11

Transactional Producers Give Atomic Multi-Partition Writes

Wrapping sends and consumer offsets in a transaction lets a stream-processing job atomically commit what it read and what it wrote — the basis of Kafka's effectively-exactly-once.

Ch. 11

Commit Timing Determines Semantics

Commit-before-process yields at-most-once (crash loses messages). Process-before-commit yields at-least-once (crash replays them). Auto-commit hides this choice and usually surprises you.

Ch. 11

Exactly-Once Requires an Idempotent Sink

Kafka can deliver each record exactly once into Kafka itself. Writing to an external system (DB, HTTP API) only achieves exactly-once if that sink is idempotent or transactional.

Ch. 11

Rebalancing Replays In-Flight Messages

When consumers join or leave the group, partitions are reassigned. Messages processed but not committed get redelivered to the new owner — where most at-least-once duplicates appear.

Ch. 11

Offset Reset Shapes Recovery

Earliest replays all retained history (heavy, complete). Latest skips the backlog (light, accepts loss). Choose by workload — backfill vs. live feed.

Ch. 11 · Vocab
At-most-once / at-least-once / exactly-once
Three canonical delivery guarantees.
Offset
Integer position of a record within a Kafka partition.
Consumer group
Set of consumers sharing the work of reading a topic.
Partition
Unit of parallelism and ordering in Kafka; records ordered within, not across.
Ch. 11 · Vocab
In-sync replica (ISR)
Follower replica caught up with the leader, eligible for acknowledgment.
Replication factor
Number of copies of each partition maintained across brokers.
Idempotent producer
Kafka producer mode that deduplicates retries via producer ID + sequence numbers.
Rebalance
Process of redistributing partitions when consumer group membership changes.
Ch. 11 · Quiz1 / 4

Spot the issue

A Kafka consumer is configured with `enable.auto.commit=true` and runs: What's the delivery-semantics mistake?

for (ConsumerRecord<String, Order> r : poll()) {
    chargeCard(r.value());      // external HTTP call
    shipOrder(r.value());       // external HTTP call
}
Ch. 11 · Quiz2 / 4

Multiple choice

A team claims their pipeline is "exactly-once" because they enabled Kafka's idempotent producer and use transactional reads. Downstream, the consumer writes results into PostgreSQL via plain `INSERT`. Where does the chapter say the guarantee actually breaks?

Ch. 11 · Quiz3 / 4

Multiple choice

Which producer `acks` setting trades the most durability for the lowest latency?

Ch. 11 · Quiz4 / 4

Spot the issue

A consumer commits offsets every 30 seconds in a background thread. During a deployment, the consumer group rebalances, and a partition is reassigned to a different consumer mid-batch. The new owner replays the last 28 seconds of messages, causing duplicate charges downstream. What's the chapter's framing?

Ch. 12

Managing Versioning and Compatibility

The chapter treats versioning as a first-class concern across four contexts: abstract version semantics, libraries, network APIs, and data storage. It distinguishes source/binary/semantic compatibility, explains diamond dependencies, and uses Protocol Buffers as the worked example for evolving persisted data.

Ch. 12

Backward vs. Forward Compatibility

Backward-compatible means new code reads old data. Forward-compatible means old code tolerates new data. APIs and storage often need both, and they require different design moves.

Ch. 12

Semantic Versioning Encodes a Contract

MAJOR.MINOR.PATCH communicates intent — MAJOR breaks, MINOR adds compatibly, PATCH fixes compatibly. The contract only works if you actually honor it, and marketing versions frequently don't.

Ch. 12

Source, Binary, and Semantic Are Independent Axes

A change can recompile cleanly (source-compatible) yet break already-compiled callers (binary-incompatible), or vice versa. Semantic compatibility — same observable behavior — is a third orthogonal property.

Ch. 12

Diamond Dependencies Force Single-Version Choice

When A depends on B v1 and C v2 of the same library, the build picks one. Library authors must preserve compatibility across MINOR/PATCH or risk wedging every downstream graph.

Ch. 12

Internal Libraries Can Break Rules Deliberately

Inside one org with a monorepo or atomic deploys, you can co-evolve callers and library, removing much of the cost of breaking changes and shifting the tradeoffs.

Ch. 12

Network APIs Need a Version-Discovery Story

URL-based (`/v1/...`), header-based, and content-negotiation strategies each trade clarity, caching, and routing. The chapter emphasizes customer-friendly clarity over cleverness.

Ch. 12

Protocol Buffers Encode Evolution Rules

Adding fields with new tag numbers and never reusing or renumbering tags makes both backward and forward compatibility the default. Repurposing a tag number is the canonical breaking change.

Ch. 12

Separate API Representation From Storage

Persisting your wire protobuf directly couples external API evolution to your database. A translation layer lets each version independently.

Ch. 12 · Vocab
Semantic versioning
MAJOR.MINOR.PATCH scheme where each component signals a compatibility promise.
Backward compatibility
New producer/server works with old consumers/data.
Forward compatibility
Old consumer tolerates new producer/data it doesn't fully understand.
Source compatibility
Old caller code recompiles unchanged against the new library.
Ch. 12 · Vocab
Binary compatibility
Old compiled artifacts keep linking and running without recompilation.
Diamond dependency
Build graph where two paths reach a library at different versions.
Protocol Buffers
Schema-driven binary serialization with tag-numbered fields enabling compatible evolution.
Breaking change
Any change violating the current compatibility contract.
Ch. 12 · Quiz1 / 4

Spot the issue

A team using Protocol Buffers needs to remove an obsolete field `string legacy_email = 4;`. They edit the `.proto` to delete the line and reuse tag number `4` for a new `int32 status_code = 4;` field. What's wrong?

Ch. 12 · Quiz2 / 4

Multiple choice

A library author renames a `public` method's parameter type from a class to its newly-introduced interface and recompiles. Existing client source still compiles, but client artifacts compiled against the old JAR fail with `NoSuchMethodError` at runtime. Which compatibility axis was broken?

Ch. 12 · Quiz3 / 4

Multiple choice

Service A depends on library L version 1.4. Service A also imports library M, which transitively depends on L version 2.0 (which dropped a method A uses). The build picks one version. What does the chapter say the library authors of L should have done?

Ch. 12 · Quiz4 / 4

Spot the issue

A team persists its wire-format Protobuf messages directly into MongoDB as the storage schema. Two years later, the public API needs a breaking v2 redesign of one field. What does the chapter say is wrong with this setup?

Key Takeaways

01

Every meaningful design decision is a tradeoff; patterns are tools, not rules.

02

Premature abstraction is more expensive than premature duplication — defer the merge until the real shape is known.

03

Optimize only the hot path, and only with measurements that tie back to a stated SLA.

04

At-least-once is the network's default; build for idempotency rather than wishing duplicates away.

05

Date/time, third-party libraries, and versioning are deceptively simple — they leak into your design forever.

06

Public extension points, configuration surfaces, and wire formats are forever; add them deliberately.