♦️ Pipeline Style: then, tap, and the Art of Chaining

2026-03-25

Ruby's then method lets you pipe data through transformations like Unix pipes. tap lets you peek at data mid-chain without changing it. Combined with method chaining, you can build entire text processing pipelines that read top-to-bottom, left-to-right, just like a shell script.

Part 1: then / yield_self

then passes the receiver into a block and returns the block's result. Think of it like a Unix pipe: data flows through transformations.

# Pipeline style - like Unix pipes
result = data
  .then { |d| parse(d) }
  .then { |d| transform(d) }
  .then { |d| format(d) }

# One-liner version
File.read(path).then { JSON.parse(_1) }.then { _1["users"] }

# Practical: read, parse, extract
config = File.read("/etc/app.conf")
  .then { |text| text.split("\n") }
  .then { |lines| lines.reject { _1 =~ %r~^#~ } }
  .then { |lines| lines.map { _1.split("=", 2) } }
  .then { |pairs| pairs.to_h }

No Perl equivalent. You'd use nested function calls or temporary variables. Ruby just lets the data flow.

yield_self is an older alias for then (Ruby 2.5). Use then (Ruby 2.6+).

Part 2: tap (Debug Mid-Chain)

tap passes the receiver into a block but returns THE ORIGINAL OBJECT, not the block's result. Perfect for debugging without breaking the chain:

# Inspect data at each step without changing it
data.map(&:chomp)
    .tap { |x| STDERR.puts "After chomp: #{x.size} lines" }
    .select { _1 =~ %r~error~i }
    .tap { |x| STDERR.puts "After filter: #{x.size} lines" }
    .map(&:downcase)
    .tap { |x| STDERR.puts "Final: #{x.first(3).inspect}" }

The key difference:

then returns the BLOCK'S result (transforms the value)
tap returns the ORIGINAL value (inspects without changing)

tap is printf debugging for method chains. You'll use it more than you think.

Part 3: Method Chaining Best Practices

# Good: clean pipeline
lines
  .map(&:strip)
  .reject(&:empty?)
  .select { _1 =~ %r~ERROR~ }
  .map { _1.split[0] }
  .uniq
  .sort

# vs. Perl-style temp variables (also fine, more familiar):
stripped = lines.map(&:strip)
non_empty = stripped.reject(&:empty?)
errors = non_empty.select { _1 =~ %r~ERROR~ }
timestamps = errors.map { _1.split[0] }
result = timestamps.uniq.sort

Both approaches are valid. Chaining is more concise; temp variables are more debuggable. Pick what reads best for your situation.

Part 4: A Real Text Processing Pipeline

#!/usr/bin/env ruby

# Process Apache access log - extract top IPs with errors

File.readlines("/var/log/apache2/access.log")
  .map(&:chomp)
  .select { _1 =~ %r~ [45]\d{2} ~ }          # 4xx and 5xx responses
  .map { _1.split[0] }                         # extract IP (first field)
  .tally                                        # count occurrences
  .sort_by { |ip, count| -count }              # sort by count descending
  .first(10)                                    # top 10
  .each { |ip, count| printf "%6d  %s\n", count, ip }

Twelve lines. Reads from file, filters error responses, extracts IPs, counts them, sorts by frequency, takes the top 10, and prints a formatted report. No temporary variables. No loops. Just data flowing through transformations.

Part 5: Implicit Return

Ruby methods return the last expression automatically, which enables chaining:

def double(n)
  n * 2       # no 'return' needed
end

# This works because each method returns a value that feeds the next
"hello".upcase.reverse.chars.first
# => "O"

Perl always needs explicit return (or relies on last expression, but it's less idiomatic). In Ruby, everything returns a value, and that's what makes pipelines possible.