ruby.onl / dark-arts

Pipeline Style: then, tap, and the Art of Chaining

2026-03-25

Ruby's then method lets you pipe data through transformations like Unix pipes. tap lets you peek at data mid-chain without changing it. Combined with method chaining, you can build entire text processing pipelines that read top-to-bottom, left-to-right, just like a shell script.

Part 1: then / yield_self

then passes the receiver into a block and returns the block's result. Think of it like a Unix pipe: data flows through transformations.
# Pipeline style - like Unix pipes result = data .then { |d| parse(d) } .then { |d| transform(d) } .then { |d| format(d) } # One-liner version File.read(path).then { JSON.parse(_1) }.then { _1["users"] } # Practical: read, parse, extract config = File.read("/etc/app.conf") .then { |text| text.split("\n") } .then { |lines| lines.reject { _1 =~ %r~^#~ } } .then { |lines| lines.map { _1.split("=", 2) } } .then { |pairs| pairs.to_h }
No Perl equivalent. You'd use nested function calls or temporary variables. Ruby just lets the data flow.

yield_self is an older alias for then (Ruby 2.5). Use then (Ruby 2.6+).

Part 2: tap (Debug Mid-Chain)

tap passes the receiver into a block but returns THE ORIGINAL OBJECT, not the block's result. Perfect for debugging without breaking the chain:
# Inspect data at each step without changing it data.map(&:chomp) .tap { |x| STDERR.puts "After chomp: #{x.size} lines" } .select { _1 =~ %r~error~i } .tap { |x| STDERR.puts "After filter: #{x.size} lines" } .map(&:downcase) .tap { |x| STDERR.puts "Final: #{x.first(3).inspect}" }
The key difference:

tap is printf debugging for method chains. You'll use it more than you think.

Part 3: Method Chaining Best Practices

# Good: clean pipeline lines .map(&:strip) .reject(&:empty?) .select { _1 =~ %r~ERROR~ } .map { _1.split[0] } .uniq .sort # vs. Perl-style temp variables (also fine, more familiar): stripped = lines.map(&:strip) non_empty = stripped.reject(&:empty?) errors = non_empty.select { _1 =~ %r~ERROR~ } timestamps = errors.map { _1.split[0] } result = timestamps.uniq.sort
Both approaches are valid. Chaining is more concise; temp variables are more debuggable. Pick what reads best for your situation.

Part 4: A Real Text Processing Pipeline

#!/usr/bin/env ruby # Process Apache access log - extract top IPs with errors File.readlines("/var/log/apache2/access.log") .map(&:chomp) .select { _1 =~ %r~ [45]\d{2} ~ } # 4xx and 5xx responses .map { _1.split[0] } # extract IP (first field) .tally # count occurrences .sort_by { |ip, count| -count } # sort by count descending .first(10) # top 10 .each { |ip, count| printf "%6d %s\n", count, ip }
Twelve lines. Reads from file, filters error responses, extracts IPs, counts them, sorts by frequency, takes the top 10, and prints a formatted report. No temporary variables. No loops. Just data flowing through transformations.

Part 5: Implicit Return

Ruby methods return the last expression automatically, which enables chaining:
def double(n) n * 2 # no 'return' needed end # This works because each method returns a value that feeds the next "hello".upcase.reverse.chars.first # => "O"
Perl always needs explicit return (or relies on last expression, but it's less idiomatic). In Ruby, everything returns a value, and that's what makes pipelines possible.

Created By: Wildcard Wizard. Copyright 2026