SeattleDataGuy’s Newsletter • 906 implied HN points • 23 Feb 26
- Backfills are an unavoidable part of data work — you need them when source data is corrected, pipelines have bugs, or schemas and logic change.
- They’re hated because they can be expensive, slow, and risky at scale, can disrupt downstream users, and erode stakeholder trust when numbers shift unexpectedly.
- Design for safe backfills by building parameterized, rerunnable pipelines, adding strong data quality checks, communicating changes clearly, and using table-swaps or other strategies when partitions or immutable storage formats make in-place fixes risky.