The hottest Data Engineering Substack posts right now

And their main takeaways
Category
Top Technology Topics
Data Engineering Central 137 implied HN points 20 Mar 23
  1. Future proof yourself against AI to stay relevant in the changing landscape of software engineering.
  2. There are three types of people when it comes to AI and programming: those who don't use AI and dismiss it, those who use it to enhance their work, and those who rely on it completely and may become less effective engineers.
  3. The impact of AI on software engineering is inevitable and will lead to changes in the field over time.
Sung’s Substack 139 implied HN points 14 Mar 23
  1. Data engineering involves many tedious tasks and manual checks, hindering the ability to reach a state of flow
  2. Software engineers have smoother workflows and better tools compared to data engineers, allowing them to focus on their work and enjoy the process
  3. There is potential to improve the data engineering workflow by implementing real-time monitoring, interactive previews, and streamlined processes to enhance the experience
timo's substack 78 implied HN points 26 Mar 23
  1. Finding a niche involves identifying what you enjoy and what is consistently needed in your projects.
  2. Tracking data is easily understood, but may have a negative reputation due to its association with web tracking practices.
  3. Measurement is a broader term than tracking, and data collection is often overlooked in the data engineering process.
Counting Stuff 54 implied HN points 11 Jul 23
  1. It is beneficial to have familiarity with running a small server to learn skills and appreciate the work of Ops and SRE professionals.
  2. Consider the value of running a small server for hosting personal projects like a homepage or resume.
  3. Exploring web-based RSS apps can help manage information overload and stay updated with blogs and newsletters.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Software Snack Bites 50 implied HN points 28 Jun 23
  1. Memphis provides a better developer experience for stream processing.
  2. Memphis is designed for quick setup, cost efficiency, and user-friendly monitoring.
  3. Memphis is a platform of choice for companies looking to replace or enhance their streaming platforms.
MLOps Newsletter 39 implied HN points 20 Feb 23
  1. Google open-sourced their blackbox optimization library named Vizier for reliable tuning and optimization.
  2. Pinterest introduced Lightweight Ranking to recommend Pins with better relevance and build scalable ML models.
  3. Netflix uses ML to predict Out of Memory issues in production, overcoming data engineering challenges like structuring data.
Data Products 5 implied HN points 08 Jan 24
  1. Data quality is crucial for machine learning projects and can have negative impacts on both society and individuals.
  2. Advances in Generative AI highlight the importance of high-quality data and the potential shortage of such data.
  3. Data quality affects the machine learning product development cycle, including ongoing maintenance costs of ML pipelines.
Data Products 2 implied HN points 27 Feb 24
  1. Chad Sanderson announced an upcoming book on Data Contracts with O'Reilly, covering topics like what data contracts are, how they work, implementation, examples, and the future implications. The book will delve into Data Quality and Governance.
  2. The first two chapters of the book are available for free on the O'Reilly website. They cover the importance of data contracts and the real goals of data quality initiatives, totaling about 45 pages of content.
  3. Chad Sanderson is currently selecting technical reviewers for the book. Interested individuals can reach out to him to share their thoughts on an advance copy.
Data Products 3 implied HN points 04 Dec 23
  1. Producers need to move towards consumer-defined data contracts to improve data quality and alignment with user needs.
  2. A phased approach of awareness, collaboration, and contract ownership helps in successful data contract adoption.
  3. Starting with consumer-defined contracts drives communication, awareness, and problem visibility, leading to long-term benefits.
Data Products 1 HN point 07 Jul 23
  1. Data requires a source of truth that microservices cannot inherently provide without a shift in software engineering practices
  2. Not all data is equally valuable, so treating all data as microservices can be costly and restrictive
  3. The data development lifecycle differs from software development, requiring flexibility, reuse, and tight coupling that conflict with typical microservices architecture
Bytewax 0 implied HN points 20 Apr 23
  1. Writing a custom input connector for Bytewax involves answering important questions related to partitions, source building, and resuming states
  2. Utilizing Bytewax's recovery system for failure recovery requires proper snapshotting and understanding of how to resume reading from a specific spot
  3. Delivery guarantees in Bytewax are at-least-once by default, and ensuring exactly-once processing may require coordination with the output connector
Reflective Software Engineering 0 implied HN points 12 Jan 24
  1. Having unit tests for SQL queries can help catch bugs introduced during code refactorings or changes.
  2. When writing unit tests for SQL queries, focus on testing the specific parts responsible for building the query rather than the entire method.
  3. Refactoring code for testability can involve moving pure functions outside of the class for easier testing and simplifying methods to focus on specific tasks.
Reflective Software Engineering 0 implied HN points 30 Dec 23
  1. Test-driven development (TDD) is a valuable tool for ensuring software quality and driving great software design.
  2. Testing data integrations and clients, especially in complex data platforms, can be challenging due to less control over underlying databases. Strategies like mocking HTTP interactions can help in testing.
  3. Separating concerns and creating small, testable units of code can enhance confidence in the system, reduce fear of regression, and improve overall software quality.
Three Data Point Thursday 0 implied HN points 15 Jun 23
  1. Building products with LLMs is challenging and requires addressing multiple issues.
  2. PandasAI offers AI-powered features for data analysis, focusing on integrating LLMs smartly into products.
  3. Consider switching to SQLMesh from dbt, especially if you are a data engineer or data scientist needing a more developer-focused analytics tool.
ingest this! 0 implied HN points 12 Mar 24
  1. Rust is reshaping data engineering by offering performance, safety, and concurrency, making it a strong contender alongside languages like Python.
  2. Learning Rust through 'The Rust Programming Language' book provides a solid foundation, with hands-on projects to enhance understanding.
  3. Mathesar is an open-source tool providing a spreadsheet-like interface to PostgreSQL databases, making data collaboration easier and more accessible.
Tributary Data 0 implied HN points 28 Aug 23
  1. Data scrubbing in streaming data pipelines is essential for cleaning and processing data in real-time to ensure it's ready for consumption.
  2. In-broker data transformations powered by WebAssembly (Wasm) are revolutionizing how data processing tasks are handled in streaming data platforms, reducing dependency on external systems.
  3. WebAssembly (Wasm) provides developers with flexibility, performance, security, and portability benefits for server-side processing in frameworks like Redpanda Data Transforms, streamlining data processing tasks within brokers.