The hottest ETL Substack posts right now

And their main takeaways
Category
Top Technology Topics
The Orchestra Data Leadership Newsletter 79 implied HN points 25 Feb 24
  1. ETL (Extract-Transform-Load) and ELT (Extract-Load-Transform) have been key data engineering paradigms, but with the rise of the cloud, the need for in-transit data transformation has decreased.
  2. Fivetran, a widely known data company, is potentially shifting back to ETL methods by offering pre-built transformation features, effectively simplifying the data modeling process for users.
  3. There seems to be a trend towards a possible resurgence of ETL practices in the data industry, with companies like Fivetran potentially leading the way in providing ETL-like services within their platforms.
The Orchestra Data Leadership Newsletter 39 implied HN points 09 Jan 24
  1. The article discusses building a data release pipeline to analyze Hubspot data using Coalesce, a no-code ELT tool on Snowflake.
  2. One key issue encountered was the challenges with Hubspot's data model when trying to consolidate form fill data and messages into a meaningful view.
  3. Setting up Coalesce involves defining storage mappings, granting access to Coalesce users, and carefully handling environments to prevent data overwriting when working between development and production.
The Orchestra Data Leadership Newsletter 39 implied HN points 30 Dec 23
  1. Data teams are increasingly turning to low-code solutions to streamline data release pipelines, utilizing tools like Airflow but questioning the need for extensive code writing and infrastructure maintenance.
  2. The complex cloud environment has led to the development of specialized data tools, making the orchestration of data pipelines challenging and highlighting the importance of governance, data quality, and scalability.
  3. No-code solutions like dbt core and Hightouch are already integrated into many data tools, simplifying the orchestration process and indicating that the future of data architecture might involve a combination of workflow orchestrators and efficient data quality checks.
davidj.substack 47 implied HN points 23 Feb 24
  1. Real-time data streaming from databases like MySQL to data warehouses such as Snowflake can significantly reduce analytics latency, making data processing faster and more efficient.
  2. Streamkap offers a cost-effective solution for streaming ETL, promising to be both faster and more affordable than traditional methods like Fivetran, providing a valuable option for data professionals.
  3. Implementing Streamkap in data architectures can lead to substantial improvements, such as reducing data update lag to under 5 minutes and delivering real-time analytics value for customers, showcasing the impact of cutting-edge data technology.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Tributary Data 0 implied HN points 29 Sep 22
  1. Stateful stream processors and streaming databases have different approaches in handling data ingestion and state persistence.
  2. Stream processors require knowing and embedding state manipulation logic in advance, while streaming databases offer ad-hoc manipulation by consumers.
  3. Stream processors are ideal for automated, machine-driven decision-making, while streaming databases cater to human decision-makers needing fast, ad-hoc data access.
realkinetic 0 implied HN points 15 Oct 20
  1. AWS Glue is a managed service for building ETL jobs on AWS, eliminating the need to manage server infrastructure and making it easy to implement analytics pipelines.
  2. Automating the deployment process of Glue jobs with a CI/CD pipeline, using tools like GitHub Actions, can streamline the workflow and ensure continuous deployment of ETL processes.
  3. Using GitHub Actions, you can convert Jupyter notebooks to Python scripts, upload them to S3, update Glue jobs, and configure AWS CLI for deployment, making the process efficient and scalable.