Three Data Point Thursday

Three Data Point Thursday is a Substack dedicated to enhancing business intelligence through data and AI. It explores the strategic implementation of data teams, AI advancements, data analytics, synthetic data, and open-source contributions to building data-driven companies. The newsletter emphasizes practical approaches for leveraging data for business value, innovation, and efficiency.

Data Strategy Artificial Intelligence Data Analytics Business Intelligence Open Source in Data Synthetic Data Data Engineering Machine Learning Data-Driven Decision Making Community Building in Tech

The hottest Substack posts of Three Data Point Thursday

And their main takeaways
0 implied HN points β€’ 09 Jun 22
  1. The last mile of analytics is crucial for startups but may not be the main bottleneck for all companies.
  2. Encrypted Spark allows for computations on encrypted datasets without significant speed issues.
  3. Data observability, inspired by software engineering, helps determine system health based on outputs.
0 implied HN points β€’ 06 Jan 22
  1. The post is about NFTs and related concepts like Data, Product Thinking, and Nbdev.
  2. The post is by Sven Balnojan, shared on January 6, 2022.
  3. For more information, visit www.thdpth.com.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
0 implied HN points β€’ 07 Oct 21
  1. Matt Turck provided an extensive landscape of the data space, showing significant growth in the past few years.
  2. Dbt's Event Logging package helps keep track of model runs and provides valuable metrics for data analysis.
  3. HomeToGo's use of Apache Airflow showcases the tool's prominence in data orchestration, emphasizing developer experience.
0 implied HN points β€’ 28 Oct 21
  1. Hex notebooks are worth a try for storytelling and building small apps.
  2. Hex has a strong data vision for the future of the market.
  3. The classification of levels in autonomous driving can be a useful framework for designing autonomous systems.
0 implied HN points β€’ 02 Sep 21
  1. Data platform building involves steps like ingestion, storage, analytics, and more with specific tools in each stage.
  2. Debate around Dbt's future importance and the evolving role of data lakes and SQL in the data ecosystem.
  3. Commercial open-source companies like TimescaleDB emphasize the importance of broad adoption and credibility before monetization.
0 implied HN points β€’ 10 Jun 21
  1. Revive your dashboards by creating a single source of truth and replacing old reports with curated ones.
  2. Consider trying out terminusDB for a graph document store with versioning and a simplistic approach.
  3. Focus on improving developer experiences for data professionals, making tasks like tracking data easier.
0 implied HN points β€’ 11 Feb 21
  1. LakeFS provides versioning and branching for data lakes
  2. Declarative DAG tools like Boundary-layer help simplify data pipelines
  3. SQLPad offers easy SQL editing with versioned connections
0 implied HN points β€’ 15 Jun 23
  1. Building products with LLMs is challenging and requires addressing multiple issues.
  2. PandasAI offers AI-powered features for data analysis, focusing on integrating LLMs smartly into products.
  3. Consider switching to SQLMesh from dbt, especially if you are a data engineer or data scientist needing a more developer-focused analytics tool.
0 implied HN points β€’ 17 Mar 23
  1. Dark data is information collected but not utilized, similar to dark matter in the universe.
  2. There are 6 categories of data, including what is used, not used but should be, and should be collected but isn't.
  3. Having unique data, especially dark data, can provide a competitive advantage and is valuable for a company.
0 implied HN points β€’ 13 Jan 22
  1. Dagster focuses on data assets, not just pipelines.
  2. Transitioning to a data mesh may not be suitable for every company at the moment.
  3. Start-ups can benefit from implementing a decentralized data culture from the beginning.
0 implied HN points β€’ 18 Aug 22
  1. Data lakes now have 3 levels for better organization.
  2. Snapshotting data with dbt is ideal, but can be challenging.
  3. Nbdocs is a helpful framework for technical documentation in notebooks.
0 implied HN points β€’ 17 Feb 22
  1. New data orchestrator called Kestra is mature and scalable
  2. Feedback indicates Airbyte and Meltano may not be ideal for mature data engineering teams yet
  3. Nadia Eghbal's Open Source book provides insightful concepts for those in the open-source space
0 implied HN points β€’ 18 Mar 22
  1. The post compares Web3 Leaders with Data Leaders.
  2. The author is Sven Balnojan.
  3. The article was published on March 18, 2022.
0 implied HN points β€’ 12 Aug 21
  1. TikTok's algorithm success is linked to how long users watch videos. It's easy to reverse engineer and could focus more on long-term wins.
  2. Snowflake differs from other platforms by truly decoupling storage from compute. This gives it a unique selling point in the market.
  3. In the data industry, open-source solutions dominate, leading to a winner-takes-all market dynamic similar to search engines.
0 implied HN points β€’ 24 Mar 22
  1. Pricing is challenging in the data space and can impact alignment.
  2. Maxime is a respected figure in the industry, sharing valuable insights on data engineering trends.
  3. Kensu.io offers a data observability toolkit worth exploring.
0 implied HN points β€’ 12 May 22
  1. Functional Data Engineering emphasizes reproducibility and efficiency in data processing.
  2. Nordic Data Architectures prioritize tools with large communities for efficient data processing.
  3. Calculating website carbon emissions can raise awareness and promote environmental responsibility.
0 implied HN points β€’ 26 Aug 22
  1. DAGs in data systems can have downsides.
  2. Data systems benefit from fault-tolerant communication.
  3. Avoid bad practices in data workflows, focus on output.
0 implied HN points β€’ 24 Nov 22
  1. Data concierges over data plumbers is important - deliver the right data, not a lot of data
  2. Components of well-written dbt models include modularity, readability, and speed
  3. Even well-written code decays over time; refactor often to maintain quality
0 implied HN points β€’ 09 Sep 21
  1. Tabular received Series A funding to work on Iceberg, an analytical table format.
  2. Firebolt aims to be faster than Snowflake with unique data handling approaches.
  3. Gloo.us implemented a Kafka-based data mesh to decentralize data ownership and track data versions.