Three Data Point Thursday

Three Data Point Thursday is a Substack dedicated to enhancing business intelligence through data and AI. It explores the strategic implementation of data teams, AI advancements, data analytics, synthetic data, and open-source contributions to building data-driven companies. The newsletter emphasizes practical approaches for leveraging data for business value, innovation, and efficiency.

Data Strategy Artificial Intelligence Data Analytics Business Intelligence Open Source in Data Synthetic Data Data Engineering Machine Learning Data-Driven Decision Making Community Building in Tech

The hottest Substack posts of Three Data Point Thursday

And their main takeaways
0 implied HN points β€’ 10 Mar 22
  1. Web3 leadership skills are underrated in the data space
  2. Focusing on education over documentation can be more effective for product adoption
  3. A data-driven culture is essential for a lean data mesh setup
0 implied HN points β€’ 28 Feb 22
  1. The book 'Data Mesh in Action' has been updated with more examples, a definition of Data Mesh, and a new chapter on data governance.
  2. An article discusses building Data Mesh on Google Cloud and provides technical insights.
  3. Exploring 'Data as Code' and its application in enhancing data team productivity using concepts from Domain-Driven Design and microservices.
0 implied HN points β€’ 24 Feb 22
  1. Iceberg and Tabular are making working with data easier for everyone.
  2. Make complex data tools approachable and easy to use for all.
  3. Data versioning is underrated but crucial for managing data effectively.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
0 implied HN points β€’ 03 Feb 22
  1. DataOps is a new concept that aims to make data teams more productive by applying agile, software engineering, and manufacturing principles.
  2. Amazon is leveraging AI superpowers to dominate the physical in-store fashion space, showing how data network effects will shape future competition.
  3. Trino, a query engine, offers the benefit of joining data across multiple sources, making it a valuable tool for data analysis.
0 implied HN points β€’ 27 Jan 22
  1. Data Science Infrastructure is crucial for success and companies are improving in this area.
  2. MetaFlow framework continues to innovate with features like MetaCards for machine learning pipelines.
  3. Emphasizing business capabilities over domains can be more practical in the data world.
0 implied HN points β€’ 20 Jan 22
  1. The Data Mesh paradigm is becoming practical with a new ebook focusing on implementation.
  2. In the future, unstructured data will dominate, and technologies for querying it are still developing.
  3. TASTI offers a unique approach to querying unstructured data, potentially paving the way for innovative start-up ideas.
0 implied HN points β€’ 14 Jan 22
  1. Improvements can be made to existing data teams without implementing Data Mesh.
  2. Data teams often create monoliths and dependencies that can be broken down into independent pieces.
  3. Applying concepts like microservices and domain-driven design to the data world can enhance productivity of data teams.
0 implied HN points β€’ 17 Dec 21
  1. Sven Balnojan discusses 3 important questions to consider before implementing a data mesh system.
  2. He points out 3 significant challenges faced by businesses and the data world today.
  3. Sven also shares insights from a talk on modern data warehouse practices, including CI/CD and full component testing.
0 implied HN points β€’ 16 Dec 21
  1. Julia programming language is gaining prominence in the data science community, worth exploring in 2022.
  2. MLOps, the machine learning operations, is an emerging field with growing tools addressing infrastructure needs.
  3. Being open, not necessarily open-source, can still leverage community support and advantages, as seen with Levels Health sharing their approach publicly.
0 implied HN points β€’ 25 Nov 21
  1. Table formats like Iceberg challenge traditional approaches by allowing parts to be exchanged on the fly, potentially reducing vendor lock-ins.
  2. The need for a cheat sheet for a product indicates a usability update may be necessary.
  3. Understanding platforms from an architectural perspective can make managing them more straightforward by focusing on core components, interface stability, and ever-changing complements.
0 implied HN points β€’ 12 Nov 21
  1. Hex Technologies has a strong vision for the data space and focuses on combining different parts of the data pipeline to provide value to end-users.
  2. The company is growing with the exponential increase in data value in the future.
  3. Hex Technologies is still evolving in terms of network effects and open-source strategies, but there is potential for growth in these areas.
0 implied HN points β€’ 29 Oct 21
  1. The data space still lacks a big platform that harnesses network effects.
  2. Tabular and GoodData are potential candidates for building platforms in the future.
  3. Building a true data platform involves leaving short-term revenues, focusing on network effects, and connecting different parts of the data world.
0 implied HN points β€’ 21 Oct 21
  1. Newcomers in the graphDB space are mostly open-source based.
  2. Consider exploring open source for data companies to tackle the 'data snowflake' problem.
  3. Product management in the data space can benefit from using the WSJF method for value estimations.
0 implied HN points β€’ 30 Sep 21
  1. Follow a product-thinking approach in building internal platforms to align with customer needs.
  2. Explore AI tools like GitHub's CoPilot to assist with programming tasks and bring expert coaching to a broader audience.
  3. Consider implementing unique contribution incentives, like the 'Token Race', to engage and empower community members in open-source projects.
0 implied HN points β€’ 08 Apr 21
  1. Machine learning libraries can help with image quality assessment, model explanations, and time series forecasting.
  2. ELI5 Python package provides explanations for predictions from common frameworks like Keras and XGBoost.
  3. Tsfresh simplifies time series machine learning by automating feature calculations and batch processing.
0 implied HN points β€’ 01 Apr 21
  1. Functional data engineering emphasizes the importance of the functional paradigm for efficient batch processing and machine learning.
  2. Functional machine learning frameworks like fklearn offer reproducibility and production-ready models through functional programming paradigms.
  3. Building evolutionary architectures for data involves using specialized tools for specific purposes to create adaptable and flexible data structures.
0 implied HN points β€’ 04 Mar 21
  1. Data will power everything in the future, so it's essential to understand and collect data points.
  2. Joining a data mesh learning slack community can provide a great learning opportunity on building data meshes.
  3. Implementing SLAs and SLOs for data teams can improve transparency, communication, and data quality.
0 implied HN points β€’ 04 Feb 21
  1. Graph embedding involves extracting graph information into a table form for machine learning.
  2. Real graph learning utilizes the graph directly in the learning mechanism without losing information.
  3. Topological learning focuses on the visual features of data and can be used to train machine learning models.
0 implied HN points β€’ 21 Jan 21
  1. DataOps companies aim for error numbers of 1 or less a year by testing value and innovation pipelines.
  2. Centralized efforts are crucial for improving data quality in companies with chain-link logic, like Airbnb's Data Quality Initiative.
  3. Testing with dbt includes using snapshots for data transformation and showcasing output tests to end-users.
0 implied HN points β€’ 08 Feb 24
  1. The newsletter made significant changes in 2023 like switching from private to open mode, rearranging topics, and focusing on longer, thoughtful content.
  2. The author adapted a new writing schedule resulting in increased productivity and better quality content.
  3. The overriding theme of the newsletter remains 'Making companies smarter with data & AI', with a focus on specific topics like data trends, product management, and data startup management.
0 implied HN points β€’ 13 Jul 23
  1. Surgical fine-tuning in ML makes algorithms better suited for specific business contexts through precise changes, an advancement over regular fine-tuning.
  2. Entity-centric data modeling marries ML feature engineering with data engineering, improving data operations for companies.
  3. Estimating efforts for ML projects can be simplified by considering the cost of delay and the real-time requirement of the algorithm.