The hottest Analytics Substack posts right now

And their main takeaways
Category
Top Technology Topics
SeattleDataGuy’s Newsletter 494 implied HN points 19 Feb 25
  1. Always focus on the real problem behind a request, not just what is being asked. This helps you deliver better solutions that actually meet the business needs.
  2. Using clear frameworks can help organize your thoughts and make complex investigations easier. A structured approach leads to clearer communication and better results.
  3. Keep your communication simple and focused on what matters to your stakeholders. This helps everyone stay on the same page and reduces confusion.
Kneeling Bus 185 implied HN points 28 Feb 25
  1. Courtsiding is when someone at a game places bets based on what they see in real time, taking advantage of the delay in betting apps. This shows how technology can create new opportunities to win in gambling.
  2. Sports betting is changing the way we consume sports media, with odds and spreads becoming more common on screens. This shift reflects a deeper trend where everything is becoming about numbers and predictions.
  3. As gambling expands into everyday life, people might start betting on personal actions. This can create new ways to have agency, suggesting that even if traditional success seems difficult, there are still ways to find success in unexpected places.
SeattleDataGuy’s Newsletter 812 implied HN points 06 Feb 25
  1. Data engineers are often seen as roadblocks, but cutting them out can lead to major problems later on. Without them, the data can become messy and unmanageable.
  2. Initially, removing data engineers may seem like a win because things move quickly. However, this speed can cause chaos as data quality suffers and standards break down.
  3. A solid data strategy needs structure and governance. Rushing without proper planning can lead to a situation where everything collapses under the weight of disorganization.
Chad Ford's NBA Big Board 19 implied HN points 31 Oct 24
  1. Scouting international NBA prospects is tough because they often play less and face varying competition, making it hard to assess their true potential.
  2. Some young players, like Nolan Traore, show great promise but have mixed stats, indicating areas where they need to improve.
  3. The article highlights top players from Europe now, with plans to cover talents from Australia and China later, suggesting a strong international class for the next NBA draft.
Data People Etc. 231 implied HN points 11 Feb 25
  1. Data is more powerful when it has a purpose. It should tell a clear story, otherwise it's just clutter.
  2. Building a strong data system is like creating a world. A good structure connects different pieces and helps everyone understand the bigger picture.
  3. Data engineering is important because it helps manage and present large amounts of information, making sure everything works smoothly and accurately.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
VuTrinh. 1658 implied HN points 24 Aug 24
  1. Parquet is a special file format that organizes data in columns. This makes it easier and faster to access specific data when you don't need everything at once.
  2. The structure of Parquet involves grouping data into row groups and column chunks. This helps balance the performance of reading and writing data, allowing users to manage large datasets efficiently.
  3. Parquet uses smart techniques like dictionary and run-length encoding to save space. These methods reduce the amount of data stored and speed up the reading process by minimizing the data that needs to be scanned.
Stealing Signals 499 implied HN points 08 Oct 24
  1. Offensive football is evolving, with more exciting plays and downfield shots happening. Quarterbacks are becoming better at making big plays, which makes the game more enjoyable.
  2. In fantasy leagues, it's important to play for high-scoring potential rather than just trying to avoid losses. Playing safe can lead to missed opportunities and a loss, so always aim for the best possible plays.
  3. Analyzing football can be a complex task, and it's common for analysts to have blind spots. It's crucial to keep digging deep and not rely only on surface-level insights to make informed decisions.
Stealing Signals 599 implied HN points 03 Oct 24
  1. Routes data is really important for understanding how well players are performing. Different sources measure these routes in different ways, which can create confusion.
  2. The NFL has started providing its own routes data, which could help standardize how we analyze player performance. This might make comparisons easier and clearer moving forward.
  3. Stats like TPRR (Targets Per Route Run) help us understand player efficiency, but they need to be used alongside other context like player roles and QB performance for better insights.
Freddie deBoer 3712 implied HN points 30 Nov 24
  1. Chiefs fans celebrated a narrow win over a bad team with their war chant, which some see as embarrassing and inappropriate. It's not cool to act like you just beat a top team when you barely won against the worst one.
  2. There are concerns about the Chiefs' performance this season compared to past years. Their offensive play has slowed down, and some fans and analysts feel they aren't as dominant as before.
  3. Many Chiefs fans act like a lot of people hate them because they are successful. Instead, they should recognize their team's success and stop complaining about being disrespected, as they are now a winning franchise.
benn.substack 1713 implied HN points 13 Dec 24
  1. Getting good at something often just takes a little focused effort over time. Many people don't actively try to improve, so they stay at a decent skill level rather than reaching their full potential.
  2. In fields like data analytics, it's essential to specialize to truly excel. Being a generalist might keep you busy, but it can lead to a career without a clear direction or growth.
  3. To stand out and achieve more in their careers, people need to identify a specific area of expertise and commit to it. Relying on being 'good at data' isn't usually enough to make a significant impact.
SeattleDataGuy’s Newsletter 400 implied HN points 17 Jan 25
  1. The data tools market is seeing a lot of consolidation lately, with companies merging or getting acquired. This means there are fewer companies competing, but it can lead to better tools overall.
  2. Acquisitions can be a mixed bag for customers. While some products improve after being bought, others might lose their features or support, making it risky for users.
  3. There's a push for bundled data solutions where customers want fewer, but more comprehensive tools. This could change how data companies operate and how startups survive in the future.

SDF

davidj.substack 59 implied HN points 12 Feb 25
  1. SDF and SQLMesh are alternatives to dbt for data transformation. They are both built with modern tech and aim to provide better ease of use and performance.
  2. SDF has a built-in local database, allowing developers to test queries without costs from a cloud data warehouse. This can speed up development and reduce costs.
  3. Both tools offer column-level lineage to track changes, but SQLMesh provides a better workflow for managing breaking changes. SQLMesh also has unique features like Virtual Data Environments that enhance developer experience.
No Grass in the Clouds 139 implied HN points 11 Oct 24
  1. Brentford has been scoring quickly, netting goals in the first 90 seconds of their games. This gives them a strong advantage over the other teams.
  2. Teams that score first tend to win more often, making early scoring really important in soccer.
  3. Brentford's strategy could be a smart playbook for other teams to follow to boost their chances of winning games.
benn.substack 1099 implied HN points 29 Nov 24
  1. Many jobs in areas like think tanks or journalism are more about creating a background or illusion rather than producing real change or value. They serve as props for the more influential figures.
  2. There's a concern that as AI becomes capable of producing content, it might not be because it’s better, but because the original jobs might not have mattered as much as once thought.
  3. In analytics, there's a question of whether the insights businesses claim to offer are real or just part of the narrative they tell to appear competent and important.
Trench Warfare 79 implied HN points 15 Oct 24
  1. True Pressure Rate (TPR) is a new tool for evaluating pass-rushers that focuses on the quality of pressures, not just the amount. This helps to understand who the best defenders really are.
  2. Pressures are categorized into three quality levels: Rare High Quality, High Quality, and Low Quality. This classification provides deeper insight into a player's performance and effectiveness.
  3. The Pressure Quality Ratio (PQR) compares high-quality pressures to low-quality ones. This helps identify players who may not have a lot of pressures but are still working hard and making an impact.
benn.substack 1099 implied HN points 22 Nov 24
  1. Data quality is important for making both strategic and operational decisions, as inaccurate data can lead to poor outcomes. Good data helps companies know what customers want and improve their services.
  2. AI models can tolerate some bad data better than traditional methods because they average out inaccuracies. This means these models might not break as easily if some of the input data isn’t perfect.
  3. Businesses now care more about AI than they used to about regular data reporting. This shift in focus might make data quality feel more important, even if it doesn’t technically impact AI model performance as much.
VuTrinh. 519 implied HN points 06 Aug 24
  1. Notion uses a flexible block system, letting users customize how they organize their notes and projects. Each block can be changed and moved around, making it easy to create what you need.
  2. To manage the huge amount of data, Notion shifted from a single database to a more complex setup with multiple shards and instances. This change helps them handle stronger user demands and analytics needs more efficiently.
  3. By creating an in-house data lake, Notion saved a lot of money and improved data processing speed. This new system allows them to quickly get data from their main database for analytics and support new features like AI.
SeattleDataGuy’s Newsletter 365 implied HN points 27 Dec 24
  1. Self-service analytics is still a goal for many companies, but it often falls short. Users might struggle with the tools or want different formats for the data, leading to more questions instead of fewer.
  2. Becoming truly data-driven is a challenge for many organizations. Trust issues with data, preference for gut feelings, and poor communication often get in the way of making informed decisions.
  3. People need to be data literate for businesses to succeed with data. The data team must present insights clearly, while business teams should understand and trust the data they work with.
Silver Bulletin 232 implied HN points 06 Jan 25
  1. The Hall of Fame should consider many factors, not just one statistic like Wins Above Replacement (WAR). This means looking at achievements, player talent, and character too.
  2. Players might have high WAR scores but lack the greatness often associated with Hall of Fame status. For example, a consistent but average player shouldn't necessarily be in the Hall over a standout who had fewer career years.
  3. Voters for the Hall of Fame are required to consider a player's overall impact, including postseason performances and fan appeal. This makes it a more complex decision than just focusing on statistics.
VuTrinh. 359 implied HN points 30 Jul 24
  1. Netflix's data engineering stack uses tools like Apache Iceberg and Spark for building batch data pipelines. This helps them transform and manage large amounts of data efficiently.
  2. For real-time data processing, Netflix relies on Apache Flink and a tool called Keystone. This setup makes it easier to handle streaming data and send it where it needs to go.
  3. To ensure data quality and scheduling, Netflix has developed tools like the WAP pattern for auditing data and Maestro for managing workflows. These tools help keep the data process organized and reliable.
The Data Ecosystem 399 implied HN points 21 Jul 24
  1. Poor data quality is a big problem for organizations, but it's often misunderstood. It's not just about fixing bad data; you need to figure out what's causing the issues.
  2. Data quality has many aspects, like accuracy and completeness. Good data helps businesses make better decisions, while bad data can cost a lot of money.
  3. To solve data quality issues, you need a complete approach that looks at different root causes. Simply fixing one part won't fix everything, and different sources might create new problems.
clkao@substack 79 implied HN points 30 Sep 24
  1. GitHub succeeded because it created tools that developers really wanted and used. The combination of Git's technical features and GitHub's social features made it very popular.
  2. The analytics and data workflow still lag behind traditional development methods. It's important to find better ways to show the value of data to businesses.
  3. There's a new way to think about pricing that considers what buyers really want, not just traditional methods. This can lead to smarter pricing strategies.
Huddle Up 40 implied HN points 29 Jan 25
  1. The NFL is exploring new tracking technology to improve accuracy in measuring first downs during games. This could make it easier to determine if a play is successful or not.
  2. Fans are frustrated because they feel the NFL is slow to adopt advancements that other sports have already embraced. For example, technologies like Hawk-Eye in tennis are much faster.
  3. Some people are questioning whether this new technology is actually needed or if it complicates the game more than it helps. There are mixed feelings about its impact on the sport.
davidj.substack 59 implied HN points 13 Jan 25
  1. The gold layer in data architecture has drawbacks, including the loss of information and inflexibility for users. This means important data could be missing, and making changes is hard.
  2. Universal semantic layers offer a better solution by allowing users to request data in plain language without complicated queries. This makes data use easier and more accessible for everyone.
  3. Switching from a gold layer to a semantic layer can improve efficiency and user experience, as it avoids the rigid structure of the gold layer and adapts to user needs more effectively.
davidj.substack 179 implied HN points 02 Dec 24
  1. SQLMesh recently announced that it is backwards compatible with dbt projects. This means teams can gradually switch to SQLMesh without having to do a big migration all at once.
  2. Using SQLMesh can help improve the clarity of data workflows and avoid broken DAGs during development. It offers features that make managing complex data stacks easier.
  3. Migrating to SQLMesh is possible even for those who aren't very tech-savvy. The process can be simple and done in an afternoon, making it accessible for teams to test and implement.
VuTrinh. 399 implied HN points 20 Apr 24
  1. Lakehouse architecture combines the strengths of data lakes and data warehouses. It aims to solve the problems that arise from keeping these two systems separate.
  2. This new approach allows for better data management, including features like ACID transactions and efficient querying of big datasets. It enables real-time analytics on raw data without needing complex data movements.
  3. With the help of technologies like Delta Lake and similar systems, the Lakehouse can handle both structured and unstructured data efficiently, making it a promising solution for modern data needs.
Data Science Weekly Newsletter 959 implied HN points 29 Dec 23
  1. This week, there's a focus on using data science techniques for practical decision-making, highlighted by an interview with Steven Levitt, who discusses making tough choices using data.
  2. There's a roundup of AI developments from 2023, showing how the field has evolved over the past year, which can help professionals stay updated.
  3. Understanding data quality is essential, as it directly impacts how useful data is for decision-making and analysis in any organization.
HyperArc 59 implied HN points 05 Aug 24
  1. AI can help us learn about the Olympics and analyze different aspects, like who won medals and their physical attributes. It starts with basic questions and gets more complicated over time.
  2. While AI is good at remembering information and summarizing it, it struggles with reasoning about things it hasn't seen before. This means it can't always come up with new insights without the right data.
  3. For businesses, using AI with their private data can lead to smarter insights and faster decisions. It's important to combine human knowledge with AI to make the best use of available information.
The Data Ecosystem 199 implied HN points 02 Jun 24
  1. It's important to focus on what the business truly needs from data, not just what they think they want. Conversations should help uncover real goals and challenges.
  2. Data projects often fail because teams don't ask the right questions or fully understand the business context. Engaging stakeholders regularly is key to success.
  3. A clear step-by-step process helps develop effective data solutions. Start with building a strong data foundation before moving on to more complex analytics.
escape the algorithm 579 implied HN points 06 Feb 24
  1. Substack's network effects might be exaggerated: Data shows that most new subscribers come from sources other than Substack.
  2. Subscriber growth on Substack may not solely be due to Substack's technology: Many readers find newsletters due to recommendations from other writers or external sources.
  3. The power of a newsletter audience lies more in the people than the platform: Leaving Substack might not drastically impact growth as much as anticipated.
Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots 59 implied HN points 31 Jul 24
  1. OpenAI bought Rockset to make their data retrieval system better, which helps in using AI more effectively.
  2. The acquisition shows that LLMs are being seen more like a tool, and the focus is shifting to building useful applications using these technologies.
  3. Rockset's technology will help OpenAI work better with developers and make it easier to access and use real-time data for AI products.
benn.substack 1278 implied HN points 19 Jan 24
  1. The modern data stack ecosystem is shifting as interest in generative AI takes over.
  2. The hype surrounding data tools can lead to rapid product development but also instability and distraction.
  3. Startups can find success by focusing on rebuilding existing ideas in a more deliberate and stable manner.
The Data Ecosystem 179 implied HN points 26 May 24
  1. A business strategy is the game plan for a company to reach its goals. It involves having a clear vision, mission, and set of goals to guide the organization.
  2. Good business strategies have defined components that everyone in the company knows. This helps avoid confusion and keeps everyone focused on the same objectives.
  3. Data plays a crucial role in shaping modern business strategies. Companies need to integrate data and analytics into their plans to make informed decisions and stay competitive.
davidj.substack 71 implied HN points 03 Dec 24
  1. There's a new public repository called bluesky-data where people can collaborate and follow along with its development. It's easy to get started by setting it up on your local machine.
  2. Using sqlmesh with the Bluesky data can provide real-time data availability, while also allowing for a more complete view of the data in a batch processing style. This means you can get both immediate updates and historical data.
  3. It's better to start with dlt and then initialize sqlmesh within that project. This way, you can efficiently manage large datasets without needing to compute everything each time.
davidj.substack 47 implied HN points 20 Dec 24
  1. If you're using dbt to run analytics, switching to sqlmesh is a good idea. It offers more features and is easy to learn while still being compatible with dbt.
  2. sqlmesh helps manage data environments and is more comprehensive in handling analytics tasks compared to dbt. It's simpler to transition from dbt to sqlmesh than from older methods like stored procedures.
  3. When using sqlmesh, think about where to run it and how to store its state. You have choices like using a different database or a cloud service, which can save you money and hassle.
davidj.substack 59 implied HN points 10 Dec 24
  1. Virtual data environments in SQLMesh let you test changes without affecting the main data. This means you can quickly see how something would work before actually doing it.
  2. Using snapshots, you can create different versions of data models easily. Each version is linked to a unique fingerprint, so they don't mess with each other.
  3. Creating and managing development environments is much easier now. With just a command, you can set up a new environment that looks just like production, making development smoother.
The Data Ecosystem 259 implied HN points 13 Apr 24
  1. The data industry is really complicated and often misunderstood. People usually talk about symptoms, like bad data quality, instead of getting to the real problems underneath.
  2. It's important to see the entire data ecosystem as connected, not just as separate parts. Understanding how these parts work together can help us find new opportunities and improve how we use data.
  3. This newsletter aims to break down complex data topics into simple ideas. It's like a cheat sheet for everything related to data, helping readers understand what each part is and why it matters.