The hottest Analytics Substack posts right now

And their main takeaways

I Am Begging Sports Media to Stop Misusing Regression to the Mean

Freddie deBoer • 1856 implied HN points • 03 Jul 25

Regression to the mean means that extreme results are unlikely to happen again without some change in conditions. If a team's situation changes, it’s not just luck but a new factor affecting performance.
Using regression to the mean incorrectly can lead to confusion. If someone thinks a team will do worse because they lost players, that’s not regression to the mean; it’s a different kind of prediction.
There’s a risk of making mistakes by assuming past results will always influence future ones, like betting based on past game outcomes. Each situation should be treated by its own conditions.

How to Win at Gambling and in Life

Kneeling Bus • 185 implied HN points • 28 Feb 25

🎾 Sports Betting Media Culture Analytics Technology

Courtsiding is when someone at a game places bets based on what they see in real time, taking advantage of the delay in betting apps. This shows how technology can create new opportunities to win in gambling.
Sports betting is changing the way we consume sports media, with odds and spreads becoming more common on screens. This shift reflects a deeper trend where everything is becoming about numbers and predictions.
As gambling expands into everyday life, people might start betting on personal actions. This can create new ways to have agency, suggesting that even if traditional success seems difficult, there are still ways to find success in unexpected places.

Which way from here?

benn.substack • 1048 implied HN points • 06 Jun 25

🕹 Technology Data science Analytics Software Artificial Intelligence Business Intelligence

Data tools are getting more advanced, but many people still struggle with knowing how to use them effectively. This means that having the right tools isn't enough if users lack direction.
The industry is shifting focus from traditional analytics towards building AI systems and infrastructure. Companies are now adapting their technologies to support AI applications instead of just analyzing data.
Self-serve BI tools aren't being used as intended because people often don't know what questions to ask. Providing clearer direction and goals might help users make better use of available data.

Behind the Numbers: A Deep Dive into 2025's Top International NBA Prospects (Part 1)

Chad Ford's NBA Big Board • 19 implied HN points • 31 Oct 24

🎾 Sports Basketball Scouting International Prospects Analytics

Scouting international NBA prospects is tough because they often play less and face varying competition, making it hard to assess their true potential.
Some young players, like Nolan Traore, show great promise but have mixed stats, indicating areas where they need to improve.
The article highlights top players from Europe now, with plans to cover talents from Australia and China later, suggesting a strong international class for the next NBA draft.

Nearly Headless BI

davidj.substack • 59 implied HN points • 25 Jun 25

🕹 Technology Data science Business Intelligence Artificial Intelligence Analytics Semantics

Snowflake and Databricks are using a semantic layer, which helps make data easier to understand and access. This is a shift from older methods that relied heavily on text-based commands.
The rise of AI has changed what businesses need from their analytics tools. Now, having a semantic layer is a must for companies that want to stay competitive in agentic analytics.
Headless business intelligence is fading away as companies now blend traditional analytics with smarter, AI-driven tools. This could change how data warehouses and BI tools work together in the future.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

I spent 8 hours learning Parquet. Here’s what I discovered

VuTrinh. • 1658 implied HN points • 24 Aug 24

🕹 Technology Data Engineering Data Storage Data processing Analytics

Parquet is a special file format that organizes data in columns. This makes it easier and faster to access specific data when you don't need everything at once.
The structure of Parquet involves grouping data into row groups and column chunks. This helps balance the performance of reading and writing data, allowing users to manage large datasets efficiently.
Parquet uses smart techniques like dictionary and run-length encoding to save space. These methods reduce the amount of data stored and speed up the reading process by minimizing the data that needs to be scanned.

Stealing Signals, Week 5, Part 2

Stealing Signals • 499 implied HN points • 08 Oct 24

🎾 Sports Football Fantasy Analytics Game analysis Player Performance

Offensive football is evolving, with more exciting plays and downfield shots happening. Quarterbacks are becoming better at making big plays, which makes the game more enjoyable.
In fantasy leagues, it's important to play for high-scoring potential rather than just trying to avoid losses. Playing safe can lead to missed opportunities and a loss, so always aim for the best possible plays.
Analyzing football can be a complex task, and it's common for analysts to have blind spots. It's crucial to keep digging deep and not rely only on surface-level insights to make informed decisions.

TPRR check-in: Who has made strides earning volume so far?

Stealing Signals • 599 implied HN points • 03 Oct 24

🎾 Sports Football Analytics Statistics Data Performance

Routes data is really important for understanding how well players are performing. Different sources measure these routes in different ways, which can create confusion.
The NFL has started providing its own routes data, which could help standardize how we analyze player performance. This might make comparisons easier and clearer moving forward.
Stats like TPRR (Targets Per Route Run) help us understand player efficiency, but they need to be used alongside other context like player roles and QB performance for better insights.

Uh, Guys: Brentford Might've Just Solved Soccer

No Grass in the Clouds • 139 implied HN points • 11 Oct 24

🎾 Sports Soccer Analytics Team Strategy

Brentford has been scoring quickly, netting goals in the first 90 seconds of their games. This gives them a strong advantage over the other teams.
Teams that score first tend to win more often, making early scoring really important in soccer.
Brentford's strategy could be a smart playbook for other teams to follow to boost their chances of winning games.

True Pressure Rate (TPR): Week 6 Update

Trench Warfare • 79 implied HN points • 15 Oct 24

🎾 Sports Football Analytics Player evaluation Defense Statistics

True Pressure Rate (TPR) is a new tool for evaluating pass-rushers that focuses on the quality of pressures, not just the amount. This helps to understand who the best defenders really are.
Pressures are categorized into three quality levels: Rare High Quality, High Quality, and Low Quality. This classification provides deeper insight into a player's performance and effectiveness.
The Pressure Quality Ratio (PQR) compares high-quality pressures to low-quality ones. This helps identify players who may not have a lot of pressures but are still working hard and making an impact.

How does Notion handle 200 billion data entities?

VuTrinh. • 519 implied HN points • 06 Aug 24

🕹 Technology Data Engineering Database Management Analytics Machine Learning

Notion uses a flexible block system, letting users customize how they organize their notes and projects. Each block can be changed and moved around, making it easy to create what you need.
To manage the huge amount of data, Notion shifted from a single database to a more complex setup with multiple shards and instances. This change helps them handle stronger user demands and analytics needs more efficiently.
By creating an in-house data lake, Notion saved a lot of money and improved data processing speed. This new system allows them to quickly get data from their main database for analytics and support new features like AI.

Chiefs Fans Doing the "War Chant" After Beating the Worst Team in the League by Two is the Most Pathetic Thing I've Ever Seen From Any Fanbase

Freddie deBoer • 3712 implied HN points • 30 Nov 24

🎾 Sports Football Fan culture Analytics Team Dynamics Media Coverage

Chiefs fans celebrated a narrow win over a bad team with their war chant, which some see as embarrassing and inappropriate. It's not cool to act like you just beat a top team when you barely won against the worst one.
There are concerns about the Chiefs' performance this season compared to past years. Their offensive play has slowed down, and some fans and analysts feel they aren't as dominant as before.
Many Chiefs fans act like a lot of people hate them because they are successful. Instead, they should recognize their team's success and stop complaining about being disrespected, as they are now a winning franchise.

The Hall of Fame is about more than WAR

Silver Bulletin • 232 implied HN points • 06 Jan 25

🎾 Sports Baseball Analytics Hall of Fame Player evaluation Statistics

The Hall of Fame should consider many factors, not just one statistic like Wins Above Replacement (WAR). This means looking at achievements, player talent, and character too.
Players might have high WAR scores but lack the greatness often associated with Hall of Fame status. For example, a consistent but average player shouldn't necessarily be in the Hall over a standout who had fewer career years.
Voters for the Hall of Fame are required to consider a player's overall impact, including postseason performances and fan appeal. This makes it a more complex decision than just focusing on statistics.

Netflix Data Engineer Stack

VuTrinh. • 359 implied HN points • 30 Jul 24

🕹 Technology Data Engineering Software Tools Streaming Analytics Infrastructure

Netflix's data engineering stack uses tools like Apache Iceberg and Spark for building batch data pipelines. This helps them transform and manage large amounts of data efficiently.
For real-time data processing, Netflix relies on Apache Flink and a tool called Keystone. This setup makes it easier to handle streaming data and send it where it needs to go.
To ensure data quality and scheduling, Netflix has developed tools like the WAP pattern for auditing data and Maestro for managing workflows. These tools help keep the data process organized and reliable.

Issue #15 – The Data Quality Conundrum (Part 1 – Root Causes)

The Data Ecosystem • 399 implied HN points • 21 Jul 24

💼 Business Data Quality Management Analytics Governance Process Improvement

Poor data quality is a big problem for organizations, but it's often misunderstood. It's not just about fixing bad data; you need to figure out what's causing the issues.
Data quality has many aspects, like accuracy and completeness. Good data helps businesses make better decisions, while bad data can cost a lot of money.
To solve data quality issues, you need a complete approach that looks at different root causes. Simply fixing one part won't fix everything, and different sources might create new problems.

Stuff Worth Reading - Sep 2024

clkao@substack • 79 implied HN points • 30 Sep 24

🕹 Technology Software Development Version Control Analytics Pricing Strategy

GitHub succeeded because it created tools that developers really wanted and used. The combination of Git's technical features and GitHub's social features made it very popular.
The analytics and data workflow still lag behind traditional development methods. It's important to find better ways to show the value of data to businesses.
There's a new way to think about pricing that considers what buyers really want, not just traditional methods. This can lead to smarter pricing strategies.

Is this a career?

benn.substack • 1713 implied HN points • 13 Dec 24

💼 Business Analytics Data science Careers Expertise Technology

Getting good at something often just takes a little focused effort over time. Many people don't actively try to improve, so they stay at a decent skill level rather than reaching their full potential.
In fields like data analytics, it's essential to specialize to truly excel. Being a generalist might keep you busy, but it can lead to a career without a clear direction or growth.
To stand out and achieve more in their careers, people need to identify a specific area of expertise and commit to it. Relying on being 'good at data' isn't usually enough to make a significant impact.

Is It Time to Say Goodbye to Data Engineers?

SeattleDataGuy’s Newsletter • 812 implied HN points • 06 Feb 25

🕹 Technology Data Engineering Software Development Data Management Business Intelligence Analytics

Data engineers are often seen as roadblocks, but cutting them out can lead to major problems later on. Without them, the data can become messy and unmanageable.
Initially, removing data engineers may seem like a win because things move quickly. However, this speed can cause chaos as data quality suffers and standards break down.
A solid data strategy needs structure and governance. Rushing without proper planning can lead to a situation where everything collapses under the weight of disorganization.

From Average To Exceptional: 5 Skills To Help You Become The Data Expert Everyone Wants To Work With

SeattleDataGuy’s Newsletter • 494 implied HN points • 19 Feb 25

💼 Business Data science Career development Communication Analytics Team Collaboration

Always focus on the real problem behind a request, not just what is being asked. This helps you deliver better solutions that actually meet the business needs.
Using clear frameworks can help organize your thoughts and make complex investigations easier. A structured approach leads to clearer communication and better results.
Keep your communication simple and focused on what matters to your stakeholders. This helps everyone stay on the same page and reduces confusion.

Searching for insight

benn.substack • 1099 implied HN points • 29 Nov 24

💼 Business Economics Analytics Media Marketing Publishing Data

Many jobs in areas like think tanks or journalism are more about creating a background or illusion rather than producing real change or value. They serve as props for the more influential figures.
There's a concern that as AI becomes capable of producing content, it might not be because it’s better, but because the original jobs might not have mattered as much as once thought.
In analytics, there's a question of whether the insights businesses claim to offer are real or just part of the narrative they tell to appear competent and important.

🔍 The best free investing tools

Compounding Quality • 2987 implied HN points • 13 Apr 23

💰 Finance Investing Stocks Resources Financial News Analytics

The internet is full of investing resources that can be a gold mine if used well.
Morningstar provides a wide range of investment research and services.
Dataroma allows tracking the portfolios of top investors in the world.

Does data quality matter?

benn.substack • 1099 implied HN points • 22 Nov 24

🕹 Technology Data Quality AI Models Software Development Business strategy Analytics

Data quality is important for making both strategic and operational decisions, as inaccurate data can lead to poor outcomes. Good data helps companies know what customers want and improve their services.
AI models can tolerate some bad data better than traditional methods because they average out inaccuracies. This means these models might not break as easily if some of the input data isn’t perfect.
Businesses now care more about AI than they used to about regular data reporting. This shift in focus might make data quality feel more important, even if it doesn’t technically impact AI model performance as much.

From Boom to Bundle: The Great Consolidation of Data Tools

SeattleDataGuy’s Newsletter • 400 implied HN points • 17 Jan 25

🕹 Technology Data Tools Mergers & Acquisitions Analytics Data science Business Intelligence

The data tools market is seeing a lot of consolidation lately, with companies merging or getting acquired. This means there are fewer companies competing, but it can lead to better tools overall.
Acquisitions can be a mixed bag for customers. While some products improve after being bought, others might lose their features or support, making it risky for users.
There's a push for bundled data solutions where customers want fewer, but more comprehensive tools. This could change how data companies operate and how startups survive in the future.

Do we need the Lakehouse architecture?

VuTrinh. • 399 implied HN points • 20 Apr 24

🕹 Technology Data architecture Data Management Machine Learning Analytics

Lakehouse architecture combines the strengths of data lakes and data warehouses. It aims to solve the problems that arise from keeping these two systems separate.
This new approach allows for better data management, including features like ACID transactions and efficient querying of big datasets. It enables real-time analytics on raw data without needing complex data movements.
With the help of technologies like Delta Lake and similar systems, the Lakehouse can handle both structured and unstructured data efficiently, making it a promising solution for modern data needs.

Data Science Weekly - Issue 527

Data Science Weekly Newsletter • 959 implied HN points • 29 Dec 23

🕹 Technology Data science Machine Learning Artificial Intelligence Data Engineering Analytics

This week, there's a focus on using data science techniques for practical decision-making, highlighted by an interview with Steven Levitt, who discusses making tough choices using data.
There's a roundup of AI developments from 2023, showing how the field has evolved over the past year, which can help professionals stay updated.
Understanding data quality is essential, as it directly impacts how useful data is for decision-making and analysis in any organization.

Holy Grails of Data: Self-Service, Single Truths, and the Role of AI

SeattleDataGuy’s Newsletter • 365 implied HN points • 27 Dec 24

🕹 Technology Data science AI Analytics Business Intelligence Machine Learning

Self-service analytics is still a goal for many companies, but it often falls short. Users might struggle with the tools or want different formats for the data, leading to more questions instead of fewer.
Becoming truly data-driven is a challenge for many organizations. Trust issues with data, preference for gut feelings, and poor communication often get in the way of making informed decisions.
People need to be data literate for businesses to succeed with data. The data team must present insights clearly, while business teams should understand and trust the data they work with.

Olympics, AI, and some BI

HyperArc • 59 implied HN points • 05 Aug 24

🕹 Technology AI Data science Analytics Machine Learning Software Development

AI can help us learn about the Olympics and analyze different aspects, like who won medals and their physical attributes. It starts with basic questions and gets more complicated over time.
While AI is good at remembering information and summarizing it, it struggles with reasoning about things it hasn't seen before. This means it can't always come up with new insights without the right data.
For businesses, using AI with their private data can lead to smarter insights and faster decisions. It's important to combine human knowledge with AI to make the best use of available information.

Issue #8 - Deliver on the Data Needs, not the Data Desires

The Data Ecosystem • 199 implied HN points • 02 Jun 24

💼 Business Data Strategy Stakeholder engagement Analytics Data Governance Business Intelligence

It's important to focus on what the business truly needs from data, not just what they think they want. Conversations should help uncover real goals and challenges.
Data projects often fail because teams don't ask the right questions or fully understand the business context. Engaging stakeholders regularly is key to success.
A clear step-by-step process helps develop effective data solutions. Start with building a strong data foundation before moving on to more complex analytics.

Is B2C and B2B Growth... the same?

Elena's Growth Scoop • 1257 implied HN points • 08 Aug 23

💼 Business Marketing Analytics Growth

B2C and B2B growth are not the same.
Collaboration with experts like Enzo Avigo can provide valuable insights.
Consider a 7-day free trial to access full post archives for more information.

Worldbuilding with data

Data People Etc. • 231 implied HN points • 11 Feb 25

🕹 Technology Data Engineering Data science Big Data Data Visualization Analytics

Data is more powerful when it has a purpose. It should tell a clear story, otherwise it's just clutter.
Building a strong data system is like creating a world. A good structure connects different pieces and helps everyone understand the bigger picture.
Data engineering is important because it helps manage and present large amounts of information, making sure everything works smoothly and accurately.

Is Substack exaggerating its network effects?

escape the algorithm • 579 implied HN points • 06 Feb 24

💼 Business Tech Marketing Analytics Platforms Subscription

Substack's network effects might be exaggerated: Data shows that most new subscribers come from sources other than Substack.
Subscriber growth on Substack may not solely be due to Substack's technology: Many readers find newsletters due to recommendations from other writers or external sources.
The power of a newsletter audience lies more in the people than the platform: Leaving Substack might not drastically impact growth as much as anticipated.

OpenAI Acquired Rockset

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 59 implied HN points • 31 Jul 24

🕹 Technology AI Data Analytics Infrastructure Applications

OpenAI bought Rockset to make their data retrieval system better, which helps in using AI more effectively.
The acquisition shows that LLMs are being seen more like a tool, and the focus is shifting to building useful applications using these technologies.
Rockset's technology will help OpenAI work better with developers and make it easier to access and use real-time data for AI products.

Issue #7 - The Business Strategy, Where The Data Journey Starts

The Data Ecosystem • 179 implied HN points • 26 May 24

💼 Business Strategy Data Leadership Analytics Performance

A business strategy is the game plan for a company to reach its goals. It involves having a clear vision, mission, and set of goals to guide the organization.
Good business strategies have defined components that everyone in the company knows. This helps avoid confusion and keeps everyone focused on the same objectives.
Data plays a crucial role in shaping modern business strategies. Companies need to integrate data and analytics into their plans to make informed decisions and stay competitive.

Intro to SQL Indexes

Data Engineering Central • 589 implied HN points • 17 Jan 24

🕹 Technology Databases Indexes Performance Analytics

Indexes are crucial for improving performance in SQL operations and data access.
Clustered and non-clustered indexes are the two main types to understand in SQL indexing.
Understanding use cases and query access patterns is key to designing effective indexes for data warehouses.

Issue #1 - We Need to Rethink Data

The Data Ecosystem • 259 implied HN points • 13 Apr 24

🕹 Technology Data Management Information Systems Data Strategy Business Intelligence Analytics

The data industry is really complicated and often misunderstood. People usually talk about symptoms, like bad data quality, instead of getting to the real problems underneath.
It's important to see the entire data ecosystem as connected, not just as separate parts. Understanding how these parts work together can help us find new opportunities and improve how we use data.
This newsletter aims to break down complex data topics into simple ideas. It's like a cheat sheet for everything related to data, helping readers understand what each part is and why it matters.

GroupBy #41: Uber’s Batch Data Infrastructure with Google Cloud Platform

VuTrinh. • 99 implied HN points • 25 Jun 24

🕹 Technology Data Engineering Cloud Computing Machine Learning Infrastructure Analytics

Uber is moving its huge amount of data to Google Cloud to keep up with its growth. They want a smooth transition that won't disrupt current users.
They are using existing technologies to make sure the change is easy. This includes tools that will help keep data safe and accessible during the move.
Managing costs is a big concern for Uber. They plan to track and control spending carefully as they switch to cloud services.

Data Science Weekly - Issue 543

Data Science Weekly Newsletter • 219 implied HN points • 19 Apr 24

🕹 Technology Data science Machine Learning AI Analytics Data Engineering

Statistical ideas have a big impact on the world. Learning about important papers can help us understand how statistics shape modern research and decision-making.
Machine Learning teams have different roles that face unique challenges. Understanding these personas can help leaders support their teams better.
Using vector embeddings can greatly improve search experiences in apps. They simplify processes that previously seemed too complex and highlight their usefulness in technology.

Procella - The query engine at YouTube

VuTrinh. • 79 implied HN points • 29 Jun 24

🕹 Technology Data Engineering Cloud Computing Database Systems Analytics

YouTube built Procella to combine different data processing needs into one powerful SQL query engine. This means they can handle many tasks, like analytics and reporting, without needing separate systems for each task.
Procella is designed for high performance and scalability by keeping computing and storage separate. This makes it faster and more efficient, allowing for quick data access and analysis.
The engine uses clever techniques to reduce delays and improve response times, even when many users are querying at once. It constantly optimizes and adapts, making sure users get their data as quickly as possible.

Indian IT’s GenAI Lag

Sector 6 | The Newsletter of AIM • 439 implied HN points • 14 Jan 24

🕹 Technology AI IT Business Analytics Software

Indian IT companies like Infosys and TCS have shown strong financial performance, but they lack confidence in generating revenue from generative AI.
In contrast, Accenture is making notable progress with generative AI, securing significant investments and showcasing strong growth.
Many Indian IT firms are reducing new hiring and focusing more on training current employees, highlighting an emphasis on automation and upskilling rather than bringing on fresh talent.

The dire state of B2B marketing attribution

Elena's Growth Scoop • 628 implied HN points • 09 Nov 23

💼 Business Marketing B2B Digital Analytics

B2B marketing attribution efforts are often non-existent.
B2B companies struggle with measuring attribution on the pipeline level.
Many B2B businesses are stuck in last-click attribution, neglecting other channels.