The hottest Data interpretation Substack posts right now

And their main takeaways
Category
Top Sports Topics
Thinking in Bets 138 implied HN points 01 Nov 24
  1. Annie Duke is starting a new opinion column in The Washington Post, focusing on risk and decision-making. She'll share insights on how we interpret important data.
  2. The column will discuss the misleading nature of data interpretation, particularly regarding Black voters' support in elections. Duke argues that misinterpretations can be more harmful than misinformation.
  3. Annie's background as a decision scientist and former poker player helps her analyze how people make choices, which she'll explore in her writing.
Wyclif's Dust 1609 implied HN points 05 Jun 25
  1. Scientism can happen when researchers make general claims about science without considering the limits of their studies. It's important for scientists to recognize when their findings may not apply broadly.
  2. Social scientists often use big concepts that sound scientific, but they sometimes fail to acknowledge the unique context of their studies. This can lead to misleading conclusions about complex issues.
  3. The way some researchers present their findings may resemble 'cargo cult science,' where they follow scientific methods superficially but miss the deeper understanding needed for true insights. It's essential to connect the rigor of research with the actual realities of the world.
a newsletter for infovores. 91 implied HN points 26 Jan 26
  1. Don’t automatically write off odd poll responses as random bad-faith answers; surprising percentages can represent real opinions that matter politically.
  2. Nontrivial shares of people—even inside expected groups—can hold hawkish or conspiratorial views, so small percentages can still equal large, consequential numbers.
  3. Before dismissing a result, check the question wording, pollster credibility, timing, survey method, and whether other sources corroborate it to judge if it’s noise or a real signal.
Passing Time 3816 implied HN points 06 Nov 24
  1. A major claim about government spending's role in GDP growth was proven incorrect with simple research. It turns out only about 30% of recent GDP growth was due to government spending, not the 85% stated.
  2. The podcast hosts did not provide critical analysis or challenge each other's claims during the discussion, which raises concerns about their credibility.
  3. It's important to verify information from sources you trust, especially when it comes to economic data, to avoid being misled.
Mindful Modeler 279 implied HN points 09 Apr 24
  1. Machine learning is about building prediction models. It covers a wide range of applications, but may not be perfect for unsupervised learning.
  2. Machine learning is about learning patterns from data. This view is useful for understanding ML projects beyond just prediction.
  3. Machine learning is automated decision-making at scale. It emphasizes the purpose of prediction, which is to facilitate decision-making.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Mindful Modeler 479 implied HN points 09 Jan 24
  1. Dealing with non-i.i.d data in machine learning can prevent data leakage, overfitting, and overly optimistic performance evaluation.
  2. For modeling data with dependencies, classical statistical approaches like mixed effect models can be used to correctly estimate coefficients.
  3. In non-i.i.d. data situations, the data splitting setup must align with the real-world use case of the model to avoid issues like row-wise leakage and over-optimistic model performance.
Stealing Signals 439 implied HN points 31 Oct 23
  1. Teams may not always give 100% effort every game in the NFL due to strategic reasons.
  2. Watching games can give a big advantage in fantasy football over just looking at stats.
  3. First-read targets dataset may not accurately reflect offensive intentions in play calling and should be analyzed cautiously.
Mindful Modeler 279 implied HN points 25 Jul 23
  1. SHAP values are like forces acting on a planet in a universe analogy, helping explain machine learning model predictions
  2. Each feature in a machine learning model contributes as a force, with SHAP values showing how they impact the prediction
  3. SHAP values aim to maintain the prediction's equilibrium by considering all forces, revealing which features are vital
Mindful Modeler 479 implied HN points 20 Sep 22
  1. Correlation between features can significantly impact the interpretability of machine learning models, both technically and philosophically.
  2. Identifying and addressing correlation issues is crucial for accurate model interpretation. Techniques include grouping correlated features, decorrelation methods like PCA, feature selection, causal modeling, and conditional interpretation.
  3. Entanglement of interpretation due to correlation makes it challenging to isolate the impact of individual features in machine learning models.
Mindful Modeler 319 implied HN points 08 Sep 22
  1. Focus on better machine learning by thinking like a statistician
  2. Prioritize model interpretation, paying attention to data, and maintaining a critical mindset
  3. Stay tuned for more updates and insights on mindfulmodeler.substack.com
FILWD 39 implied HN points 30 Jan 24
  1. Data-reality gaps exist when there is disconnect between data representation and reality
  2. A data generation model helps in identifying gaps like selection bias and interpretation gap
  3. Understanding different gaps in data can lead to more accurate visualization and interpretation
Mindful Modeler 99 implied HN points 21 Mar 23
  1. Utilize background data creatively in analysis by considering it as more than just a nuisance for estimation
  2. Leverage background data to explore different scenarios like distribution shifts, feature effects in various data groups, and stability of model predictions
  3. Background data plays a crucial role in model-agnostic interpretation methods like Shapley values and permutation feature importance, providing opportunities to enhance analysis by smart selection
Independent SAGE continues 19 implied HN points 04 Apr 24
  1. Currently, there are low levels of Covid in hospitals and the community. The data suggest that the situation is better than many people think.
  2. Some claims about high Covid cases and hospitalizations are misleading. It's important to examine the evidence and context behind those claims.
  3. Overall, the chances of getting severely sick from Covid are much lower now than before, thanks largely to vaccinations and improved immunity.
inexactscience 39 implied HN points 22 Jul 23
  1. Correlation does not mean one thing causes another. Just because two things are related doesn't mean one causes the other.
  2. Many people mistakenly think the correlation coefficient is a percentage. This can be misleading and lead to wrong conclusions.
  3. To understand how much one thing explains another, use the coefficient of determination, not the correlation. Squaring the correlation gives you a clearer picture of the relationship.
Cybernetic Forests 19 implied HN points 09 Jul 23
  1. The story explores the disconnect between data produced by the body and how machines interpret it, highlighting the complexities in translating and calibrating data.
  2. It questions the dangers of misinterpreting brain activity as a linear flow of information, emphasizing the importance of understanding gaps when reconstructing signals.
  3. The narrative offers a prescient warning about the misuse of automated statistical analysis systems to determine societal control based on physical characteristics, urging critical examination of the tools and notions used.
Cybernetic Forests 59 implied HN points 04 Jul 21
  1. Machines understand models of reality through data, influenced by what is deemed significant, leading to gaps and potential misinterpretations.
  2. Datasets are contextual and not universally applicable, emphasizing the importance of clear documentation and awareness of data limitations.
  3. Creating a 'Tourist's Guide to Datasets' with annotations and personal insights can enhance understanding and avoid misuse when data is reused for different purposes.
Musings on Markets 0 implied HN points 31 Aug 16
  1. Mean reversion is the idea that extreme results will return to the average over time. This is seen in sports and investing, but it can lead us to make wrong assumptions about future performance.
  2. There are two types of mean reversion: time series mean reversion, which looks at past average values over time, and cross-sectional mean reversion, which compares values against the average of similar items. Both have their own risks and assumptions.
  3. Structural changes in the economy or companies can disrupt mean reversion, meaning trusting it too much could lead to poor investment decisions. It's important to stay aware of these changes and not just rely on historical data.