The hottest Data interpretation Substack posts right now

And their main takeaways
Category
Top Sports Topics
Thinking in Bets β€’ 138 implied HN points β€’ 01 Nov 24
  1. Annie Duke is starting a new opinion column in The Washington Post, focusing on risk and decision-making. She'll share insights on how we interpret important data.
  2. The column will discuss the misleading nature of data interpretation, particularly regarding Black voters' support in elections. Duke argues that misinterpretations can be more harmful than misinformation.
  3. Annie's background as a decision scientist and former poker player helps her analyze how people make choices, which she'll explore in her writing.
Passing Time β€’ 3816 implied HN points β€’ 06 Nov 24
  1. A major claim about government spending's role in GDP growth was proven incorrect with simple research. It turns out only about 30% of recent GDP growth was due to government spending, not the 85% stated.
  2. The podcast hosts did not provide critical analysis or challenge each other's claims during the discussion, which raises concerns about their credibility.
  3. It's important to verify information from sources you trust, especially when it comes to economic data, to avoid being misled.
Mindful Modeler β€’ 279 implied HN points β€’ 09 Apr 24
  1. Machine learning is about building prediction models. It covers a wide range of applications, but may not be perfect for unsupervised learning.
  2. Machine learning is about learning patterns from data. This view is useful for understanding ML projects beyond just prediction.
  3. Machine learning is automated decision-making at scale. It emphasizes the purpose of prediction, which is to facilitate decision-making.
Mindful Modeler β€’ 479 implied HN points β€’ 09 Jan 24
  1. Dealing with non-i.i.d data in machine learning can prevent data leakage, overfitting, and overly optimistic performance evaluation.
  2. For modeling data with dependencies, classical statistical approaches like mixed effect models can be used to correctly estimate coefficients.
  3. In non-i.i.d. data situations, the data splitting setup must align with the real-world use case of the model to avoid issues like row-wise leakage and over-optimistic model performance.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Mindful Modeler β€’ 279 implied HN points β€’ 25 Jul 23
  1. SHAP values are like forces acting on a planet in a universe analogy, helping explain machine learning model predictions
  2. Each feature in a machine learning model contributes as a force, with SHAP values showing how they impact the prediction
  3. SHAP values aim to maintain the prediction's equilibrium by considering all forces, revealing which features are vital
Mindful Modeler β€’ 479 implied HN points β€’ 20 Sep 22
  1. Correlation between features can significantly impact the interpretability of machine learning models, both technically and philosophically.
  2. Identifying and addressing correlation issues is crucial for accurate model interpretation. Techniques include grouping correlated features, decorrelation methods like PCA, feature selection, causal modeling, and conditional interpretation.
  3. Entanglement of interpretation due to correlation makes it challenging to isolate the impact of individual features in machine learning models.
Mindful Modeler β€’ 99 implied HN points β€’ 21 Mar 23
  1. Utilize background data creatively in analysis by considering it as more than just a nuisance for estimation
  2. Leverage background data to explore different scenarios like distribution shifts, feature effects in various data groups, and stability of model predictions
  3. Background data plays a crucial role in model-agnostic interpretation methods like Shapley values and permutation feature importance, providing opportunities to enhance analysis by smart selection
Independent SAGE continues β€’ 19 implied HN points β€’ 04 Apr 24
  1. Currently, there are low levels of Covid in hospitals and the community. The data suggest that the situation is better than many people think.
  2. Some claims about high Covid cases and hospitalizations are misleading. It's important to examine the evidence and context behind those claims.
  3. Overall, the chances of getting severely sick from Covid are much lower now than before, thanks largely to vaccinations and improved immunity.
inexactscience β€’ 39 implied HN points β€’ 22 Jul 23
  1. Correlation does not mean one thing causes another. Just because two things are related doesn't mean one causes the other.
  2. Many people mistakenly think the correlation coefficient is a percentage. This can be misleading and lead to wrong conclusions.
  3. To understand how much one thing explains another, use the coefficient of determination, not the correlation. Squaring the correlation gives you a clearer picture of the relationship.
Cybernetic Forests β€’ 19 implied HN points β€’ 09 Jul 23
  1. The story explores the disconnect between data produced by the body and how machines interpret it, highlighting the complexities in translating and calibrating data.
  2. It questions the dangers of misinterpreting brain activity as a linear flow of information, emphasizing the importance of understanding gaps when reconstructing signals.
  3. The narrative offers a prescient warning about the misuse of automated statistical analysis systems to determine societal control based on physical characteristics, urging critical examination of the tools and notions used.
Cybernetic Forests β€’ 59 implied HN points β€’ 04 Jul 21
  1. Machines understand models of reality through data, influenced by what is deemed significant, leading to gaps and potential misinterpretations.
  2. Datasets are contextual and not universally applicable, emphasizing the importance of clear documentation and awareness of data limitations.
  3. Creating a 'Tourist's Guide to Datasets' with annotations and personal insights can enhance understanding and avoid misuse when data is reused for different purposes.
Granted β€’ 19 implied HN points β€’ 04 Aug 19
  1. Strive to be better, not the best. The competition should be with your past and future self.
  2. Data doesn't really talk, people interpret it. Question the competence and integrity of data interpreters.
  3. To be a good mentee, value mentor's time, seek clear guidance, be open to ideas and reflect on your progress.
Musings on Markets β€’ 0 implied HN points β€’ 31 Aug 16
  1. Mean reversion is the idea that extreme results will return to the average over time. This is seen in sports and investing, but it can lead us to make wrong assumptions about future performance.
  2. There are two types of mean reversion: time series mean reversion, which looks at past average values over time, and cross-sectional mean reversion, which compares values against the average of similar items. Both have their own risks and assumptions.
  3. Structural changes in the economy or companies can disrupt mean reversion, meaning trusting it too much could lead to poor investment decisions. It's important to stay aware of these changes and not just rely on historical data.