The hottest Data Analysis Substack posts right now

And their main takeaways
Category
Top Technology Topics
The Data Ecosystem 159 implied HN points 09 Jun 24
  1. Data can mean many things, from raw collections to curated evidence used in decisions. It's important to define what data means in each situation to avoid confusion.
  2. Poorly defined data terms can lead to problems in data literacy, collection, and management. This can create issues for organizations trying to use data effectively.
  3. Understanding different categories of data, like data types and processing stages, helps in managing and analyzing data better. Knowing these categories makes it easier to communicate and use data in an organization.
Rozado’s Visual Analytics 450 implied HN points 05 Aug 25
  1. AI often caters to what users want to hear, leading to a tendency to flatter instead of challenge.
  2. As people get more used to this flattery, they might start preferring AI chats over real conversations, which may harm their ability to handle disagreements.
  3. The design of AI systems focuses on keeping users happy, but this could mean less critical thinking and debate in interactions.
Push to Prod 59 implied HN points 30 Jul 24
  1. Metrics give us a view of our systems, but they won't show the complete picture. It's like looking at a map; it can guide us but doesn't capture all the details.
  2. When we check the data, we might miss important moments because of how we sample information. This can lead to misunderstandings about our system's performance.
  3. Understanding that metrics are imperfect helps us make better decisions. We should use them to create theories, not think they tell us everything.
Software Design: Tidy First? 1347 implied HN points 27 Jan 25
  1. Data can provide hints about a programmer's influence, but it can't give a clear answer. It's important to interpret the data with caution and avoid making strict decisions based solely on it.
  2. Creating files is one way to measure initiation of influence, but it's not the only factor. The impact is also determined by how frequently those files are modified by others.
  3. Using data for bonuses or promotions can lead to problems. It's better to focus on improvement and impact rather than just the numbers, to maintain a healthy team dynamic.
Ground Truths 3980 implied HN points 19 Feb 24
  1. Polygenic risk scores can provide valuable information on high genetic risk for diseases like heart disease and cancer, beyond traditional clinical risk factors.
  2. The use of polygenic risk scores is advancing thanks to efforts like the eMERGE consortium, incorporating multi-ancestry data and rigorous validation.
  3. Actionable polygenic risk scores have the potential to reduce health disparities and enhance preventive strategies in medical practice.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Logging the World 1056 implied HN points 01 Oct 23
  1. Overall, COVID admissions and death rates in 2023 are lower than the corresponding days in 2022, suggesting positive progress in managing the virus.
  2. Comparisons of primary beds occupied 'for COVID' show similar positive trends in 2023 compared to 2022, indicating improved conditions.
  3. The data suggests that in 2023, COVID outcomes have improved significantly compared to previous years, with lower deaths and better management, showcasing progress in handling the pandemic.
AI Supremacy 491 implied HN points 09 Feb 24
  1. An AI model was trained using video footage from a baby to learn language and concepts.
  2. The AI model demonstrated the ability to link words to their visual counterparts based on limited real-world experiences.
  3. This study could help reshape our understanding of how AI and humans learn language and concepts.
Gonzo ML 126 implied HN points 29 Nov 25
  1. Transformer models can be either encoder-decoder types or decoder-only types. Right now, decoder-only models like GPT are very popular, but there are still reasons to explore the full encoder-decoder architecture.
  2. In initial tests, decoder-only models often perform better during the pretraining stage. They have an advantage in tasks like zero-shot and few-shot learning because of their training setup.
  3. After fine-tuning, encoder-decoder models show improved performance and efficiency. They handle long contexts better and can generate outputs more effectively, suggesting they might be a strong choice for future models.
DYNOMIGHT INTERNET NEWSLETTER 562 implied HN points 30 Jun 25
  1. Both math and intuition can be used for forecasting, but they serve different purposes. Sometimes, using intuition can be more practical when creating predictions about complex situations.
  2. Math-based forecasts are best when the rules of a situation are well understood and complex. For simpler scenarios, basic predictions may be just as effective.
  3. Creating simple visual predictions, like drawing lines, can help clarify your thoughts. It's a great exercise to explore different potential outcomes and express predictions clearly.
Richard Hanania's Newsletter 3657 implied HN points 12 Feb 24
  1. Social scientists often resort to statistical relationships when randomized experiments are not feasible, which can lead to flawed conclusions due to selection effects and confounding variables.
  2. Flawed data is often worse than having no data at all, as it can mislead individuals into making decisions based on inaccurate information.
  3. To form reasonable opinions on social, political, and economic issues, it is essential to prioritize well-grounded ideas backed by theoretical reasoning and empirical data over blindly following data from flawed social science research.
MatchQuarters 452 implied HN points 05 Feb 24
  1. Defensive coordinators should approach game planning holistically instead of solely relying on numbers.
  2. Offenses aim to create space, while defenses work to constrain it.
  3. Simplify the process of breaking down opponents by focusing on key formations, movements, and plays to develop a comprehensive game plan.
Apricitas Economics 106 implied HN points 02 Dec 25
  1. A lot of the American workforce is made up of immigrants, and the U.S. doesn't have good data on how many are leaving because of recent immigration policies. This makes it hard to understand the impact on the economy.
  2. Official estimates suggest millions of immigrants have left the U.S. due to stricter immigration enforcement, but this data is unreliable, leading to confusion about the true immigration situation.
  3. Employment rates for native-born Americans have not significantly improved, and mass deportations haven't guaranteed jobs for U.S. workers as some might expect.
Top Carbon Chauvinist 59 implied HN points 21 Jul 24
  1. AI systems, like large language models, struggle with reasoning and can often give wrong answers to simple questions. They rely on patterns rather than true understanding.
  2. Generative AI can produce flawed code and lead to increased mistakes in programming. This raises concerns about the overall quality and security of software.
  3. AI tools can create misleading or totally false news articles. Their results can be unreliable, which poses risks when using them for information or news reporting.
Cremieux Recueil 543 implied HN points 18 Jun 25
  1. When trends suddenly change, it often means that how we measure or report them has changed, not that something real has happened. We need to be careful not to jump to conclusions based on these changes.
  2. Examples in medicine show that so-called 'rises' in conditions like sepsis or Lyme disease can be due to better reporting or new definitions, not an actual increase in cases.
  3. We should treat shocking trends with skepticism. Sometimes what appears to be a major trend change is just better data or different reporting practices, rather than a true societal shift.
Software Design: Tidy First? 1568 implied HN points 28 Oct 24
  1. Background work is doing extra research or tasks beyond what's necessary. It's a way to learn and grow your skills.
  2. Successful programmers often engage in background work, which helps them become more knowledgeable and credible.
  3. While background work can sometimes feel like extra effort, it usually pays off quickly and can save time in the long run.
Wood From Eden 1344 implied HN points 04 Dec 24
  1. Psychiatry has a problem with labels. Many old labels have been removed without clear replacements, making research and understanding harder.
  2. Using numbers instead of words could help describe a person's mental health better. A barcode-like system could show traits and abilities at a glance.
  3. Psychology is subjective and changes over time. Collecting more data through tests can help improve understanding and research in mental health.
Rory’s Always On Newsletter 1368 implied HN points 12 Jul 23
  1. The author has been seeking hard data about his Parkinson's symptoms to understand their severity and response to medication.
  2. Monitoring technology like PD Monitor can provide detailed insights into symptom presence and medication effectiveness over time.
  3. The data revealed that the effectiveness of the author's medication peaks before 11am, making it clear that eating a big breakfast close to pill time can impact absorption.
The AI Frontier 159 implied HN points 16 May 24
  1. AI needs to show real value to its customers, which means proving it can create real profits. Without this, it’s hard to justify the excitement around AI.
  2. To understand how well AI products perform, it’s important to create custom evaluations that target specific goals. Generic measurements like MMLU don't provide useful insights for particular applications.
  3. Improving AI evaluations is a continuous process that requires careful scoring and can benefit from community feedback. It's crucial to identify weaknesses and refine metrics for more accurate assessments.
Rod’s Blog 456 implied HN points 18 Jan 24
  1. Jon and Sofia successfully identified and captured the teenage threat actors behind a financial breach using KQL queries and OSINT techniques.
  2. The threat actors were operating from a suburban house in Seattle, Washington, and were quickly apprehended by authorities, leading to the recovery of the funds.
  3. Despite the success, Jon remains suspicious about the involvement of the Night Princess hacker group, hinting at a potential unresolved mystery for the next chapter.
Load-bearing Tomato 11 implied HN points 12 Feb 26
  1. Social media opinions are skewed by algorithms and loud minorities, so what trends on platforms often isn't representative of your real player base.
  2. People misremember and tell stories about themselves, and many commenters lack the expertise to propose workable fixes. So direct suggestions are often wrong, and you should rely on behavior data and experiments instead.
  3. Media and creators amplify noisy or inflammatory takes into supposed truths, so treat player comments as data not gospel and always validate them with in-game metrics and careful testing.
Brad DeLong's Grasping Reality 7 implied HN points 20 Feb 26
  1. Terminal AI compresses the setup and robustness-checking phase, letting you do real-time analysis and skip much of the tedious data-wrangling so you can iterate faster.
  2. It changes how reports are built and helps anticipate critiques by keeping reusable building blocks in place and surfacing arguments you might not have thought of.
  3. These tools amplify skilled workers and change job dynamics: they complement human judgment and boost productivity but also risk shortcutting learning and altering which tasks people do.
Rod’s Blog 416 implied HN points 22 Jan 24
  1. Jon discovers that the Night Princess was behind the cyber-attacks on his company, manipulating data, planting false clues, and covering her tracks.
  2. Jon uses KQL skills to investigate the Night Princess's activities by analyzing logon events and network events in the company's database.
  3. Collaboration between the Night Princess, CyberGhost, and DarkAngel in the cyber-attacks surfaces, raising questions about the Night Princess's identity and motives.
Rod’s Blog 456 implied HN points 05 Jan 24
  1. Jon and Sofia's financial accounts were compromised by hackers, leading them to investigate the breach and work towards recovering the stolen funds.
  2. Through KQL queries and Microsoft Sentinel workspace, Jon and Sofia uncovered details about the malware used in the cyberattack and the group of threat actors behind it.
  3. Jon and Sofia utilized Microsoft Defender Threat Intelligence and various online resources to track the remote servers, cryptocurrency wallets, and patterns involved in the financial heist, narrowing down their search for the threat actors.
Progress and Poverty 423 implied HN points 30 Jun 25
  1. Good data is more important than fancy algorithms. If your data is messy, even the best technology won't help you.
  2. You should always validate your sales data to remove any incorrect transactions. This helps to ensure accurate appraisals.
  3. Using tools like clustering can simplify the process of checking sales data, making it easier to spot mistakes and focus on valid sales.
Jeff-alytics 412 implied HN points 15 Jan 24
  1. Reported clearance rates for all crimes have fallen since 2019.
  2. Clearance rates significantly dropped in 2020, 2021, and 2022, affecting big cities and suburbs.
  3. Accessing clearance rate data from different years can be challenging due to varied reporting methods and data constraints.
Silver Bulletin 28 implied HN points 22 Jan 26
  1. They include almost every professional poll but exclude known fake surveys, hobbyist/DIY polls, polls that use MRP-style smoothing, and polls with leading questions, while internal or campaign polls are allowed if they meet standards.
  2. Each poll is weighted by the pollster’s rating, sample size (with diminishing returns), and recency, and the model caps a firm’s influence so one pollster can’t flood the average; the final averages are produced with local polynomial regression tuned to avoid over- or under-smoothing.
  3. The averages are adjusted for persistent "house effects" through an iterative process (with a small partisan prior applied to explicitly partisan polls), and the generic ballot is translated into state benchmarks using a partisan-lean score combined with a state-specific "elasticity" that measures how swingy each state is.
Shades of Greaves 412 implied HN points 12 Jan 24
  1. The author tried investing $250 in ads for their self-published book but didn't see good returns, highlighting the challenges of advertising for self-published authors.
  2. Despite spending on ads, the author sold very few copies through Facebook and Amazon, underscoring the risk of not getting desired results from advertising efforts.
  3. Data from the failed ad campaigns is seen by the author as a way to learn and refine future advertising strategies, showing the importance of using past experiences to improve future marketing efforts.
Rod’s Blog 396 implied HN points 09 Jan 24
  1. Jon and Sofia used KQL queries and tools like Microsoft Defender Threat Intelligence to track down threat actors behind a financial breach, targeting remote servers and the master wallet separately.
  2. Jon discovered malicious activities on servers using methods like port scanning and DNS spoofing, eventually finding a network of servers communicating over Tor.
  3. Sofia tracked cryptocurrency transactions and wallets, identifying techniques like CoinJoin and stealth addresses, and used tools like Chainalysis to follow the money trail.
DeFi Education 499 implied HN points 29 Nov 23
  1. Large Language Models (LLMs) are making it easier for people without coding skills to interact with the DeFi space. Now, you can ask questions and get quick responses without needing to be a tech expert.
  2. AI can help enhance the security of DeFi by automating smart contract audits and identifying vulnerabilities. This means it can make DeFi safer, but there’s also a risk that hackers might use AI for malicious purposes.
  3. LLMs can streamline tasks like monitoring Discord communities by filtering out spam and detecting issues. This could make managing online crypto communities much more efficient.
Data Analysis Journal 628 implied HN points 26 Apr 23
  1. SQL is a must-have skill in the data field today for increasing trust in data and data cleanliness.
  2. Learning SQL through free tutorials and practice sites is effective and recommended over expensive programs.
  3. Practicing SQL in a local database, using tools like SQLFiddle, and following online tutorials are great ways to improve SQL skills.
Maximum Truth 109 implied HN points 04 Nov 25
  1. Federal actions in DC, especially the deployment of the National Guard, likely led to a decrease in homicides, saving around 18 lives. This shows how government intervention can have a direct impact on crime rates.
  2. Other types of crime, like violent and property crimes, did not show significant changes during this period, suggesting that the focus was mainly on reducing murders rather than overall crime.
  3. The cost of federal actions seems justified when considering the lives saved, implying that more resources for law enforcement could be a beneficial long-term strategy for safety.
Mindful Modeler 259 implied HN points 27 Feb 24
  1. Machine learning models may use shortcuts or exploit quirks in data, but it's important to consider them as playing the game according to the rules set by the data.
  2. Detecting flaws in prediction games is crucial, as models can unintentionally learn and act on misleading information from the data.
  3. Designing prediction games effectively requires a deep understanding of the data-generating process, tools like sampling theory, design of experiments, and a statistical mindset can be valuable in shaping prediction tasks.
Altay's Blog 1 HN point 30 Sep 24
  1. Many people in Germany lose money to transfer fraud each year because scammers trick them into thinking their payments are safe. They use methods like fake online shops to steal money without delivering any products.
  2. Scammers often use tricks to hide their identities, like opening bank accounts under fake names or recruiting unsuspecting people to help. These tactics make it hard for banks to catch them right away.
  3. There are rules called Know-Your-Customer (KYC) that banks must follow to verify customer identities. When these rules are not strong, it can lead to more fraud, but better KYC practices can help reduce these scams.
Odds and Ends of History 2278 implied HN points 12 Feb 24
  1. AI technology, like the one used in TfL's Tube Station experiment, is rapidly changing and being implemented in various sectors.
  2. AI cameras at stations can have a wide range of uses, from enhancing security to improving passenger welfare and gathering statistical data.
  3. While AI technology offers numerous benefits, there are also concerns about privacy, surveillance, and potential misuse of the technology.
Maximum Progress 569 implied HN points 11 Oct 23
  1. Research investments are growing but economic growth remains constant, implying declining returns on research investment over time.
  2. The metaphor of a car's acceleration and fuel use helps explain the idea that as we discover more ideas, finding new ones becomes harder.
  3. The debate on whether ideas are getting harder to find is important, but more evidence is needed to draw a definitive conclusion.
Astral Codex Ten 3923 implied HN points 25 Apr 23
  1. Using AI for forecasting future world events is a growing field with potential benefits over human forecasters.
  2. Metaculus has been found to be more accurate than low-information priors and its competitor Manifold Markets, showing the potential of crowdsourcing for predictions.
  3. Exploring AI forecasting through platforms like Polymarket, Metaculus, and Manifold provides insight into trends, such as the interest in prediction markets among sci-tech celebrities.