The hottest Data Analysis Substack posts right now

And their main takeaways
Category
Top Technology Topics
Mindful Modeler 59 implied HN points 06 Dec 22
  1. The concept of creating fictive datasets using GPT-3 for testing ML models and educational purposes is explored in 'The Infinite Data Hallucinator'.
  2. The 'Infinite Data Hallucinator' is a Jupyter notebook script that leverages the OpenAI API and pandas DataFrame to generate datasets based on a user-provided prompt.
  3. While the generated datasets may have superficial coherence, they are not entirely realistic, and there are limitations due to token limits when creating larger datasets.
The Data Score 19 implied HN points 11 Dec 23
  1. The fashion industry in the US is promoting more aggressively this holiday season, with an increase in the percentage of products discounted and a decrease in average percentage markdown compared to last year.
  2. 48% of fashion retailers are promoting more aggressively this year, while 48% are promoting less aggressively, showing variations in promotional strategies among different brands.
  3. Flywheel's web-mined pricing data indicates a response to the holiday season through increased discounting activity, leading to a greater percentage of products being sold out.
Sunday Letters 19 implied HN points 11 Dec 23
  1. The job market is always changing, just like it did when agriculture jobs shrank a century ago. People need to adapt and learn new skills to keep up.
  2. Everyone now has the chance to do data analysis, which is great for innovation. Fast and low-cost experiments help us find unexpected insights.
  3. Understanding basic concepts like mean vs median is becoming more important. It helps people ask better questions and make sense of the data they encounter.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Conspirador Norteño 44 implied HN points 22 Nov 24
  1. The 'For You' feed on X shows mostly posts from accounts you don't follow. In fact, more than half of the recommended posts come from these unfamiliar sources.
  2. Elon Musk's posts are the most frequently suggested, even to users who do not follow him. This indicates that trending figures often dominate the recommendation algorithm.
  3. Connections between suggested accounts are mostly based on repost interactions. Most recommended accounts have links to the ones you already follow, showing a network effect.
Natto Thoughts 19 implied HN points 07 Dec 23
  1. The post discusses disinformation and how it can harm individuals and society.
  2. Tips are provided to detect and avoid disinformation, including advice on how to investigate sources and spot deepfakes.
  3. Various professionals like litigators, intelligence analysts, fact-checkers, and historians, provide valuable insights for countering disinformation.
The Security Industry 13 implied HN points 16 Jul 25
  1. There are many cybersecurity companies with fewer than 50 employees showing growth. In fact, there are currently 459 of them that have positive growth this year.
  2. Some companies from last year's Fast 50 list have continued to thrive and are on track to join a larger group called the Cyber 150.
  3. Tracking data helps identify which smaller companies are rising quickly in the cybersecurity field, making it easier to spot potential leaders.
Data Thoughts 79 implied HN points 21 Oct 22
  1. Working in data often feels lonely, since a lot of the work is done solo on a computer, but there's magic in that solitude.
  2. Events and communities bring people together, making these lonely moments feel connected and meaningful, especially in the data field.
  3. The joy of working with data comes from the love of the craft itself, not just the outcomes or recognition, and that passion can survive even in tough times.
jonstokes.com 154 implied HN points 18 May 23
  1. Different approaches to evaluating AI performance have practical implications in development, deployment, and regulation.
  2. Language models like GPT-4 struggle with resolving ambiguity in human language due to limitations in understanding context.
  3. Using an engineering approach, providing relevant context, and improving language parsing can help mitigate language model biases and inaccuracies.
ASeq Newsletter 14 implied HN points 20 Jun 25
  1. Oxford Nanopore has stopped sharing details about its customer base, which raises concerns about growth. It's unclear how many customers they really have now.
  2. The MinION, which has a lot of users, isn't very profitable for Oxford, so its slowing growth might not be a big issue.
  3. Research funding seems to be declining overall, which could affect Oxford and other companies in the field, especially for their larger customers.
Three Data Point Thursday 19 implied HN points 16 Nov 23
  1. Time series models, like TimeGPT, are advancing and will provide a significant boost in machine learning capabilities.
  2. Adding time as a feature in models can enhance data analysis due to the information richness of recent data.
  3. Although skepticism exists around time series machine learning models, advancements in generic models like TimeGPT are removing some barriers.
School Shooting Data Analysis and Reports 19 implied HN points 15 Nov 23
  1. School shootings go beyond high profile incidents like Parkland, impacting hundreds of schools with lockdowns and swatting hoaxes, creating a broader emotional and social toll on students.
  2. Swatting, false 911 calls to trigger police response, poses a real danger to schools and has become a widespread issue, including multi-state serial swattings.
  3. Collaboration between The Economist and the K-12 School Shooting Database sheds light on the increasing security spending in schools, revealing the mismatch between rising security measures and the continued occurrences of shootings.
Artificial Ignorance 25 implied HN points 06 Mar 25
  1. Several new advanced AI models have been released recently, improving reasoning and knowledge. These models, like OpenAI's GPT-4.5 and Google's Gemini 2.0, excel in different areas.
  2. AI is becoming more interactive with features that let it browse the web and perform tasks for users. This shows a shift towards AI that can take action, not just chat.
  3. The best AI models now cost more, with some requiring premium subscriptions. While powerful models like GPT-4.5 have high access fees, other new features may be available for free with some limits.
Cremieux Recueil 96 implied HN points 31 Dec 23
  1. The observed Black-White intelligence gap in standardized test performance has shown some variations over the years.
  2. Errors were found in a study that claimed a significant closure in the intelligence gap between Black and White individuals.
  3. Recent data and analyses suggest that the racial intelligence gap in the U.S. has not significantly closed and remains consistent with historical observations.
The Orchestra Data Leadership Newsletter 19 implied HN points 13 Nov 23
  1. Zero ELT aims to streamline data processing by eliminating traditional extraction, loading, and transformation tools.
  2. Zero ELT tools are evolving to focus more on use-case specialization rather than functional grounds, leading to a trade-off between stack complexity and having the best tool for the job.
  3. Zero ELT tools, while promising in simplifying processes, may create data silos, lack interoperability with other tools, and bring about stack complexity issues.
Conspirador Norteño 36 implied HN points 28 Nov 24
  1. Handle squatting is when people register social media handles to sell them later. Even though Bluesky allows custom domain names as handles, some still try to squat.
  2. Buying account names is risky and usually a bad idea. It's better to create your own accounts instead of getting them from spammers.
  3. Some recent accounts on Bluesky show repetitive bios and were created in batches, indicating possible spam activity. One such account even changed its bio to seem more legitimate.
inexactscience 39 implied HN points 27 Mar 23
  1. Running Coibion-Gorodnichenko regressions with individual data can lead to misleading results. It's important to use appropriate data types to avoid confusion in the findings.
  2. Individual forecasts tend to produce negative results compared to positive results in average forecasts. This means that the insights from these regressions can differ significantly based on the data used.
  3. The methodology is sensitive to noise and measurement errors, which can skew results. Researchers need to be cautious and robust in their approach to ensure accurate interpretations.
Sustainability by numbers 75 implied HN points 19 Mar 24
  1. American households primarily use electricity for heating, cooling, and controlling humidity.
  2. Future challenges in energy demand will revolve around balancing supply and demand, particularly for temperature control like heating and cooling.
  3. Electricity consumption is dominated by heating, cooling, and humidity control in households, highlighting the importance of efficient solutions in this area.
The Security Industry 10 implied HN points 25 Jul 25
  1. At Black Hat 2025, there will be 307 exhibitors focusing on cybersecurity. This event gives you a chance to meet many of the top vendors in the industry.
  2. These cybersecurity vendors have received over $43 billion in funding, showing the industry's rapid growth and strong investment interest.
  3. Despite global challenges, the number of exhibitors remains steady compared to last year. This indicates that companies still want to participate and showcase their solutions.
Jakob Nielsen on UX 27 implied HN points 30 Jan 25
  1. DeepSeek's AI model is cheaper and uses a lot less computing power than other big models, but it still performs well. This shows smaller models can be very competitive.
  2. Investments in AI are expected to keep growing, even with cheaper models available. Companies will still spend billions to advance AI technology and achieve superintelligence.
  3. As AI gets cheaper, more people will use it and businesses will likely spend more on AI services. The demand for AI will increase as it becomes more accessible.
davidj.substack 35 implied HN points 18 Nov 24
  1. Taking risks is a natural part of business. Employees at all levels face risks, and their roles should help manage those risks effectively.
  2. Data teams need to engage with business risks and help optimize rewards. Building data infrastructure should only be a means to support this goal.
  3. Not everyone is suited for risk-taking roles in the private sector. Some people may excel at politics but fail to deliver real results, which leads to inefficiencies in recruitment.
The Security Industry 11 implied HN points 03 Jul 25
  1. The Cyber 150 list includes cybersecurity companies with between 50 to 500 employees, showcasing those on the rise before they grow too big.
  2. Funding is flowing into these companies, with some receiving over $100 million, totaling around $2.3 billion in the first half of 2025 alone.
  3. Companies that grow past 500 employees or fail to grow can graduate from or drop off the Cyber 150 list, highlighting their changing status in the industry.
Askwhy: UX Research, Product Management, Design & Careers 33 implied HN points 27 Nov 24
  1. Always start with a clear hypothesis when analyzing data. This helps focus your research and prevents getting lost in too much information.
  2. Use a mix of qualitative and quantitative data for a better understanding. This means looking at both numbers and user feedback to get the full picture.
  3. Document your analysis process carefully. This helps others understand your findings and allows for better collaboration in the future.
Conspirador Norteño 36 implied HN points 02 Nov 24
  1. Community Notes on the X platform use a unique voting system to check facts, requiring a mix of helpful ratings. This makes it harder to manipulate which information is shown.
  2. Recent voting patterns show large bursts of upvotes or downvotes after political posts, often favoring right-leaning perspectives. This suggests some users might be trying to game the system.
  3. Out of many notes reviewed, most aimed to correct or add context to political content. While some notes were rated 'helpful,' others still need more varied ratings to be visible.
Rod’s Blog 19 implied HN points 10 Oct 23
  1. Zero-day exploits are dangerous because they exploit unknown software vulnerabilities and can have severe consequences like data breaches and system disruptions.
  2. To protect against zero-day exploits, organizations can monitor reported vulnerabilities, install next-generation antivirus solutions, perform rigorous patch management, segment networks with firewalls, and deploy advanced endpoint protection solutions.
  3. Microsoft Sentinel, a cloud-native SIEM solution, can help organizations protect against zero-day exploits by collecting data at cloud scale, detecting threats with analytics and intelligence, and investigating and responding with automation and orchestration.
Rod’s Blog 19 implied HN points 31 May 23
  1. Using the count operator in KQL can help understand the overall impact of a situation by providing the exact number of occurrences of a specific event or data in a table.
  2. The count operator syntax is simple, with just the table name followed by the count operator, making it easy to implement in queries.
  3. Adding the count operator to queries can significantly enhance their impact by providing summarized, relevant data instead of rows of information to manually sift through.
Rod’s Blog 19 implied HN points 31 May 23
  1. The Where Operator in KQL is essential for filtering and retrieving exact, actionable data, improving query performance.
  2. When learning KQL, it's beneficial to type out queries character-by-character to solidify new knowledge.
  3. Consider using the KQL Playground as a learning environment to avoid frustrations with example queries not showing results.
Magid and Co 19 implied HN points 12 Jun 23
  1. This post provides data on Series A deals done in the last week.
  2. The information covers Series A deals worldwide (excluding China) where companies raised over $5M and are not focused on therapeutics.
  3. Readers can subscribe for free to receive new posts and support the author's work.
Brain Lenses 19 implied HN points 07 Mar 23
  1. A trial balloon is a test of messaging or direction to gauge public response.
  2. Using trial balloons can help predict reactions to potential decisions.
  3. Trial balloons are widely used in politics, business, and other areas to shape public opinion.
Rod’s Blog 19 implied HN points 31 May 23
  1. Understanding the table schema in KQL is vital as it helps in finding data in an organized manner with the use of columns and types.
  2. KQL column types are basic, time, and complex, and knowing them alters the query approach for specific columns.
  3. The UI in KQL provides shortcuts for querying tables, expanding tables to view schema, using functions like stored procedures, and filtering data columns.