The hottest Data Analysis Substack posts right now

And their main takeaways
Category
Top Technology Topics
Data at Depth 19 implied HN points 08 Jun 23
  1. Data visualization skills are crucial for modern data analysis, and mapping skills are a valuable addition to visualization abilities.
  2. Python libraries like Folium, Plotly, and Dash can be used for effective display of data.
  3. Interactive mapping tutorials using Python can help in visualizing US education trends with tools like Folium, Plotly, and Dash.
Wooly's Post Repository 19 implied HN points 23 Jul 23
  1. The data on housing prices and construction can be confusing and counterintuitive, leading to difficulties in drawing clear conclusions.
  2. YIMBY goals require a significant amount of construction to impact housing prices, but achieving such high construction rates can be challenging.
  3. Confidence in real estate research should be lowered due to the complexity and potential errors in the data, making it important to approach conclusions with caution.
Rod’s Blog 19 implied HN points 31 May 23
  1. Understanding the Kusto Query Language (KQL) is essential for utilizing tools like Microsoft Sentinel to monitor security and detect threats.
  2. Building your first Microsoft Sentinel Analytics Rule involves filtering data, summarizing information, and assigning entities for investigations.
  3. Creating a Watchlist in Microsoft Sentinel can enhance the intelligence of your KQL query by filtering out trusted users and capturing potential threats more accurately.
Rod’s Blog 19 implied HN points 31 May 23
  1. The Join operator in KQL is used to merge rows from two tables by matching values of specified columns.
  2. Using different flavors of Join, like inner, leftouter, rightouter, and fullouter, can change how data is displayed in the results.
  3. To practice and understand the Join operator better, examples can be tested in the KQL Playground or explored in advanced tutorials like 'Addicted to KQL'.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
Rod’s Blog 19 implied HN points 31 May 23
  1. The Union operator in KQL allows you to combine data from multiple tables to display all rows together, while the Join operator is used for more specific results by matching column values of two tables.
  2. Union in KQL supports wildcard usage to merge multiple tables and can be used to combine tables from different data sources like Log Analytics Workspaces.
  3. In Microsoft security tools like Microsoft Sentinel and Defender, the Join operator is commonly used for creating Analytics Rules for specific results, while Union is useful for advanced hunting tasks.
Rod’s Blog 19 implied HN points 31 May 23
  1. The Let statement in KQL allows you to create variables that can be used throughout the query, aiding in better query performance.
  2. Let statements can be used to create variables either from scratch, from existing data, or from Microsoft Sentinel Watchlists.
  3. It's important to properly finalize the Let statement with a semicolon to ensure the variable is stored correctly for query execution.
The Grey Matter 19 implied HN points 01 Aug 23
  1. The Dunning-Kruger effect is likely a statistical artifact, not a genuine psychological phenomenon
  2. The popular interpretation of the Dunning-Kruger effect as 'the dumbest people think they're the smartest' is a distortion
  3. Replication of the Dunning-Kruger effect through simulation suggests it may not be a real psychological finding
Rod’s Blog 19 implied HN points 31 May 23
  1. The Distinct operator in KQL helps in delivering results based on a distinct combination of provided columns.
  2. Distinct can be used to get precise results and is essential for tasks like security hunting operations.
  3. By combining Distinct with other operators like Summarize, you can manipulate data to show specific insights and counts in KQL.
Rod’s Blog 19 implied HN points 31 May 23
  1. Using the Project operator in KQL allows selecting specific data columns to display, providing efficiency for security analysis.
  2. The Project operator variants like Project-away, Project-keep, Project-rename, and Project-reorder offer additional functionalities like excluding columns, renaming headers, and reordering columns in query results.
  3. Understanding and utilizing these Project operator variants can enhance data visualization and streamline data analysis processes in Kusto Query Language.
Rod’s Blog 19 implied HN points 31 May 23
  1. Custom data views in KQL are crucial for tailoring information to each environment's unique requirements for security and operations.
  2. The Extend operator in KQL allows users to create custom columns in real-time for query results, enhancing data analysis and presentation.
  3. By using the Extend operator, it's possible to generate calculated columns, append them to results, and combine existing data to display meaningful information in KQL queries.
Rod’s Blog 19 implied HN points 31 May 23
  1. The Render operator in KQL allows you to turn data into visualizations like area graphs, bar charts, and pie charts among others.
  2. Using KQL to create visualizations is crucial for tasks like developing dashboards in Microsoft Sentinel, providing real-time insights to security teams.
  3. Learning to transform data into graphs and charts can make information more engaging and appealing, especially for visual or hands-on learners.
Steve Kirsch's newsletter 9 implied HN points 15 Jul 25
  1. Grok 4 acknowledged that the COVID vaccines may have caused more harm than good. It recognized that the data showed little benefit from the vaccines during critical periods.
  2. The conversation highlighted that despite the claims of safety, there is significant evidence pointing to increased mortality rates among vaccinated individuals after booster shots.
  3. Many experts and organizations, like the CDC, have been criticized for not engaging with the data that suggests harm from the vaccines, leading to concerns about transparency and willingness to discuss the issue.
The Security Industry 25 implied HN points 03 Jan 25
  1. In 2024, investments in cybersecurity reached an impressive $16.1 billion, which is a big jump of 60% from the previous year.
  2. A total of 432 cybersecurity companies received funding, with many rounds exceeding $100 million, showing strong interest in the industry.
  3. Looking ahead, experts believe that funding in 2025 could surpass 2024, indicating a growing demand for tech and security services.
Jakob Nielsen on UX 21 implied HN points 13 Feb 25
  1. AI models are getting better at reducing false information, called hallucinations. This means they are less likely to make things up over time.
  2. Bigger AI models generally make fewer mistakes. As AI technology improves, we can expect even fewer errors from future models.
  3. While waiting for better AI, improving user experience can help users spot and double-check misleading information, making it easier to trust AI outputs.
Engineering Enablement 21 implied HN points 05 Feb 25
  1. Metrics for developers should help improve their work experience, not just measure their output. Goodhart's Law reminds us that once metrics are tied to rewards, they can become misleading.
  2. Developer experience is more about effectiveness than happiness. Measuring how developers feel needs to focus on the frustrations they face, and not just on making them comfortable.
  3. Using benchmarks is important but context is key. Just like medical tests, numbers need interpretation to make sense; comparing different teams requires understanding their unique challenges.
Harnessing the Power of Nutrients 79 implied HN points 19 Feb 22
  1. Understanding the impact of COVID vaccines on all-cause mortality is crucial for assessing their risk versus reward.
  2. Manipulation of data definitions can lead to misinterpretation of findings, emphasizing the importance of transparent reporting.
  3. All-cause mortality is a key metric to evaluate, but other factors like long-term complications and individual risk profiles should also be considered.
Harnessing the Power of Nutrients 79 implied HN points 19 Feb 22
  1. Some studies suggest natural immunity from past infection can be as good or even better than full vaccination at protecting against COVID-19 infection.
  2. The new CDC study does not directly compare infection risk between vaccinated and naturally immune populations, but instead looks at hospitalized individuals with COVID-like symptoms.
  3. The study raises questions about the effectiveness of vaccines in preventing hospitalization for COVID-like illness and emphasizes the importance of examining data carefully to draw meaningful conclusions.
The Open-Source Blueprint 5 HN points 04 Apr 24
  1. Python has a strong ecosystem for data-related libraries and first-party clients for databases, making it a good choice for data tools.
  2. Javascript also has a large ecosystem of data libraries, first-party clients for major databases, and excellent support for building frontend experiences.
  3. Choosing between Python and JavaScript for building data tools depends on the project requirements and the potential need for incorporating web frameworks.
Sarah's Newsletter 59 implied HN points 29 Mar 22
  1. Python's popularity is due to its ease of use and readability, making it one of the top 5 most popular languages.
  2. Abstractions like AWS Lambda can be efficient but may become harmful if not managed properly, leading to issues like security and cost concerns.
  3. Using SQL GUI tools for data aggregation can speed up the process but may lead to inaccurate results and wrong decisions due to lack of testing and QA processes.
Data Thoughts 39 implied HN points 21 Jan 23
  1. Data quality is all about how useful the data is for the specific task at hand. What is considered high quality in one situation might not be in another.
  2. There are several key aspects of data quality, including accuracy, completeness, consistency, and uniqueness. Each of these factors helps to determine how reliable the data is.
  3. Improving data quality involves preventing errors, detecting them when they occur, and repairing them. It's about making sure the data is accurate and useful over time.
Dev Interrupted 18 implied HN points 18 Feb 25
  1. AI models sometimes miss important details, like humans do. For example, they may overlook obvious outliers in data visualizations.
  2. Banks are changing their hiring tactics to attract tech talent by offering more flexibility and modern tools. This helps them stay competitive against tech firms.
  3. In a world where AI is growing, the ability to focus deeply is becoming more valuable than just knowing how to use AI tools. Staying focused can help engineers excel.
Steve Kirsch's newsletter 9 implied HN points 11 Jun 25
  1. Time series graphs can show if a vaccine is safe or not by plotting daily deaths after vaccination. A safe vaccine should show a flat line after the initial period.
  2. Current data for COVID vaccines shows increasing mortality rates after vaccination, which suggests they may not be safe. Many reports don’t show this data.
  3. The medical community often ignores clear signs of vaccine risks, despite evidence appearing in graphs and reports, leading to frustration among those who analyze the data.
Cremieux Recueil 66 implied HN points 07 Dec 23
  1. The post collects and acknowledges errors made by the author
  2. Errors mentioned include coding mistakes, misconceptions, and incorrect claims
  3. Author provides updates and corrections for the errors identified
davidj.substack 95 implied HN points 10 May 23
  1. Excel is still widely used in the data space for its ease of use and versatility
  2. Data teams aim to reduce Excel use due to limitations such as scalability and version control issues
  3. New tools like Count and Equals are emerging to address Excel limitations and improve collaboration in data analysis
Graphs For Science 52 implied HN points 24 Feb 24
  1. k-Core Decomposition is a way to explore the structure of networks by identifying the largest subgraph where every node has a specified minimum degree.
  2. The k-Core Decomposition algorithm involves recursively removing nodes with degrees lower than a specified threshold to reveal the k-core and k-shell structure of a graph.
  3. The degree of a node in a k-core doesn't have an upper limit, providing unique insights into network connectivity beyond traditional degree-based analysis.
CalculatedRisk Newsletter 23 implied HN points 02 Dec 24
  1. The Freddie Mac House Price Index went up 3.7% compared to last year, showing a steady increase in home prices.
  2. Florida has many cities experiencing large price declines, with 18 out of the top 35 cities affected.
  3. If more houses are available for sale and sales remain low, we might see a slowdown in home price growth early next year.
HackBoyFly 1 HN point 17 Jul 24
  1. Using a Monte Carlo Simulation can help estimate a wide range of potential outcomes when making investment decisions like buying an apartment in Stockholm
  2. Historical data, mean annual returns, and standard deviations are crucial inputs for simulations to introduce randomness and variability to financial projections
  3. Visualizing simulations through charts can provide insights on possible outcomes, such as optimistic and pessimistic scenarios, aiding in making informed decisions about investments
Ill-Defined Space 19 implied HN points 10 Jan 25
  1. In 2024, 2,807 spacecraft were deployed globally, which is about 1.5% less than 2023. Despite the decrease in the number of deployments, the total mass of these spacecraft actually increased by 28%.
  2. SpaceX was the leading company, responsible for around 71% of all spacecraft deployed, mainly for its Starlink internet satellites. Other nations and companies started making larger deployments, especially in China.
  3. While the U.S. led global deployments, many countries participated, though the total number of nations involved dropped significantly from 54 in 2023 to just 39 in 2024.
Tigerfeathers! 24 implied HN points 07 Nov 24
  1. Pixxel is developing a fleet of satellites with special cameras that can see details beyond what regular cameras can, helping monitor Earth's health and detect issues like pollution and crop problems.
  2. The founders of Pixxel, Awais and Kshitij, began their journey in college and faced many challenges, including launch failures and funding issues, but they remained determined and adapted their business strategy.
  3. Pixxel aims not just to serve Earth, but eventually wants to use their technology for exploring resources in space, showing how their ambitions go far beyond just satellite imaging.
LatchBio 17 implied HN points 29 Jan 25
  1. There are many open-source tools for biological imaging like Napari, ImageJ, Cellpose, CellProfiler, and Suite2p. Each tool has unique features and helps scientists visualize and analyze complex biological data.
  2. Using these tools, scientists can perform tasks such as tracking embryo development, analyzing protein interactions, segmenting cells, and studying neural activity. This technology makes research more efficient and accurate.
  3. Modern data infrastructure can greatly improve the use of these imaging tools. Centralizing resources, using container templates, and optimizing data transfer enhances research productivity and collaboration among teams.
Steve Kirsch's newsletter 7 implied HN points 26 Jun 25
  1. Czech time series data shows a big increase in deaths after vaccination, suggesting the vaccines might not be safe.
  2. If the vaccines were safe, death rates would stay flat or not increase significantly, but the data shows a clear rise over time.
  3. Health authorities may ignore this data and won't admit they were wrong, which makes it hard for people to trust them.
CodeFaster 72 implied HN points 23 Jul 23
  1. The Unix one-liner uses commands like cat, tac, cut, and less to process a CSV file.
  2. Using 'cat' reads the file, 'tac' prints it in reverse, 'cut' selects specific columns, and 'less' displays data page by page.
  3. This one-liner is handy for quickly examining and navigating through large CSV files in the terminal.