The hottest Data Analysis Substack posts right now

And their main takeaways

Introducing The Shake V2

The Shake • 137 implied HN points • 26 Mar 23

The Shake V2 is a brand new version of The Shake that has officially launched.
The Shake is now more than just a newsletter and has evolved into a data provider, resource hub, and product lab.
The Shake V2 will continue to offer on-chain analysis, interactive educational tools, and expand into the greater DWeb ecosystem.

September 20, 2023 - What this Week's Data Tell Us

Frank’s Alabama COVID Newsletter • 137 implied HN points • 20 Sep 23

🏥 Health & Wellness COVID-19 Vaccination Data Analysis Public Health Prevention

Florida and Arkansas have hospitalization rates higher than Alabama's due to lower vaccination rates.
Nationwide hospitalizations for Covid-19 have decreased compared to previous years.
Expired at-home Covid-19 test kits may still provide reliable results, but it's better to check for extended expiration dates or get a new test.

GroupBy #30: Uber- How LedgerStore Supports Trillions of Indexes, Composable Data Systems: Lessons from Apache Calcite Success

VuTrinh. • 39 implied HN points • 09 Apr 24

🕹 Technology Data Engineering Data Analysis Software Development Cloud Computing Machine Learning

LedgerStore at Uber can handle trillions of indexes, making it a powerful tool for managing large-scale data efficiently.
Apache Calcite helps build flexible data systems with strong query optimization features, which are vital for many data applications.
Spotify's data platform plays a critical role in their operations, guiding how to build effective data systems in organizations.

The Equation That Outsmarts AI: y=mx+b

Shrek's Substack • 4 HN points • 19 Aug 24

🕹 Technology AI Machine Learning Education Mathematics Data Analysis

The way you ask questions and set the model's temperature can really affect how well AI solves math problems. Clear prompts and specific instructions can help improve its accuracy.
AI like GPT-4o struggles with big numbers and can make mistakes about half the time when calculating linear equations. It works better with smaller numbers.
It's important to be careful when using AI for math, especially in education. Using other tools to double-check results can help avoid mistakes.

The KQL Mysteries: Chapter 2

Rod’s Blog • 99 implied HN points • 04 Dec 23

🕹 Technology Cybersecurity Data Analysis Network Security

Jon and Sofia used KQL queries to identify and isolate an infected computer in the finance department.
The malware was discovered disguised as a legitimate application, hidden in the Recycle Bin to avoid detection.
Jon and Sofia's discovery of the global financial breach hints at a larger, more sinister threat by a group known as Night Princess.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Math for Software Engineering[Math Mondays]

Technology Made Simple • 139 implied HN points • 21 Mar 23

🕹 Technology Mathematics Software Engineering Data Analysis Algorithms Machine Learning

Linear Algebra is crucial for software engineers, especially for operations involving vector and matrix operations. Understanding the basics is key for most developers.
Probability and Statistics play a significant role in analyzing data, and even non-AI professionals can benefit from grasping concepts like causal inference. Focus on foundational principles before diving deeper.
Calculus, though important, may not be essential for all software engineers. Studying up to Calc-2 is generally adequate, as it appears in various other topics.

E8 - 🤖 AI Integration for Faster, Better Product Development

The Product Channel By Sid Saladi • 6 implied HN points • 29 Dec 24

🕹 Technology AI Integration Product Development Data Analysis Market research Innovation

AI can help improve product development by analyzing customer feedback and identifying what users want. Using AI for market research can spot new opportunities and gaps in the market.
Integrating AI into decision-making processes, like demand forecasting and risk assessment, can save time and resources. This way, product managers can make smarter choices about what to build.
AI makes the design and development phases faster and more efficient. It can quickly create prototypes and help optimize engineering tasks, leading to quicker product launches.

The KQL Mysteries: Chapter 1

Rod’s Blog • 99 implied HN points • 27 Nov 23

🕹 Technology Cybersecurity Data Analysis Networking

KQL's search operator is a powerful tool for finding potential threats in a company's data environment.
Using specific queries like filtering by tables and applying operators like 'has' can help pinpoint suspicious activities in data.
Collaborating with trusted teammates is crucial in verifying and responding to potential cybersecurity threats promptly.

Microsoft Sentinel SOC 101: Detecting and Mitigating Spear Phishing with Microsoft Sentinel

Rod’s Blog • 59 implied HN points • 12 Feb 24

🕹 Technology Cybersecurity Cloud Computing Data Analysis Automation Best Practices

Spear phishing is a serious cyber-attack that targets specific individuals or organizations. Microsoft Sentinel's tools can help detect and prevent these types of threats.
Microsoft Sentinel allows for the creation of custom analytics rules based on KQL queries to identify potential spear phishing activities. This helps in early detection of threats.
Automation and playbooks in Microsoft Sentinel enable immediate responses like blocking URLs or initiating password resets upon detecting a spear phishing attempt.

Data Storytelling UN Food Security Data With Python: From CSV to Data Visualization

Data at Depth • 59 implied HN points • 11 Feb 24

🕹 Technology Data Visualization Data Analysis

Data storytelling is a powerful tool for communicating development data to the world.
Understanding global food security is essential for creating effective policies to ensure everyone has enough food.
Consider becoming a subscriber to support the author's work and access more insightful content.

Against using Nicholas Cage movies to teach correlation and causation

Unconfusion • 39 implied HN points • 31 Mar 24

🚌 Education Statistics Pedagogy Cognition Research Methodology Data Analysis

Using silly examples to teach correlation and causation can let students off too easily. It's important to challenge them with examples that make them think.
Most teaching examples use time-series data, but many real-world correlations don't fit this model. We should focus on typical variations found in research.
Mixing random correlations with spurious connections creates confusion. Teaching should clearly explain how confounders can lead to false relationships.

Illumina Constellation, 2x500bp MiSeq i100 Reads

ASeq Newsletter • 14 implied HN points • 13 Nov 24

🔬 Science Genetics Biotechnology Medical Research Data Analysis

Illumina might be able to increase its read length to 1Kb, which is a good sign for better sequencing.
There could be a new way to use sequencers where you just add DNA and it handles the library prep itself.
This new method may make Illumina devices more appealing compared to other platforms for various uses.

Every shooting at a school in May 2024

School Shooting Data Analysis and Reports • 19 implied HN points • 01 Jun 24

📰 News Gun Violence School Shootings Podcasts Data Analysis Security Measures

The number of school shooting incidents in May 2024 continues a rising trend over the last 3 years, but the increase from 2023 to 2024 is not exponential.
The number of victims in May 2024 is higher compared to 2023 but notably lower than in 2022, when a tragic incident in Uvalde involved multiple fatalities and injuries.
In May 2024, shootings often occurred at night and during school events like graduations, emphasizing the importance of proactive policing, as incidents frequently happened during unauthorized post-graduation parties on campus.

Microsoft Sentinel SOC 101: How to Detect and Mitigate SQL Injection Attacks with Microsoft Sentinel

Rod’s Blog • 119 implied HN points • 27 Sep 23

🕹 Technology Cybersecurity Cloud Computing Data Analysis Threat Detection Incident Response

SQL injection attacks exploit vulnerabilities in web applications to access sensitive data.
Microsoft Sentinel uses advanced analytics rules and integrates with Defender for SQL to detect and respond to SQL injection attacks effectively.
Organizations can benefit from automated incident response, threat hunting, and incident investigation capabilities in Microsoft Sentinel to mitigate the impact of SQL injection attacks.

In Defense of Human Senses

Cybernetic Forests • 119 implied HN points • 30 Apr 23

🕹 Technology Artificial Intelligence Human experience Data Analysis AI Art

Human perception of images is deeply intertwined with personal experiences and emotions, shaping how images are interpreted and associated with memories.
Creating art involves a fusion of individual lived experiences and learned skills over time, contrasting with the quick generation of images by AI devoid of personal experiences.
AI images are structured based on categories and datasets, emphasizing the need for artists to negotiate these categories and infuse individualized interpretations into the process.

Microsoft Sentinel SOC 101: How to Detect and Mitigate Inactive Account Sign-ins with Microsoft Sentinel

Rod’s Blog • 59 implied HN points • 05 Feb 24

🕹 Technology Security Detection Mitigation Identity Management Data Analysis

Microsoft Sentinel helps in detecting and mitigating inactive account sign-ins by collecting and analyzing sign-in logs from Microsoft Entra ID using the Kusto Query Language.
To mitigate inactive account sign-ins, actions include investigating the source, blocking or disabling the account, resetting credentials, and educating users on security best practices.
Best practices for managing inactive accounts in Microsoft Entra ID include defining a policy for account lifecycle, implementing provisioning and deprovisioning processes, monitoring account activity, and educating users.

Tanks for the memories

Logging the World • 179 implied HN points • 11 Dec 22

🔬 Science Statistics Probability Military History Cybersecurity Data Analysis

In a raffle with a large number of tickets, the biggest number drawn out starts to show some structure as more tickets are selected.
By looking at the maximum value drawn in a raffle, one can estimate the total number of tickets, a concept applied in statistics like the German tank problem.
Sequential numbering schemes can reveal interesting insights, as seen in situations like the Skripal poisonings and Novak Djokovic's COVID test, highlighting the importance of careful numbering practices.

Creating Individualised Chess Engines

Chess Engine Lab • 39 implied HN points • 26 Mar 24

🕹 Technology Artificial Intelligence Data Analysis Programming Chess Machine Learning

An engine called Maia focused on predicting human moves accurately instead of just being the strongest in chess, resulting in a more meaningful impact, especially for club-level players.
By individualizing chess engines to predict moves of specific players, accuracy can be increased by 4-5% and players can be identified with 98% accuracy from a pool of 400, based on their game patterns.
Identifying players through their mistakes is a crucial aspect - as mistakes are unique to individual players, understanding and fixing them can greatly aid in chess improvement.

The State Of Gun Violence In The State of New York

Jeff-alytics • 117 implied HN points • 24 Jul 23

🇺🇸 U.S. Politics Gun Violence Data Analysis

Shooting data in New York shows a positive trend with declining gun violence in the state.
Gun violence trends in New York City have also seen a significant decrease in shooting victims.
The available data from New York provides valuable insights, but it's uncertain if the declining gun violence trend will be mirrored in other states.

Pt 1: Understanding $OP Liquidity Incentives

DeFi Weekly • 117 implied HN points • 18 May 23

🔮 Crypto Data Analysis Incentives Product Development

Airdrops may have low retention rates, making incentives ineffective
Ensuring rewards do not exceed costs is crucial for sustainable growth
Long-term impact of incentives should focus on real usage and product quality

Mass Value Report for March

Planetocracy • 117 implied HN points • 31 Mar 23

🔬 Science Space Exploration Data Analysis Rocket Technology

The analysis focuses on SpaceX Falcon 9 launches and its mass-to-orbit capability
SpaceX is increasing its flight rate through reuse and adding more boosters to its fleet
Future analysis will include data on Falcon 9, mass estimates for beyond Earth orbits, and the transition to Starship for maintaining pace

2024 DORA Report

Engineering Enablement • 15 implied HN points • 30 Oct 24

🕹 Technology Software Development DevOps Artificial Intelligence Data Analysis Engineering

Using AI tools can actually make software delivery worse, as they lead to larger code changes that are riskier. This is surprising because many people think AI would improve coding efficiency.
Software delivery performance indicators are becoming more independent from each other. This year's report shows some unexpected trends, like medium performance groups having fewer failures than high performance groups.
To boost productivity, companies should focus on creating user-friendly internal platforms for developers. It's important for leaders to understand their team's needs and provide clear support to improve overall performance.

Novaxia vs Bigpharmia

Logging the World • 199 implied HN points • 04 Nov 22

🔬 Science Health Graphs Vaccines Modeling Data Analysis

Understand the impact of vaccines on disease spread: Novaxia and Bigpharmia are examples of two scenarios showing how vaccines can affect the spread of a disease differently.
Graphs help visualize data trends: Using different types of graphs can show how disease spread changes over time and the effectiveness of interventions like vaccines.
Consider the importance of logarithmic scales: Logarithmic scales can provide a different perspective on data trends, allowing for better understanding of the impact of interventions like vaccines.

5 learnings from making UX less annoying for 20M+ users

CommandBlogue • 19 implied HN points • 28 May 24

🕹 Technology User Experience Product Design Data Analysis Software Development Customer Engagement

Users don't easily forget bad experiences, like annoying pop-ups. Once trust is lost, it's hard to regain, so it's important to be careful with how you present information to them.
Beautiful design attracts users and keeps them engaged. Nowadays, a nice look matters just as much as solving a problem, since many products are similar.
Users prefer having multiple options. If they feel like they don't need help at first, they might still end up needing it later, so providing a way for them to revisit guides is key.

The PacBio Benchtop - The Vega

ASeq Newsletter • 14 implied HN points • 07 Nov 24

🕹 Technology Biotechnology Genomics Instrumentation Data Analysis Healthcare

The new PacBio Vega is a benchtop DNA sequencer that provides 60Gb of data in just 24 hours and costs $169,000. There's also a lower cost option for labs that need less capacity.
When compared to Oxford Nanopore's PromethION, the Vega appears to deliver better accuracy and more consistent results, making it a suitable choice for smaller labs needing reliable output.
The launch of the Vega could help PacBio increase revenue and broaden its market presence, as it appeals to labs that want access to high-quality sequencing without breaking the bank.

From Lionesses to Flying Wingers: Revealing the New Heroes of The Beautiful Game

Workforce Futurist by Andy Spence • 244 implied HN points • 16 Aug 23

🎾 Sports Football Data Analysis Well-being

Tracking data in football helps with performance improvement and injury prevention.
Analyzing skill ecosystems is crucial in talent scouting, even in the workplace.
Using data to empower individuals to analyze their performance can lead to better organizational outcomes.

Last word on LNT

Gordian Knot News • 139 implied HN points • 14 Jan 24

🔬 Science Research Data Analysis

Linear No-Threshold (LNT) model in radiation exposure prediction is criticized for being inaccurate.
Comparing different dose rate profiles with the same total dose is crucial to understanding radiation harm models.
Dose rate is a critical factor in DNA damage repair, impacting cancer incidence predictions in radiation exposure.

Concise Chain-of-Thought (CCoT) Prompting

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 59 implied HN points • 24 Jan 24

🕹 Technology AI Machine Learning Prompt engineering Natural Language Data Analysis

Concise Chain-of-Thought (CCoT) prompting helps make AI responses shorter and faster. This means you save on costs and get quicker answers.
Using CCoT, the response length can be reduced by almost 50%, but it can lead to lower performance in math problems. So, it’s a trade-off between speed and accuracy.
For cost-saving in AI, focusing on reducing the number of output tokens is key since they are generally more expensive. CCoT is one way to achieve this without sacrificing performance too much.

2023 Wrapped: a year of sickness and health

art fish intelligence • 58 implied HN points • 21 Jan 24

🏥 Health & Wellness Data Analysis Personal Health Women's Health Self-reflection Healthy living

In 2023, the author analyzed their patterns of sickness and health through data collected from sources like Google Maps location history and Apple Health.
The analysis revealed insights such as spending almost half of the year unwell and correlations between health factors like exercise and location.
Key findings included the impact of menstrual cycle on sickness, the importance of rest during certain phases, and the value of personal data exploration for health insights.

Datasette enrichments and Datasette comments

Datasette Newsletter • 78 implied HN points • 04 Dec 23

🕹 Technology Data Analysis Collaboration Plugins

New Datasette enrichments feature enables data modification and enhancement through plugins.
Datasette comments allow collaboration on data analysis by attaching and replying to comments on any row of data.
Enrichments and comments are now available on Datasette Cloud for collaborative data analysis.

Freddie Mac House Price Index Increased in September; Up 3.6% Year-over-year

CalculatedRisk Newsletter • 14 implied HN points • 31 Oct 24

💰 Finance Real Estate Market Trends Investment Economics Data Analysis

The Freddie Mac House Price Index went up by 3.6% compared to last year. This shows that house prices are on the rise.
Many cities in Florida are struggling with real estate; 17 out of the 30 worst performing cities are located there.
The Freddie Mac index is based on specific loans and includes sales data to track house prices accurately.

Driving Change: 8 Learnings.

The Future Does Not Fit In The Containers Of The Past • 113 implied HN points • 28 Jan 24

💼 Business Change Management Leadership Culture Incentives Data Analysis

Change is difficult but necessary to avoid irrelevance.
Understanding human emotions and incentives is key to driving change.
Reducing fear, addressing company culture, and inspiring leadership are crucial in navigating change.

Horrors Of The Theoretical Maximum

ASeq Newsletter • 14 implied HN points • 30 Oct 24

🕹 Technology Bioinformatics Data Analysis Genomics Research Methods

Vendors sometimes quote theoretical maximums for data output, which can be misleading. It's important to understand that these numbers might not reflect actual performance.
Comparing different technologies can be complicated because they have different specifications and capabilities. Each technology, like PacBio, Oxford Nanopore, and Illumina, has its unique strengths and limitations.
In the real world, the difference between what is theoretically possible and what is actually achieved can be significant. This means we should be cautious and not rely solely on theoretical figures.

How to Get UEBA Costs for Microsoft Sentinel

Rod’s Blog • 99 implied HN points • 09 Oct 23

🕹 Technology IT Management Cost Analysis Data Analysis

UEBA costs for Microsoft Sentinel are based on the amount of data analyzed and can vary based on factors like the tables used.
A KQL query can help estimate and break down the costs for UEBA in Microsoft Sentinel.
By utilizing the provided KQL query, you can calculate and observe the estimated costs for the UEBA solution within Microsoft Sentinel.

Microsoft Sentinel SOC 101: How to Detect and Mitigate Phishing Attacks with Microsoft Sentinel

Rod’s Blog • 99 implied HN points • 19 Sep 23

🕹 Technology Cybersecurity Data Analysis Automation Threat Detection Cloud Computing

Phishing attacks are a significant threat that targets human vulnerabilities and can lead to identity theft or financial fraud.
Organizations can mitigate phishing attacks by adopting a 'defense in depth' strategy that includes user education, email filtering, and incident response planning.
Utilizing Microsoft Sentinel, Kusto Query Language (KQL), and integrating with Microsoft 365 Threat Protection can enhance proactive threat hunting and response capabilities against phishing attacks.

Getting GEO Information for IP Addresses without Using a Microsoft Sentinel Playbook

Rod’s Blog • 99 implied HN points • 06 Jun 23

🕹 Technology Data Analysis

A Kusto function called geo_info_from_ip_address() enables retrieving geolocation details for IP addresses without relying on third-party APIs.
This function can gather Country, State, City, Latitude, and Longitude info for both IPv4 and IPv6 addresses.
While IP-API.com offers additional details like IP management entity and mobile device indication, they may not always be necessary.

Group Assignment in a Frontend vs. Backend A/B Test

Sarah's Newsletter • 99 implied HN points • 19 Sep 23

🕹 Technology Experimentation UI/UX Data Analysis Frontend Backend

Decide which product feature should be behind a test, read the results of an A/B test, prioritize features based on data
Understand that frontend tests focus on user experience and user groups in the browser, while backend tests require business logic and user assignment in the database
Choose frontend user group assignment for speed and simplicity via firing analytics events; go for backend assignment for more complete data by storing user assignment in a database model

Understanding Variation in PTEN

Holodoxa • 99 implied HN points • 07 Sep 23

🔬 Science Genetics Research Data Analysis Cancer

Understanding genomic data variation and its effect is a significant challenge in genetic research.
Deep Mutational Scanning (DMS) and Multiplex Assays of Variant Effects (MAVEs) are crucial methods to study how mutations impact protein function.
MAVE data on PTEN has provided insights into its function, stability, and clinical implications, aiding in the understanding of PTEN variation.

Joe's Nerdy Rants #3

Joe Reis • 98 implied HN points • 03 Jun 23

🕹 Technology AI Data Analysis Software Engineering Podcasts Events

In many companies, there is a divide between software engineering and data teams.
Data is becoming more integrated into applications, blurring the lines between data and software.
The divide between software and data teams will eventually disappear as data becomes more critical to businesses.

Practical AI is a Big Tent: Five Ideas for CEOs, Leaders, and Investors

Mike Talks AI • 98 implied HN points • 27 Aug 23

🕹 Technology AI Algorithms Data Analysis Team Building

Practical AI encompasses various machine learning algorithms and techniques, including optimization and Operations Research.
The concept of Practical AI allows for the inclusion of both established and emerging approaches in the field.
To effectively solve real-world problems, AI leaders need a diverse set of skills and expertise, and must understand the strengths and weaknesses of different algorithms.