The hottest Data Analysis Substack posts right now

And their main takeaways
Category
Top Technology Topics
Talking to Computers: The Email 0 implied HN points 14 Jun 24
  1. Using synonyms in search helps users find what they need faster. It allows them to use their own words instead of worrying about exact terms.
  2. Creating synonyms can be tricky, but observing how users search can help build a better list. Watching what terms people actually use is more effective than guessing.
  3. While synonyms cover many cases, they struggle with specific long terms. For more complex searches, vector search technology might be a better solution.
Talking to Computers: The Email 0 implied HN points 22 Apr 24
  1. Sometimes, it's okay to have a few irrelevant search results mixed in with the good ones. This balance can help show more options, even if some aren't what you wanted.
  2. Businesses often choose to include a small number of unrelated items in search results. This helps them find a middle ground between showing only perfect matches and potentially missing out on useful items.
  3. In systems like AI, having occasional mistakes or 'hallucinations' can spark creativity. It's about finding the right balance that works for the situation.
machinelearninglibrarian 0 implied HN points 18 Sep 23
  1. Hugging Face's datasets don't have built-in groupby features, but you can use Polars to handle this. You can load datasets with Polars and perform group operations easily.
  2. Polars allows you to work with large datasets efficiently using lazy evaluation. This means you can process data without needing to load everything into memory all at once.
  3. You can visualize data comparisons after grouping by specific columns, making it easier to understand patterns or insights from the data.
Alex's Personal Blog 0 implied HN points 10 Oct 24
  1. September's inflation data showed a 0.2% rise, with the yearly change at 2.4%. This suggests some ongoing economic pressure.
  2. Crunchbase is focusing on AI by enhancing its data tools. They introduced AI-powered search features to improve access to their extensive data.
  3. OpenAI is projected to have significant cash losses but could still become profitable by 2029 with a strong revenue base. The risks of high spending in this sector are considerable.
Coin Metrics' State of the Network 0 implied HN points 22 Oct 24
  1. New metrics help track Bitcoin and Ethereum flows to and from exchanges. This data can show how much people are buying or selling and help understand the market.
  2. There has been an increase in miners sending Bitcoin to exchanges recently. This could be due to them wanting to secure profits before changes in Bitcoin rewards.
  3. Crypto.com is gaining a larger share of the Bitcoin market lately. By looking at trading volumes and flow data, we can tell if market activity is genuine or just fake trades.
Get a weekly roundup of the best Substack posts, by hacker news affinity:
ASeq Newsletter 0 implied HN points 12 Nov 24
  1. The PacBio Vega Chips are similar to the Revio chips, but they provide much less data. This means they might not be as powerful for certain tasks.
  2. The data from the Vega chips is available for analysis, and people can check it out for deeper understanding.
  3. This information is part of a subscription service, which means you can get more insights if you become a paid member.
Martin’s Newsletter 0 implied HN points 17 Sep 24
  1. The best day for submitting new AI research papers tends to be Tuesday. This timing is likely chosen to catch attention after the weekend.
  2. This year has seen fewer exciting advancements in AI-based human synthesis, with technologies being reused rather than creating entirely new concepts.
  3. New research is focusing on better facial expression recognition and human reconstruction from single images, showing promise in areas like understanding micro-emotions.
davidj.substack 0 implied HN points 17 Dec 24
  1. There's a new command called `sqlmesh cube_generate` that helps build models for data analysis. It's designed to make working with data easier for users.
  2. The tool outputs useful information in a structured format, which includes joins and fields for data analysis. This makes it simple to understand how the data connects.
  3. Even if there are challenges with complex data types, the output is still effective and can be enhanced using AI, showing there's room for creativity in data modeling.
A Small, Good Thing 0 implied HN points 30 Dec 24
  1. Many people just want basic monitoring tools that are easy to use and affordable. They care more about practical solutions than getting into complex observability concepts.
  2. There's a balance between reliability, shipping speed, and team well-being that needs to be carefully managed. It's important not to sacrifice too much reliability just to be fast.
  3. The focus should be on delivering a cost-effective way to monitor systems, rather than just aiming for the latest version of observability. It's essential to figure out who will handle the work involved.
Theory A : Visualize Value Investing 0 implied HN points 14 Jan 25
  1. A new trading journal feature helps you see all your open positions in one place. This makes it easier to keep track of different option contracts and their expiration dates.
  2. There's improved bid-ask data with a new system that's more accurate. You can now see where the current price is in relation to your contracts with a color-coded line.
  3. The free access to options data has been extended from 30 days to 180 days. This gives you more time to analyze market trends without needing a paid subscription.
Kartick’s Blog 0 implied HN points 21 Jan 25
  1. Variance helps us understand risk in different jobs. A steady job is low risk, while a startup can be very unpredictable.
  2. The median is a strong way to find a typical value because it's not easily affected by extreme numbers. So, when data is messy, the median usually gives a better answer than the mean.
  3. To get better estimates, look at a lot of data over time. More data usually means less error, helping you make smarter decisions.
Nano Thoughts 0 implied HN points 20 Jan 25
  1. Not all zeros in data mean the same thing. Sometimes, they can indicate something was never there, or other times, they mean something was just missed.
  2. Zero inflation happens when there's lots of data and many readings come back as zero. This can make it hard to understand what's really going on behind those zeros.
  3. There are different methods to deal with zeros in data, like checking if they are real or just unnoticed signals. Choosing the right method is important to get accurate insights.
The Strategy Toolkit 0 implied HN points 27 Jan 25
  1. People expect randomness to seem chaotic, but true randomness can appear ordered. This misunderstanding affects how we perceive things like music playlists.
  2. Users often complain about problems with shuffle algorithms, thinking they should never see clusters of songs from the same artist. But statistically, that can happen and is actually normal.
  3. Our brains are wired to look for patterns, making us think randomness should behave in a way that fits our expectations, rather than how it actually works.
ASeq Newsletter 0 implied HN points 27 Feb 25
  1. Roche is working on new nanopore sequencing technology, focusing on how much the instruments will cost to produce. Understanding these costs is important for the technology's success.
  2. The nanopore sequencing process involves collecting a large amount of data quickly, which means the data rates are extremely high. This could lead to challenges in storing and processing such vast amounts of information.
  3. Since the raw data volume is so large, it's unlikely that most users will store it all. Instead, they will probably need to focus on analyzing only the most crucial information collected.
ASeq Newsletter 0 implied HN points 09 Jun 25
  1. The PromethION flowcell has an average output of about 84Gb per run. This is important for understanding how much data you can expect.
  2. In comparison, the PacBio flowcell seems to produce higher quality data with around 120-150Gb. This could make it a better option for some users.
  3. Cost per gigabyte is lower for PacBio, making it potentially more affordable when analyzing large amounts of data.
Expand Mapping with Mike Morrow 0 implied HN points 14 Jul 25
  1. You can choose how SQL query results are stored in Hex, either in memory or in the database. This affects how quickly you can run follow-up queries.
  2. There are two types of SQL commands in Hex: one that queries directly from the database and another that queries from a local in-memory dataframe. This choice can impact how your data is used.
  3. Hex allows you to chain SQL queries, which makes handling complex tasks easier. However, you need to be aware of where each query pulls data from to avoid surprises.
Expand Mapping with Mike Morrow 0 implied HN points 14 Aug 25
  1. Supervised machine learning helps us understand how inputs relate to outputs, but just because two things move together doesn't mean one causes the other.
  2. To prove something causes another, experiments are the best way, but we can also make educated guesses using causal diagrams, like trees that show how different factors connect.
  3. Machine learning models are great at predictions but aren't designed to show cause and effect; we can use them to help create clearer models for understanding these relationships.
The Healthtech Initiative 0 implied HN points 05 Jan 26
  1. Most people quit their new‑year sleep resolutions almost immediately — 60% stop within 48 hours and the median streak is one day, with under 3% lasting beyond five days.
  2. People who kept up the changes at first actually slept worse short‑term: they went to bed earlier and tracked routines more, yet their time to fall asleep rose to about 26+ minutes.
  3. Trying harder often makes sleep worse, so the common New Year’s resolution approach to ‘optimize’ sleep is counterproductive and needs a different framework.
FutureIQ 0 implied HN points 07 Jan 26
  1. A well-formed two-armed spiral galaxy called Alaknanda was observed at redshift z≈4, meaning we see it as it was about 12 billion years ago — only ~1.5 billion years after the Big Bang.
  2. The galaxy’s mature disk and clear spiral arms so early in cosmic history conflict with current models that predict such structures need about 3–4 billion years to form, so our theories of galaxy formation need revision or expansion.
  3. The discovery relied on deep JWST infrared data, gravitational lensing, and advanced analysis of public datasets, highlighting how modern instruments and open data can enable unexpected breakthroughs.