The hottest Data Substack posts right now

And their main takeaways

(Mostly) Closing The Book On Murder in 2023

Jeff-alytics • 216 implied HN points • 29 Jan 24

Murder rates likely fell by about 12% in over 200 cities in 2023.
Some cities saw an increase in murder, like Topeka, Greensboro, and Shreveport.
The murder trend appeared positive in 2024 with fewer cities showing an increase.

Agent AI: Agentic Applications Are Software Systems With A Foundation Model AI Backbone & Defined Autonomy via Tools

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 05 Aug 24

🕹 Technology AI Software Models Applications Data

Agentic Applications are advanced software systems that use AI models to operate more independently. They can navigate and process information effectively using tools.
The MindSearch framework helps break down complex questions into simpler parts, making it easier to find answers online. It simulates how humans think and search for information.
There are special agents in this system, like WebPlanner and WebSearcher, that work together to gather and organize information from the web, enhancing the problem-solving process.

Is RBAC Still Relevant? Am I?

Permit.io’s Substack • 99 implied HN points • 25 Apr 24

🕹 Technology Software Security Data Development Innovation

RBAC is still important as it simplifies the management of user permissions by linking them to roles, making it easier for developers and users to understand.
Newer models like ABAC and ReBAC are gaining popularity because they offer more flexibility and can handle complex permission requirements better than RBAC.
Using RBAC as a foundation allows developers to build more advanced authorization systems by layering on additional models, adapting to the changing needs of applications.

IT-Harvest Launches AI Assistants for Industry Research

The Security Industry • 8 implied HN points • 15 Jan 25

🕹 Technology AI Cybersecurity Software Data Innovation

IT-Harvest has launched AI assistants called HarvestIQ.ai, which help users research companies and products in the cybersecurity field. These assistants are designed to make finding information easier and faster.
The HarvestIQ Assistants feature chat interfaces that allow users to ask questions about cybersecurity vendors and products, providing detailed responses and insights. This is especially helpful for professionals needing quick access to relevant data during discussions.
The tools are cost-effective compared to traditional research methods and integrate advanced technologies to assist users in selecting the best cybersecurity solutions for their needs.

Moving Beyond Doomism: Data-Driven Strategies for Effective Climate Content

This Week in MCJ (My Climate Journey) • 393 implied HN points • 14 Mar 23

🌞 Climate & Environment Data Communication Behavior Circular Economy Psychology

Data-driven decisions are crucial in climate content to engage mainstream audiences effectively.
Promoting self-interest in climate content yields more results than focusing on planetary benefits.
Starting with simple, relatable content and gradually guiding individuals towards impactful actions can drive engagement and awareness.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

AI as Symptom and Dream

SCIENCE GODDESS • 393 implied HN points • 08 May 23

🕹 Technology AI Ethics Workforce Community Data

Many AI researchers are calling for a pause in advanced AI research due to concerns about potential apocalyptic scenarios.
There is a need to question the motives and proposed solutions of prominent AI organizations and figureheads.
Ethical considerations around AI should focus on issues like worker exploitation and power concentration, rather than just sensationalized fears of AI surpassing humanity.

The Core Primitive of AI: Experimental Learning

Shakos Metaheuristics • 294 implied HN points • 28 Nov 23

🕹 Technology AI Data

LLMs can extract higher-order reasoning abilities from training data
Corpus of human language acts as a vast universe of knowledge for AI training
Training data for AI models can be seen as a composition of all recorded experiments

An Inventor's Quest for the NHL Pt. 41

An Engineering Self-Study • 667 implied HN points • 07 Feb 24

🕹 Technology Inventions Engineering YouTube Data

The inventor had setbacks but is now back on track with building prototypes and filming updates.
The inventor is making money as a YouTube partner and finds it rewarding.
There's a philosophical shift towards being less anti-establishment and more open to using data in future designs.

How AI becomes biased, visually explained 🐧

Year 2049 • 8 implied HN points • 17 Jan 25

🕹 Technology AI Bias Data Learning Ethics

AI can show bias based on how it learns from the data given to it. If the data contains biases, the AI will likely reflect those biases in its decisions.
Using simple examples, like a penguin metaphor, helps explain complex AI concepts. It's easier to understand difficult ideas with relatable stories.
It's important to be aware of AI bias as it affects how AI technologies interact with people. Being educated about these biases can lead to better, fairer AI development.

The Sequence Opinion #485: What's Wrong With AI Benchmarks

TheSequence • 56 implied HN points • 06 Feb 25

🕹 Technology AI Data Evaluation Machine Learning Benchmarking

AI benchmarks are currently facing issues like data contamination and memorization, which affect how accurately they evaluate models. It's important to find better ways to test these systems.
New benchmarks are popping up all the time, making it hard to keep track of what each one measures. This could lead to confusion in understanding AI capabilities.
There's a need for clearer and more standard methods in AI evaluation to really see how well these models perform and improve their reliability.

Import AI 329: Compute IS data; don't build AI agents; AI needs a precautionary principle

Import AI • 399 implied HN points • 15 May 23

🕹 Technology AI Data Ethics Language Models AI Development

Building AI scientists to advise humans is a safer alternative to building AI agents that act independently
There is a need for a precautionary principle in AI development to address threats to democracy, peace, safety, and work
Approaches like Self-Align show the potential for AI systems to self-bootstrap using synthetic data, leading to more capable models

Swimming in the Tensions

Cybernetic Forests • 199 implied HN points • 21 Jan 24

🎨 Art & Illustration AI Art Archives Data Cultural Diversity Ethics

When creating images with AI, we are essentially building data visualizations based on training data, and this can lead to reproducing stereotypes found in the training data.
Archives, like Wikimedia Commons, require curation and community engagement to ensure responsible and equitable representation in AI training datasets.
There is a need to recognize the cultural and emotional value of images and data, and to approach AI training data as more than just facts, but as part of a larger social and cultural fabric.

AI#29: Take a Deep Breath

Don't Worry About the Vase • 716 implied HN points • 14 Sep 23

🕹 Technology AI Data Research Ethics Discourse

Taking a deep breath can help with problem-solving.
Language models can be useful in generating prompts and responses.
Developing AI systems to reason and code is a focus for future technology.

SIEM 4.0: The Essentialist Evolution

Detection at Scale • 59 implied HN points • 28 May 24

🕹 Technology Security AI Data Software Engineering Automation

Security teams are moving towards prioritizing impactful MITRE tactics over complete ATT&CK coverage to reduce distracting alerts and focus on critical threats.
Transitioning from individual behaviors to risk-based alerts allows for a more context-based approach, reducing alert volumes and enhancing significance.
The evolution to SIEM 4.0 includes opening up data lakes, adopting 'as code' principles, and utilizing AI to automate routine tasks so human analysts can focus on high-value work.

sqlmesh janitor

davidj.substack • 119 implied HN points • 13 Dec 24

🕹 Technology Software Data Engineering Cloud Development

Sqlmesh offers various command-line interface commands that help manage and maintain your data projects effectively. For example, the `clean` command helps fix any issues that might arise during execution.
The new tool has unique features that improve development, like automatic data contract handling and optimized incremental models, making it easier to work with large datasets without unnecessary costs.
Competition in the data transformation space is healthy. It pushes tools like dbt and sqlmesh to improve, ultimately benefiting users by providing better features and experiences.

How To Set Up Subscription Analytics For Growth Reporting - Issue 182

Data Analysis Journal • 196 implied HN points • 17 Jan 24

💼 Business Analytics Subscription Data Revenue Metrics

Subscription apps are rapidly growing as a business model.
Companies are targeting the subscription analytics space to help optimize plans, pricing, and revenue.
Setting up subscription analytics for reporting can help with data-driven decision-making.

TT Chapter: Fat-Tailed Distributions

Software Design: Tidy First? • 154 implied HN points • 04 Nov 24

🕹 Technology Software Design Development Data Engineering

Fat-tailed distributions show that extreme events can happen more often than we expect. This is important for planning in various fields.
When designing software, it's good to focus on creating simple models first. This can help make complex concepts easier to understand.
Being an empirical designer means you rely on real-world data and observations to guide your design decisions. This approach can lead to better results.

That Little Question of Humanity…

Technically Optimistic • 59 implied HN points • 24 May 24

🕹 Technology AI Copyright Privacy Data Artificial Intelligence

Celebrities like Scarlett Johansson are facing challenges with AI replicating their voices and likenesses without consent, raising important questions about ownership and rights.
Actors like Clark Gregg are advocating for the protection of their biometric data, pushing for the rights to own and control their scans, and be compensated for their use.
The intersection of technology and personal identity is a complex issue that prompts reflection on what it means to be human in a world where even famous personalities are at risk of having their identities manipulated.

Composition to the Rescue

FREST Substack • 9 implied HN points • 16 Jan 25

🕹 Technology Software Development Architecture Security Data

Current software systems are often too complex and difficult to modify, which makes them less user-friendly. We need simpler ways to build software that anyone can change easily.
Many businesses often overcomplicate software development, focusing too much on rigid structures instead of creating flexible systems. Instead, we should aim for systems that work like Excel and FileMaker, where changes can be made swiftly.
A new approach to software composition is needed, one that allows everyone to understand and manipulate tools. By focusing on natural relations and simple queries, we can create software that is accessible to all, not just a select few.

The Five Layers of Incident Response (Part 1)

Detection at Scale • 59 implied HN points • 21 May 24

🕹 Technology Security Automation Data Detection Incident Response

Detection Engineering involves automating SecOps using software engineering and data principles to enhance defense capabilities without eliminating human roles.
For effective Incident Response, utilize the 'Five Layers of IR': Playbook Management, Data Layer, and Presentation Layer.
The Playbook sets the strategy, Data Layer defines necessary logs for playbooks, and Presentation Layer visualizes alerts and actions for human analysis.

Dashboard Update: NIST Subcategories, MITRE Subtechniques and Mitigations

The Security Industry • 30 implied HN points • 20 Nov 24

🕹 Technology Cybersecurity Software Data Research Compliance

The platform now includes detailed information on over 9,000 cybersecurity products, helping professionals match their needs with available solutions. Users can see how each product aligns with NIST and MITRE standards.
Customers will soon be able to analyze their entire security stack, finding overlaps and gaps in their cybersecurity coverage. This feature will help them save costs and improve efficiency.
Traditional research firms only cover a small fraction of the cybersecurity industry. By capturing detailed data on all products, this platform aims to provide a more comprehensive view of available options.

How Generative AI is Transforming Healthcare

Gradient Flow • 139 implied HN points • 22 Feb 24

🕹 Technology AI Healthcare Data Podcasts Models

Generative AI in healthcare can transform patient care by providing personalized treatment suggestions, streamlining documentation, and enhancing communication.
Generative AI enables the development of privacy-assured synthetic medical data for research and prediction of health outcomes through data analysis.
Specialized models tailored to specific tasks through fine-tuning offer more efficient and accurate solutions compared to broader capabilities, highlighting the importance of personalized AI approaches.

AI, The Idiot Ant Queen

Desystemize • 1404 implied HN points • 07 Mar 23

🕹 Technology AI Ethics Science Engineering Data

Artificial intelligence could lead to a loss of understanding and agency in decision-making
AI ethics issues stem from existing power imbalances and biases, not just the capabilities of AI systems
The real concern with AI is the potential control it may have over societal institutions, impacting human autonomy and decision-making

Weekly Dose of Optimism #125

Not Boring by Packy McCormick • 92 implied HN points • 20 Dec 24

🕹 Technology Energy AI Sustainability Data Innovation

Commonwealth Fusion is making big strides toward clean energy with plans for the world's first commercial fusion power plant in Virginia, which could be operational by the early 2030s.
Off-grid solar microgrids could greatly help power AI data centers quickly and affordably, making use of solar energy, especially in sunny regions like the U.S. Southwest.
A new method called HORNET combines atomic force microscopy and AI to map RNA structures. This could improve our understanding of RNA and lead to better treatments for diseases.

Monarch Matrices(M2) instead of Transformers?

MLOps Newsletter • 176 implied HN points • 14 Jan 24

🕹 Technology AI Machine Learning Data Libraries

Monarch Matrices (M2) are proposed as a replacement for Transformers in models.
M2 uses structured Monarch matrices to improve efficiency in capturing relationships and reduce computational costs.
Replacing attention and MLPs with Monarch matrices in M2 enhances model performance and simplifies learning parameters.

What You Need to Know About GPT-4 and PaLM 2

Gradient Flow • 319 implied HN points • 01 Jun 23

🕹 Technology AI Data Ethics Performance Scaling

Leading-edge AI models like GPT-4 and PaLM 2 are becoming less open due to growing costs, IP protection, and misuse concerns.
Insights from technical reports of these models help in understanding capabilities, risks, and benefits, aiding in developing strategies to manage potential harm.
GPT-4 and PaLM 2 underwent rigorous testing for responsible AI behavior, outperforming predecessors in various tasks and showing advancements in performance, scalability, and efficiency.

An Ocean of Data

Justin E. H. Smith's Hinternet • 466 implied HN points • 12 Mar 24

🕹 Technology Data Artificial Intelligence Writing Publishing Digital Transformation

Data produced in just one minute in 2023 was 169,371 times more than produced in the entire 18th century.
The analogy of
pissing into the ocean
implies the massive amount of data being generated daily being like a drop in the vast ocean.
The role of a writer has evolved significantly from the 18th century, with the digital era signaling the end of traditional writing as we knew it.

7 Must-Have Features for Crafting Custom LLMs

Gradient Flow • 299 implied HN points • 21 Sep 23

🕹 Technology AI ML Generative AI Large Language Models Data

Crafting custom large language models (LLMs) is essential for addressing concerns about intellectual property, data security, and privacy.
Tools for building custom LLMs must include versatile tuning techniques, human-integrated customization, and data augmentation capabilities.
Developing multiple custom LLMs requires features like experimentation facilitation with tools such as MLflow, the use of distributed computing accelerators, and documentation excellence for alignment, accuracy, and reliability.

Import AI 328: Cheaper StableDiffusion; sim2soccer; AI refinement

Import AI • 339 implied HN points • 08 May 23

🕹 Technology AI Research Simulation Data Robotics

Training image models can be cheaper with smart tweaks like Low Precision GroupNorm and Low Precision LayerNorm. Companies like Mosaic are leading the way in AI industrialization.
Prominent AI researcher Geoff Hinton has expressed concerns about the rapid progress and control of advanced AI models. His departure from Google highlights the growing worries in the field.
New companies like Lamini are offering services to fine-tune existing AI models, indicating further industrialization of AI. Startups like these are bridging the gap between AI products and consumers.

Joe's Nerdy Rants #2

Joe Reis • 294 implied HN points • 27 May 23

🕹 Technology Data AI Business Startups Learning

Identify your motivation to learn in a rapidly changing industry by finding your ultimate goal or purpose.
Focus on mastering the fundamentals of a topic by understanding it from end to end and learning from first principles.
Be patient, read widely, and connect various ideas together to grow your knowledge over time.

The Sequence Chat: The End of Data. Or Maybe Not

TheSequence • 105 implied HN points • 20 Nov 24

🕹 Technology AI Data Generative AI Machine Learning Models

There's a big debate about whether we're running out of data for AI. Some people believe that as AI keeps growing, we might hit a point where there's just not enough new data to use.
Many AI models have already used a lot of data from the internet. This raises concerns that without fresh and vast data sources, these models might not improve much anymore.
To tackle the data issue, some suggest focusing on getting better quality data or even creating new, artificial datasets. This could help keep AI development moving forward.

The AI Conference in SF: The Future of AI is Now!

Gradient Flow • 319 implied HN points • 18 May 23

🕹 Technology AI Conferences Data Machine Learning Podcasts

The AI Conference in San Francisco aims to bridge the gap between research and real-world applications of AI by providing a vendor-neutral platform for networking and learning.
The conference is seeking speakers with expertise in implementing AI across various industries like healthcare, finance, manufacturing, and more, as well as in model development and deployment.
Cutting-edge developments in AI include advancements such as a benchmarking platform for large language models with Elo ratings, reduced latency in Apache Spark Structured Streaming, and AI systems like Med-PaLM 2 for medical question answering.

Where's PlayStation - and Sony's games - headed in 2023?

The GameDiscoverCo newsletter • 275 implied HN points • 31 May 23

🕹 Technology Gaming Data Updates Strategy

Sony is pushing ahead with the PS5 despite supply constraints
Sony is expanding beyond consoles to focus on PC and mobile gaming
Getting insights on Games as a Service processes from big titles can be challenging due to transparency concerns and potential harassment of developers

Building The Most Basic LangChain Chatbot

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 59 implied HN points • 06 May 24

🕹 Technology AI Software Chatbots NLP Data

Chatbots use Natural Language Understanding (NLU) to figure out what users want by detecting their intentions and important information.
With Large Language Models (LLMs), chatbots can understand and respond to conversations more naturally, moving away from rigid, rule-based systems.
Building a chatbot now involves using advanced techniques like retrieval-augmented generation (RAG) to pull in useful information and provide better answers.

Testing a book's code, part 4: Testing Matplotlib data visualizations

Mostly Python • 314 implied HN points • 01 Feb 24

🕹 Technology Programming Data Testing

Testing data visualizations programs involves assessing both terminal and graphical outputs.
Automated testing of Matplotlib programs can be challenging due to the appearance of the Matplotlib plot viewer.
One approach to overcome the challenge of testing Matplotlib programs is to modify the files to generate image files for testing.

TAO - Meta's Scalable architecture powering world's largest social graph

Engineering At Scale • 120 implied HN points • 09 Nov 24

🕹 Technology Software Architecture Data Systems Engineering

Meta created TAO to handle the huge amount of data and user interactions on its platform. This system helps generate personalized content for over 2 billion users very quickly.
TAO uses a layered architecture that includes caching and data storage to improve performance. This design helps distribute the load and maintain fast responses even when many users are active.
TAO prioritizes high availability over strict data consistency. This means it can sometimes show slightly out-of-date information, but it still works well for users, especially during busy times.

Generative AI 2023: Why This Year Marks a Major Turning Point

Gradient Flow • 199 implied HN points • 16 Nov 23

🕹 Technology AI Data Software Development Podcast Startups

Generative AI, particularly large language models like GPT-4, is rapidly gaining mainstream adoption across various sectors like chatbots, computer programming, medicine, and law.
Executives and managers are increasingly recognizing the transformative potential of generative AI, with surveys showing high interest and willingness to invest in the technology for efficiency and growth.
Studies highlight the significant productivity gains generative AI provides, benefiting lower-performing workers and increasing productivity in areas like writing tasks and customer service by substantial percentages.

Gsnowflake

benn.substack • 792 implied HN points • 07 Jul 23

🕹 Technology Data AI Cloud Computing Natural Language Processing

Google is technically a database but differs from traditional databases in its structure and content.
Snowflake is introducing features like Document AI that hint at a shift towards focusing on information retrieval rather than just data analysis.
The market for an information database could potentially be larger and more accessible than traditional data warehouses, offering simpler access to basic facts and connections.

The 4 Problems of Airdrops

DeFi Weekly • 235 implied HN points • 26 Apr 23

🔮 Crypto Liquidity Valuations Data

Decentralisation theatrics don't necessarily protect against legal issues with airdrops.
Airdrops can lead to early liquidity for team members and investors, impacting valuations.
Inflated user counts from airdrops may not reflect genuine user ownership or value creation.

The Joe Reis Show - My New Solo Podcast

Joe Reis • 235 implied HN points • 08 Mar 23

🎙 Podcasts Technology Data

Joe Reis has launched a new solo podcast called The Joe Reis Show.
The podcast will feature candid takes on the technology and data industry.
Listeners can find the podcast on Spotify.