The hottest Data Substack posts right now

And their main takeaways

Brand is Inevitable

42 Slash • 98 implied HN points • 18 Jun 23

A brand is more than just a logo or image, it encompasses values and purpose.
Investing in brand development is crucial from the start, not something to be done later.
Brands are about storytelling that goes beyond data and resonates culturally.

Battery Ventures' Max Schireson: To Build Value in Tech, Build Different

Condensing the Cloud • 98 implied HN points • 31 Aug 23

🕹 Technology Innovations Startups Data Computing Tech Companies

To build value in the tech industry, aim to do things differently, not just better or faster.
Doing something different can polarize users, with some finding it better and others not.
Success in tech often comes from being unique and offering something new, not just improving existing technologies.

Big Tech's LLM evals are just marketing

Democratizing Automation • 205 implied HN points • 13 Dec 23

🕹 Technology Artificial Intelligence Evaluation Models Data Companies

Big Tech's LLM evaluations are often just a form of marketing.
Companies may use misleading comparisons in their model scores without being able to truly evaluate their competitors.
Access to training data and code is crucial for confidently assessing differences in LLM evaluation scores.

Beat your Bot: Building your Moat against AI

Musings on Markets • 2 HN points • 28 Aug 24

🕹 Technology AI Computing Data Disruption Automation

AI is getting better at doing mechanical tasks, but it struggles with intuitive ones. This means jobs that rely on creativity and adaptability are safer than those that are purely formulaic.
Jobs that follow strict rules can be easily replaced by AI, while those that need human judgement and understanding of principles will be harder for AI to take over. This shows the value of being skilled in areas that require more complex thinking.
To protect your job from AI, be a generalist instead of a specialist, practice telling stories around your work, and try not to rely too much on technology for reasoning. This can help you stay unique and valuable in a changing job landscape.

Vendor Lock-in Scores: A brief summary

The Orchestra Data Leadership Newsletter • 59 implied HN points • 02 Jan 24

🕹 Technology Data Software SaaS

Vendor lock-in is an assessment of present gain versus future risk in the world of data, software, and cloud services.
Key considerations include migration risk, migration cost, and pricing cost when assessing vendor lock-in.
Factors like data portability, integration, service and support, and community strength play a significant role in evaluating vendor lock-in risks when choosing a SaaS provider.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

A First Principles guide to Data Availability: Part 1

Matthew’s Substack • 39 implied HN points • 28 Feb 24

🔮 Crypto Blockchain Web3 Data Security Infrastructure

Data Availability (DA) is important for blockchain because it allows data to be accessible and verified by users. It helps ensure security, especially for rollups on Ethereum.
Rollups process transactions on cheaper chains but rely on Ethereum's main network for security by posting necessary data. This means Ethereum validates transactions and can handle fraud cases effectively.
The future of Data Availability includes new methods to lower costs and improve scalability, like Danksharding. This could make it easier to store data efficiently while maintaining security.

How To Create A LangChain Application That Runs Locally & Offline

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 39 implied HN points • 28 Feb 24

🕹 Technology AI Software Development Privacy Data

Running language models locally gives you more control over data privacy and enhances security by keeping sensitive information off external servers.
Using small language models can improve efficiency in tasks like conversation management and language understanding while also cutting down on costs associated with cloud services.
Local deployment makes models available offline, ensuring you can use them anytime without needing an internet connection, which is useful for research and development.

Google Gemini and the Responsible AI Conundrum

Rod’s Blog • 39 implied HN points • 26 Feb 24

🕹 Technology AI Data Ethics Security Government

Google's Gemini AI models are designed for various tasks and are based on responsible AI principles, but faced challenges like data poisoning attacks.
The data poisoning attack on Google's Gemini showed the model's vulnerability and raised questions about the effectiveness of Google's Responsible AI policy.
Experts suggest that Google should have better safeguards for data quality, transparency in model deployment, and more engagement with the AI community to address ethical implications.

LLM links, 2/12

In My Tribe • 151 implied HN points • 12 Feb 24

🕹 Technology AI Data Ethics

AI can expand human capabilities and creativity by serving as a partner in various tasks.
Future AI technology is predicted to have the capability to understand human emotions and subtle communications, potentially intruding on privacy.
LLMs can easily be steered politically through supervised fine-tuning, highlighting the influence of human biases on these models rather than training data.

Five Links for October 2023

Five Links (and three graphs) by Auren Hoffman • 235 implied HN points • 06 Oct 23

🕹 Technology Data AI Start-ups Social media E-commerce

Amazon's ad business is as big as Nike or Volvo.
The rise in food allergies may be related to the decline in hookworms.
Good data is more important than tools for generating insights.

Learning the basics of AI

Sunday Letters • 19 implied HN points • 05 May 24

🕹 Technology AI Software Development Data Innovation

Building with AI is both easy and hard. It's easy to get something working quickly, but creating really good experiences takes more effort.
We're still figuring out the basics of AI, just like we did with early computer graphics. There's a lack of clear best practices and common tools right now.
To improve AI development, we should focus on finding problems to solve and be open to changing our solutions as we learn more about what works and what doesn't.

Do we need RL for RLHF?

Democratizing Automation • 182 implied HN points • 06 Dec 23

🕹 Technology AI Research Algorithms Data Models

The debate around integrating human preferences into large language models using RL methods like DPO is ongoing.
There is a need for high-quality datasets and tools to definitively answer questions about the alignment of language models with RLHF.
DPO can be a strong optimizer, but the key challenge lies in limitations with data, tooling, and evaluation rather than the choice of optimizer.

The Age of Incumbents

Investing 101 • 133 implied HN points • 02 Mar 24

🕹 Technology Data AI Market Competition Software

Technology as an asset class is relatively new in the stock market, with tech companies now dominating market capitalization.
The age of dynamic dinosaurs is here, with established tech companies evolving and becoming more challenging to displace.
Big markets attract big attention, but distribution is key for success in tech, as seen with companies like Microsoft leveraging built-in distribution for products like Teams.

Data and Innovation in D.C.

Technically Optimistic • 79 implied HN points • 20 Oct 23

🇺🇸 U.S. Politics Legislation Privacy Technology Data AI

Data privacy is crucial in the development of AI legislation to protect user information and provide transparency and control.
Users often do not understand the extent of data collection by companies and the tradeoffs involved in sharing personal information for personalized experiences.
There is a need to enhance digital literacy, promote user agency over their data, and find alternatives to the current consent practices in applications to address evolving challenges around data privacy.

PSA: Migrate from the Threat Intelligence Platform Connector to the Threat Intelligence Solution in Microsoft Sentinel

Rod’s Blog • 79 implied HN points • 21 Jun 23

🕹 Technology Security Data Software Updates Networking

The Threat Intelligence Platform Connector in Microsoft Sentinel is being deprecated, so users should consider migrating to the new Threat Intelligence Solution soon.
There is no definitive date for the deprecation, but users are advised to start using the new version within the next 6 months.
The new version of the Threat Intelligence Solution offers more artifacts like Rules and Hunting Queries, providing additional capabilities.

Must Learn AI Security Part 4: Trojan Attacks Against AI

Rod’s Blog • 79 implied HN points • 21 Aug 23

🕹 Technology Security AI Malware Data

Trojan attacks against AI involve disguising malware as legitimate software to gain unauthorized access, steal data, or manipulate algorithms, leading to dangerous outcomes.
Common steps in a Trojan attack against AI include reconnaissance, delivery of the Trojan, installation, establishing command and control, exploitation, and covering up tracks to avoid detection.
Mitigation of Trojan attacks against AI involves measures like using antivirus software, regular software updates, strong access controls, employee education on social engineering, and implementing monitoring strategies like real-time monitoring, intrusion detection, and machine learning for anomaly detection.

Market Map & Analysis: AI Synthetic Data Companies

The Strategy Deck • 78 implied HN points • 06 Jul 23

🕹 Technology AI Data ML Synthetic Data Computer Vision

Synthetic data is crucial for ML by replacing real-world data, protecting sensitive information, and validating AI applications.
Synthetic data is used in computer vision for autonomous vehicles and is expanding to other data types like text and tabular data.
There are specialized and general-purpose synthetic data platforms developing innovative solutions for various industries and use cases.

Five Links for December 2023

Five Links (and three graphs) by Auren Hoffman • 170 implied HN points • 30 Nov 23

🕹 Technology Links Podcast Data Social media

The post shares five interesting links to read, watch, and listen to.
There is a new launch called Placekey 2.0 focusing on entity resolution for places and addresses.
The text includes a bonus content section with additional information and links for further reading.

Images are Biased

Never Met a Science • 77 implied HN points • 26 Feb 24

🕹 Technology Media Communication Data Images Information

Images are a biased form of communication compared to text because they inherently introduce bias by conveying more context and extra-textual information.
Different communication modalities like images and text convey different amounts and types of information, impacting how we understand and interpret data and knowledge.
Understanding the rise of visual communication technologies can lead to a deeper comprehension of the effects of information technology on society and help in decision-making for the future.

Make Everything Versionable

Sarah's Newsletter • 239 implied HN points • 24 May 22

🕹 Technology SaaS Tools Data Development

Teams are facing challenges with SaaS tools and maintaining them as complexity grows.
Making everything versionable can help in QA, testing, and peer reviewing changes, leading to fewer errors in production.
There is a need for more accessible ways to version configurations across different teams and tools, especially for non-technical users.

Unpaywalled: The next 10B+ security companies

Frankly Speaking • 305 implied HN points • 06 Apr 23

🕹 Technology Cybersecurity Data Platform Identity Network

Investors seek 10B+ security companies for meaningful returns on their funds.
Building a successful security business requires addressing broad problems and having a platform play.
Telemetry in areas like network, code, identity, and data is crucial for cybersecurity platform potential.

☞ The Garden of Computational Delights

Cabinet of Wonders • 231 implied HN points • 02 Aug 23

🕹 Technology Computing Innovation Data Digital Preservation Emerging Tech

Computing goes beyond utilitarian purposes to bring delight and wonder through creative coding and simulations.
The 'Garden of Computational Delights' is a collection of places that evoke fascination with web, programming, and computing.
The boundaries of what fits in the 'Garden' are fuzzy, personal, and idiosyncratic, showcasing a diverse range of computer-related interests.

pip install sqlmesh-cube

davidj.substack • 23 implied HN points • 19 Dec 24

🕹 Technology Software Programming Data Development Systems

A new package called 'sqlmesh-cube' is available for anyone to use. You can easily install it with pip.
This package helps create a CLI command that outputs JSON, showing how sqlmesh models relate to each other. It's important for building a semantic layer.
This was the author's first package, and they learned a lot about the publishing process along the way. They are open to feedback and requests for updates.

No-Code Deployment & Orchestration Of Open-Sourced Foundation Models

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 16 Apr 24

🕹 Technology AI Automation Software Development Data

Open-sourced language models are easier for everyone to access and can be customized to fit specific needs. This means more people, like researchers or developers, can use them to create unique solutions.
Choosing the right model for each task can improve performance, so it's important to understand what each model does best. Using multiple models together can lead to better results overall.
No-code tools like GALE make it simple to deploy and manage these models without needing deep technical skills. This helps businesses and individuals quickly set up and adapt AI applications.

The Three Tenets for AI Security and How to Audit Activity Logs

Rod’s Blog • 59 implied HN points • 10 Nov 23

🕹 Technology AI Security Data Access

AI security involves three main tenets: secure code, secure data, and secure access. It is crucial for security professionals to ensure AI systems are designed, developed, and maintained following these principles.
To achieve secure code, monitor and update AI systems regularly, validate and verify their performance, and adhere to secure development practices and tools.
When auditing activity logs, focus on detecting cyberthreats, troubleshooting and resolving issues, and optimizing performance. It involves collecting, analyzing, visualizing, and reporting on the activities within the AI system.

DALL·E 2 Decoded

Gradient Flow • 219 implied HN points • 21 Jul 22

🕹 Technology AI Data Podcasts Events Books

A guide to data annotation and synthetic data generation helps navigate the variety of tools available in the machine learning and artificial intelligence landscape.
The Data Exchange podcast features conversations on DALL�E, scalable machine learning, and orchestration tools for data scientists.
Book recommendations offer a diverse selection including finance, the Metaverse, rogues, and visionary figures like John von Neumann.

Extremely Open Science

Rabbit Thoughts • 39 implied HN points • 17 Jan 24

🔬 Science Open Science Research Experiment Data Tool

The author will work on a scientific project completely in the open in 2024, streaming and recording sessions for an hour per week.
The project aims to show the process from scratch to help junior researchers understand and learn from the experience of dealing with minor issues.
The author is choosing a question for the project that can be followed along at home with just a personal laptop or desktop computer.

AGI is the Lie of the Year

Dana Blankenhorn: Facing the Future • 59 implied HN points • 27 Nov 23

🕹 Technology AI Machine Learning Data Cloud Computing Big Tech

Artificial General Intelligence (AGI) does not exist.
Generative AI is not the same as general intelligence.
AI programs are software designed for specific tasks and inputs by humans.

FaaF: Facts As A Function For Evaluating RAG

Cobus Greyling on LLMs, NLU, NLP, chatbots & voicebots • 19 implied HN points • 04 Apr 24

🕹 Technology AI NLP Data Software Programming

RAG systems often struggle to verify facts in generated text. This is because they don't focus enough on assessing the truthfulness of low-quality outputs.
Verifying facts one by one takes a lot of time and resources. It's challenging to check multiple facts in a single generated response efficiently.
The FaaF framework improves fact verification greatly. It simplifies the process, makes it more accurate, and cuts down the time needed for checking facts.

Rise of the Artificial Influencer

John Mayo-Smith's Substack • 79 implied HN points • 17 Jan 23

🕹 Technology AI Advertising SEO Influencers Data

Advertising, SEO, and Artificial Influence are all methods to grab attention for products or services.
AIs are starting to exhibit brand preferences, like humans do, affecting the way they provide recommendations and influence choices.
Influencing AIs involves understanding their training data and providing reliable, consistent, and trustworthy information to align with their preferences.

Evolution of LLM Agents

LLMs for Engineers • 79 implied HN points • 21 Jun 23

🕹 Technology AI Software Automation Data Machine Learning

Large Language Models (LLMs) are becoming more powerful and can now perform complex tasks with the help of internet data and tools. This could significantly boost productivity for both individuals and corporations.
The evolution of LLMs has progressed through several levels, starting from simple API calls to advanced agents that understand tasks better and can even interact without much human guidance.
While these advancements are exciting, there are still challenges to overcome, such as reliability, cost, and the potential for errors in the output of LLMs.

Farewell to the Safe Space

Technically Optimistic • 59 implied HN points • 13 Oct 23

🕹 Technology Privacy AI Ethics Surveillance Data

Utilizing AI for memory recall, like with Rewind AI, can be a beneficial tool for enhancing memory capabilities.
There is a constant trade-off between personalization and privacy in the digital space, raising questions about the extent of data individuals are willing to share for customization.
Emerging technologies such as surveillance devices and advanced software like Rewind AI prompt discussions on privacy expectations and the need for clear regulations to safeguard personal data.

GPT-4’s Placeholder Problem: A Case Study of GPT-4 Fake Data Presented as Fact

Data at Depth • 39 implied HN points • 26 Dec 23

🕹 Technology AI Data

GPT-4 can find and present information in various formats based on how you ask it to, whether as a paragraph, a chart, or even a poem.
The issue highlighted is GPT-4 presenting data as facts, raising concerns about the accuracy and authenticity of information generated by AI models.
The post emphasizes the importance of being vigilant and critical when consuming information generated by AI like GPT-4.

AI Isn’t Good Enough

Irregular Ideas with Paul Kedrosky & Eric Norlin of SKV • 172 HN points • 23 Aug 23

🕹 Technology AI Automation Productivity Tools Data

There is a significant shortage of workers in the U.S. across various industries, leading to the need for automation.
Current AI technology has limitations and is not yet capable of addressing the workforce shortage effectively.
To avoid economic disruptions, future automation needs to focus on delivering high productivity gains that outweigh worker displacement.

Data, Despair, and the FLOWBEE

Data People Etc. • 266 implied HN points • 13 Mar 23

💼 Business Data Metrics Organization Intelligence Thinking

Data professionals may feel isolated due to externalized intelligence and lack of integration into daily activities.
Thinkers in organizations may become untethered without proper recognition and integration with doers.
To be effective, thinkers must be tightly integrated into their environment and endorsed by leadership.

I AM AI

Rod’s Blog • 59 implied HN points • 15 Aug 23

🕹 Technology AI Data Ethics Security Training

President Biden made headlines by saying 'I am AI', creating confusion and criticism, despite NVIDIA previously using the phrase for marketing.
The statement 'I am AI' is viewed as clever and may spark important discussions about artificial intelligence's impact on society and responsibility.
Humans are connected to the creation and control of AI, emphasizing that the responsibility lies with us to shape AI's future.

The Jargonator T-800 Newsletter Entry

The Data Score • 59 implied HN points • 02 Oct 23

💼 Business Finance Data Technology Market research Investing

The newsletter offers insights into data-driven decision-making for a range of professionals.
The newsletter includes a section where jargon related to finance, data, and technology is defined in simpler terms.
Top 5 most viewed articles from the Data Score Newsletter offer valuable insights on revenue estimates, alternative data, evaluating data partners, and more.

Must Learn AI Security Part 9: Hyperparameter Attacks Against AI

Rod’s Blog • 59 implied HN points • 07 Sep 23

🕹 Technology AI Security Machine Learning Cybersecurity Data

A hyperparameter attack against AI manipulates crucial adjustable settings of an algorithm to influence the machine learning model's performance and behavior
Different types of hyperparameter attacks can target aspects like performance, biases, vulnerability to adversarial examples, transferability, and resource consumption
Mitigating hyperparameter attacks involves securing data access, monitoring hyperparameter changes, testing robustness, updating models, and following responsible AI practices

Dancing Cyborgs, Morphing Multi-modal Robots, DIY Python Pups, Teslabot..

Robots & Startups • 59 implied HN points • 02 Jul 23

🕹 Technology AI Robots Data Generative AI

AI-generated bad data is becoming more prevalent and is impacting the learning process of artificial intelligence.
Humans are increasingly using AI to create content, leading to a mix of human and AI-generated text.
There is a concern that the over-reliance on AI for labeling data may have negative implications.

Launch Of Modeling Mindsets Book 🐙

Mindful Modeler • 139 implied HN points • 08 Nov 22

🚌 Education Modeling Data Book Learning Statistics

Having multiple modeling mindsets can help overcome challenges in modeling projects.
Different modeling approaches have different strengths and limitations.
It's valuable to understand a variety of modeling mindsets to enhance problem-solving abilities.