The hottest Cloud Computing Substack posts right now

And their main takeaways

Nebius Part 1: the Smooth (Developer) Operator

Interconnected • 277 implied HN points • 17 Feb 25

🕹 Technology AI Cloud Computing Developer Tools Investment Tech Companies

Nebius is focused on creating a smooth experience for developers. They make it easy for developers to start using their platform without unnecessary steps, which is important for building cool AI projects.
The company has a strong background thanks to its roots in Yandex, which gives it experience in running cloud services effectively. This experience helps Nebius offer a wide range of cloud solutions, not just GPU rentals.
While some may worry about Nebius's Russian connections, the company has distanced itself from that past. With significant funding and a solid road ahead, it seems ready to grow and succeed free from those burdens.

PyIceberg: Current State and Roadmap

Ju Data Engineering Newsletter • 396 implied HN points • 28 Oct 24

🕹 Technology Data Engineering Software Development Big Data Open Source Cloud Computing

Improving the user interface is crucial for more teams to use Iceberg, especially those that use Python for their data work.
PyIceberg, which is a Python implementation, is evolving quickly and currently supports various catalog and file system types.
While PyIceberg makes it easy to read and write data, it has some limitations, especially compared to using Iceberg with Spark, like handling deletes and managing metadata.

Was Zuck Right about Chinese AI Models?

Interconnected • 4751 implied HN points • 13 Jan 25

🕹 Technology AI Cloud Computing Data Privacy Censorship Global Competition

Chinese AI models can answer sensitive questions when run locally, but they often censor answers in cloud settings. This shows a difference in behavior based on where the models are hosted.
Censorship in AI models is more about the cloud platforms than the models themselves. This poses challenges for Chinese cloud providers wanting to compete internationally.
Even though some see Chinese AI as censored, it can still be powerful and competitive. Users may prefer to download and run these models locally to avoid censorship and make the most of their capabilities.

Iceberg + Single Node Engines

Ju Data Engineering Newsletter • 515 implied HN points • 17 Oct 24

🕹 Technology Data Engineering Cloud Computing Big Data Software Development Data Management

The use of Iceberg allows for separate storage and compute, making it easier to connect single-node engines to the data pipeline without needing extra steps.
There are different approaches to integrating single-node engines, including running all processes in one worker or handling each transformation with separate workers.
Partitioning data can improve efficiency by allowing independent processing of smaller chunks, which reduces the limitations of memory and speeds up data handling.

Analyze research papers with Gemini 2.0

Gonzo ML • 126 implied HN points • 23 Feb 25

🕹 Technology Artificial Intelligence Machine Learning Natural Language Processing Data science Cloud Computing

Gemini 2.0 models can analyze research papers quickly and accurately, supporting large amounts of text. This means they can handle complex documents like academic papers effectively.
The DeepSeek-R1 model shows that strong reasoning abilities can be developed in AI without the need for extensive human guidance. This could change how future models are trained and developed.
Distilling knowledge from larger models into smaller ones allows for efficient and accessible AI that can perform well on various tasks, which is useful for many applications.

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Clouded Judgement 6.13.25 - The Battle for Data Ownership

Clouded Judgement • 7 implied HN points • 13 Jun 25

🕹 Technology Data Management Software Development Artificial Intelligence Cloud Computing SaaS

You might think you own your data, but companies can make it hard to use. For example, Slack has new rules that limit how you can access your own conversation data.
If other apps like Salesforce or Workday follow Slack's lead, it could become really tough for companies to use their data in AI projects. This means you might not have as much control as you thought.
The fight for data ownership is a big deal right now. As software shifts towards AI, who controls the data will be a key factor in how companies operate.

DataFrame

davidj.substack • 35 implied HN points • 20 Feb 25

🕹 Technology Data science Machine Learning Programming Cloud Computing Open Source

Polars Cloud allows for scaling across multiple machines, making it easier to handle large datasets than using just a single machine. This helps in processing data faster and more efficiently.
Polars is simpler to use compared to Pandas and often performs better, especially when transforming data for machine learning tasks. It supports familiar methods that many users already know.
Unlike SQL, which runs well on cloud services, using Pandas and R for large-scale transformations has been challenging. The new Polars Cloud aims to bridge this gap, providing more scalable solutions.

AWS CEO Matt Garman Talks Amazon’s Big Bets in AI Chips, Reasoning, and Nuclear Energy

Big Technology • 5129 implied HN points • 03 Dec 24

🕹 Technology AI Development Cloud Computing Nuclear Energy Partnerships

Amazon is focusing heavily on AI and has introduced new AI chips, reasoning tools, and a large AI training cluster to enhance their cloud services. They want customers to have more options and better performance for their AI needs.
AWS believes in providing choices to customers instead of pushing one single solution. They aim to support various AI models for different use cases, which gives developers flexibility in how they build their applications.
For energy solutions, Amazon is investing in nuclear energy. They see it as a clean and important part of the future energy mix, especially as demand for energy continues to grow.

The Overview Of Apache Spark

VuTrinh. • 879 implied HN points • 07 Sep 24

🕹 Technology Data processing Software Engineering Distributed Systems Open Source Cloud Computing

Apache Spark is a powerful tool for processing large amounts of data quickly. It does this by using many computers to work on the data at the same time.
A Spark application has different parts, like a driver that directs processing and executors that do the work. This helps organize tasks and manage workloads efficiently.
The main data unit in Spark is called RDD, which stands for Resilient Distributed Dataset. RDDs are important because they make data processing flexible and help recover data if something goes wrong.

I spent 6 hours learning how Apache Spark plans the execution for us

VuTrinh. • 659 implied HN points • 10 Sep 24

🕹 Technology Data science Software Engineering Big Data Cloud Computing Machine Learning

Apache Spark uses a system called Catalyst to plan and optimize how data is processed. This system helps make sure that queries run as efficiently as possible.
In Spark 3, a feature called Adaptive Query Execution (AQE) was added. It allows the tool to change its plans while a query is running, based on real-time data information.
Airbnb uses this AQE feature to improve how they handle large amounts of data. This lets them dynamically adjust the way data is processed, which leads to better performance.

The Weekly Kaitchup #64

The Kaitchup – AI on a Budget • 59 implied HN points • 25 Oct 24

🕹 Technology AI Machine Learning Software Data science Cloud Computing

Qwen2.5 models have been improved and now come in a 4-bit version, making them efficient for different hardware. They perform better than previous models on many tasks.
Google's SynthID tool can add invisible watermarks to AI-generated text, helping to identify it without changing the text's quality. This could become a standard practice to distinguish AI text from human writing.
Cohere has launched Aya Expanse, new multilingual models that outperform many existing models. They took two years to develop, involving thousands of researchers, enhancing language support and performance.

I spent 5 hours learning how Google manages terabytes of metadata for BigQuery.

VuTrinh. • 399 implied HN points • 17 Sep 24

🕹 Technology Data Engineering Cloud Computing Database Systems Big Data

Metadata is really important because it helps organize and access data efficiently. It tells systems where files are and which ones can be ignored during processing.
Google's BigQuery uses a unique system to manage metadata that allows for quick access and analysis of huge datasets. Instead of putting metadata with the data, it keeps them separate but organized in a smart way.
The way BigQuery handles metadata improves performance by making sure that only the relevant information is accessed when running queries. This helps save time and resources, especially with very large data sets.

Kubernetes for Data Engineers

VuTrinh. • 859 implied HN points • 03 Sep 24

🕹 Technology Data Engineering Cloud Computing DevOps Software Development Infrastructure

Kubernetes is a powerful tool for managing containers, which are bundles of apps and their dependencies. It helps you run and scale many containers across different servers smoothly.
Understanding how Kubernetes works is key. It compares the actual state of your application with the desired state to make adjustments, ensuring everything runs as expected.
To start with Kubernetes, begin small and simple. Use local tools for practice, and learn step-by-step to avoid feeling overwhelmed by its many components.

What hath AWS wrought?

Cloud Irregular • 2661 implied HN points • 10 Dec 24

🕹 Technology Cloud Computing Software Development DevOps Infrastructure

At this year's AWS re:Invent, there were no major new services launched, which is quite different from previous years. Instead, AWS focused on enhancing existing services and features.
In the past, AWS released many new services, but many of them didn't succeed. This led to dissatisfaction within the developer community.
Now, AWS seems to be concentrating on improving their core offerings. This change could help revive interest and excitement in the AWS developer community again.

Scaling Distributed Systems with the Scatter-Gather Pattern

Engineering At Scale • 60 implied HN points • 15 Feb 25

🕹 Technology Distributed Systems Microservices Cloud Computing Software Design System Architecture

The Scatter-Gather pattern helps speed up data retrieval by splitting requests to multiple servers at once, rather than one after the other. This makes systems respond faster, especially when lots of data is needed.
Using this pattern can improve system efficiency by preventing wasted time waiting for responses from each service. This means the system can handle more requests at once.
However, implementing Scatter-Gather can be tricky. It requires careful handling of errors and managing different data sources to ensure the information is accurate and reliable.

($) DeepSeek Diffusion

Interconnected • 123 implied HN points • 07 Feb 25

🕹 Technology AI Open Source Geopolitics Cloud Computing Cybersecurity

The ongoing discussion about DeepSeek focuses too much on the rivalry between the U.S. and China. It's more about whether technology is open source or closed source.
Open source technology, like DeepSeek, can spread quickly and widely, getting adopted by various companies across the globe.
Major cloud providers, including U.S. companies, are offering DeepSeek models to their customers, showing its significant impact in the tech world.

I spent 5 hours learning how Google lets us build a Lakehouse.

VuTrinh. • 139 implied HN points • 24 Sep 24

🕹 Technology Cloud Computing Data Engineering Software Development Information Storage

Google's BigLake allows users to access and manage data across different storage solutions like BigQuery and object storage. This makes it easier to work with big data without needing to move it around.
The Storage API enhances BigQuery by letting external tools like Apache Spark and Trino directly access its stored data, speeding up the data processing and analysis.
BigLake tables offer strong security features and better performance for querying open-source data formats, making it a more robust option for businesses that need efficient data management.

Resilient Cyber Newsletter #15

Resilient Cyber • 119 implied HN points • 24 Sep 24

🕹 Technology Cybersecurity AI Security Software Development Cloud Computing Digital Transformation

Some software vendors are creating security problems by delivering buggy products. Customers should demand better security from their suppliers during purchase.
As companies rush to adopt AI, many are overlooking crucial security measures, which poses a big risk for future incidents.
Supporting open source software maintainers is vital because many of them are unpaid. Companies should invest in the projects they rely on to ensure their continued health and security.

Uber’s Big Data Revolution: From MySQL to Hadoop and Beyond

VuTrinh. • 279 implied HN points • 14 Sep 24

🕹 Technology Data Engineering Big Data Cloud Computing Data Management Data Analytics

Uber evolved from simple data management with MySQL to a more complex system using Hadoop to handle huge amounts of data efficiently.
They faced challenges with data reliability and latency, which slowed down their ability to make quick decisions.
Uber introduced a system called Hudi that allowed for faster updates and better data management, helping them keep their data fresh and accurate.

The massive DeepSeek affect

TP’s Substack • 37 implied HN points • 15 Feb 25

🕹 Technology AI Models Open Source Consumer Electronics Software Development Cloud Computing

DeepSeek has gained huge popularity in China, surpassing major competitors and reaching 30 million daily active users. This shows that users really like its features.
Chinese companies are rapidly integrating DeepSeek into their products, from smartphones to cars, suggesting that more devices will soon be using this powerful AI tool.
The rise of DeepSeek is changing how people in China use AI and might even provide better search options compared to existing services like Baidu. It's a big deal for the tech industry there.

How Adaptive AI Microcontainers Outmaneuver Modern Cybersecurity Threats in AI Workloads

Phoenix Substack • 14 implied HN points • 20 Feb 25

🕹 Technology AI Security Cybersecurity Microservices Software Development Cloud Computing

AI workloads are important for businesses but are also very attractive targets for cyber threats. This means we need better ways to protect them.
Traditional security methods struggle because they can be predictable and static, making it easier for hackers to get in and steal data or disrupt systems.
Adaptive AI Microcontainers offer a modern solution by constantly changing and healing themselves, making it much harder for cybercriminals to succeed.

How do we run Kafka 100% on the object storage?

VuTrinh. • 519 implied HN points • 27 Aug 24

🕹 Technology Software Cloud Computing Data Engineering

AutoMQ enables Kafka to run entirely on object storage, which improves efficiency and scalability. This design removes the need for tightly-coupled compute and storage, allowing more flexible resource management.
AutoMQ uses a unique caching system to handle data, which helps maintain fast performance for both recent and historical data. It has separate caches for immediate and long-term data needs, enhancing read and write speeds.
Reliability in AutoMQ is ensured through a Write Ahead Log system using AWS EBS, which helps recover data after crashes. This setup allows for fast failover and data persistence, so no messages get lost.

I spent 4 hours learning Apache Iceberg. Here's what I found.

VuTrinh. • 799 implied HN points • 10 Aug 24

🕹 Technology Data Engineering Software Development Database Management Big Data Cloud Computing

Apache Iceberg is a table format that helps manage data in a data lake. It makes it easier to organize files and allows users to interact with data without worrying about how it's stored.
Iceberg has a three-layer architecture: data, metadata, and catalog, which work together to track and manage the actual data and its details. This structure allows for efficient querying and data operations.
One cool feature of Iceberg is its ability to time travel, meaning you can access previous versions of your data. This lets you see changes and retrieve earlier data as needed.

I spent 7 hours diving deep into Apache Iceberg

VuTrinh. • 339 implied HN points • 31 Aug 24

🕹 Technology Data Engineering Software Development Cloud Computing Big Data Database Management

Apache Iceberg organizes data into a data layer and a metadata layer, making it easier to manage large datasets. The data layer holds the actual records, while the metadata layer keeps track of those records and their changes.
Iceberg's manifest files help improve read performance by storing statistics for multiple data files in one place. This means the reader can access all needed statistics without opening each individual data file.
Hidden partitioning in Iceberg allows users to filter data without needing extra columns, saving space. It records transformations on columns instead, helping streamline queries and manage data efficiently.

Postgres in a box

benn.substack • 920 implied HN points • 06 Dec 24

🕹 Technology Databases Cloud Computing Software AI Data Management

Software has changed from being sold in boxes in stores to being bought as subscriptions online. This makes it easier and cheaper for businesses to manage.
The new trend is separating storage from computing in databases. This lets companies save money by only paying for the data they actually use and the calculations they perform.
There's a push towards making data from different sources easily accessible, so you can use various tools without being trapped in one system. This could streamline how businesses work with their data.

16 Cybersecurity Startups Selected for Google Growth Academy

The Security Industry • 11 implied HN points • 16 Feb 25

🕹 Technology Cybersecurity Artificial Intelligence Startups Data Management Cloud Computing

IT-Harvest is part of Google's Growth Academy for 2025, focusing on supporting cybersecurity startups. This helps them connect with experts and gain valuable resources.
The platform has evolved to meet the needs of security teams, showing strong interest in their data tools and features. Users can now map their security tools to important frameworks like NIST CSF.
They are using AI to streamline data collection and analysis, which makes understanding cybersecurity products faster and easier. This change has made their tools more appealing to companies and consultants alike.

Data's final format

Data People Etc. • 391 implied HN points • 09 Dec 24

🕹 Technology Data Management Cloud Computing Software Development Tech Innovation

Apache Iceberg™ is a popular way to manage data, offering features like scalability and openness. However, using it can feel complicated and less exciting than expected.
CSV format is an easy and humble way to manage data, requiring no special knowledge or complex setups. It’s simple and widely understood, making it a go-to choice for many.
The transformation of data management, like Iceberg™, is like building a transcontinental railroad. It's a huge effort aimed at improving the way we process and use information in the modern world.

Issue #103

Infra Weekly Newsletter • 9 implied HN points • 20 Feb 25

🕹 Technology Infrastructure Programming Cloud Computing Software Development DevOps

Hashitalks 2025 event is happening now, and you can check it out for the latest in technology.
You no longer need a DynamoDB table for remote state locking in Terraform when using S3, which simplifies the process.
The Infra Weekly Newsletter covers infrastructure and programming topics, providing useful updates and tutorials each week.

Diving Deep into LinkedIn's Data Infrastructure: My 6-Hour Learning & Key Takeaways

VuTrinh. • 299 implied HN points • 03 Aug 24

🕹 Technology Data Engineering Software Architecture Databases Distributed Systems Cloud Computing

LinkedIn's data infrastructure is organized into three main tiers: data, service, and display. This setup helps the system to scale easily without moving data around.
Voldemort is LinkedIn's key-value store that efficiently handles high-traffic queries and allows easy scaling by adding new nodes without downtime.
Databus is a change data capture system that keeps LinkedIn's databases synchronized across applications, allowing for quick updates and consistent data flow.

Practical Data Engineering using AWS Cloud Technologies

VuTrinh. • 339 implied HN points • 23 Jul 24

🕹 Technology Cloud Computing Data Engineering Software Development Information Systems

AWS offers a variety of tools for data engineering like S3, Lambda, and Step Functions, which can help anyone build scalable projects. These tools are often underused compared to newer options but are still very effective.
Services like SNS and SQS can help manage data flow and processing. SNS allows for publishing messages while SQS aids in handling high event volumes asynchronously.
Using AWS for data engineering is often simpler than switching to modern tools. It's easier to add new AWS services to your existing workflow than to migrate to something completely new.

Google Struggles To Support Anthropic, a Key AI Partner With a New Amazon Deal

Big Technology • 9007 implied HN points • 29 Sep 23

🕹 Technology AI Cloud Computing Startups Partnerships

Google is having challenges supporting Anthropic due to a new deal with Amazon.
Google Cloud Platform engineers are working weekends to address issues impacting Anthropic.
The situation highlights the high-stakes nature of partnerships in the AI industry.

Wait, is cloud bad?

Cloud Irregular • 7244 implied HN points • 24 Oct 23

🕹 Technology Cloud Computing Data Centers Tech industry

DHH believes established companies that can amortize capital investments should reconsider the cloud
Different types of companies require different approaches to cloud vs. data center
Switching back from the cloud to data center may bring back old problems that cloud solutions had addressed

What does DigitalOcean do?

Technically • 14 implied HN points • 18 Feb 25

🕹 Technology Cloud Computing Web Development Software Engineering User Experience Infrastructure

DigitalOcean is a service that rents out servers to developers for building web applications. It helps developers run their apps without needing their own hardware.
Unlike bigger companies like AWS or Google Cloud, DigitalOcean is independent and not owned by a massive tech giant. This makes their approach more focused on users.
They focus on simplicity and user experience, making it easier for developers to use their services compared to other cloud providers.

Hype cycles

Bite code! • 10520 implied HN points • 24 Jun 23

🕹 Technology Programming Web Development Software Architecture Data Management Cloud Computing

XML was once believed to be the future, but turned out to create technical debt instead.
Following every hype blindly in technology can lead to failed projects and waste of money.
Using the right tool for the right job is crucial in software development, avoiding unnecessary complexity and costs.

Leaving Google Cloud

Cloud Irregular • 4878 implied HN points • 03 Jan 24

🕹 Technology Tech news Cloud Computing Career development Tech Events Consulting

Leaving a familiar job for the unknown can be both challenging and exhilarating.
At times, there may not be a clear, traditional career path to follow, and you might need to create your own unique journey.
Prioritizing creating things that bring joy to people can drive your career decisions and future goals.

How ADR Addresses Gaps in the Detection & Response Landscape

Resilient Cyber • 99 implied HN points • 20 Aug 24

🕹 Technology Cybersecurity Software Application Security Data Protection Cloud Computing

Application Detection & Response (ADR) is becoming important because attackers are increasingly targeting application vulnerabilities. This shift means we need better tools that focus specifically on applications.
Modern software systems are complex, making it hard for traditional security tools to catch real threats. That's why understanding how these systems interact can help identify harmful behavior more effectively.
There’s a big push to find and fix security issues early in the development process. However, this focus on early detection often misses what's actually happening in real-life applications, making runtime security like ADR crucial.

"Here's the sad state of the cloud"

Cloud Irregular • 3696 implied HN points • 22 Jan 24

🕹 Technology Cloud Computing Artificial Intelligence Tech industry Tech Culture Data Privacy

The cloud landscape is shifting from big hyperscalers to more specialized services like standalone databases and DIY cloud-in-a-box.
Using tools like Nightshade to protect art from being exploited by AI may not be the best strategy, focusing on creating original, high-quality art is key.
Google, despite criticism, remains a significant player in the tech industry, seen as a symbol of intellectual prowess and innovation.

IaC wars

Cloud Irregular • 3252 implied HN points • 06 Feb 24

🕹 Technology Cloud Computing Infrastructure as Code Programming Languages Serverless

Different cloud providers have different approaches to Infrastructure-as-Code.
There is a need for tools that can migrate configurations to idiomatic Infrastructure-as-Code templates.
New configuration languages like Pkl are emerging to address frustrations with existing options.

Qualcomm’s Cloud AI 100 PCIe: Now For All

More Than Moore • 93 implied HN points • 06 Jan 25

🕹 Technology AI hardware Cloud Computing Machine Learning Embedded Systems Data processing

Qualcomm's Cloud AI 100 PCIe card is now available for the wider embedded market, making it easier to use for edge AI applications. This means businesses can run AI locally without relying heavily on cloud services.
There are different models of the Cloud AI 100, offering various compute powers and memory capacities to suit different business needs. This flexibility helps businesses select the right fit based on how much AI processing they require.
Qualcomm is keen to support partnerships with OEMs to build appliances that use their AI technology, but they are not actively marketing it widely. Interested users are encouraged to reach out directly for collaboration opportunities.

Take on the Kubernetes Resume Challenge

Cloud Irregular • 3104 implied HN points • 14 Feb 24

🕹 Technology Cloud Computing Programming Training Community

The Cloud Resume Challenge community is launching a Kubernetes Challenge throughout March to help individuals build their Kubernetes skills by deploying a basic e-commerce website.
The challenge focuses on learning the operations of a K8s cluster such as configuration, scaling, monitoring, and persistence, offering guidance to prevent going off track.
Participants will work through the challenge together over 4 weeks in the CRC Discord server, with special incentives for those who complete it.