Artificial Fintelligence

Artificial Fintelligence analyzes the cutting-edge of AI research, exploring innovations in models, inference techniques, and market trends. It discusses developments in Mixture of Experts models, transformer optimizations, Large Language Models (LLMs), the AI market, and advances in image generation, focusing on efficiency, scalability, and effectiveness.

AI Research and Development Model Optimization and Inference Techniques Large Language Models (LLMs) AI Market Trends Image Generation Technologies

The hottest Substack posts of Artificial Fintelligence

And their main takeaways

How to hire ML engineers/researchers

17 implied HN points • 16 Jan 25

🕹 Technology Machine Learning Hiring practices

When hiring ML engineers or researchers, focus on real-world problems they might face, rather than traditional coding tests. Use scenarios from your team’s work to assess their problem-solving skills.
Be clear about your company's expectations and culture from the start. Candidates should know they won’t have the freedom to pursue purely academic research.
Keep a rigorous hiring process. It’s important to be selective and maintain high standards, even when there's pressure to hire quickly.

Papers I've read this week: vision language models

8 implied HN points • 28 Oct 24

🕹 Technology AI Models Computer Vision Machine Learning Natural Language Processing Research Papers

Vision language models (VLMs) are simplifying how we extract text from images. Unlike older software, modern VLMs make this process much easier and faster.
There are several ways to combine visual and text data in VLMs. Most recent models prefer a straightforward approach of merging image features with text instead of using complex methods.
Training a VLM involves using a good vision encoder and a pretrained language model. This combination seems to work well without any major drawbacks.

Where do LLMs spend their FLOPS?

13 implied HN points • 29 Jan 24

🕹 Technology Theory Performance Experiments

FLOPS in LLMs are mainly spent on computing QKV, attention output matrix, and running the FFN.
Wider LLM models parallelize better and favor lower latency, while deeper models linearly increase inference time.
Empirical analysis shows linear scaling in performance as LLM model dimensions increase.

Transformer inference tricks

16 implied HN points • 23 Nov 23

🕹 Technology Optimization Quantization Inference

Implement a KV cache for the decoder to optimize inference speed in transformers.
Consider using speculative decoding with a smaller model to improve decoder inference speed when excess compute capacity is available.
Quantization can be a powerful tool to reduce model size without significant performance tradeoffs, especially with 4-bit precision or more.

Papers I’ve read this week, Mixture of Experts edition

21 implied HN points • 04 Aug 23

🕹 Technology Machine Learning Neural Networks

Mixture of Experts models vary parameters for each input
Problems with conditional routing models include token allocation imbalance and performance evaluation challenges
Improving training stability for sparse models is a key focus in recent research

The evolution of the LLM API market

13 implied HN points • 13 Dec 23

🕹 Technology AI APIs

LLM API market has seen growth with new competitors like Bard, Claude, and Gemini entering.
Competition in the LLM market is driving efficiency and lower prices for hosting services.
Market for LLM APIs will bifurcate into high-end expensive models and low-end cost-effective models, with open weight models improving in quality and decreasing in cost.

How does batching work on modern GPUs?

8 implied HN points • 01 Mar 24

🕹 Technology Deep Learning Neural Networks

Batching is a key optimization for modern deep learning systems, allowing for processing multiple inputs simultaneously without significant time overhead.
Modern GPUs run operations concurrently, leading to no additional time needed as batch sizes increase up to a certain threshold.
For convolutional networks, the advantage of batching is reduced compared to other models due to the reuse of weights across multiple instances.

Why do LLMs use greedy sampling?

7 implied HN points • 17 Oct 23

🕹 Technology NLP Neural Networks Search

LLMs use greedy sampling to generate text sequences.
In contrast to games research, language modeling doesn't typically use fancy decoding algorithms.
OpenAI has been exploring the incorporation of search techniques in their models.

Efficient LLM inference

10 implied HN points • 09 May 23

🕹 Technology Quantization Optimization Inference Efficiency

Optimizing code through profiling can lead to surprising reductions in overhead.
Distillation is often more effective than training a smaller model or quantization.
Quantization can be a cost-effective method to reduce model size and inference costs.

The market for AI companies

8 implied HN points • 18 Jun 23

💼 Business Technology Investments Startups Venture Capital Revenue growth

AI is mostly a sustaining innovation benefiting larger companies.
Competition is tough in AI, with winners dominating markets.
Investing in AI companies at early stages can be risky due to unseen long-term profitability.

The evolution of the LLM API market

3 implied HN points • 12 Dec 23

🕹 Technology

The post discusses the evolution of the LLM API market
There is a note about issues when trying to access the article online
To continue reading, a 7-day free trial subscription is offered

How is LLaMa.cpp possible?

4 HN points • 16 Mar 23

🕹 Technology Deep Learning Inference Quantization Performance analysis

Large deep learning models like LLaMa can run locally on a variety of hardware with optimizations and weight quantization.
Memory bandwidth is crucial for deep learning GPUs, with memory being the bottleneck for inference performance.
Quantization can significantly reduce memory requirements for models, making them more manageable to serve, especially on GPUs.

A step towards self-improving LLMs

4 implied HN points • 07 Mar 23

🕹 Technology AI Data generation Model Training Data Quality

Models need to generate data by themselves for self-improvement, seen in examples like AlphaZero.
Models should adapt to new domains without requiring vast existing data, like the CLIP model.
Improving efficiency of models, like auto regressive sampling, is crucial for advancement in AI development.

Five years of progress in GPTs

3 HN points • 29 Mar 23

🕹 Technology Language Models Research Innovation Optimization

Focus on the evolution of GPT models over the past five years, highlighting key differences between them.
Explore the significant impact of large models, dataset sizes, and training strategies on language model performance.
Chinchilla and LLaMa papers reveal insights about the optimal model sizes, dataset sizes, and computational techniques for training large language models.

Papers I've read this week

2 implied HN points • 05 Mar 23

🕹 Technology AI Research Neural Networks Machine Learning Data Analysis Internet

Routing improves performance of language models across all sizes
Using agents to dynamically explore the internet could provide more data for training AI models
LLaMa models have shown performance improvements compared to GPT-3, but the reasons behind these improvements are not fully clear

Papers I’ve read this week: Image generation

1 HN point • 11 Apr 23

🕹 Technology Image Generation Neural Networks Artificial Intelligence

CLIP focuses on aligning text and image embeddings, showcasing its utility for various applications like search, image generation, and zero-shot classification.
DALL-E introduces a large-scale autoregressive transformer model for text-to-image generation, revolutionizing image generation beside the prevalent GAN models.
GLIDE employs a 3.5B parameter diffusion model to convert text embeddings into images, exploring guiding methods like CLIP and classifier-free guidance.

Coming soon

1 implied HN point • 02 Mar 23

🕹 Technology Artificial Intelligence

The website www.artfintel.com will be launching soon.
Finbarr Timbers will write 1-2 articles per month about current advances in AI research.
The focus will be on accelerating AI research.

Artificial Fintelligence

The hottest Substack posts of Artificial Fintelligence

How to hire ML engineers/researchers

17 implied HN points • 16 Jan 25

Papers I've read this week: vision language models

8 implied HN points • 28 Oct 24

Where do LLMs spend their FLOPS?

13 implied HN points • 29 Jan 24

Transformer inference tricks

16 implied HN points • 23 Nov 23

More on Mixture of Experts models

19 implied HN points • 07 Sep 23

Papers I’ve read this week, Mixture of Experts edition

21 implied HN points • 04 Aug 23

The evolution of the LLM API market

13 implied HN points • 13 Dec 23

How does batching work on modern GPUs?

8 implied HN points • 01 Mar 24

Why do LLMs use greedy sampling?

7 implied HN points • 17 Oct 23

Efficient LLM inference

10 implied HN points • 09 May 23

The market for AI companies

8 implied HN points • 18 Jun 23

The evolution of the LLM API market

3 implied HN points • 12 Dec 23

How is LLaMa.cpp possible?

4 HN points • 16 Mar 23

A step towards self-improving LLMs

4 implied HN points • 07 Mar 23

Five years of progress in GPTs

3 HN points • 29 Mar 23

Papers I've read this week

2 implied HN points • 05 Mar 23

Papers I’ve read this week: Image generation

1 HN point • 11 Apr 23

Coming soon

1 implied HN point • 02 Mar 23