Experiments with NLP and GPT-3

This Substack explores the potential and challenges of working with GPT-3 and NLP technologies through various experiments. It covers creating local language models, building and utilizing the AI stack with embeddings and vector databases, understanding neural networks, generating educational content, and developing efficient encoding methods for text.

Natural Language Processing Machine Learning GPT-3 Embeddings Vector Databases Neural Networks Educational Content Generation Large Language Models AI in Content Creation

The hottest Substack posts of Experiments with NLP and GPT-3

And their main takeaways

Building Chandamama Kathalu

7 implied HN points • 10 Jan 24

Language has a suggestive power beyond just words, especially in one's mother tongue.
Open datasets in local languages are valuable for various industries and tasks.
There is immense love and support for local language models, like in the Chandamama experiment.

The LLM stack

7 implied HN points • 23 Jun 23

🕹 Technology AI Data Pipelines

The LLM App stack is important in the AI world today.
Embeddings from OpenAI and Huggingface play a key role in giving meaning to data.
VectorDBs like Pinecone and Vespa are crucial for managing embeddings in the AI stack.

Understanding Neural Networks and KE Sieve

2 HN points • 29 Oct 23

🕹 Technology AI Neural Networks Visualization Machine Learning Data Analysis

Neural networks create hyper dimensional planes using weights and biases
Neural networks use iterative back propagation to adjust planes
Experiment shows how planes in neural networks separate training points visually

The use cases of large language models(LLMs)

1 HN point • 12 Mar 23

🕹 Technology AI NLP Automation Text generation

Large language models are not AGI but are making significant advancements in solving various NLP problems.
LLMs excel in tasks like parts of speech tagging, semantic parsing, named entity recognition, and question answering.
LLMs can automate back office work and offer solutions for tasks like stemming, lemmatization, relationship extraction, summarization, keyword extraction, and text generation.

Takeaways from "How does ChatGPT work" blog

1 HN point • 01 Mar 23

🕹 Technology Artificial Intelligence Machine Learning Neural Networks Deep Learning NLP

ChatGPT generates text one word at a time
To predict the next word, the system finds embeddings and generates probabilities
ChatGPT shows evidence of fundamental 'laws of language' that can be discovered

Get a weekly roundup of the best Substack posts, by hacker news affinity:

Generating educational content(mindmaps) with chatGPT

1 HN point • 16 Feb 23

🚌 Education Educational Technology Content creation

ChatGPT is useful for generating educational content, especially for high school topics.
Mermaid is a great tool for creating structured diagrams from plain text, including mindmaps.
You can use the live editor for Mermaid to easily convert generated text into mindmaps.

Building a new embedding(bit vector) for text

1 HN point • 09 Feb 23

Embeddings play a crucial role in NLP research and solutions.
Experimenting with alternative embeddings like the KE Sieve method can lead to significant size reduction and efficient search operations.
By building an embedding space from scratch using methods like TF-IDF and KE Sieve, it's possible to create unique and effective sentence embeddings for various applications.

Activities planned for 2025

0 implied HN points • 30 Oct 24

🕹 Technology AI Open Source Community LLM Vision

There are open source projects planned for 2025 that focus on AI technology. These projects mainly include advancements in language models, speech processing, and computer vision.
Community involvement is encouraged, and anyone interested in AI-related activities can get in touch to participate.
The guiding principles of these projects are based on the AI Punk's manifesto, emphasizing collaboration and innovation in the field of AI.

Taking on OpenAI

0 implied HN points • 11 Jun 23

🕹 Technology AI NLP Algorithms Models Mathematics

Sama believes building foundational models to compete with OpenAI's ChatGPT is hopeless without significant investment.
The current approach depends heavily on data and compute resources, which OpenAI has in abundance.
The author plans to build foundational models using the KESieve algorithm, focus on math, involve students, and avoid traditional funding methods.

Using shape of stories for web site analysis

0 implied HN points • 07 Oct 24

🕹 Technology AI Web Design Content Strategy Information Architecture Usability

Websites can have a certain flow or structure, similar to stories. This means the way content is organized can affect how users experience the site.
Using AI can help analyze website content to identify strengths and areas for improvement. It can suggest ways to make a site more engaging and comprehensive.
Improving a website involves expanding the topics covered, deepening content on existing topics, and making connections between different parts of the site clearer.

Making use of Alpes bit embedding in a vector database

0 implied HN points • 10 Jul 23

🕹 Technology Artificial Intelligence Database Programming

You can create semantic searches for your knowledge bases using Alpes bit embeddings in a vector database.
There are different methods presented, like using numpy to have easy embedding serving with minimal dependencies.
Faiss and Milvus are other options for managing larger quantities of embeddings efficiently in vector databases.

AI and the art of doing with little-Part 1

0 implied HN points • 05 Mar 24

🕹 Technology AI Data Innovation Research Open Source

In the AI field, access to large amounts of compute power and data is crucial, but it can be expensive and a barrier for many. This can lead to a reliance on funding and resources, putting a focus on money as a determinant of success.
The author emphasizes the importance of simpler, more accessible experiments in AI research, drawing inspiration from V.S. Ramachandran's approach in neuroscience. Small, innovative solutions may offer promising alternatives to standard big science methods.
There is a push for exploring new ways to tackle AI challenges beyond the current reliance on GPUs and deep learning models. The idea of creating open-source datasets and involving young talents from India in research signifies a shift towards more inclusive and collaborative approaches.

Analyzing PG essays in the semantic space

0 implied HN points • 09 Nov 24

🕹 Technology AI Writing Data Analysis Content creation

The writing style has shifted from a smooth, flowing approach to a more structured, geometric style in 2024.
There are sharper transitions between ideas now, making it clear when topics change.
The points made in the writing are more organized into distinct clusters, suggesting a more deliberate way of presenting ideas.

Reducing the size of all-mpnet-base-v2 model

0 implied HN points • 07 Jul 23

🕹 Technology AI Data Storage Model optimization

Consider reducing the dimensions of models for efficient storage.
Using bit vectors can significantly decrease the memory required for embeddings.
The KeSieve approach shows promising results in compressing embeddings without sacrificing search quality.

Building a vector database in 2GB for 36 million Wikipedia passages

0 implied HN points • 21 Jun 23

🕹 Technology Data Management Machine Learning Embeddings Neural Networks Cost Analysis

Built a vector database for 36 million Wikipedia passages in just 2GB
Used Alpes KE Sieve algorithm for efficient space learning
Cost of hosting the instance was around $100

What does $2 for 1 million tokens get you

0 implied HN points • 09 Mar 23

🕹 Technology AI NLP API

For $2, 1 million tokens can generate a variety of content like code, articles, novels, tweets, and more.
Generating content using AI may not always result in high-quality or unique output; success may involve integrating AI into existing processes.
The key is to leverage generative AI as a part of the creative pipeline rather than relying solely on the AI to do all the work.