Skip to content

Embeddings

Embeddings

Next up we’re going to use embeddings to find related blog posts.

Embeddings are a really cool technology that’s related to large language models, and it is part of what powers Retrieval Augmented Generation, which is what our sister company Visma Spcs uses for their new Ai Assistant, for instance.

Let’s break down the concept of embeddings.

The Basic Idea The core concept behind embeddings is relatively simple: they convert content (like a blog post in CowPress) into a numerical format. Here’s how it works:

  1. Content to Numbers: When you write a blog post, embeddings transform this text into an array of floating-point numbers. Think of it as translating words into a language that computers understand better.

Pasted image 20231115033036.png

  1. Consistent Length: An interesting aspect of this array is that its length is constant. This means no matter how long your blog post is, the array will always have the same number of elements, say 300, 1,000, or 1,536 numbers. This consistency is determined by the embedding model you choose.

  2. Visualizing the Array: Imagine these numbers as coordinates in a multi-dimensional space. This might sound abstract, but it’s like plotting points on a graph, only with many more dimensions than the usual three. Hard to visualise but here’s a 3D visualisation of the same idea:

Why Use Embeddings? Embeddings are not just about turning text into numbers. They serve a more significant purpose:

  1. Semantic Meaning: The position of these numerical arrays in multi-dimensional space represents what the content is about—their semantic meaning. This could relate to various attributes of the text, like themes, emotions, or key concepts.

  2. Discovering Relationships: By analysing these positions, we can uncover relationships between different pieces of content. For example, blog posts that are close together in this space might cover similar topics or have similar tones.

  3. The Mystery of Numbers: While the exact meaning of each number in the array can be a bit of a mystery, their collective pattern helps us discern useful information about the text, like clustering similar content, semantic search or recommending related articles on CowPress.

So, in essence, embeddings are a way to translate your blog content into a numerical format that can be analysed and compared in a multi-dimensional space, helping you to discover underlying patterns and relationships in the data and in practice gives you powerful ways to search and cluster the data.

Now let’s take a look at how we can use them to build Related blog posts


Last update : November 17, 2023
Created : November 16, 2023