Skip to content

Commit 494830d

Browse files
authored
Create 2024-08-14-word-algebra.md
1 parent 58cfaa0 commit 494830d

File tree

1 file changed

+25
-0
lines changed

1 file changed

+25
-0
lines changed

_posts/2024-08-14-word-algebra.md

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
title: "Doing algebra with the meaning of words"
3+
layout: single
4+
excerpt: "how word embeddings can tell us about the relationship between words"
5+
tags: [til, statistics]
6+
---
7+
8+
Learning lots of crazy stuff about embeddings from this great [guide](https://lena-voita.github.io/nlp_course/word_embeddings.html). The basic idea is to represent every word in a language by a vector which encodes its **semantic meaning**. You can then find similar words/topics by looking for vectors that are close to each other.
9+
10+
It turns out that when you do these kind of word embeddings on large amounts of text there are many linear relationships between words. For example, the vector distance between **king** and **queen** is about the same as the distance between **man** and **woman**.
11+
12+
This means you can perform a kind of addition and subtraction about the meaning of words!
13+
14+
```
15+
bar - alcohol + coffee = cafe
16+
musician - music + science = researcher
17+
```
18+
19+
This idea is similar to [how emojis are encoded](https://developers.mattermost.com/blog/all-about-emojis/), with the astronaut (👨‍🚀) emoji literally being represented as 👨/👩 + 🚀. This also used to encode all the different family combinations, e.g. 👨 + 👨 + 👧 = 👨‍👨‍👧
20+
21+
# Learning translations
22+
23+
The added bonus of this fact you can transfer what you learnt about one language to another, just need a small dictionary connecting the two languages and you can learn other translations for free!
24+
25+
![](https://lena-voita.github.io/resources/lectures/word_emb/analysis/cross_lingual_matching-min.png)

0 commit comments

Comments
 (0)