clustering

Debiaswe: try to make word embeddings less sexist

Word embeddings are a widely used representation for text data. A well-known example in natural language processing (NLP) is Word2vec, which uses a neural network to learn latent vector representations of words. It turns out that relations in this latent vector space capture semantic relations quite well. For example, by finding similar vectors you typically end up with highly related or synonymous words. Another typical example is that when you add up the vectors of “king” and “woman”, you end up with the vector corresponding to “queen”, so even a form of conceptual calculus is possible. Read more...