Words to Vectors

Manually tagging millions of rows of transactions with their correct merchant group is extremely time consuming and costly, which is why we train models to do this job for us but how does a computer understand text? Enter the world of word embedding, which revolves around representing words as vectors.


Table shows 'petrol' in two different transaction string patterns

Most word embedding techniques rely on a core assumption – words that appear in the same contexts share semantic meaning. In this case we’re less concerned with the semantic meaning and more with how the word is spelt, nevertheless the same reasoning applies. One of these techniques is called bag of words, which represents a text as a set of vocabulary.


To visualise these vectors further, we can apply dimensionality reduction techniques such as T-SNE to summarise these vectors into 3 dimensions. Here we focus on representing asda, tesco, amazon, link/ATM and uber.





We can see in this visualisation that similar sets of text are clustered closely together, thus allowing us to calculate distances and classify them into their corresponding group.

178 views

We provide data and insight services using consumer and SME transaction data from millions of users within European banks and

credit card providers. We are on a mission to provide better data to decision makers,

anonymised, real time and at scale.



Cookie Policy

Privacy Policy
 

© 2020 Fable Data, All Rights Reserved

Dark Logo Purple Full.png

Subscribe for updates