Known unknowns

A common data science challenge is creating a data set for training models. We discuss tagging and generating synthetic data in our earlier blogs. There are other routes to creating training sets.

Fable data is laser focused on correctly classifying merchant transaction string patterns in anonymised consumer credit and debit card data. This blog describes how we solve for string patterns for Bill’s, a chain of restaurant with a hundred locations across the UK. Donald Rumsfeld made a famous statement “There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know.”

We are not certain that our historical data contains transactions from every Bill’s restaurant, so we need to create text that represents likely string patterns across all the locations. Bill’s is a tough merchant to classify because it has a very short name and the string occurs as part of many other strings. We need to find a way to enhance our model accuracy. We have the addresses of Bill’s Restaurants. They are in shopping malls, high streets and cities. Our process is to create a list of words that we expect to see with “Bill’s” in a transaction string using the address information. Here are a few examples:

The Orient, Intu, Trafford Centre, Camberley, The Atrium, Jubilee Arch, Bluewater, St. Pauls Place, Maidstone, Fremlin Walk, Cheapside, Marlow, Bullring, Cardinal, Bishop’s Stortford


Bill’s Resturants in central London

These strings go through our vectorization and modelling pipeline and put us in a great position to correctly classify Bill’s restaurants when we encounter novel data. We’ve solved Rumsfeld’s known unknowns problem.