Topic Discovery in Recipes

by Daniel Breen

What distinguishes Italian, Mexican, or Polish cuisines from all the others?

We start by assessing the relative importance of ingredients scraped from recipes on and

  1. construct the ingredients lists
  2. plot wordclouds for each cuisine showing the relative importance of different ingredients
The results show that different combinations of spices are the most obvious way to distinguish different cuisines from each other. Garlic, salt, and pepper are apparently very common across many cuisines.

With this information, how do the results of clustering algorithms (LDA) compare with our discovered wordclouds?

About five main types of cuisines can be distinguised. 20 or even 7 topics appears to be too many, but the topics are well defined and distinguished with 4 topics. In the 7 topic model, the model is able to recover something close to Mexican, Italian, Greek, Asian, and 'French' cuisines as different topics.

Future steps:

  1. gather more data,
  2. assess the performance of clustering,
  3. train and validate models using classification and dimensionality reduction,
  4. develop the methods and compare the models, and
  5. use the methods and models to gain insight into new data.

For example, using data containing information about purchases, such as customer ids, ingredients used in products, and geographical information, we may be able to discover groups of people who prefer characteristic kinds of ingredients. We may also be able to discover groups of ingredients or preparation processes underlying the success or failure of different food products. We can use the models to predict whether new products will be likely to succeed or not.

With All Spices

When salt, garlic, and black pepper are included in the list of ingredients, the word clouds look more similar.

Eliminating these spices draws out the differences better so that there is less overlap in key ingredients.

Without Garlic, Salt, and Black Pepper

According to the data, French and Polish dishes tend to contain ingredients common in baked goods and dairy products. This is partly due to the fact that searching with the keyword 'french' pulls up 'french toast' as a search result in many, but not all results. "French" dishes tend on the sweet side while Polish dishes contain ingredients like sauerkraut and onions. Greek and Italian dishes are similar, though there might be a slight emphasis on citrus ingredients in Greek cuisine that is not as present in Italian. Japanese, Korean, Thai, Chinese, and Vietnamese food share ingredients not present in other dishes, including scallions and soy sauce. However, Vietnamese food tends more towards sweetness, Korean food emphasises sesame oil, Chinese food ginger (as does Indian and Japanese food), and Thai food lime juice and cilantro leaves.

About me

My name is Daniel. I am a physics PhD student at UC San Diego. My dissertation involves developing computational models of neurons, estimating parameters for these models, and detecting patterns in sets of estimated parameters.

I enjoy discovering insights from data which can inform decision making. When I am not learning about science and technology, I like to run, swing dance, and play violin. I look forward to applying my technical skills in industry to create interesting and valuable products.

Get in touch: