Generating Memes Using Artificial Intelligence — French Canadian Edition

Jérémi DeBlois-Beaucage
8 min readMar 31, 2021

This article covers a project carried out in Fall 2019 with Justine Massicotte and Vincent Espanol, as part of the Machine Learning I course at HEC Montréal, lectured by Professor Laurent Charlin.

This article is also available in French. All the images presented in this article are in French; when possible, English translations are proposed.

Can artificial intelligence generate funny and interesting content? Can an algorithm create good French Canadian memes? These are the questions my colleagues and I asked ourselves several months ago. Our answer: yes, but of very variable quality.

The meme is defined by the Oxford English Dictionary as “an element of a culture or system of behavior that may be considered to be passed from one individual to another by nongenetic means, especially imitation”.

Sample meme, in a popular format: an image and a caption (or legend). Translation: Me when I generate memes.

Here is an overview of our project: how to build an algorithm able to generate a meme given any image.

Table of Contents

  1. Find, import, and clean images and captions
  2. Build and train the model: encoder-decoder neural network
  3. Generate memes from new images (with examples of memes generated by our algorithm! 🎉)

Find, import, and clean images and captions

Artificial intelligence and machine learning have been trending topics for a few years now. These areas require large amounts of data, often seen as new information goldmines.

One of the challenges in generating French Canadian memes was therefore to collect the data: no dataset was available, and the memes are usually found on several pages of different social networks.

In the end, we were able to access a few thousand memes published by well-known Quebec pages.

Examples of memes in our database. “Mom: Awww you are so cute on this picture! Me on the picture:”; “When you tell your mom that you lost her tupperware”; “Me looking for $950,000 houses on DuProprio with 19$ in my bank account”

Clean database

The first step was to clean the collected data:

  • First, split the caption of the image. To do this, we used Optical Character Recognition (OCR), with Google’s Tesseract tool.
  • Second, clean the captions: punctuation and too infrequent words were removed. Only the top 6000 most frequent words were kept, in what is called the algorithm dictionary.

The result? Nearly 4000 French Canadian memes and their captions, ready to be used.

A larger amount of memes would have been desirable, but the time and scope of the project limited us in data collection.

Restructure data

At high level, the task of the algorithm is rather simple: we present an image to the algorithm, and it generates a humorous caption related to the image.

To do this, the algorithm proceeds iteratively. It first uses the image — by analyzing the contents of the image, for example detecting some objects or faces — to predict what the 1st word of the sentence might be. Then, the algorithm uses the image and the 1st word to generate the 2nd, then the image and the first 2 words to generate the 3rd, and so on, until the end of the sentence is predicted.

Here’s an example of input data (Input) and a perfect algorithm prediction (Target Word).

Meme (top) and data (bottom). “When you tell your mom that you lost her tupperware””

Build and train the model: encoder-decoder neural network

Generating memes can be thought of as image description: generate a caption describing the image. In our case, the additional constraint is that the caption must be humorous.

Such a task requires an encoder-decoder type solution. First, a neural network encodes the image: information is extracted from the image. This encoding phase gives us a vector of 2048 dimensions, which represents the image. For this project, we used the pre-trained Inception V3 network.

Training

During training, the algorithm is given the encoded image and the caption’s first words; the algorithm must then predict the caption’s next word. If the algorithm predicts the correct word, the algorithm is somehow rewarded for its prediction. The cost function of the algorithm is categorical cross-entropy.

Concretely, there are two different inputs to the algorithm.

The complete model, with an encoder-decoder structure.

On one side, the first words of the legend are entered. These are first encoded using word embeddings, i.e. vector representations of words (more information on word embeddings here). Then, these embeddings are entered into a Long Short-Term Memory (LSTM) type recurrent neural network, which transforms the sequence of words into a single vector. This vector represents, in the eyes of the algorithm, the idea of ​​the sentence.

On the other side, the image is entered. It is encoded and passed through a dense layer of neurons. The result is a vector which represents, in the eyes of the algorithm, the idea of ​​the image.

The two final vectors are then joined together, and with this final vector, we predict which will be the next word, among the dictionary of 6000 possible words.

Performance measures

It is difficult to constrain training with performance measures: in the end, the goal is to make people laugh. However, a model would perform well if it could exactly repeat previously seen memes, which falls short of the goal. So we had to find a certain balance: the algorithm should come up with sound and intelligible captions, but these should not be integrally copied from seen memes.

Several tools are used to calibrate this creativity, including the number of times the model is trained (or number of epochs) and the random factor in word selection (or model temperature).

Two different metrics were used to evaluate the models: humorous quality (how much the meme makes us personally laugh) and perplexity (more information on perplexity here).

Language issue

Memes pose an additional challenge through the use of colloquial terms. To this difficulty is added the bilingual aspect of French Canadian memes: we often find words in English and French in the same sentence.

Our solution was to use two different word embeddings corpus. We have associated with each word a French representation if it existed (from the fastText corpus), if not English (from the GloVe corpus).

Generate memes

Once the model is trained, we can move on to the best part: generating new memes.

The process is as follows: the new image is first reformatted then encoded, then the 1st word is decoded. With the image and the 1st word, we decode the 2nd word, and so on, until the algorithm predicts the end of the legend.

Architecture used during meme generation: words are predicted one at a time, until the end of the caption is predicted.

In the end, the prediction made by the trained model is probabilistic: the algorithm assigns a probability between 0 and 1 for each of the 6000 words in the dictionary. Once these probabilities were calculated, two techniques for constructing a sentence were considered.

  • Choose one word at a time
    In this model, the algorithm chooses one word at a time, following the distribution of probabilities. If the model predicts a value of 0.1 for a word, that word has a 10% chance of being selected. You can adjust the creativity of the model using the temperature. The temperature controls how heavily the probabilities are normalized: the lower the temperature, the more likely improbable words are to be chosen.
  • Choose word combinations at a time
    This model, called beam search, will consider several combinations of words at the same time. In our case, the model chose probable combinations of 3 consecutive words at a time. More details on beam search here.

Several experiments were done, under several different hyperparameters: number of epochs during which the model was trained, temperature of the model and type of prediction.

Here are some examples of memes generated by the algorithm.

“Me talking to my heart” ; “When you realize that you don’t have enough hens” ; “When you leave your headphones home”

Conclusion

The initial goal of this project was to generate humorous Quebecois-flavored memes from any image. This goal has been partially met.

First, many of the best memes were, in our opinion, very absurd. Here, the incongruity of the algorithm brings a humorous aspect.

“I need this carpet”; “When you are in an insect”; “Tag a girl that would eat this”

However, the best memes would be those which, in their incongruity, remain relevant and intelligent. The algorithm failed to generate such memes.

It is also unclear whether the algorithm correctly took the image context into account. Several memes exhibited a disconnect between the image and the caption, even though certain entities were recognized, such as a person or an object.

This project was only intended as a brief overview of the application of artificial intelligence in the world of humor. In the future, several avenues would be interesting to explore.

  • Generate one character at a time, not full words
    The algorithm generates words here, but another avenue often used in similar projects is to predict one character at a time.
  • Have a better dataset
    It would be nice to have more memes, and the memes image and caption to be extracted more accurately.
  • Not fixing the encoder weights
    In this project, the encoder was fixed: during the training of the model, if the model had “badly” recognized certain entities of the image, it was not penalized.
  • Not fixing the word embedding weights
    The first words of the meme provide context for the rest of the sentence, and those first words were “understood” by the model through word embeddings. If the model “misunderstood” the words, they were not penalized.
  • Add attention
    Attention has attracted keen interest of many researchers in the field. More details here.

If artificial intelligence is still far from a comic stand-up at the Bordel Comédie Club, it still managed to make us laugh on several occasions. The future of the domain seems very exciting!

Questions, comments, ideas? Feel free to reach out to me via LinkedIn!

The Python code used is available on GitHub (disclaimer: the code was not thoroughly cleaned).

--

--