AI Love You: the AI-powered dating App

After completing a one-year long Junior Data Science Development program at Xomnia, Olivier Schwirtz, Oeljana Smits, Ronald van Velzen, and Paul Ozkohen created AI Love You, the dAIting App, a prototype for an AI-powered dating application. Our junior Data Scientists explain their project in the blog below.

Creating a profile on a dating app is a source of struggle to many - including our friends and single Xomnians ;). For instance, when joining a dating app, we ask ourselves many questions, like "what are the best photos to choose, and which ones should be put first?", and "what should be written in my bio to catch the attention of potential matches?". To help people in the love market find the accurate answers to those pressing questions in today’s romantic scene, we came up the 'AI Love You' app.

Users of this app can simply upload a group of photos to 'AI Love You', and the app will tell them which ones are best suited for their dating profile. For photos with a lower score, the app will give some suggestions about how to improve photo composition, sharing feedback like “This looks like a group picture, try uploading a picture with fewer people”, “this picture is a bit blurry, try a picture that is a bit sharper” or even “try wearing a red shirt instead”.

The app can also generate an interesting and funny bio (short, medium or long) based on the user’s gender, hobby and favorite food. The user can even help the app a little by suggesting a starting sentence. This is a great help for people who are not very creative, or who find it hard to speak about themselves.

Click here to read more about Xomnia's ML Training program

The data science behind the dating app

Image Related Classification in the app

We used the pre-trained classifiers from OpenCV, also known as HaarCascades, to execute the majority of image-related classifications. This way, we could quickly detect people and their faces, and use this data in giving images a score.

To determine the sharpness of the pictures, we started by detecting edges using the basis of the Laplacian Edge Detector. Once the edges were determined, we calculated the variance of the Laplacian over the whole photo, giving us a measure of the sharpness.

For measuring the “redness” of a photo, we looked at the share of pixels that fall within a certain range on the RGB spectrum. Red has shown to subconsciously be attractive to people, so wearing something red or adding other red details could give your picture the extra nudge that it needs. After calculating the measures, they are then each converted to a 0-100 score using non-linear scaling functions.

**Image Related Classification in AI Love You App**

The app’s biography generator

The bio generator uses a language generation model. The architecture is based on OpenAI’s GPT-2 model, which is a large transformer model that has shown to be great at producing grammatically correct sentences. To train the data to produce bios, we gathered texts written by people on their OKCupid profiles, an online dating app. We then took a pre-trained GPT-2 model and finetuned it using these texts, so that the already existing language knowledge could be bent to allow the model to produce grammatically correct dating profile bios.

We also wanted to let the users specify keywords describing themselves, such as their favorite hobbies or food, so that those interests are mentioned in their bios. However, GPT-2 does not have any default functionality for conditional generation based on keywords. The only thing that is fed to GPT-2 prior to text generation is a prompt. This is usually in the following syntax: ‘<|startoftext|>[starting text here]’. GPT-2 will then keep generating words until the ‘<|endoftext|>’ token is generated, at which point the generation stops.

**AI Love You App’s biography generator interface**

However, by manipulating the starting prompt, conditional generation on keywords is possible with GPT-2. First, we extracted keywords automatically from the OKCupid texts using this project (other methods for extracting keywords were also experimented with, but gave less accurate results). During training, the keywords would then be placed in the text right after the ‘<|startoftext|>’ token, using a specific syntax. For example, one training example could look like this:

<|startoftext|>~^interest look family times a-smile enjoy hang~} I'm new here...just trying this out. I enjoy hanging out with friends and family but I can be a homebody at times. I'm looking for someone that has the same interest as me. someone that can put a smile on my face . :)<|endoftext|>

During training, the model should learn the link between the keywords given before the ‘}’ token. After training, a sentence can be generated by manipulating the prompt, for example by feeding it the following input containing some keywords:

‘<|startoftext|>~^pizza cycling cats~}’

Afterwards, the model will then start creating a bio that is at least related to some of these keywords. The sentences generated by the model don’t always contain the keywords. To mitigate this, multiple bios are generated and the bio that contains the most of the given keywords is shown. To strike a balance between getting the best bio in the lowest amount of time, we let the model generate 10 bios and try to pick the best one from those.

Ideas to develop the concept

Some ideas that can be added to enhance this app concept include some more sophisticated techniques to give score images, such as emotion detection (are you smiling or not?), detection of lighting quality, and some more fun detection techniques, which can for instance spot whether there’s a pet included in the picture. Moreover, the bio generator could be further improved to return a bit more coherent bios, as sometimes the separate sentences are contradictory.

Interested in joining Xomnia's ML Training program? Click here to find out more.

Data Science Text Analysis

Written by

Sarah Hassan

Content creator and passionate storyteller. On a mission to tell Xomnia's story to the most diverse and wide audience.