This opportunity is based in Lausanne

Cross-lingual Toxicity Detection (Diploma Thesis/Internship)



Online interaction is nowadays prevalent in the life of many people, and the youngest who use social media and instant messaging platforms the most are also the most vulnerable. As discussions can sometimes get heated, users might face personal insults, harassment or other kinds of hate messages.

Several solutions to this problem are already in use: manual or keyword-based censorship, moderators or user flagging toxic messages, etc. But these systems either need constant surveillance by human moderators or can be abused (users flagging a message for no reason or using new non-blacklisted words to prevent censorship).

The goal of this project is to guarantee users a safe and healthy experience in online chats by implementing an intelligent moderating system able to detect and identify toxic messages to assist moderators in their task.


Challenges: In this project, it is necessary to identify correctly and in a timely manner different classes of toxicity in online discussions, where the messages can be produced in large volumes and variety. It is also important that this internship aims towards cross-lingual models, namely using aligned datasets to train a model on multiple languages will be necessary. Moreover, the annotated datasets available might come from different sources, and thus contain different labels. Being able to merge these datasets will be required as well.


Project applications: Social media/online discussions/chatbots

What you will learn: You will be a junior data scientist, developing your skills in machine learning (deep learning, natural language processing).


Possible extensions: Integrating the model with a moderating chatbot to test it with users online.

Keywords: NLP, toxicity detection, multi-label classification, cross-lingual models, transfer learning, text embeddings

In this role

In this project, the goal is to:

  • Build a model able to detect and identify toxic messages
  • Use transfer learning to train the model on several languages

What we offer

Diploma Thesis / Internship in Lausanne. Join our team as intern and you will find a young, dynamic and culturally diverse working environment.

    About your profile

    • Required: machine learning and deep learning, NLP
    • Software engineering, Python, deep leaning/ML libraries (keras, tensorflow, scikit-learn, nltk, spaCy, etc.)

    If you are INTERESTED in applying for this position, please send us your complete application (CV, cover letter, letter of reference, diplomas and certificates).

    By continuing to browse this site, you accept the use of cookies or similar technologies whose purpose is to produce statistics on visits to our site (tests and measurement of visitor numbers, visit frequency, page views and performance) and to offer you content and promotions which will be of interest to you.

    Our cookie policy has been updated. Feel free to manage your preferences.


    Manage your cookie preferences

    Update your cookie preferences

    Find out about the type of cookies stored on your device, accept or block them for the entire site, all services or on a service-by-service basis.

    OK, accept all

    Visitor flow

    These cookies provide us with insight into traffic sources and allow us to better understand our visitors anonymously.

    (Google Analytics and CrazyEgg)


    Sharing tool

    Social media cookies allow content sharing on your preferred networks.



    Visitor understanding

    These cookies are used to track visitors across websites.

    The intention is to enable us to offer more relevant, targeted content to existing contacts (ClickDimensions) and display ads that are relevant and engaging for users (Facebook Pixels).


    For more information about these cookies and our cookie policy, click here