Online interaction is nowadays prevalent in the life of many people, and the youngest who use social media and instant messaging platforms the most are also the most vulnerable. As discussions can sometimes get heated, users might face personal insults, harassment or other kinds of hate messages.
Several solutions to this problem are already in use: manual or keyword-based censorship, moderators or user flagging toxic messages, etc. But these systems either need constant surveillance by human moderators or can be abused (users flagging a message for no reason or using new non-blacklisted words to prevent censorship).
The goal of this project is to guarantee users a safe and healthy experience in online chats by implementing an intelligent moderating system able to detect and identify toxic messages to assist moderators in their task.
Challenges: In this project, it is necessary to identify correctly and in a timely manner different classes of toxicity in online discussions, where the messages can be produced in large volumes and variety. It is also important that this internship aims towards cross-lingual models, namely using aligned datasets to train a model on multiple languages will be necessary. Moreover, the annotated datasets available might come from different sources, and thus contain different labels. Being able to merge these datasets will be required as well.
Project applications: Social media/online discussions/chatbots
What you will learn: You will be a junior data scientist, developing your skills in machine learning (deep learning, natural language processing).
Possible extensions: Integrating the model with a moderating chatbot to test it with users online.
Keywords: NLP, toxicity detection, multi-label classification, cross-lingual models, transfer learning, text embeddings
In this role
In this project, the goal is to:
Build a model able to detect and identify toxic messages
Use transfer learning to train the model on several languages
What we offer
Diploma Thesis / Internship in Lausanne. Join our team as intern and you will find a young, dynamic and culturally diverse working environment.
With 50+ year of history and over 1000 specialists, we offer a unique spectrum of experience, skills and technical innovations.