Published on

Call for Supporters: Research Thesis in NLP

Authors
  • avatar
    Name
    Minh N. Ta
    Twitter

💡 This blog post is aimed at calling my friends (up to 5) to join with me on topic of detecting machine-generated contents, which will be extended as my graduation research thesis by the end of next June. This research will be carried on at Foundation Models Lab, BKAI Research Center.

Research topics

The ease of access to large language models (LLMs) has enabled a widespread of machine-generated texts, and now it is often hard to tell whether a piece of text was human-written or machine-generated. This raises concerns about potential misuse, particularly within educational and academic domains. Thus, it is important to develop practical systems that can automate the process.

In academia, especially at HUST, students maybe overuse LLMs for their own purpose, this can lead to some decrease in student's abilities. Hence, my aim is create a system that can detect machine-generated contents in two domains:

  • Coding exercises of IT courses (Introduction to Programming, Data Structures and Algorithms, Applied Algorithms, etc.).
  • Students reports for their projects, especially in thesis or capstone projects.

Small guidance: I expect to use a small classifier to create a black-box detector (with acceptable quality), and use GAN-architecture to create an explainable detector.

Some Related Publications

From me

  • Mervat Abassy*, Kareem Elozeiri*, Alexander Aziz*, Minh Ngoc Ta*, Raj Vardhan Tomar*, Bimarsha Adhikari*, Saad El Dine Ahmed*, et al. "LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection". In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 336–343, Miami, Florida, USA. Association for Computational Linguistics.
  • Yuxia Wang, et al. “GenAI Content Detection Task 1: English and Multilingual Machine- generated Text Detection: AI vs. Human”. In Proceedings of the 31st Inter- national Conference on Computational Linguistics (COLING). Abu Dhabi, UAE: Association for Computational Linguistics, Jan. 2025. (to be appeared).
  • and two other publications are expected to be published in ACL/NAACL 2025.

From others

What I offer and what I need?

What I need

  • Have a strong interest in researching in NLP and LLM.
  • A good background in Math and programmin (especially in Python).
  • Experience in programming machine learning and deep learning algorithms.
  • A beautiful soul and eager to learn.
  • No need for prior knowledge of LLM or NLP.

What I have and offer

  • Access to GPUs for research purpose.
  • Supports in your own related projects.
  • A guide of how to do research from A-Z.
  • Potentially put your name in a publication or scientific research competition with my supervisor.
  • No salary but unlimited funds for iced tea, bubble tea, etc. 😆

Supervisors and Extended Research Team

These people may not join in our projects, but the same research will be carried by me and them:

My thesis supervisor:

The extended research team:

If you are interested in...

Please contact me via my email minh@tnminh.com if you want to work with me or have further questions about this topic.

You can also comment on this blog post for further discussion.

❗️ Deadline for Application: 23:59 12/12/2024.