Below, you will find current topics that can be worked in the context of a Seminar Software Engineering, a Bachelor’s or a Master’s Thesis. The context indicates the scope of the work, and the keywords give you further information about the topic and its domain.
Mind that there are multiple pages, you can navigate them using the buttons on the bottom.
The Apertus project from EPFL, ETH, focuses on developing a Swiss-based Large Language Model (LLM) with strong multilingual capabilities. While the model performs competitively on general language tasks, it currently struggles with structured programming challenges such as:
LLMs increasingly rely on synthetic data for continued improvement, as most publicly available datasets (e.g., GitHub) have already been extensively used in training existing models. However, ensuring the correctness and usefulness of synthetic code remains a major challenge.
Motivation
This project proposes a competitive training framework inspired by GAN-like systems:
One model generates synthetic code samples
Another model evaluates and tests correctness
Feedback is used to iteratively improve generation quality
A major challenge in evaluating modern LLMs is determining whether a model has previously seen benchmark data during training. This project focuses on detecting and mitigating training data contamination.
One of the fastest-growing application areas of LLMs is scientific computing, mathematics, and formal reasoning. However, current models still struggle with:
Mathematical proof generation
Symbolic reasoning
Scientific code correctness
Long-step logical inference
This project introduces students to new scientific benchmarks and explores how existing models can be extended to perform better on STEM-related tasks.