Current Topics

Below, you will find current topics that can be worked in the context of a  Seminar Software Engineering, a Bachelor’s or a Master’s Thesis. The context indicates the scope of the work, and the keywords give you further information about the topic and its domain.

Mind that there are multiple pages, you can navigate them using the buttons on the bottom.

 Apertus: Improving Coding Capabilities

Context

The Apertus project from EPFL, ETH, focuses on developing a Swiss-based Large Language Model (LLM) with strong multilingual capabilities. While the model performs competitively on general language tasks, it currently struggles with structured programming challenges such as:

  • Long-horizon reasoning over multiple files
  • Code refactoring and abstraction
  • Repository-level understanding
  • Debugging and test-driven development
  • Reliable code generation under constraints

 Show more...

 Competitive Training of LLMs

Context

LLMs increasingly rely on synthetic data for continued improvement, as most publicly available datasets (e.g., GitHub) have already been extensively used in training existing models. However, ensuring the correctness and usefulness of synthetic code remains a major challenge.

Motivation

This project proposes a competitive training framework inspired by GAN-like systems:

  • One model generates synthetic code samples
  • Another model evaluates and tests correctness
  • Feedback is used to iteratively improve generation quality

 Show more...

 LLMs for Science

Context

One of the fastest-growing application areas of LLMs is scientific computing, mathematics, and formal reasoning. However, current models still struggle with:

  • Mathematical proof generation
  • Symbolic reasoning
  • Scientific code correctness
  • Long-step logical inference

This project introduces students to new scientific benchmarks and explores how existing models can be extended to perform better on STEM-related tasks.

 Show more...