LLM
Agent-Based Code Repair | Current Topics
Context
Large language models (LLMs) have became popular over the last few years, one of the reason being the quality of the outputs these models generate. LLM Agents are a step further, they allow LLMs to use memory, tools and sequential thinking.
Agentic AI System for Automated Systematic Literature Reviews | Current Topics
Context
This project operates within the domain of agentic AI, information retrieval, and research methodology automation. Systematic literature reviews (SLRs) are fundamental to evidence-based research, yet their execution remains largely manual and time-intensive. The project builds on an existing paper collection system that queries academic databases (arXiv, Semantic Scholar, DBLP) using structured configurations, and aims to transform it into a multi-agent pipeline where researchers provide a natural language topic description and receive a draft survey report with human oversight at critical decision points.
Context
Modern LLMs are increasingly enhanced with the ability to interact with external tools such as:
- Code interpreters
- Search engines
- Databases
- Simulated environments
Apertus: Improving Coding Capabilities | Current Topics
Context
The Apertus project from EPFL, ETH, focuses on developing a Swiss-based Large Language Model (LLM) with strong multilingual capabilities. While the model performs competitively on general language tasks, it currently struggles with structured programming challenges such as:
- Long-horizon reasoning over multiple files
- Code refactoring and abstraction
- Repository-level understanding
- Debugging and test-driven development
- Reliable code generation under constraints
Competitive Training of LLMs | Current Topics
Context
LLMs increasingly rely on synthetic data for continued improvement, as most publicly available datasets (e.g., GitHub) have already been extensively used in training existing models. However, ensuring the correctness and usefulness of synthetic code remains a major challenge.
Motivation
This project proposes a competitive training framework inspired by GAN-like systems:
- One model generates synthetic code samples
- Another model evaluates and tests correctness
- Feedback is used to iteratively improve generation quality
Consolidating Unstructured Knowledge into Structured Documentation | Current Topics
Context
Over the last few years, the usage of large language models (LLMs), Retrieval-Augmented Generation (RAG), and Agentic AI has increased, as has the quality of the generated outputs. While RAG enables LLMs to leverage (internal) company information to answer more complex and detailed questions, we face the issue that this information is not necessarily well-structured, centralized, or available in high quality (e.g., a significant amount of knowledge resides in emails, where the information is distributed across multiple conversations, folders, and mailboxes). Therefore, the quality of answers produced by RAG systems strongly depends on the quality of the available information.
Data Contamination of LLMs | Current Topics
Context
A major challenge in evaluating modern LLMs is determining whether a model has previously seen benchmark data during training. This project focuses on detecting and mitigating training data contamination.
Diffusion-based Code Language Model | Current Topics
Context
While standard code models generate text one token at a time (autoregressive), Diffusion Language Models (DLMs) generate and refine the entire block of code simultaneously. This allows the model to look ahead and fix structural errors in a non-linear fashion, where inference becomes an online optimization problem.
LLMs for Science | Current Topics
Context
One of the fastest-growing application areas of LLMs is scientific computing, mathematics, and formal reasoning. However, current models still struggle with:
- Mathematical proof generation
- Symbolic reasoning
- Scientific code correctness
- Long-step logical inference
This project introduces students to new scientific benchmarks and explores how existing models can be extended to perform better on STEM-related tasks.