Format

 B.Sc. Thesis

  •  Behavior-Driven Development in the Age of AI-Assisted Programming |  Current Topics

    Context

    Behavior-Driven Development (BDD) is a software development approach that uses structured, natural-language specifications (typically written in Gherkin language) to describe system behavior through concrete examples and scenarios. These specifications support shared understanding between developers, testers, and domain experts and can be directly linked to automated tests.

    With the rise of “vibe coding” and Large Language Models (LLMs), software development is increasingly driven by informal prompts and rapid prototyping. While this enables fast development, it often lacks systematic specification and traceability. BDD offers a structured way to describe expected behavior and may serve as a high-quality input for AI-based code generation.

    This project is conducted in collaboration between multiple universities (FHNW, the university of Sannio, Italy) and investigates how BDD practices can be combined with modern LLM-based development.

     Show more...

  •  Feature Engineering for Classification-Based Merge Conflict Resolution |  Current Topics

    Context

    Merge conflict resolution remains a significant challenge in Git-based software development, as manual conflict resolutions slow down collaboration and reduce developer productivity. However, empirical research results suggest that a vast majority of chunk resolutions found in practice can be derived from a fixed set of conflict resolution patterns, combining the ours, theirs, and base parts of a conflicting chunk in a pre-defined way. These findings form the foundation for phrasing merge conflict resolution as a classification problem, and thus using traditional machine learning for predicting conflict resolutions.

     Show more...

  •  RL-based Training for Code in LLMs |  Current Topics

    Context

    Large Language Models (LLMs) have shown strong performance in code generation, completion, and repair tasks. However, supervised pretraining on massive code corpora is limited by data quality, lack of explicit feedback, and the inability to capture correctness beyond next-token prediction. Recent research has explored Reinforcement Learning (RL) based training approaches to refine LLMs for code. By leveraging feedback signals—such as compilation success, test case execution, or static analysis warnings—models can be trained to better align with correctness and developer intent.

     Show more...

  •  Trust-VR: Feasibility of AI Architectures for Virtual Pediatric Patients |  Current Topics

    Context

    Managing distressed pediatric patients in clinical environments is a challenging yet critical skill for healthcare professionals. Patients exhibit diverse emotional responses—ranging from anxiety and shyness to outright resistance—making it essential for clinicians to adapt their approach. Traditional training methods often lack the realism and variability needed to prepare professionals for these high-stakes interactions, particularly when it comes to emotional and behavioral dynamics in children.

     Show more...

 M.Sc. Thesis

  •  Behavior-Driven Development in the Age of AI-Assisted Programming |  Current Topics

    Context

    Behavior-Driven Development (BDD) is a software development approach that uses structured, natural-language specifications (typically written in Gherkin language) to describe system behavior through concrete examples and scenarios. These specifications support shared understanding between developers, testers, and domain experts and can be directly linked to automated tests.

    With the rise of “vibe coding” and Large Language Models (LLMs), software development is increasingly driven by informal prompts and rapid prototyping. While this enables fast development, it often lacks systematic specification and traceability. BDD offers a structured way to describe expected behavior and may serve as a high-quality input for AI-based code generation.

    This project is conducted in collaboration between multiple universities (FHNW, the university of Sannio, Italy) and investigates how BDD practices can be combined with modern LLM-based development.

     Show more...

  •  Deep Learning for Software Merge Conflict Resolution |  Current Topics

    Context

    Merge conflict resolution is a critical challenge in software development, particularly in large, collaborative projects that use version control systems like Git. When multiple developers modify the same part of a codebase, conflicts arise that require manual intervention. Existing automated resolution strategies often rely on rule-based approaches or traditional machine learning models, which struggle with complex and ambiguous cases. Deep learning has the potential to improve conflict resolution by learning patterns from historical merge conflicts and predicting optimal resolution strategies. However, identifying the most effective deep learning architecture for this task remains an open question.

     Show more...

  •  Energy–Efficient Configuration of Embedded Data-Processing Systems in Public Transportation |  Current Topics

    Context

    Modern public transportation vehicles, such as trams, buses, trolleybuses and trains, increasingly rely on on-board computing units to process and securely transfer large volumes of data generated by sensors and surveillance cameras. These systems often operate on limited battery power during night-time parking, when vehicles are disconnected from external energy sources. During this time window, the on-board computer must complete several computationally intensive tasks—such as software updates, video decoding, compression, encryption, and data upload—before service resumes.

    In collaboration with Supercomputing Systems AG (SCS) and a public transportation company in Romandie, this project addresses the challenge of executing these tasks reliably under strict energy and time constraints. Understanding how to configure the embedded system and how to select optimal communication protocols for data transfer in order to remain both energy-efficient and predictable is essential for dependable fleet operations.

     Show more...

  •  Feature Engineering for Classification-Based Merge Conflict Resolution |  Current Topics

    Context

    Merge conflict resolution remains a significant challenge in Git-based software development, as manual conflict resolutions slow down collaboration and reduce developer productivity. However, empirical research results suggest that a vast majority of chunk resolutions found in practice can be derived from a fixed set of conflict resolution patterns, combining the ours, theirs, and base parts of a conflicting chunk in a pre-defined way. These findings form the foundation for phrasing merge conflict resolution as a classification problem, and thus using traditional machine learning for predicting conflict resolutions.

     Show more...

  •  Predicting Merge Conflict Resolutions: WSRC vs. Random Forest |  Current Topics

    Context

    Merge conflict resolution remains a significant challenge in Git-based software development, as manual conflict resolutions slow down collaboration and reduce developer productivity. However, empirical research results suggest that a vast majority of chunk resolutions found in practice can be derived from a fixed set of conflict resolution patterns, combining the ours, theirs, and base parts of a conflicting chunk in a pre-defined way. These findings form the foundation for phrasing merge conflict resolution as a classification problem, and thus using traditional machine learning for predicting the correct resolution.

     Show more...

  •  RL-based Training for Code in LLMs |  Current Topics

    Context

    Large Language Models (LLMs) have shown strong performance in code generation, completion, and repair tasks. However, supervised pretraining on massive code corpora is limited by data quality, lack of explicit feedback, and the inability to capture correctness beyond next-token prediction. Recent research has explored Reinforcement Learning (RL) based training approaches to refine LLMs for code. By leveraging feedback signals—such as compilation success, test case execution, or static analysis warnings—models can be trained to better align with correctness and developer intent.

     Show more...

  •  Trust-VR: Feasibility of AI Architectures for Virtual Pediatric Patients |  Current Topics

    Context

    Managing distressed pediatric patients in clinical environments is a challenging yet critical skill for healthcare professionals. Patients exhibit diverse emotional responses—ranging from anxiety and shyness to outright resistance—making it essential for clinicians to adapt their approach. Traditional training methods often lack the realism and variability needed to prepare professionals for these high-stakes interactions, particularly when it comes to emotional and behavioral dynamics in children.

     Show more...

 Seminar

  •  Agent-Based Code Repair |  Current Topics

    Context

    Large language models (LLMs) have became popular over the last few years, one of the reason being the quality of the outputs these models generate. LLM Agents are a step further, they allow LLMs to use memory, tools and sequential thinking.

     Show more...

  •  Agentic AI System for Automated Systematic Literature Reviews |  Current Topics

    Context

    This project operates within the domain of agentic AI, information retrieval, and research methodology automation. Systematic literature reviews (SLRs) are fundamental to evidence-based research, yet their execution remains largely manual and time-intensive. The project builds on an existing paper collection system that queries academic databases (arXiv, Semantic Scholar, DBLP) using structured configurations, and aims to transform it into a multi-agent pipeline where researchers provide a natural language topic description and receive a draft survey report with human oversight at critical decision points.

     Show more...

  •  Agentic LLMs |  Current Topics

    Context

    Modern LLMs are increasingly enhanced with the ability to interact with external tools such as:

    • Code interpreters
    • Search engines
    • Databases
    • Simulated environments

     Show more...

  •  Animating Virtual Children: Realistic Behaviors for VR Training in Pediatric Care |  Current Topics

    Context

    Managing distressed patients in clinical environments is a challenging yet critical skill for healthcare professionals. Patients exhibit diverse emotional responses—ranging from anxiety and shyness to outright resistance-making it essential for clinicians to adapt their approach. Traditional training methods often lack the realism needed to prepare professionals for these high-stakes interactions.

     Show more...

  •  Apertus: Improving Coding Capabilities |  Current Topics

    Context

    The Apertus project from EPFL, ETH, focuses on developing a Swiss-based Large Language Model (LLM) with strong multilingual capabilities. While the model performs competitively on general language tasks, it currently struggles with structured programming challenges such as:

    • Long-horizon reasoning over multiple files
    • Code refactoring and abstraction
    • Repository-level understanding
    • Debugging and test-driven development
    • Reliable code generation under constraints

     Show more...

  •  Behavior-Driven Development in the Age of AI-Assisted Programming |  Current Topics

    Context

    Behavior-Driven Development (BDD) is a software development approach that uses structured, natural-language specifications (typically written in Gherkin language) to describe system behavior through concrete examples and scenarios. These specifications support shared understanding between developers, testers, and domain experts and can be directly linked to automated tests.

    With the rise of “vibe coding” and Large Language Models (LLMs), software development is increasingly driven by informal prompts and rapid prototyping. While this enables fast development, it often lacks systematic specification and traceability. BDD offers a structured way to describe expected behavior and may serve as a high-quality input for AI-based code generation.

    This project is conducted in collaboration between multiple universities (FHNW, the university of Sannio, Italy) and investigates how BDD practices can be combined with modern LLM-based development.

     Show more...

  •  Competitive Training of LLMs |  Current Topics

    Context

    LLMs increasingly rely on synthetic data for continued improvement, as most publicly available datasets (e.g., GitHub) have already been extensively used in training existing models. However, ensuring the correctness and usefulness of synthetic code remains a major challenge.

    Motivation

    This project proposes a competitive training framework inspired by GAN-like systems:

    • One model generates synthetic code samples
    • Another model evaluates and tests correctness
    • Feedback is used to iteratively improve generation quality

     Show more...

  •  Consolidating Unstructured Knowledge into Structured Documentation |  Current Topics

    Context

    Over the last few years, the usage of large language models (LLMs), Retrieval-Augmented Generation (RAG), and Agentic AI has increased, as has the quality of the generated outputs. While RAG enables LLMs to leverage (internal) company information to answer more complex and detailed questions, we face the issue that this information is not necessarily well-structured, centralized, or available in high quality (e.g., a significant amount of knowledge resides in emails, where the information is distributed across multiple conversations, folders, and mailboxes). Therefore, the quality of answers produced by RAG systems strongly depends on the quality of the available information.

     Show more...

  •  Data Contamination of LLMs |  Current Topics

    Context

    A major challenge in evaluating modern LLMs is determining whether a model has previously seen benchmark data during training. This project focuses on detecting and mitigating training data contamination.

     Show more...

  •  Deep Learning for Software Merge Conflict Resolution |  Current Topics

    Context

    Merge conflict resolution is a critical challenge in software development, particularly in large, collaborative projects that use version control systems like Git. When multiple developers modify the same part of a codebase, conflicts arise that require manual intervention. Existing automated resolution strategies often rely on rule-based approaches or traditional machine learning models, which struggle with complex and ambiguous cases. Deep learning has the potential to improve conflict resolution by learning patterns from historical merge conflicts and predicting optimal resolution strategies. However, identifying the most effective deep learning architecture for this task remains an open question.

     Show more...

  •  Diffusion-based Code Language Model |  Current Topics

    Context

    While standard code models generate text one token at a time (autoregressive), Diffusion Language Models (DLMs) generate and refine the entire block of code simultaneously. This allows the model to look ahead and fix structural errors in a non-linear fashion, where inference becomes an online optimization problem.

     Show more...

  •  Energy-Aware Environment Configuration in Simulation-based Testing of Autonomous Vehicles |  Current Topics

    Context

    Autonomous vehicles (AVs) are complex cyberphysical systems that require extensive validation to ensure safety and reliability. Since real-world testing is expensive and potentially unsafe, simulation-based testing using platforms like CARLA has become a key component of AV software validation. These simulators reproduce realistic traffic scenarios, sensors, and environmental conditions, but they are computationally intensive and consume significant amounts of energy. As large-scale simulation campaigns become common (e.g., thousands of tests in continuous integration pipelines), improving the energy efficiency of simulation-based testing becomes increasingly important for sustainable software engineering.

     Show more...

  •  Feature Engineering for Classification-Based Merge Conflict Resolution |  Current Topics

    Context

    Merge conflict resolution remains a significant challenge in Git-based software development, as manual conflict resolutions slow down collaboration and reduce developer productivity. However, empirical research results suggest that a vast majority of chunk resolutions found in practice can be derived from a fixed set of conflict resolution patterns, combining the ours, theirs, and base parts of a conflicting chunk in a pre-defined way. These findings form the foundation for phrasing merge conflict resolution as a classification problem, and thus using traditional machine learning for predicting conflict resolutions.

     Show more...

  •  LLMs for Science |  Current Topics

    Context

    One of the fastest-growing application areas of LLMs is scientific computing, mathematics, and formal reasoning. However, current models still struggle with:

    • Mathematical proof generation
    • Symbolic reasoning
    • Scientific code correctness
    • Long-step logical inference

    This project introduces students to new scientific benchmarks and explores how existing models can be extended to perform better on STEM-related tasks.

     Show more...

  •  Predicting Merge Conflict Resolutions: WSRC vs. Random Forest |  Current Topics

    Context

    Merge conflict resolution remains a significant challenge in Git-based software development, as manual conflict resolutions slow down collaboration and reduce developer productivity. However, empirical research results suggest that a vast majority of chunk resolutions found in practice can be derived from a fixed set of conflict resolution patterns, combining the ours, theirs, and base parts of a conflicting chunk in a pre-defined way. These findings form the foundation for phrasing merge conflict resolution as a classification problem, and thus using traditional machine learning for predicting the correct resolution.

     Show more...

  •  RL-based Training for Code in LLMs |  Current Topics

    Context

    Large Language Models (LLMs) have shown strong performance in code generation, completion, and repair tasks. However, supervised pretraining on massive code corpora is limited by data quality, lack of explicit feedback, and the inability to capture correctness beyond next-token prediction. Recent research has explored Reinforcement Learning (RL) based training approaches to refine LLMs for code. By leveraging feedback signals—such as compilation success, test case execution, or static analysis warnings—models can be trained to better align with correctness and developer intent.

     Show more...

  •  Trust-VR: Feasibility of AI Architectures for Virtual Pediatric Patients |  Current Topics

    Context

    Managing distressed pediatric patients in clinical environments is a challenging yet critical skill for healthcare professionals. Patients exhibit diverse emotional responses—ranging from anxiety and shyness to outright resistance—making it essential for clinicians to adapt their approach. Traditional training methods often lack the realism and variability needed to prepare professionals for these high-stakes interactions, particularly when it comes to emotional and behavioral dynamics in children.

     Show more...