Behavior-Driven Development in the Age of AI-Assisted Programming
Context
Behavior-Driven Development (BDD) is a software development approach that uses structured, natural-language specifications (typically written in Gherkin language) to describe system behavior through concrete examples and scenarios. These specifications support shared understanding between developers, testers, and domain experts and can be directly linked to automated tests.
With the rise of “vibe coding” and Large Language Models (LLMs), software development is increasingly driven by informal prompts and rapid prototyping. While this enables fast development, it often lacks systematic specification and traceability. BDD offers a structured way to describe expected behavior and may serve as a high-quality input for AI-based code generation.
This project is conducted in collaboration between multiple universities (FHNW, the university of Sannio, Italy) and investigates how BDD practices can be combined with modern LLM-based development.
Motivation
We seek to answer the following research questions.
RQ1: Practitioner Experience with Gherkin Refactoring What are practitioners’ experiences regarding Gherkin usage and refactoring?
- Analyze existing survey data
- Study how often and why scenarios are refactored
- Assess the need for automated refactoring support
RQ2: Code Generation from Structured vs. Informal Specifications Can LLMs generate better code from BDD specifications than from informal natural-language descriptions?
- Design coding tasks of varying difficulty
- Compare BDD-based vs. informal inputs
- Evaluate multiple open-source coding LLMs
- Analyze code quality and correctness
RQ3: Automated Gherkin Refactoring with LLMs (Conditional) Can LLMs automate Gherkin refactoring according to best practices?
- Apply prompting strategies to refactoring tasks
- Compare different open-source LLMs
- Validate results with human experts
For seminar and bachelor’s thesis, students will focus on RQ1 and RQ2, for masters’s thesis, additionally on RQ3.
Goal
The key tasks are:
- Analysis of survey data
- Literature review on BDD and LLM-based programming
- Design of controlled experiments
- Evaluation of generated code and specifications
- Expert validation
The expected outcomes are:
- Empirical insights into BDD maintenance practices
- Evidence on the impact of structured specifications on AI-generated code
- Assessment of LLMs for automated refactoring
- Practical recommendations for AI-supported development
Requirements
-
Pointers
-
Contact
- Nitish Patkar, FHNW, nitish.patkar@fhnw.ch
- Sebastiano Panichella