Behavior-Driven Development in the Age of AI-Assisted Programming

Context

Behavior-Driven Development (BDD) is a software development approach that uses structured, natural-language specifications (typically written in Gherkin language) to describe system behavior through concrete examples and scenarios. These specifications support shared understanding between developers, testers, and domain experts and can be directly linked to automated tests.

With the rise of “vibe coding” and Large Language Models (LLMs), software development is increasingly driven by informal prompts and rapid prototyping. While this enables fast development, it often lacks systematic specification and traceability. BDD offers a structured way to describe expected behavior and may serve as a high-quality input for AI-based code generation.

This project is conducted in collaboration between multiple universities (FHNW, the university of Sannio, Italy) and investigates how BDD practices can be combined with modern LLM-based development.

Motivation

We seek to answer the following research questions.

RQ1: Practitioner Experience with Gherkin Refactoring What are practitioners’ experiences regarding Gherkin usage and refactoring?

Analyze existing survey data
Study how often and why scenarios are refactored
Assess the need for automated refactoring support

RQ2: Code Generation from Structured vs. Informal Specifications Can LLMs generate better code from BDD specifications than from informal natural-language descriptions?

Design coding tasks of varying difficulty
Compare BDD-based vs. informal inputs
Evaluate multiple open-source coding LLMs
Analyze code quality and correctness

RQ3: Automated Gherkin Refactoring with LLMs (Conditional) Can LLMs automate Gherkin refactoring according to best practices?

Apply prompting strategies to refactoring tasks
Compare different open-source LLMs
Validate results with human experts

For seminar and bachelor’s thesis, students will focus on RQ1 and RQ2, for masters’s thesis, additionally on RQ3.

Goal

The key tasks are:

Analysis of survey data
Literature review on BDD and LLM-based programming
Design of controlled experiments
Evaluation of generated code and specifications
Expert validation

The expected outcomes are:

Empirical insights into BDD maintenance practices
Evidence on the impact of structured specifications on AI-generated code
Assessment of LLMs for automated refactoring
Practical recommendations for AI-supported development

Requirements

Pointers

Contact

Nitish Patkar, FHNW, nitish.patkar@fhnw.ch
Sebastiano Panichella