Pattern-based Characterization of Evolving Variable Software

Page content

Context

Feature annotations are key to represent variability in annotative software product lines. This variability information, like the source code itself, is subject to continuous evolution. To date, there is a substantial lack of studies that characterize and help us to better understand this kind of evolution. Bittner et al. proposed so-called variation diffs, a graph structure that represents the difference between two versions of a code base based on the nesting hierarchy of source code blocks and their surrounding feature annotations as well as their logical interrelations.

Goal

While variation diffs are well-suited to describe the difference between two versions, the goal of this thesis is to detect frequently occurring patterns (i.e., variation diff sub-graphs) within a huge set of variation diffs, which has been already extracted from various open-source projects hosted on GitHub. Such patterns represent typical editing operations on variable software such as annotative software product lines. The patterns found shall be compiled into a catalogue of editing operations which contributes to the general body of knowledge in the field and which can be exploited in other use cases such as product-line analysis and testing in the future.

Approach

The idea of mining frequent variation diffs (i.e., patterns) is in this particular application an instance of a subgraph mining problem. Finding frequent subgraphs in large sets of graphs is a common and well studied problem in the field of structural pattern recognition. Recently, Fuchs and Riesen proposed a novel and quite efficient algorithm for extracting stable and frequent subgraphs out of sets of graphs. In the present project the developed framework of Fuchs and Riesen needs to be adopted to the problem of mining variation graph structures and evaluated on a real world set of variation diffs.

Required Skills

Good programming skills, particularly working with different APIs Basic statistics and a bit of graph theory and algorithms

Remark

The thesis is at the intersection of software engineering and pattern recognition. It will be supervised by the Software Engineering Group (Prof. Kehrer) that covers the application-oriented perspective, and the Pattern Recognition Group (Dr. Riesen) that contributes its expertise on graph pattern mining.

Further Reading

  • Bittner, P. M., Tinnes, C., Schultheiß, A., Viegener, S., Kehrer, T., & Thüm, T.: Classifying edits to variability in source code. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 196-208).
  • Mathias Fuchs, Kaspar Riesen: A novel way to formalize stable graph cores by using matching-graphs. Pattern Recognit. 131: 108846 (2022)