Predicting Merge Conflict Resolutions: WSRC vs. Random Forest
Context
Merge conflict resolution remains a significant challenge in Git-based software development, as manual conflict resolutions slow down collaboration and reduce developer productivity. However, empirical research results suggest that a vast majority of chunk resolutions found in practice can be derived from a fixed set of conflict resolution patterns, combining the ours, theirs, and base parts of a conflicting chunk in a pre-defined way. These findings form the foundation for phrasing merge conflict resolution as a classification problem, and thus using traditional machine learning for predicting the correct resolution.
Motivation
In a preliminary study, we collected a large dataset by extracting conflicts and their resolutions from the evolution of thousands of open-source projects, which may be used for training conflict resolution classifiers. While traditional classifiers (e.g., logistic regression, random forests, and support vector machines) have been evaluated on this task, with random forests showing the most promising results, the space of classification methods has not yet been fully explored.
Goal
The goal of this project is to evaluate through a series of experiments whether the Test Weighted Sparse Representation Classifier may outperform Random Forest on the task of predicting merge conflict resolutions.
Requirements
The student should have:
- Familiarity with classic classification methods (e.g. decision trees, random forests, SVM, Weighted Sparse Representation Classifier).
- Proficiency in handling tabular data with Python(scikit-learn for RF for WSRC (L1-solve) for model evaluation on high-D data).
- Proficiency in handling tabular data with Python(scikit-learn for RF for WSRC )
- Understanding of Git/version control and software engineering for context.
Pointers
- https://dl.acm.org/doi/abs/10.1145/3661167.3661197
- https://www.sciencedirect.com/science/article/pii/S0950584923001878