Exploration of Self-Reflective LLMs for Code

Context

Large language models (LLMs) have became popular over the last few years, one of the reason being the quality of the outputs these models generate. Recent advancements try to make models think more, by either utilizing simple prompts or by training them using self-reflection via reinforcement learning.

Motivation

Models such as o1 or DeepSeek-R1 are very recent reasoning models. By spending more time thinking, the model(s) achieve better performance in many tasks involving logical reasoning, including code.

Goal

The student will follow 3 steps:

Review about Reasoning Models
Practical direction: Code Repair, Vulnerability Repair, Code Generation
Experiments and analysis of results including comparison with existing models where the second step is to be discussed further

Requirements

Knowledge of Machine Learning, PyTorch or TensorFlow
Passion for learning about state-of-the-art methods and models

Pointers

https://arxiv.org/abs/2407.11511
https://huggingface.co/blog/open-r1
https://huggingface.co/deepseek-ai/DeepSeek-R1

Contact

Roman Macháček