Overview
Members
implement_code_changes
Interestingly, we set the conversation temp to 0, stashing the old temperature to be reset after we’re done coding. We call get_original_file to retrieve the file contents, then passes it into the code writing function replace_complete_file.
Next up, code review. We enter a retry loop where n=2, and execute review_change. If everything comes back fine, we write out the file. Otherwise we regenerate the review feedback and reprocess it.
Error recovery redundancy
Unsure if this is due to an organic codebase or they’re attempting to slowly shift how redundancy works, but it seems odd that there’s a retry loop in this method, and in review_change.
Additionally, if the max depth is reached in review_change, it returns a success case with the contents of the new file. This means this loop would short circuit as it doesn’t detect a problem, and error recovery is never done in this method.
get_original_file
Mostly IO cleanup. If a path or name isn’t included in the step being executed, we run identify_file_to_change. Otherwise read the file out.
identify_file_to_change
In the “how much fuzzy matching is too much fuzzy matching” but undeniably neat, we take a description of the code changes required and prompt (identify_files_to_change.prompt
) the LLM that, given the description, and a list of all files in the project, which files need to be modified?
replace_complete_file
More prompt IO! implement_changes.prompt
is one of the largest prompts we’ve seen in the project so far at 3600 characters, and details specifically how the LLM is meant to write code. It seems there’s a lot of struggle for it to output a consistent code file without summarization, as much of the prompting is negative reinforcement for things like // Rest of code here
.
Note that this method name is a misnomer, and actually just returns the refactored file’s contents.
review_change
This is quite neat — First a set of diff hunks are collected from the proposed changes and the original file. Then the LLM is prompted (review_changes.prompt
) to review the set of hunks and make a decision on each. It uses a function definition to output a representation of these decisions as an array of hunks and the decision output.
We see a generic error correction prompt at play here which is neat. It relies on the conversation context to provide what it needs to error correct, and gives it a short synopsis of why the action failed (in this case, if not all hunks have a review outcome), and passes the same function definition so the output will be the same.
Error recovery fallback
Interestingly, if it isn’t able to provide a complete review after 2 attempts it returns the contents of the new file without review. I wonder what the hit rate of the reviewer is.