Overview
An agent meant to do the actual development work. From what I can tell this is the most complex agent which leverages a great deal of context.
Members
set_up_environment
Project Step: environment_setup
First it loops through system dependencies provided by get_architecture.
These system dependency objects that are returned from the LLM include a test command to verify that the dependency is installed (neat idea). If it isn’t it just prints out a message about needing to install it before continuing.
start_coding
Internally we start tracking what % of the features have been implemented. If we hit 50%, we call out to the technical writer to create a README. It looks like at some point they want to flesh out a license and API documentation as well.
We loop through each dev task in the development plan generated by the tech lead (just a str) and we set the properties defining the task to the project’s current task values. For each dev task, we call implement_task.
After all tasks are processed, we call back to the TechnicalWriter to rewrite our README.
Feature Complete!
implement_task
dev_steps_to_load
I’ve been seeing this reference that if the first element in the project’s dev_steps_to_load
has a property prompt_path
set to the string breakdown.prompt
it’s meant to signify that it’s the last element to be processed — Given the name I assumed it was around rehydrating a state from the DB, but it seems like it’s used in the normal flow?
If we aren’t on the final task, we prompt the LLM to tell us the code that needs to be written (breakdown.prompt
). It includes the entire project details, including the features, file list, all development tasks, and technical details. Some things I find interesting:
- The sheer size of this prompt. GPT-4 has an absolutely massive working context for this to be coherent. Or is this a side effect of a very effective prompt?
- The output format for the code files are unprompted, are they relying on GPT-4’s training to always output code in the same format?
- There seems to be a mechanism for user input to self correct, but it’s unused. This isn’t abnormal, it’s good to have flexibility in the prompt template and it’s why I like they’re using jinja here, just curious when it stopped being utilized and why.
dev_steps_to_load
Diving in because I saw more references during control flow, it is solely used when continuing work on a project.
dev_steps_to_load
is a list of DevelopmentPlan models which includes the prompt filename that’s being used to execute the step.
We get back a response from the LLM. It looks like we assume the LLM has wrapped the response in a prefix and postfix of the first five and last five words of the message, which we then split into instructions_prefix
and instructions_postfix
.
We then send a message referring to the output of our last message asking the LLM to parse it, including our pre and postfix we extracted. It includes a function definition called parse_development_task
that returns a JSON object with an array called “tasks”. Each entry in the task array can be one of three values:
- A command that needs to be run to execute the action
- A file that needs to be created or updated
- A notification that human intervention is required
pre and postfix
Since there isn’t a mention of a pre and postfix in the code writing prompt, is this just a method to prompt more effectively? Does telling the LLM a pre and postfix exists surrounding an instruction, even if it’s normal text produce more effective results?
Message cleanup when executing actions
The practice of cleaning up the message history when the LLM interacts with itself like in these steps (prompting it to write code, then task out what needs to be done with it) by deleting it from the message history seems interesting. I assume it serves not only to clean up the UI, but also keep the context more lightweight during the development process.
We now have our development_task object as defined above, which we execute using execute_task
execute_task
Here’s where we get into some interesting conversation branch management business. After some state management code RE: loading / continuing from a certain step, we generate a uuid for the branch we’re about to create in the conversation.
We loop over the task steps generated by the parse_task
prompt, branching off the step’s type
property and executing a different function depending on the value. (step_save_file, step_command_run, step_delete_file, step_human_intervention).
Step type values
Might just be a vestigial holdover, but there are step types here that aren’t enumerated in the prompt. For example, grepping for
delete_file
yields only comparison checks.Is the LLM generating steps types that it wants to take, like
delete_file
when necessary, or was this functionality that got removed?
step_command_run
Action for when the LLM determines it wants to run a command like npm install
. Wrapper for run_command_until_success
step_save_file
Action for when the LLM determines it wants to write or update a file. Wrapper for implement_code_changes
step_delete_file
Action for when the LLM determines it wants to delete a file. Interestingly, doesn’t actually delete a file but instead logs out that the LLM attempted to.
step_human_intervention
Action for when the LLM determines it needs user clarification. First calls ask_for_human_intervention, and provides a callback that allows the user to input “R” to run the app for debugging, I assume so the user can get a specific error code that’s occurring.
If the user inputs “continue” we assume all is well and return a success flag.
Otherwise we run debug, passing the user’s description as the issue description along with the current state of the conversation.