We are building an LLM-based application which takes a lot of user data from various internal sources.

It then sends the data to various prompts which provide the answers needed to fill out forms correctly.

The prompts are being stored in the codebase to allow for rapid updates to different prompt chains which usually require updates to both prompts and code.

How should the metrics on the prompts be managed and matched back to the original prompt in git?

These include:

  • Evaluation pipeline metrics,
  • feedback from users,
  • metrics on the individual prompts and
  • metrics on the whole prompt chain
  • etc.

Also how can the prompts either the whole chain or one part of the prompt chain be retrieved from git to be rolled back or re-evaluated in an efficient way?

2 Replies 2

Development/Update Phase:

    • Modify prompts in the codebase.

    • Tag and commit the changes to Git with metadata identifying version and prompt chain.

    • Add/Update relevant metrics in the database (for example, a batch job runs to capture metrics after prompt chain execution).

    Evaluation Phase:

    • Track metrics and evaluate prompt performance.

    • Collect user feedback to improve prompt responses.

    • Use a dashboard to view performance over time.

    Update/Rollback Phase:

    • When performance issues arise, retrieve the version of the prompt from Git.

    • Re-run or modify the prompt chain, and roll out an update based on new metrics.

    Metrics Re-Evaluation:

    • Reevaluate the performance of a specific prompt or prompt chain, possibly using user feedback and automated tests.

Thanks for the reply.
The steps of the process I understand I am having difficulty though figuring out:
a. automatically setup metrics for a new prompt. Every version release cannot automatically add a new prompt metric to the db as there could be other changes in the code unrelated to the prompt - or even changes to a different prompt. Even if each prompt change is separated into a different module so that it has it's own version still if the prompts are all in the same repository when the code is released all the module versions will be updated. So my question is really is there a way to automate releases to update versions and therefore metrics only for prompts that have been changed.
b. how to easily retrieve and rerun previous versions of the prompt quickly and efficiently when other commits and changes to the code might have been made since the version being rolled back to.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.