Published on

Introducing gpt-prompt-engineer- Experimenting with Prompts for Better AI Performance

  • avatar


Have you ever wondered how to improve the performance of AI models by optimizing prompts? Prompt engineering is like alchemy, where you experiment with different prompts until you find the perfect one. But what if there was a tool that could take this experimentation to a whole new level? Introducing gpt-prompt-engineer, a powerful tool that generates, tests, and ranks prompts to enhance AI performance.


  • Prompt Generation: With the help of GPT-4 and GPT-3.5-Turbo, gpt-prompt-engineer can generate a wide range of prompts based on your specific use-case and test cases.

  • Prompt Testing: The real magic happens during prompt testing. Each generated prompt is tested against a set of test cases, and their performance is compared and ranked using an ELO rating system.

  • ELO Rating System: Every prompt starts with an ELO rating of 1200. As they compete against each other in generating responses to the test cases, their ELO ratings change based on their performance. This allows you to easily identify the most effective prompts.

  • Classification Version: gpt-prompt-engineer also offers a classification version specifically designed for classification tasks. It evaluates the correctness of test cases by matching them to expected outputs ('true' or 'false') and provides a table with scores for each prompt.

  • Weights & Biases Logging: Optionally, you can log your configuration details, system prompts, user prompts, test cases, and final ranked ELO ratings for each prompt using Weights & Biases. This feature is currently available only in the main gpt-prompt-engineer notebook.

How to Use

  1. Start by defining your use-case and test cases. The use-case is a description of what you want the AI to do, while test cases are specific prompts to which you want the AI to respond. For example:
description = "Given a prompt, generate a landing page headline."

test_cases = [
        'prompt': 'Promoting an innovative new fitness app, Smartly',
        'prompt': 'Why a vegan diet is beneficial for your health',
        'prompt': 'Introducing a new online course on digital marketing',
    # Add more test cases here

For the classification version, your test cases should be in the following format:

test_cases = [
        'prompt': 'I had a great day!',
        'output': 'true'
        'prompt': 'I am feeling gloomy.',
        'output': 'false'
    # Add more test cases here
  1. Choose the number of prompts you want to generate. Keep in mind that generating a large number of prompts can be expensive. Starting with 10 prompts is a good idea.

  2. Call the generate_optimal_prompt(description, test_cases, number_of_prompts) function to generate a list of potential prompts, test their performance, and rank them. For the classification version, simply run the last cell.

  3. The final ELO ratings will be displayed in a table, sorted in descending order. The higher the rating, the better the prompt. For the classification version, the scores for each prompt will be shown in a table.

Benefits and Use Cases

gpt-prompt-engineer offers several benefits and use cases:

  • Improved AI Performance: By experimenting with different prompts and finding the most effective ones, you can significantly enhance the performance of your AI models.

  • Time and Cost Savings: gpt-prompt-engineer automates the prompt engineering process, saving you time and resources that would otherwise be spent on manual experimentation.

  • Optimized Prompt Selection: With the ELO rating system, you can easily identify the prompts that perform the best, allowing you to make informed decisions about which prompts to use in your AI models.

  • Classification Tasks: The classification version of gpt-prompt-engineer is specifically designed for classification tasks, making it easier to evaluate the correctness of test cases and select the most suitable prompts.

Future Directions

gpt-prompt-engineer is an open-source project, and contributions are welcome. Here are some ideas for future development:

  • Multiple System Prompt Generators: Introduce different styles of prompt generators to cover a wider range of use-cases, such as examples, verbose, short, markdown, and more.

  • Automatic Test Case Generation: Develop a feature that automatically generates test cases, further streamlining the prompt engineering process.

  • Expanded Classification Support: Enhance the classification version to support more than two classes using tiktoken.


gpt-prompt-engineer is a powerful tool that takes prompt engineering to the next level. By generating, testing, and ranking prompts, you can optimize the performance of your AI models and achieve better results. Whether you're working on a specific use-case or a classification task, gpt-prompt-engineer provides the tools you need to experiment and find the most effective prompts. Give it a try and unlock the full potential of your AI models!


This project is licensed under the MIT License.


For more information about gpt-prompt-engineer, you can reach out to Matt Shumer on Twitter @mattshumer_.

Project Link: