Your Prompt Engineering Masterclass (Part 1)
Prompt engineering has become a trending topic with the rise of Large Language Models (LLMs), especially with innovations like ChatGPT. This field emerged from the need to integrate LLMs effectively into various user applications.
At its core, prompt engineering is about developing the skills and techniques to maximize the potential of LLMs. This enables the models to interact efficiently with users, answering questions and exhibiting logical reasoning.
We will explore the key aspects of prompt engineering, providing a comprehensive understanding of this essential practice. In this article we’ll cover the following:
- Why Prompt Engineering?
- What are the prompt elements?
- What are LLMs Settings?
- What are the datasets that can be used for testing LLMs?
- What are the Prompt Engineering techniques?
- Summary of the prompting techniques
- Resources
1. Why Prompt Engineering?
Here’s why prompt engineering is essential:
- Task Customization: Prompt engineering allows us to tailor the inputs given to language models to specific tasks or domains. By crafting prompts that are relevant to the desired task, we can guide the model to generate more accurate and useful outputs.
- Improving Performance: Well-designed prompts can significantly improve the performance of language models. By providing the right cues and context, prompt engineering can help the model better understand the desired task and generate more relevant responses.
- Controlling Output: Prompt engineering enables us to exert some level of control over the outputs generated by language models. By carefully crafting prompts, we can influence the types of information the model relies on and encourage it to produce outputs that align with our goals.
- Mitigating Bias: Prompt engineering can help mitigate biases in model outputs. By designing prompts that encourage fairness and inclusivity, we can reduce the likelihood of the model generating biased or harmful responses.
- Enhancing Interpretability: By engineering prompts, we can make the reasoning process of language models more transparent and interpretable. This can be particularly valuable in applications where understanding the model’s decision-making process is crucial.
- Domain Adaptation: Prompt engineering can facilitate domain adaptation, allowing language models to perform effectively in specialized domains or on specific tasks by providing domain-specific prompts.
Overall, prompt engineering is a vital tool for leveraging the capabilities of language models effectively and ensuring that they produce outputs that meet our needs and standards.
2. What are the prompt elements?
A prompt contains any of the following elements:
- Instruction: a specific task or instruction you want the model to perform
- Context: external information or additional context that can steer the model to better responses
- Input Data: the input or question that we are interested to find a response for
- Output Indicator: the type or format of the output.
3. What are the Prompt Engineering parameters?
LLM settings are the parameters that affect the performance of LLM model. These settings can be used during the training phase of LLM model while giving it a context to predict the next one.
Here’s some LLM settings that are widely used in different LLMs:
- Temperature: Temperature is for randomness of the LLM model which encourages to have more diverse context and words. In general, you will need to decrease this setting value to have more stable and reliable feedback from LLM model if you tried to make a question answering problem. Other creative tasks, you may like to increase the temperature value.
- Top P: Same as the temperature, if you need a more exact answer, try to lower this setting down. As it selects the most confident answers as you lower its value. It controls how the deterministic the model will be.
- Max Length: The maximum number of words (tokens) that can be used to prevent irrelevant answers.
- Stop Sequences: A specified string that signals the model to stop generating further tokens.
- Frequency Penalty: This applies a penalty to a token based on how often it has already appeared in the context. The more frequently a word occurs, the greater the penalty it receives. This means the likelihood of that token being used again is reduced, encouraging the model to generate more diverse and varied responses.
- Presence Penalty: Similar to Temperature and Top P settings and opposite to Frequency Penalty, the higher the presence of a token in the given token, the higher its presence penalty will be. The model prevents repeating the word, in case it achieved a certain limit (you set it up) to manage the diversity of the model
4. What are the datasets that can be used for testing LLMs?
Here are some datasets that are commonly used with LLMs:
- NumerSense (Lin et al., 2020) consists of numerical statements about common objects and concepts where for each sentence we need to recover a masked number word. The choices are integers ranging from zero to ten, plus the word no, so the task can be framed as a multiple-choice problem.
- CommonsenseQA (CSQA) (Talmor et al., 2019) is a 5-way multiple-choice QA dataset about common world scenarios.
- CommonsenseQA 2.0 (CSQA2) (Talmor et al., 2021) is a binary classification dataset where we need to judge whether commonsense statements are true or false.
- Question Answering via Sentence Composition (QASC) (Khot et al., 2020) is an 8-way multiple choice QA dataset about grade school science. This dataset also includes two pieces of background knowledge per question, whose composition fully answers the question.
You can find the most recent datasets using this Github repo.
5. What are the Prompt Engineering techniques?
1. Zero Shot Prompting
Directly Asking the LLM:
2. Few Shot Prompting
Giving examples to LLM. Single shot is for providing one example. For more complex tasks, you try to give more than one example as 3-shot (3 examples), 5-shot and etc.
3. Chain of thought (COT) prompting
Chain of thought prompting is about giving thoughts to LLMs step by step and expect obtaining same steps in thinking from LLMs with a similar question. There are different types of COT as follows:
A) Using some logical step and it can be mixed with few-shot prompting.
B) COT with zero shot prompting which is used by adding one more sentence “Let’s think step by step.”
C) Automatic COT
To understand Automatic COT, we need to have an example to see the response of LLMs. Here’s an example to find out the response of LLM.
Fig. 6: Example to see the response of LLMs
As you can see in that question, the LLM didn’t consider the last two kittens, that’s why we need to develop another prompt using Automatic COT.
Auto-CoT consists of two main stages:
- Stage 1): question clustering: partition questions of a given dataset into a few clusters.
- Stage 2): demonstration sampling: select a representative question from each cluster and generate its reasoning chain using Zero-Shot-COT with simple heuristics.
4. Self-Consistency
Self-consistency is about boosting the performance of COT prompting on tasks involving arithmetic and commonsense reasoning by providing different examples using few-shot COT to LLMs and use the generations to select the most consistent answer.
SELF-CONSISTENCY HELPS WHEN CHAIN-OF-THOUGHT HURTS PERFORMANCE
The paper Wang et al. (2022) that suggested self-consistency had named the COT “The greedy decode” as it greedily decodes the optimal reasoning path to generate a series of short sentences that mimic the reasoning process a person might employ in solving a task.
.
5. Generate Knowledge Prompting
Encourages the model to generate responses based on existing knowledge. This method is used for Commonsense Reasoning. This technique involves prompting the model to first generate relevant facts needed to complete the prompt. Then it proceeds to complete the prompt. This often results in higher completion quality as the model is conditioned on relevant facts.
Factors contribute to the performance of generated knowledge prompting according to Liu et al. 2022:
- The quality of knowledge
- The quantity of knowledge where the performance improves with more knowledge statements.
- The strategy for integrating knowledge during inference.
Here’s an example to figure out what ChatGPT will do in Fig. 10.
Fig. 10: Example for Generate Knowledge Prompting.
6. Tree of Thoughts
It is used for complex tasks that require exploration or strategic lookahead, traditional or simple prompting techniques fall short Yao et el. (2023) and Long (2023). They proposed Tree of Thoughts (ToT), a framework that generalizes over chain-of-thought prompting and encourages exploration over thoughts that serve as intermediate steps for general problem solving with language models.
The LM’s ability to generate and evaluate thoughts is then combined with search algorithms (e.g., breadth-first search and depth-first search) to enable systematic exploration of thoughts with lookahead and backtracking.
7. Prompt Chaining
Breaking a question in a prompt into a group of simple questions. This helps the model solve problems in a series of intermediate steps rather than directly answering the question. Prompt chaining involves using the output from one prompt as the input for another prompt.
Prompt chaining is particularly useful when building LLM-powered conversational assistants and improving the personalization and user experience of your applications. And maybe used in parallel processing, verifying outputs and multi-step tasks that require multiple distinct steps, such as researching a topic, outlining an essay, writing the essay, then formatting the essay.
7. Resources:
- Retrieval Augmented Generation (RAG) | Prompt Engineering Guide (promptingguide.ai)
- Chain prompts (anthropic.com)
- Generated Knowledge Prompting for Commonsense Reasoning
- Self-Consistency Improves Chain of Thought Reasoning in Language Models
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models