What is Prompt Engineering in AI

Raktim Singh
12 min readDec 27, 2023

Prompt engineering is a concept in AI (basically in natural language processing) where the task description is embedded in the input, e.g., as a question instead of being implicitly given.

In this case, the task description is provided as input, such as a query, rather than being assumed.

It means denoting a particular task as a collection of inputs or prompts to the AI Engine.

Consider the AI Engine to be a model.

For the model to define the solution space from which responses are anticipated and extract the most optimal response to your query, the prompt should contain adequate details to describe the problem and its context.

In general, prompt engineering operates through training a language model and converting one or more tasks into a prompt-based dataset.

The language model is trained using the information or data provided in the prompt.

As a result, this approach is called “prompt-based learning.” A sizable “frozen” pre-trained language model is sufficient for prompt engineering.

Under such circumstances, solely the prompt’s representation is acquired (i.e., optimized) through the implementation of techniques like “prefix-tuning” or “prompt tuning.”

A prompt may contain one or more elements listed below:

1. Question or Instructions

2. Data Entry (optional)

3. Illustrative instances (optional)

Language Model: Currently, it is imperative that we comprehend the language model underlying Prompt Engineering.

Natural Language Processing (NLP) is a critical component of AI technology.

With Large Language Models (LLMs), NLP is advancing.

The LLMs are neural networks with hundreds of billions of parameters at most. They undergo training using voluminous amounts of text data in the hundreds of gigabytes.

Early in 2007, a group of Google researchers, T. Brants et al., empirically demonstrated that the quality of language translation, as measured by the Bilingual Evaluation Understudy (BLEU) score, enhances as the size of the language model increases.

Over the years, businesses have undertaken numerous initiatives in this regard.

A few prominent large language models include

1. OpenAI GPT-3– This July 2020-released model contains 175 billion parameters and is trained using more than 570 GB of text data. Training for this model was conducted on NVIDIA V100 GPUs. A supercomputer is utilized for educational objectives, which is hosted in the Azure cloud and comprises 10,000 high-end GPUs and 285,000 CPU cores.

2. Cohere platform — A platform that provides representation language models and generative language models.

3. The Megatron-Turing Natural Language Generation model (MT-NLG), trained on NVIDIA’s Selene machine learning supercomputer, consists of 530 billion parameters and was developed in collaboration between NVIDIA and Microsoft.

4. GPT-Neo, GPT-J, and GPT-NeoX are LLMs trained on the Pile dataset, an open-source language modeling dataset spanning approximately 825 GB. These models perform admirably when given a limited number of examples.

5. BERT: BERT is a family of LLMs, which Google introduced in 2018. Its architecture is a stack of transformer encoders with around 342 million parameters.

6. Claude: It was created by the company Anthropic

7. GPT-4: It is the largest model in OpenAI’s GPT series, released in 2023. It’s a transformer-based model.

Its parameter count has not been released to the public. (It can be 170 trillion +).

It is a multimodal model, meaning it can process and generate language and images instead of being limited to only language.

It also introduced a way, which helps users specify tone of voice and task.

Apart from these, Cohere, Ernie, Lamda and Llama are also there.

OpenAI GPT-3 and similar models can process input instructions in Spanish, French, and English. However, since the model is trained on a predominantly English-language dataset, English-language outcomes are highly probable.

The model is equipped with parameters, including Temperature and Top_p, which enable users to instruct the model to generate more adaptable output.

If you must utilize GPT-3/ChatGPT or any other model, consider the following:

1. Select the appropriate learning level in response to the prompt.

2. Create an effective, prompt design. The formulation of the prompt is critical. This requires 1. Capacity to define the context precisely 2. Capability to define the intent precisely?

3. Illustrate instances that establish a discernible “pattern” between the inputs and the anticipated output.

4. Accurately configuring the model parameters.

A prompt is an input or instruction, along with its context, supplied to the model.

A model provides a response or completion in response to the prompt. Enhancing the design of the prompt results in a more precise output response.

The precision or correctness of the detail or “accuracy” of the prompt itself determines the accuracy or correctness output of the LLM. Alternatively stated, the input prompts to GPT-3 should be designed to yield an output that is more closely aligned with the intended output.

GPT-3 offers a collection of models, including ‘code-DaVinci-002’ (classified in the CODEX models series), that facilitate code-to-natural language translation. GPT-3 presently provides support for more than a dozen programming languages.

Supported programming languages include Python, JavaScript, SQL, CSS, GO, and Shell. The input prompt may be a code or a natural language instruction. The CODEX models and the Software Development Life Cycle can achieve massive efficiency gains.

1. It can generate code in response to natural language requirements.

2. Refactor the code and rectify numerous errors

3. Include required annotations within the code 4. Generate a document concerning the input code provided.

According to the OpenAI documentation, Python code generation is the area in which the GPT-3 is presently the most resilient.

GPT is a large-scale machine learning model that applies deep learning techniques to support context- and time-appropriate natural language conversations.

GPT-3 represents the third iteration of the OpenAI-original GPT language paradigm.

By leveraging historical data and predictive analytics, GPT-3 facilitates conversational exchanges. It can extract substantial quantities of contextually pertinent text, images, and videos from internet data (small-sized text/pictures/videos).

It can perform various functions, including the following:

1. Translating text between languages

2. Composing melodies or poems

3. Produce entirely fresh text or narrative

4. Produce software source code.

Based on the sampling methodology, the GPT-3 Chatbot’s conversation or interaction with the customer will be more refined and precise as the model or sampling size increases.

GPT was initially introduced in 2018 with a total of 117 million parameters.

GPT-3 has 175 billion parameters, which aid in analyzing pertinent data by the model proportions.

GPT-3 175B evaluates the contextual relevance of data and recognizes associations and patterns to assist the conversational interface between the system and its clients.

It operates on multiple strata to construct a model. Every successive layer represents a progression in intelligence, with the most recent layer exhibiting the highest level of sophistication at any given moment.

In this model, editing is feasible, including deleting layers and inserting/adding ones. This enhances the model’s capacity to function at its maximum efficiency.

By analyzing examples, GPT-3 can form associations between diverse inputs and produce textual outputs such as sentences, paragraphs, melodies, and images.

Additionally, it can detect patterns and associate words or objects. Thus, the output of a task requiring the generation of a sentence, passage, or melody using the input training data will appear to have been “created” by a human.

One advantage of GPT-3 is its potential to decrease the quantity of annotated instances required for training a deep neural network.

• It is a collection of methodologies that can be utilized to train a model to forecast the labels of an unlabeled random subset of input data.

• This enables the neural network to acquire knowledge and construct superior features that were previously potential only with labeled data of superior quality.

• Present iterations produce outputs comparable to those generated by humans with minimal instruction and input data. New versions of GPT-3 were made available by OpenAI on March 15, 2022.

OpenAI introduced ChatGPT on November 30, 2022, an enhanced iteration of a model from the GPT-3.5 series.

ChatGPT: Naturally engaging ChatGPT is, as its name implies, a chatbot constructed using the GPT-3.5 family of Large Language Models from OpenAI.

This chatbot surpasses its predecessors by incorporating additional functionalities, including recalling past interactions, filtering content for specified objectives, and remembering previous exchanges.

Introduced in November 2022, ChatGPT attracted unparalleled interest due to its capability of producing comprehensive responses to inquiries spanning various domains.

AI powers ChatGPT, which can be utilized to pose inquiries.

ChatGPT was developed by OpenAI utilizing the RLHF (Reinforcement Learning from Human Feedback) methodology.

An AI is instructed using a system of rewards and punishments in this approach. Rewarding the desired action and punishing the undesirable one are the two outcomes. Implementing this binary response system empowers the RLHF to pose appropriate inquiries and produce accurate responses.

Furthermore, this methodology facilitates seamless conversation devoid of technical jargon and terminology.

Simply put, ChatGPT is a method for inciting artificial intelligence to respond to a query in a manner resembling that of a human.

This functionality enables users to interact with ChatGPT in a manner resembling human discourse, allowing users to produce unique content. However, what distinguishes ChatGPT from human AI trainers is the “additional training” it underwent.

A substantial quantity of inquiries and responses were inputted into the initial language model, which was subsequently appended to its dataset. Additionally, experts ranked the program’s responses to various queries from worst to best.

This is the underlying rationale behind ChatGPT’s ability to produce authentic, human-like replies after comprehending the query and amassing the relevant data.

Now, the query must be answered is how businesses can utilize ChatGPT.

Given its foundation in the Large Language Model and Reinforcement Learning from Human Feedback, this system exhibits versatility beyond automating mundane duties.

In addition to generating code and customized instructions, conducting research compilations, composing marketing content, delivering after-sales support, and augmenting customer engagement are among its many other potential applications. It is crucial to highlight that ChatGPT was trained using external data rather than knowledge gained from user interactions (e.g., Wikipedia, filtered versions of Common Crawl, etc.).

This distinction sets ChatGPT apart from other conversational chatbots that acquire knowledge through user interaction.

DALL-E2
DALL-E and DALL-E 2 are deep learning models developed by OpenAI to generate digital images from natural language descriptions, called “prompt”. With ChatGPT, one can generate various text ( story/poem.etc).

In similar way, with DALL-E2, one can generate various images/pictures. It also uses a version of GPT-3.
DALL-E, the software’s name is a blend of words.

It has been taken from the names of animated robot Pixar character WALL-E and the Spanish surrealist artist Salvador Dali.

How ChatGPT works

ChatGPT responds based on the context and intent behind a user’s question.

When you ask any search engine (Say Google) to look up something, it searches its database for pages that match that request.

Here 2 main phases are there.

1. The data gathering phase and

2. The user interaction/lookup phase.

In ChatGPT, the data-gathering phase is called pretraining, e while the user responsiveness phase is called inference.

The magic behind GEN AI is that the way pretraining works has suddenly proven to be enormously scalable.

ChatGPT uses non-supervised pretraining which is a game changer here.

The model is trained to learn the underlying structure and patterns in the data without any specific task in mind.

A typical pretraining task is to predict the next word in a sequence.

With the entire training dataset as context, the model can apply patterns learned in the task.

For example, the first type of learning is that the word “going” is often followed by “to.” Or that “thank” is followed typically by “you.”

We humans don’t learn every new language or process from scratch.

We rely on previous experience or knowledge (books) to help us understand & complete new tasks. ChatGPT’s technology works similarly.

It records these patterns and stores them as parameters (data points). Then, it can refer to them to make further predictions or solve problems.

At the end of the pretraining process, OpenAI said ChatGPT had developed 175 billion parameters.

This vast amount of data means more options for the system to pull from for an accurate response.

Reinforcement Learning from Human Feedback (RLHF)

LLMs are generally functional after pre-training. But ChatGPT also underwent another pioneering OpenAI process called Reinforcement Learning from Human Feedback (RLHF).

This worked in two stages:

The developers gave the system specific tasks to complete (e.g., answering questions or generating creative work)

Humans rated the LLM’s response for effectiveness and fed these ratings back into the model, so it understood its performance.

RLHF’s fine-tuning made ChatGPT more effective at generating relevant, valuable responses every time.

Critical Parameters that govern the output of the model

  1. Decoding: the procedure by which a model selects the tokens to be included in the output. Greedy decoding operates by choosing the maximum probability token at every stage of the decoding procedure. In less creative or fact-based use cases, greedy decoding generates output resembling the most prevalent language in the model’s pretraining data and the query text. This characteristic is preferable.

2. Sampling decoding is more stochastic and variable in nature than greedy decoding.

Note that, randomness and variation are preferred in creative use cases.

3. Temperature sampling is choosing the next token with a high or low probability.

a. Top-k sampling is a method wherein the subsequent token is chosen randomly from a predetermined set of k tokens with the most significant probabilities.

b. Top-up sampling is a method where the subsequent token is chosen randomly from the smallest set of tokens such that the cumulative probability surpasses a predetermined value, denoted as p.

Values can be specified for both the Top K and Top P.

In the presence of both parameters, Top K is executed initially. Tokens with probabilities less than the threshold established by Top K are regarded as having a zero probability when Top P is calculated.

4. Random Seed: Using sampling decoding, when you repeatedly input the same prompt into a model, you will typically receive a different generated text each time.

This variation is the result of the decoding procedure incorporating deliberate pseudo-randomness.

5. Repetition penalty: An approach to address the issue of repetitive text in the generated result for the selected query, model, and parameters is to implement a repetition penalty.

6. Stopping criteria: The duration of the model’s output can be modified by specifying stop sequences and configuring minimum and maximum token values.

When the model deems the output complete, a halt sequence is produced, or the maximum token limit is reached, text generation ceases.

A stop sequence comprises a minimum of one character. Specified stop sequences cause the model to cease output generation automatically when one of the specified stop sequences appears in the generated output.

7. Minimum and maximum new tokens: Attempt to modify the parameters that govern the number of tokens generated if the model’s output is excessively brief or lengthy.

Before concluding, I will summarize my chat with my friend, Sai.

Me: I was watching the movie ‘A Few Good Men.’

Sai: Oh…Yeah…the iconic dialogue “You can’t handle the Truth.”…So, what is the relevance here?

Me: After Danny (Tom Cruise) picks up the case of soldiers, he comes to know that they were asked to follow the order Code Red.

But they never told him this earlier. These are the lines from that conversation.

Danny: Did Lieutenant Kendrick order you guys to give Santiago a code red

Soldier: Yes, sir

Danny: You mind telling me why the hell you never mentioned this before

Soldier: You didn’t ask us, sir

Sai: Interesting. They never told him because he never asked.

Me: Yes…The power of asking the right questions.

So now, one should know how to frame & ask the right questions.

ChatGPT works on a Large Language model, trained with an extensive data set.

So, you can get the answers but ask the right question. More specific questions, with proper context & correct intent, will help you get the right answers quickly.

But yes, remember to ask the right question.

In the future, the job description of an engineer will be ‘Ability to frame right set of questions.’ After that, these generative tools ( ChatGPT, DALL-E2, etc.) can generate a proper set of code, stories, poems, or pictures.

--

--

Raktim Singh

RAKTIM has done B.TECH from IIT-BHU. He joined Infosys in 1995. He is author of Amazon Best Seller 'Driving Digital Transformation'. www.raktimsingh.com