Mastering the Art of AI Prompting

Chuka University
CDAM
Data Science
Machine Learning
AI Tools
Statistics
Author

D K. Muriithi | CDAM-Chuka University

Published

March 28, 2026

About this Course

Artificial intelligence (AI) language models are a popular tool in natural language processing (NLP) that allows computers to interact with humans using natural language. The models are trained to understand and generate human language. These models use algorithms and machine learning techniques to:

  • Analyze large amounts of text data.

  • Learn patterns and relationships in the language.

  • Generate new text that is similar in style and content to the training data

Some of the AI language models include:

  • ChatGPT, developed by OpenAI and backed by Microsoft

  • Claude, developed byAnthropic

  • Gemini, developed by Google DeepMind

  • Grok, developed by xAI (Elon Musk’s AI company)

  • DeepSeek, developed by DeepSeek (Hangzhou DeepSeek Artificial Intelligence)

  • Qwen, developed by Alibaba Cloud

  • Copilot, developed by Microsoft in collaboration with OpenAI

AI language models are capable of generating complex sentences and paragraphs, answering questions, and even writing creative fiction! They have extensive, potential applications across industries in today’s world.

What you will learn

After completing this course, you should be able to:

  • Describe an AI language model

  • Explain how an AI language model understands and responds to humans

  • Identify the rules to follow to write effective prompts to generate focused and accurate results from an AI language model

  • List the steps to sign up for a ChatGPT account

  • Follow the steps to effectively write and refine a series of prompts for ChatGPT for a travel itinerary scenario

  • Demonstrate the steps to effectively write and refine a series of prompts for ChatGPT to create a custom music playlist

What is an AI language model?

An AI language model is a type of artificial intelligence technology that is designed to understand, interpret, and generate human language. These models use machine learning algorithms to learn the patterns and structures of natural language, allowing them to analyze and produce text that appears to be written by humans.

AI language models can perform a wide range of tasks, including language translation, summarization, sentiment analysis, question-answering, and even content creation. Some of the most advanced AI language models, such as the Generative Pre-trained Transformer 5 (GPT-5), can generate highly coherent and contextually appropriate text that closely mimics human writing.

This course focuses on ChatGPT. You’ll examine how to write questions, also known as prompts, so ChatGPT generates natural language responses. To get the best responses, you must write clear and concise prompts that provide enough information for ChatGPT to understand what you are asking.

A key characteristic of ChatGPT is that it remembers the prompts you enter. This allows you to carry on a conversation with the chatbot where its responses build on your prompts. It goes beyond the one-question-one-answer interactions of other models.

AI language models need your help to grow

Imagine having a conversation with a machine that not only understands what you’re saying, but can also provide insightful and relevant responses in a matter of seconds. That’s exactly what AI language models, like ChatGPT, are capable of.

The latest AI language models have been trained on a massive collection of text, making them some of the most intelligent machines out there. They are capable of tasks ranging from answering trivia questions to composing poetry. 

But, they have limitations and room for improvement. They need your help to learn by interacting with you.

In this course, you’ll try your hand at the art of prompt writing.

  • How can your wording, tone, and context of a prompt illicit different responses from an AI language model?

  • How can you build an ongoing conversation with questions to get the most relevant answers?

  • How can you make the most of an AI language model’s incredible capabilities?

Why is prompt writing an important skill?

In a world that is increasingly reliant on machines to help with everyday tasks, being able to communicate effectively with them is essential. Whether you’re a researcher, developer, writer, or someone who’s curious about the world, knowing how to interact with AI language models can open up a whole new world of possibilities.

Are AI language models confused by improperly asked questions?

Do family or friends ever ask you questions that you have difficulty understanding? As a result, do you ever have difficulty answering those questions? Maybe you can use body language and familiarity with them to infer what they ask. Computers don’t have the benefit of human-to-human interaction.

An AI language model’s ability to understand and respond to questions depends on the quality and clarity of the input it receives. If a question is improperly asked, unclear, or contains errors, the AI language model might have difficulty understanding the question and generating an appropriate response.

However, AI language models are designed to be very robust and can often interpret even poorly worded questions to some extent, using context and other clues in the input to generate a response.

NB: It is important to note that while AI language models can generate human-like responses, they are not a perfect tool. They might sometimes provide inaccurate or irrelevant responses, especially if the input is particularly ambiguous or nonsensical.

Good prompts are the key

Prompt writing is important when using AI language models to get good results. Don’t try to cover everything in a single prompt that becomes complicated and difficult to follow. If it isn’t easy for a person to understand, then you can expect that it will not be easily understood by a computer (yet).

You want to treat the language model as the second side of a conversation. The good thing is newer language models are great at remembering the prompts you asked previously in the same chat. The models take those into consideration when formulating its responses. Therefore, it’s to your advantage to know a few basics about writing good prompts before diving into your first conversation.

Continue on to explore how to write good prompts for AI language models and take your conversations with machines to the next level!

The rules of writing an effective prompt

When it comes to writing prompts for AI language models, there are some general rules that can help ensure that your prompts are effective and produce the desired results.

Select each section to carefully review some of the do’s of prompt writing.

Be clear and specific

Clearly state what you want the AI language models to do and provide specific details about the task at hand.

Use correct grammar and spelling

Make sure your prompt is free of errors to avoid confusion or misunderstandings. New AI language models can correct spelling errors and do not need precise language. This is a very useful capability, but good grammar and spelling reduces potential errors in the responses to your prompts.

Keep it concise

Avoid making your prompt too long or complicated. Shorter prompts tend to work better.

Here’s an example of a complex prompt that is overly long and includes multiple questions within a single sentence: “Could you please tell me about the different types of software programs that are currently available for use on personal computers, including their features and functionality, as well as any advantages or disadvantages that they may have when compared to one another?”

Be polite and respectful

AI language models don’t have feelings, but it’s still good to be polite and respectful when asking for help.

Use an appropriate format

Use the correct format for the prompt depending on what you want the AI language model to do. 

For example, if you want ChatGPT to generate text, use a text generation prompt. An example of a good prompt for text generation depends on the specific task you want the model to perform. Here are a few examples:

  • For a summarization task: “Please summarize this news article in one or two sentences.”

  • For a creative writing task: “Write a short story about a person who discovers a hidden world.”

  • For a dialogue generation task: “Write a conversation between two people discussing their favorite hobbies.”

  • For a translation task: “Please translate this sentence from English to French: ’The cat is sleeping on the couch.”

  • For a question-answering task: “What is the capital of Kenya?”

Select each section to carefully review some of the dont’s of prompt writing.

Don’t use vague language

Avoid using vague or ambiguous language that can be interpreted in different ways. Be specific and clear about what you want the AI language model to do.

For example, this prompt–“Write a story about a man who caught a fish.”–is vague because it does not provide specific details about the story’s setting, plot, or characters. As a result, there are many ways to interpret the prompt and write a story based on it.

Don’t use slang or jargon

AI language models might not be familiar with slang or jargon, so it’s best to avoid using these types of terms in your prompts.

Don’t over-complicate the task

Keep the task simple and straightforward. Avoid asking for too much or making the task overly complex, as this can lead to inaccurate or incomplete results.

Don’t be too broad

Avoid writing prompts that are too broad or general. This can make it difficult for the AI language model to understand what you’re asking for and lead to poor results. 

For example, this prompt–“Write an essay about technology.”–doesn’t provide any specific focus or direction for the essay. The term “technology” is very general and can encompass a wide range of topics. Without a more specific focus, the AI language model might struggle to generate a cohesive and well-supported essay on the topic of technology. The resulting essay could end up being too general or unfocused, or it might not address the specific aspects of technology that the user was hoping to explore.

NB: Another good rule to follow is to always proofread your prompts before submitting them to an AI language model. Errors in spelling or grammar can lead to inaccurate results, so it’s important to make sure your prompts are free of errors.

With AI language models, you can request whatever you want and then refine your prompts based on the responses provided. Basically, you can have a question-answer conversation with an AI language model. In fact, the activities in this course were partially created by asking ChatGPT for advice about how to create them and they started with prompts! 

Getting started with AI tool

You need to set up an account with ChatGPT to use it in this course. But don’t worry, if you prefer not to sign up or use ChatGPT, you can still continue. This course will help you understand how to perfect your prompt writing for any generative AI model.

There are two activities in this course.

  1. In a guided activity, you’ll follow a scenario to learn about effective prompt writing for ChatGPT.

  2. Then, you’ll participate in an activity to apply the skills you learned to write your own effective prompts for ChatGPT.

For both activities, you’ll use ChatGPT. Take a minute now to set up an account with ChatGPT(opens in a new tab).

Start with a prompt

Keep these rules in mind to write an effective prompt.

  • Be clear and specific, stating with what you want ChatGPT to accomplish.

  • Use correct grammar and spelling to avoid confusion or misunderstandings.

  • Be concise by using short prompts rather than making the prompts too long or complicated.

  • Use the appropriate format by writing the prompts to request the type of response you want.

  • Proofread your prompts to reduce confusion.

Tasks

  1. Create a prompt for a folktale story titled “A farmer who shared his last meal in Kenya”

  2. Create a prompt for a Chuka University story titled “A famous philanthropist and seer by the name Jerusha Kanyua”

  3. Create a prompt for data Science and AI Career in Kenya

  4. Create a prompt for an itinerary for a Mombasa vacation.

What is generative AI?

Generative AI refers to deep-learning models that can generate high-quality text, images, and other content based on the data they were trained on.

1. Definition of Generative AI

Generative AI refers to a class of artificial intelligence systems designed to create new data that resembles existing data. Instead of only analyzing or predicting outcomes, these models learn underlying patterns and generate novel outputs such as text, images, audio, or code.

Formally, generative models approximate the data-generating distribution ( P(X) ) or conditional distribution ( P(X|Y) ), enabling the synthesis of new samples.

Example:
A generative AI model trained on medical reports can produce a new, realistic clinical summary based on input symptoms.

2. Difference from Traditional AI

Aspect Traditional AI Generative AI
Objective Prediction / classification Creation / synthesis
Output Labels, scores, decisions New content (text, images, etc.)
Learning focus P(Y/X) Model how data is generated
Example Spam detection Writing an email from scratch

Key distinction:
Traditional AI answers “What is this?”
Generative AI answers “Create something new like this.”

3. Underlying Techniques and Models

Generative AI relies heavily on advanced machine learning architectures:

a) Neural Networks

  • Function approximators that learn complex nonlinear relationships.

  • Foundation for most generative models.

b) Transformer Models

  • Introduced in the Attention Is All You Need.

  • Use self-attention mechanisms to capture long-range dependencies in data.

  • Highly effective for sequential data (e.g., text).

c) Large Language Models (LLMs)

  • Built on transformer architectures.

  • Trained on massive corpora to generate human-like text.

  • Example: ChatGPT.

d) Other Generative Models

  • GANs (Generative Adversarial Networks)
    Two networks (generator vs discriminator) compete to improve realism.

  • VAEs (Variational Autoencoders)
    Encode data into latent space and reconstruct new samples.

  • Diffusion Models
    Gradually remove noise from random signals to generate high-quality outputs (used in image generation).

4. Real-World Applications

a) Healthcare

  • Synthetic patient data generation for research.

  • Drug discovery and molecular design.

  • Medical imaging enhancement.

b) Finance

  • Fraud detection simulations.

  • Scenario generation for risk modeling.

  • Automated financial report writing.

c) Education

  • Intelligent tutoring systems.

  • Automated content generation (notes, quizzes).

  • Personalized learning pathways.

d) Creative Industries

  • Text generation (stories, scripts).

  • Image and video creation.

  • Music composition.

Example:
A model can generate realistic X-ray images to augment training datasets where real data is scarce.

5. Advantages of Generative AI

  • Automation of content creation
    Reduces human effort in writing, design, and coding.

  • Scalability
    Can generate large volumes of data quickly.

  • Personalization
    Tailors outputs to user-specific inputs.

  • Data augmentation
    Useful in domains with limited datasets (e.g., healthcare).

6. Limitations of Generative AI

  • Data dependency
    Requires large, high-quality training datasets.

  • Hallucinations
    Models may generate plausible but incorrect information.

  • Computational cost
    Training and deployment are resource-intensive.

  • Lack of true understanding
    Models rely on statistical patterns, not reasoning or consciousness.

7. Ethical Considerations

a) Bias

  • Models inherit biases from training data.

  • Can reinforce social inequalities if unchecked.

b) Misinformation

  • Capable of generating convincing fake content (text, images, deepfakes).

c) Data Privacy

  • Risk of memorizing and reproducing sensitive data.

  • Requires strict governance and anonymization.

d) Intellectual Property

  • Ownership of AI-generated content remains legally ambiguous.

8. Simple Illustrative Example

Task: Generate a short paragraph about malaria symptoms.

  • Input: “Describe common malaria symptoms.”

  • Output (generated): A coherent paragraph describing fever, chills, and fatigue.

This demonstrates learning of linguistic patterns, not factual verification.

9. Summary

  • Generative AI focuses on creating new data rather than just analyzing it.

  • Built on advanced architectures like transformers and neural networks.

  • Widely applied across healthcare, finance, education, and creative fields.

  • Offers significant benefits but introduces technical and ethical challenges.

Mastering the Art of AI Prompting for Data Science

1. Introduction to AI Prompting in Data Science

Definition

AI prompting in data science refers to the structured formulation of inputs (instructions, context, and constraints) given to AI systems to perform analytical, statistical, or computational tasks. It is essentially a form of human–AI interface design where the prompt determines the quality, reproducibility, and validity of outputs.

How Modern Models Interpret Queries

Modern AI systems, particularly transformer-based models (e.g., large language models), process prompts using:

  • Tokenization: Breaking text into units (tokens)

  • Context windows: Interpreting relationships across tokens

  • Attention mechanisms: Prioritizing relevant parts of the input

  • Pretrained statistical patterns: Mapping prompts to learned representations

These models do not “understand” data in a causal sense; they approximate patterns based on training distributions. Therefore, prompt clarity directly influences output reliability.

Role in Data Science Workflows

Prompting enhances:

  • EDA: Rapid summaries and hypothesis generation

  • Modeling: Code generation, diagnostics, and interpretation

  • Reporting: Automated narratives and visualization explanations

  • Reproducibility: Structured workflows via standardized prompts

2. Core Principles of Effective Prompting

1. Precision in Task Specification

Avoid vague instructions.

Weak:

Fit a model

Strong:

Fit a logistic regression model predicting malaria test outcome using age, temperature, and parasite type, including interaction terms between age and temperature.

2. Structured Context

Always include:

  • Dataset description

  • Variable definitions

  • Assumptions

Example:

Dataset contains 1,000 observations with variables: age (years), temperature (°C), parasite_type (categorical), malaria_result (binary).

3. Role-Based Prompting

Assign domain expertise:

Act as a data scientist analyzing clinical trial data.

This improves:

  • Statistical rigor

  • Interpretation quality

  • Method selection

4. Explicit Output Formatting

Specify structure:

Provide:

  • R code using ggplot2

  • Summary table

  • Interpretation in 3–5 bullet points

5. Iterative Refinement

Prompting is not one-shot. Use:

  • Follow-up prompts

  • Error correction

  • Output validation

3. Prompting for Key Data Science Tasks

A. Exploratory Data Analysis (EDA)

Strong Prompt Example

Perform EDA on a malaria dataset with variables (age, region, temperature, parasite_type, test_result).

  • Compute summary statistics

  • Identify missing values

  • Detect outliers

  • Generate visualizations using ggplot2

  • Provide interpretation of patterns

Example (R Code Output)

library(ggplot2)  summary(data)  ggplot(data, aes(x = age)) +   geom_histogram(binwidth = 5)  ggplot(data, aes(x = region, fill = test_result)) +   geom_bar(position = "fill") 

Interpretation

  • Age distribution is right-skewed

  • Certain regions show higher malaria prevalence

  • Missing values concentrated in temperature

B. Statistical Modeling

Prompt Example

Fit a logistic regression model predicting malaria_result using age, temperature, and parasite_type.
Report coefficients, odds ratios, confidence intervals, and model diagnostics.

Example (Python)

import statsmodels.api as sm  X = data[['age', 'temperature']] X = sm.add_constant(X) y = data['malaria_result']  model = sm.Logit(y, X).fit() print(model.summary()) 

Interpretation

  • Positive coefficient for temperature suggests increased malaria risk

  • Odds ratio > 1 indicates higher likelihood

  • Check p-values (< 0.05 for significance)

C. Machine Learning

Prompt Example

Train a Random Forest and XGBoost model to predict malaria outcome.
Include feature importance, cross-validation (5-fold), and hyperparameter tuning.

Example (Python)

from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import cross_val_score  rf = RandomForestClassifier(n_estimators=100) scores = cross_val_score(rf, X, y, cv=5)  print(scores.mean()) 

Key Outputs

  • Model accuracy

  • Feature importance ranking

  • Overfitting diagnostics

D. Data Cleaning and Transformation

Prompt Example

Clean dataset by:

  • Imputing missing values using median for numeric variables

  • Encoding categorical variables

  • Standardizing continuous features

Example (Python)

from sklearn.preprocessing import StandardScaler  data.fillna(data.median(), inplace=True) scaler = StandardScaler() data[['age', 'temperature']] = scaler.fit_transform(data[['age', 'temperature']]) 

E. Programming Support (R/Python)

Prompt Example

Generate R code using caret to train a classification model with SMOTE sampling.

library(caret) library(DMwR)  train_control <- trainControl(method="cv", number=5, sampling="smote")  model <- train(malaria_result ~ ., data=data,                method="rf",                trControl=train_control) 

4. Prompt Engineering Techniques

1. Decomposition

Break complex tasks:

Step 1: Clean data
Step 2: Perform EDA
Step 3: Fit model
Step 4: Interpret results

2. Constraints

Control outputs:

Use only base R functions
Avoid data leakage
Use 95% confidence intervals

3. Templates

Use structured prompts:

Task: Data: Method: Output format: Assumptions: 

4. Statistical Rigor Control

Explicitly request:

  • Assumptions (normality, independence)

  • Diagnostics (residual plots, VIF)

  • Uncertainty (confidence intervals)

5. Advanced Strategies

Model Interpretation

Prompt Example

Interpret Random Forest model using SHAP values and partial dependence plots.

Key Outputs

  • Feature contribution

  • Nonlinear relationships

  • Interaction effects

Reducing Hallucinations

  • Provide real data samples

  • Request “show calculations”

  • Ask for uncertainty statements

Verification

  • Cross-check outputs manually

  • Validate statistical assumptions

  • Re-run with alternative methods

Reproducible Workflows

Design prompts that:

  • Include seeds (set.seed(123))

  • Specify software versions

  • Output full scripts

6. Practical Examples (Before vs After)

Example 1: EDA

Weak Prompt

Analyze this dataset

Strong Prompt

Perform EDA on a malaria dataset with variables (age, region, temperature, parasite_type).
Generate summary statistics, visualize distributions using ggplot2, and highlight key epidemiological patterns.

Example 2: Modeling

Weak Prompt

Build a model

Strong Prompt

Fit a logistic regression model predicting malaria_result.
Include interaction terms, check multicollinearity, report odds ratios, and interpret results.

7. Common Pitfalls

  • Ambiguous variable definitions

  • Missing dataset context

  • Ignoring statistical assumptions

  • Blind trust in AI outputs

  • Lack of reproducibility

8. Hands-on Exercises

Exercise 1: Malaria Prediction

Construct a prompt to:

  • Clean data

  • Perform EDA

  • Train Random Forest

Model Answer (Example Prompt):

Clean the malaria dataset by imputing missing values and encoding categorical variables. Perform EDA with summary statistics and visualizations. Train a Random Forest model using 5-fold cross-validation and report feature importance.

Exercise 2: Clinical Trial Analysis

Task:

  • Compare treatment vs control using t-test

  • Report confidence intervals

Exercise 3: Survey Data

Task:

  • Perform chi-square test

  • Interpret association

9. Conclusion and Best Practices

Prompt Quality Checklist

  • Clear objective

  • Defined variables

  • Appropriate statistical method

  • Explicit output format

  • Reproducibility elements

Best Practices

  1. Treat prompting as experimental design

  2. Always validate outputs statistically

  3. Use iterative refinement

  4. Combine AI outputs with domain expertise

  5. Document prompts for reproducibility

Final Note

AI prompting is not a replacement for statistical reasoning. It is a force multiplier. The quality of your analytical thinking still determines the validity of the results.