About this Course
Artificial intelligence (AI) language models are a popular tool in natural language processing (NLP) that allows computers to interact with humans using natural language. The models are trained to understand and generate human language. These models use algorithms and machine learning techniques to:
Analyze large amounts of text data.
Learn patterns and relationships in the language.
Generate new text that is similar in style and content to the training data
Some of the AI language models include:
ChatGPT, developed by OpenAI and backed by Microsoft
Claude, developed byAnthropic
Gemini, developed by Google DeepMind
Grok, developed by xAI (Elon Musk’s AI company)
DeepSeek, developed by DeepSeek (Hangzhou DeepSeek Artificial Intelligence)
Qwen, developed by Alibaba Cloud
Copilot, developed by Microsoft in collaboration with OpenAI
AI language models are capable of generating complex sentences and paragraphs, answering questions, and even writing creative fiction! They have extensive, potential applications across industries in today’s world.
What you will learn
After completing this course, you should be able to:
Describe an AI language model
Explain how an AI language model understands and responds to humans
Identify the rules to follow to write effective prompts to generate focused and accurate results from an AI language model
List the steps to sign up for a ChatGPT account
Follow the steps to effectively write and refine a series of prompts for ChatGPT for a travel itinerary scenario
Demonstrate the steps to effectively write and refine a series of prompts for ChatGPT to create a custom music playlist
What is an AI language model?
An AI language model is a type of artificial intelligence technology that is designed to understand, interpret, and generate human language. These models use machine learning algorithms to learn the patterns and structures of natural language, allowing them to analyze and produce text that appears to be written by humans.
AI language models can perform a wide range of tasks, including language translation, summarization, sentiment analysis, question-answering, and even content creation. Some of the most advanced AI language models, such as the Generative Pre-trained Transformer 5 (GPT-5), can generate highly coherent and contextually appropriate text that closely mimics human writing.
This course focuses on ChatGPT. You’ll examine how to write questions, also known as prompts, so ChatGPT generates natural language responses. To get the best responses, you must write clear and concise prompts that provide enough information for ChatGPT to understand what you are asking.
A key characteristic of ChatGPT is that it remembers the prompts you enter. This allows you to carry on a conversation with the chatbot where its responses build on your prompts. It goes beyond the one-question-one-answer interactions of other models.
AI language models need your help to grow
Imagine having a conversation with a machine that not only understands what you’re saying, but can also provide insightful and relevant responses in a matter of seconds. That’s exactly what AI language models, like ChatGPT, are capable of.
The latest AI language models have been trained on a massive collection of text, making them some of the most intelligent machines out there. They are capable of tasks ranging from answering trivia questions to composing poetry.
But, they have limitations and room for improvement. They need your help to learn by interacting with you.
In this course, you’ll try your hand at the art of prompt writing.
How can your wording, tone, and context of a prompt illicit different responses from an AI language model?
How can you build an ongoing conversation with questions to get the most relevant answers?
How can you make the most of an AI language model’s incredible capabilities?
Why is prompt writing an important skill?
In a world that is increasingly reliant on machines to help with everyday tasks, being able to communicate effectively with them is essential. Whether you’re a researcher, developer, writer, or someone who’s curious about the world, knowing how to interact with AI language models can open up a whole new world of possibilities.
Are AI language models confused by improperly asked questions?
Do family or friends ever ask you questions that you have difficulty understanding? As a result, do you ever have difficulty answering those questions? Maybe you can use body language and familiarity with them to infer what they ask. Computers don’t have the benefit of human-to-human interaction.
An AI language model’s ability to understand and respond to questions depends on the quality and clarity of the input it receives. If a question is improperly asked, unclear, or contains errors, the AI language model might have difficulty understanding the question and generating an appropriate response.
However, AI language models are designed to be very robust and can often interpret even poorly worded questions to some extent, using context and other clues in the input to generate a response.
NB: It is important to note that while AI language models can generate human-like responses, they are not a perfect tool. They might sometimes provide inaccurate or irrelevant responses, especially if the input is particularly ambiguous or nonsensical.
Good prompts are the key
Prompt writing is important when using AI language models to get good results. Don’t try to cover everything in a single prompt that becomes complicated and difficult to follow. If it isn’t easy for a person to understand, then you can expect that it will not be easily understood by a computer (yet).
You want to treat the language model as the second side of a conversation. The good thing is newer language models are great at remembering the prompts you asked previously in the same chat. The models take those into consideration when formulating its responses. Therefore, it’s to your advantage to know a few basics about writing good prompts before diving into your first conversation.
Continue on to explore how to write good prompts for AI language models and take your conversations with machines to the next level!
The rules of writing an effective prompt
When it comes to writing prompts for AI language models, there are some general rules that can help ensure that your prompts are effective and produce the desired results.
Select each section to carefully review some of the do’s of prompt writing.
Be clear and specific
Clearly state what you want the AI language models to do and provide specific details about the task at hand.
Use correct grammar and spelling
Make sure your prompt is free of errors to avoid confusion or misunderstandings. New AI language models can correct spelling errors and do not need precise language. This is a very useful capability, but good grammar and spelling reduces potential errors in the responses to your prompts.
Keep it concise
Avoid making your prompt too long or complicated. Shorter prompts tend to work better.
Here’s an example of a complex prompt that is overly long and includes multiple questions within a single sentence: “Could you please tell me about the different types of software programs that are currently available for use on personal computers, including their features and functionality, as well as any advantages or disadvantages that they may have when compared to one another?”
Be polite and respectful
AI language models don’t have feelings, but it’s still good to be polite and respectful when asking for help.
Use an appropriate format
Use the correct format for the prompt depending on what you want the AI language model to do.
For example, if you want ChatGPT to generate text, use a text generation prompt. An example of a good prompt for text generation depends on the specific task you want the model to perform. Here are a few examples:
For a summarization task: “Please summarize this news article in one or two sentences.”
For a creative writing task: “Write a short story about a person who discovers a hidden world.”
For a dialogue generation task: “Write a conversation between two people discussing their favorite hobbies.”
For a translation task: “Please translate this sentence from English to French: ’The cat is sleeping on the couch.”
For a question-answering task: “What is the capital of Kenya?”
Select each section to carefully review some of the dont’s of prompt writing.
Don’t use vague language
Avoid using vague or ambiguous language that can be interpreted in different ways. Be specific and clear about what you want the AI language model to do.
For example, this prompt–“Write a story about a man who caught a fish.”–is vague because it does not provide specific details about the story’s setting, plot, or characters. As a result, there are many ways to interpret the prompt and write a story based on it.
Don’t use slang or jargon
AI language models might not be familiar with slang or jargon, so it’s best to avoid using these types of terms in your prompts.
Don’t over-complicate the task
Keep the task simple and straightforward. Avoid asking for too much or making the task overly complex, as this can lead to inaccurate or incomplete results.
Don’t be too broad
Avoid writing prompts that are too broad or general. This can make it difficult for the AI language model to understand what you’re asking for and lead to poor results.
For example, this prompt–“Write an essay about technology.”–doesn’t provide any specific focus or direction for the essay. The term “technology” is very general and can encompass a wide range of topics. Without a more specific focus, the AI language model might struggle to generate a cohesive and well-supported essay on the topic of technology. The resulting essay could end up being too general or unfocused, or it might not address the specific aspects of technology that the user was hoping to explore.
NB: Another good rule to follow is to always proofread your prompts before submitting them to an AI language model. Errors in spelling or grammar can lead to inaccurate results, so it’s important to make sure your prompts are free of errors.
With AI language models, you can request whatever you want and then refine your prompts based on the responses provided. Basically, you can have a question-answer conversation with an AI language model. In fact, the activities in this course were partially created by asking ChatGPT for advice about how to create them and they started with prompts!
Getting started with AI tool
You need to set up an account with ChatGPT to use it in this course. But don’t worry, if you prefer not to sign up or use ChatGPT, you can still continue. This course will help you understand how to perfect your prompt writing for any generative AI model.
There are two activities in this course.
In a guided activity, you’ll follow a scenario to learn about effective prompt writing for ChatGPT.
Then, you’ll participate in an activity to apply the skills you learned to write your own effective prompts for ChatGPT.
For both activities, you’ll use ChatGPT. Take a minute now to set up an account with ChatGPT(opens in a new tab).
Start with a prompt
Keep these rules in mind to write an effective prompt.
Be clear and specific, stating with what you want ChatGPT to accomplish.
Use correct grammar and spelling to avoid confusion or misunderstandings.
Be concise by using short prompts rather than making the prompts too long or complicated.
Use the appropriate format by writing the prompts to request the type of response you want.
Proofread your prompts to reduce confusion.
Tasks
Create a prompt for a folktale story titled “A farmer who shared his last meal in Kenya”
Create a prompt for a Chuka University story titled “A famous philanthropist and seer by the name Jerusha Kanyua”
Create a prompt for data Science and AI Career in Kenya
Create a prompt for an itinerary for a Mombasa vacation.
What is generative AI?
Generative AI refers to deep-learning models that can generate high-quality text, images, and other content based on the data they were trained on.
1. Definition of Generative AI
Generative AI refers to a class of artificial intelligence systems designed to create new data that resembles existing data. Instead of only analyzing or predicting outcomes, these models learn underlying patterns and generate novel outputs such as text, images, audio, or code.
Formally, generative models approximate the data-generating distribution ( P(X) ) or conditional distribution ( P(X|Y) ), enabling the synthesis of new samples.
Example:
A generative AI model trained on medical reports can produce a new, realistic clinical summary based on input symptoms.
2. Difference from Traditional AI
| Aspect | Traditional AI | Generative AI |
|---|---|---|
| Objective | Prediction / classification | Creation / synthesis |
| Output | Labels, scores, decisions | New content (text, images, etc.) |
| Learning focus | P(Y/X) | Model how data is generated |
| Example | Spam detection | Writing an email from scratch |
Key distinction:
Traditional AI answers “What is this?”
Generative AI answers “Create something new like this.”
3. Underlying Techniques and Models
Generative AI relies heavily on advanced machine learning architectures:
a) Neural Networks
Function approximators that learn complex nonlinear relationships.
Foundation for most generative models.
b) Transformer Models
Introduced in the Attention Is All You Need.
Use self-attention mechanisms to capture long-range dependencies in data.
Highly effective for sequential data (e.g., text).
c) Large Language Models (LLMs)
Built on transformer architectures.
Trained on massive corpora to generate human-like text.
Example: ChatGPT.
d) Other Generative Models
GANs (Generative Adversarial Networks)
Two networks (generator vs discriminator) compete to improve realism.VAEs (Variational Autoencoders)
Encode data into latent space and reconstruct new samples.Diffusion Models
Gradually remove noise from random signals to generate high-quality outputs (used in image generation).
4. Real-World Applications
a) Healthcare
Synthetic patient data generation for research.
Drug discovery and molecular design.
Medical imaging enhancement.
b) Finance
Fraud detection simulations.
Scenario generation for risk modeling.
Automated financial report writing.
c) Education
Intelligent tutoring systems.
Automated content generation (notes, quizzes).
Personalized learning pathways.
d) Creative Industries
Text generation (stories, scripts).
Image and video creation.
Music composition.
Example:
A model can generate realistic X-ray images to augment training datasets where real data is scarce.
5. Advantages of Generative AI
Automation of content creation
Reduces human effort in writing, design, and coding.Scalability
Can generate large volumes of data quickly.Personalization
Tailors outputs to user-specific inputs.Data augmentation
Useful in domains with limited datasets (e.g., healthcare).
6. Limitations of Generative AI
Data dependency
Requires large, high-quality training datasets.Hallucinations
Models may generate plausible but incorrect information.Computational cost
Training and deployment are resource-intensive.Lack of true understanding
Models rely on statistical patterns, not reasoning or consciousness.
7. Ethical Considerations
a) Bias
Models inherit biases from training data.
Can reinforce social inequalities if unchecked.
b) Misinformation
- Capable of generating convincing fake content (text, images, deepfakes).
c) Data Privacy
Risk of memorizing and reproducing sensitive data.
Requires strict governance and anonymization.
d) Intellectual Property
- Ownership of AI-generated content remains legally ambiguous.
8. Simple Illustrative Example
Task: Generate a short paragraph about malaria symptoms.
Input: “Describe common malaria symptoms.”
Output (generated): A coherent paragraph describing fever, chills, and fatigue.
This demonstrates learning of linguistic patterns, not factual verification.
9. Summary
Generative AI focuses on creating new data rather than just analyzing it.
Built on advanced architectures like transformers and neural networks.
Widely applied across healthcare, finance, education, and creative fields.
Offers significant benefits but introduces technical and ethical challenges.
Mastering the Art of AI Prompting for Data Science
1. Introduction to AI Prompting in Data Science
Definition
AI prompting in data science refers to the structured formulation of inputs (instructions, context, and constraints) given to AI systems to perform analytical, statistical, or computational tasks. It is essentially a form of human–AI interface design where the prompt determines the quality, reproducibility, and validity of outputs.
How Modern Models Interpret Queries
Modern AI systems, particularly transformer-based models (e.g., large language models), process prompts using:
Tokenization: Breaking text into units (tokens)
Context windows: Interpreting relationships across tokens
Attention mechanisms: Prioritizing relevant parts of the input
Pretrained statistical patterns: Mapping prompts to learned representations
These models do not “understand” data in a causal sense; they approximate patterns based on training distributions. Therefore, prompt clarity directly influences output reliability.
Role in Data Science Workflows
Prompting enhances:
EDA: Rapid summaries and hypothesis generation
Modeling: Code generation, diagnostics, and interpretation
Reporting: Automated narratives and visualization explanations
Reproducibility: Structured workflows via standardized prompts
2. Core Principles of Effective Prompting
1. Precision in Task Specification
Avoid vague instructions.
Weak:
Fit a model
Strong:
Fit a logistic regression model predicting malaria test outcome using age, temperature, and parasite type, including interaction terms between age and temperature.
2. Structured Context
Always include:
Dataset description
Variable definitions
Assumptions
Example:
Dataset contains 1,000 observations with variables: age (years), temperature (°C), parasite_type (categorical), malaria_result (binary).
3. Role-Based Prompting
Assign domain expertise:
Act as a data scientist analyzing clinical trial data.
This improves:
Statistical rigor
Interpretation quality
Method selection
4. Explicit Output Formatting
Specify structure:
Provide:
R code using
ggplot2Summary table
Interpretation in 3–5 bullet points
5. Iterative Refinement
Prompting is not one-shot. Use:
Follow-up prompts
Error correction
Output validation
3. Prompting for Key Data Science Tasks
A. Exploratory Data Analysis (EDA)
Strong Prompt Example
Perform EDA on a malaria dataset with variables (age, region, temperature, parasite_type, test_result).
Compute summary statistics
Identify missing values
Detect outliers
Generate visualizations using
ggplot2Provide interpretation of patterns
Example (R Code Output)
library(ggplot2) summary(data) ggplot(data, aes(x = age)) + geom_histogram(binwidth = 5) ggplot(data, aes(x = region, fill = test_result)) + geom_bar(position = "fill")
Interpretation
Age distribution is right-skewed
Certain regions show higher malaria prevalence
Missing values concentrated in temperature
B. Statistical Modeling
Prompt Example
Fit a logistic regression model predicting malaria_result using age, temperature, and parasite_type.
Report coefficients, odds ratios, confidence intervals, and model diagnostics.
Example (Python)
import statsmodels.api as sm X = data[['age', 'temperature']] X = sm.add_constant(X) y = data['malaria_result'] model = sm.Logit(y, X).fit() print(model.summary())
Interpretation
Positive coefficient for temperature suggests increased malaria risk
Odds ratio > 1 indicates higher likelihood
Check p-values (< 0.05 for significance)
C. Machine Learning
Prompt Example
Train a Random Forest and XGBoost model to predict malaria outcome.
Include feature importance, cross-validation (5-fold), and hyperparameter tuning.
Example (Python)
from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import cross_val_score rf = RandomForestClassifier(n_estimators=100) scores = cross_val_score(rf, X, y, cv=5) print(scores.mean())
Key Outputs
Model accuracy
Feature importance ranking
Overfitting diagnostics
D. Data Cleaning and Transformation
Prompt Example
Clean dataset by:
Imputing missing values using median for numeric variables
Encoding categorical variables
Standardizing continuous features
Example (Python)
from sklearn.preprocessing import StandardScaler data.fillna(data.median(), inplace=True) scaler = StandardScaler() data[['age', 'temperature']] = scaler.fit_transform(data[['age', 'temperature']])
E. Programming Support (R/Python)
Prompt Example
Generate R code using
caretto train a classification model with SMOTE sampling.
library(caret) library(DMwR) train_control <- trainControl(method="cv", number=5, sampling="smote") model <- train(malaria_result ~ ., data=data, method="rf", trControl=train_control)
4. Prompt Engineering Techniques
1. Decomposition
Break complex tasks:
Step 1: Clean data
Step 2: Perform EDA
Step 3: Fit model
Step 4: Interpret results
2. Constraints
Control outputs:
Use only base R functions
Avoid data leakage
Use 95% confidence intervals
3. Templates
Use structured prompts:
Task: Data: Method: Output format: Assumptions:
4. Statistical Rigor Control
Explicitly request:
Assumptions (normality, independence)
Diagnostics (residual plots, VIF)
Uncertainty (confidence intervals)
5. Advanced Strategies
Model Interpretation
Prompt Example
Interpret Random Forest model using SHAP values and partial dependence plots.
Key Outputs
Feature contribution
Nonlinear relationships
Interaction effects
Reducing Hallucinations
Provide real data samples
Request “show calculations”
Ask for uncertainty statements
Verification
Cross-check outputs manually
Validate statistical assumptions
Re-run with alternative methods
Reproducible Workflows
Design prompts that:
Include seeds (
set.seed(123))Specify software versions
Output full scripts
6. Practical Examples (Before vs After)
Example 1: EDA
Weak Prompt
Analyze this dataset
Strong Prompt
Perform EDA on a malaria dataset with variables (age, region, temperature, parasite_type).
Generate summary statistics, visualize distributions usingggplot2, and highlight key epidemiological patterns.
Example 2: Modeling
Weak Prompt
Build a model
Strong Prompt
Fit a logistic regression model predicting malaria_result.
Include interaction terms, check multicollinearity, report odds ratios, and interpret results.
7. Common Pitfalls
Ambiguous variable definitions
Missing dataset context
Ignoring statistical assumptions
Blind trust in AI outputs
Lack of reproducibility
8. Hands-on Exercises
Exercise 1: Malaria Prediction
Construct a prompt to:
Clean data
Perform EDA
Train Random Forest
Model Answer (Example Prompt):
Clean the malaria dataset by imputing missing values and encoding categorical variables. Perform EDA with summary statistics and visualizations. Train a Random Forest model using 5-fold cross-validation and report feature importance.
Exercise 2: Clinical Trial Analysis
Task:
Compare treatment vs control using t-test
Report confidence intervals
Exercise 3: Survey Data
Task:
Perform chi-square test
Interpret association
9. Conclusion and Best Practices
Prompt Quality Checklist
Clear objective
Defined variables
Appropriate statistical method
Explicit output format
Reproducibility elements
Best Practices
Treat prompting as experimental design
Always validate outputs statistically
Use iterative refinement
Combine AI outputs with domain expertise
Document prompts for reproducibility
Final Note
AI prompting is not a replacement for statistical reasoning. It is a force multiplier. The quality of your analytical thinking still determines the validity of the results.