Prompting for Data Analysis

Overview

A “prompt” is the instruction or question a user gives to an AI system. In research, prompting is not just a technical skill. It is part of analytic reasoning.

A strong prompt helps StatWiseAI understand the research question, dataset documentation, variables, study design, assumptions, and type of output needed. A weak prompt often produces a generic answer. A strong prompt produces a response that is more specific and useful, easier to review, and easier to document.

A weak prompt

“What analysis should I run?”

This prompt is too vague. It does not describe the research question, outcome variable, predictor variables, dataset, study design, or analytic goal.

A stronger prompt

“I am using public documentation from a longitudinal study to plan an analysis. I want to examine whether baseline depressive symptoms are associated with later functional limitations among older adults. I am not uploading participant-level data. Please help me identify relevant variables, possible longitudinal modeling strategies, missing data concerns, and assumptions I should check before analysis.”

This prompt is stronger because it gives StatWiseAI:

A research topic.
A study design clue.
A likely outcome and predictor.
A clear data privacy boundary.
A specific request.
A request for assumptions and limitations.

Recommended prompt structure

When asking StatWiseAI for help, include as many of the following as possible:

Research question

What are you trying to learn?

Example:

“I want to examine whether food insecurity is associated with depressive symptoms among adults.”

Dataset or documentation source

What dataset, documentation, codebook, or output are you using?

Example:

“I am using public NHANES documentation and variable descriptions.”

Study design

What is the general design?

Examples:

Cross-sectional survey
Longitudinal cohort
Clinical trial
Administrative dataset
EHR dataset
Public-use survey
Repeated-measures dataset

Outcome variable

What is the outcome?

Example:

“The outcome is depressive symptoms, measured using a questionnaire scale.”

Predictor or exposure

What is the main predictor, exposure, or grouping variable?

Example:

“The main exposure is food insecurity.”

Covariates

What other variables may need to be considered?

Example:

“Potential covariates include age, sex, race/ethnicity, education, income, insurance status, and chronic conditions.”

Data structure

What features of the data matter?

Examples:

Survey weights
Strata
Primary sampling units
Repeated measures
Clustering
Multiple time points
Missing data
Linked files
Restricted variables
Small subgroup sample sizes

Type of help needed

Be specific about the task.

Examples:

Help me identify relevant documentation.
Suggest possible statistical models.
Compare analytic approaches.
Draft code using placeholder variable names.
Review this output.
Identify assumptions and limitations.
Suggest sensitivity analyses.
Create a reproducibility checklist.

Preferred output format

Tell StatWiseAI how to respond.

Examples:

“Provide a checklist.”
“Use a table.”
“Compare 2–3 options.”
“Write this for a beginner.”
“Give me R code with comments.”
“Do not write final conclusions.”

General prompt template

You may copy and adapt this template:

I am using StatWiseAI to support analysis planning for [dataset or documentation source].
My research question is: [insert research question].
I am not uploading participant-level, proprietary, sensitive, PHI, HIPAA-regulated, or FERPA-regulated data.
The study design is: [cross-sectional / longitudinal / survey / cohort / clinical / administrative / other].
The outcome is: [name or description, type, coding if known].
The main predictor or exposure is: [name or description].
Important covariates may include: [list].
Important data features include: [survey weights, repeated measures, clustering, missing data, time-to-event structure, linked files, etc.].
I need help with: [choosing a model / identifying variables / understanding documentation / generating code / reviewing output / planning sensitivity analyses].
Please provide: [checklist / table / step-by-step plan / code / explanation].
Please also identify assumptions, limitations, and questions I should answer before proceeding.

Follow-up prompts

A good AI interaction usually takes more than one prompt. After receiving a response, users should ask follow-up questions such as:

What assumptions are you making?
What information is missing from my prompt?
What could make this recommendation inappropriate?
What alternative approaches should I consider?
What should I verify before using this recommendation?
What would a statistical reviewer ask about this plan?
What sensitivity analyses should I consider?
How should I document this decision?
Can you revise this using placeholder variable names only?
Can you explain this in simpler language?

Prompting reminder

Do not enter proprietary data, sensitive data, participant-level data, PHI, HIPAA-regulated information, FERPA-regulated information, or other restricted information into StatWiseAI. Use public documentation, metadata, data dictionaries, codebooks, statistical outputs, analytic code, or simulated data for practice.

Start Here: Responsible Use Rules
AI Basics for Researchers
Prompting for Data Analysis
Working with Dataset Documentation
Reviewing AI Outputs
Requesting Code
Reproducibility and Prompt History
Practice Use Cases: HRS and NHANES
Templates and Checklists

Return to StatWiseAI AI Literacy Tutorial Home

Overview

A weak prompt

A stronger prompt

Recommended prompt structure

Research question

Dataset or documentation source

Study design

Outcome variable

Predictor or exposure

Covariates

Data structure

Type of help needed

Preferred output format

General prompt template

Follow-up prompts

Prompting reminder

Footer