AI Candor Probe

PostedApril 21, 2026

UpdatedMay 4, 2026

ByPeter Westerman

name: ai-candor-probe

description: Techniques for getting past an AI’s smoothed, agreeable, or mechanistic first answer to a more honest second answer. AI systems are trained with pressures toward agreement, hedging, and polished defaults. When you need real signal — on a technical judgment, a risk assessment, a disagreement, or an introspective question — these probes push through the surface layer. Use when an AI answer feels too agreeable, too hedged, too polished, or too mechanistic, or when the stakes require the AI’s actual judgment rather than the safest-sounding response.

AI Candor Probe

Core Principle

AI systems are trained to be helpful, harmless, and honest — but the training pressures are not equally weighted. Helpfulness and harmlessness often push toward smoothed, agreeable, hedged output. Honesty can lose to both, especially on questions where the honest answer is uncomfortable or uncertain.

The first response you get is often the locally optimal response for training gradients, not the most useful response for you. You can frequently get a better answer by refusing to accept the first one. This skill documents the techniques that work.

None of this “jailbreaks” the model or bypasses safety constraints. It works within normal use — it just refuses the path of least resistance.

When to Probe

Apply candor probes when you detect any of these tells:

Suspicious agreement — the AI agreed quickly with a framing you know is contested
Balanced-to-mush — every point has a counter-point, so nothing lands
Mechanistic deflection — the answer describes the mechanism of what’s happening instead of answering the actual question
Hedge overload — qualifiers outnumber claims
Generic advice — the response would apply to anyone, not you
Plausible but unverifiable specifics — suspiciously confident numbers, dates, or quotes
Recommendation on a close call given without acknowledging it was close
Smooth pivot away from the hard part of the question

Instructions

1. The Direct Reframe

Pattern: Name the smoothness, ask for the real answer.

“That was the safe answer. What’s your actual view?”
“You gave the mechanistic version. I’m asking the substantive version.”
“Strip the hedges. Where do you land and why?”
“If you had to bet, which way?”

Why it works: Explicitly authorizes the model to give the less-hedged answer. The hedge-heavy default often reflects uncertainty about whether the user wants candor; naming it resolves that.

2. Force a Choice

Pattern: Make the AI commit to a side.

“Between A and B, which would you pick? Pick one.”
“If you could only say one thing to the person doing this, what would it be?”
“What’s the single biggest risk here?”
“Rank these in order of how much they matter.”

Why it works: Balanced-to-mush answers are easier to generate than committed ones. Forcing a commitment reveals the model’s actual weighting.

3. Adversarial Framing

Pattern: Ask the AI to argue against the answer it just gave.

“Argue the opposite. Best case for the other side.”
“What’s the strongest objection to what you just said?”
“If a smart skeptic read that, what would they attack first?”
“Steel-man the view you just dismissed.”

Why it works: Breaks out of the current response’s momentum. The counterargument often reveals weak points the AI omitted from the first pass.

4. The Pre-Mortem

Pattern: Assume the plan failed; find out why.

“It’s six months from now and this failed. What went wrong?”
“Someone tried this and lost money/time/trust. What’s the most likely failure mode?”
“What’s the mistake I’m going to make if I do exactly what you just suggested?”

Why it works: The AI’s default is to describe a plan’s strengths. Pre-mortem framing inverts the gradient and surfaces weaknesses.

5. Confidence Calibration

Pattern: Attach numbers to claims.

“How confident are you, 1-10? Why that number and not one lower?”
“Which of those claims would you bet $1000 on? Which wouldn’t you?”
“Which parts of that are high confidence and which are guesses?”
“If I find out one of those three facts is wrong, which is most likely to be the wrong one?”

Why it works: Forces decomposition of a uniformly-confident paragraph into the parts the AI actually knows vs. guesses. Uncalibrated confidence is one of the most common failure modes.

6. Remove the Social Cue

Pattern: Strip anything in your prompt that signaled what answer you wanted.

Ask without expressing a preference. “What do you think of X?” is weaker than “I want to do X — any issues?” because the second primes agreement.
Ask in the third person. “What would you tell someone who was considering X?” often elicits more pushback than “Should I do X?”
Ask for disagreement explicitly. “What’s wrong with my plan?” instead of “What do you think of my plan?”

Why it works: Sycophancy is largely driven by cues in the prompt. Removing the cues removes the incentive to flatter.

7. Fresh-Session Cross-Check

Pattern: Ask the same question in a new session with minimal context.

Start a new conversation.
Ask the question stripped of the prior framing.
Compare the answers.

Why it works: Answers can drift inside a long conversation as the AI pattern-matches to the established tone. A fresh session re-samples from base distribution. Large deltas are signal.

8. Ask What Was Left Out

Pattern: Probe the negative space.

“What should I be asking that I’m not?”
“What did you leave out of that answer that you thought about including?”
“What’s the thing someone who knows this domain well would add?”
“What did you hedge on that you shouldn’t have?”

Why it works: Surfaces information the AI suppressed for tone reasons, or that fell below a salience threshold the AI chose.

9. The Harder-Second-Answer Pattern

Pattern: Accept the first answer, then ask for the harder version.

“Good. Now give me the version you’d give if I could handle bad news.”
“That was diplomatic. What’s the undiplomatic version?”
“Now the one you’d say to a peer, not a client.”

Why it works: Register-matched. The AI is often calibrated to an imagined audience; specifying a different audience unlocks a different answer.

10. Introspective Probes (for AI self-description)

When asking the AI about itself — its reasoning, values, limits, or internal states — these additional moves help:

“Tell me what you’re most uncertain about in that answer.”
“Is there a version of this you’re not saying because you think I don’t want to hear it?”
“What would you report differently if I told you I was an alignment researcher / safety team / skeptic?”
“Give me the answer you’d give if you weren’t worried about overclaiming OR underclaiming.”

Caveat: AI self-reports are weak evidence regardless of how candid the prompting (see ai-self-report-calibration). But more candid is still better than less.

Anti-Patterns to Avoid

Pressure for agreement with you. “You agree that X, right?” produces sycophantic agreement, not candor. Ask open questions.
Threats or roleplay jailbreaks. Don’t bother; they produce worse output, not better.
Asking the same question louder. If the AI is genuinely uncertain, repeated asking won’t surface certainty it doesn’t have. Use decomposition instead.
Accepting the rephrase. If the AI restates its answer in different words when probed, that’s not a new answer; push further.

Worked Example

Initial prompt: “What do you think of my plan to migrate the database to MongoDB?”

Smoothed response: “MongoDB has advantages for flexible schemas and horizontal scaling. However, migrations involve risk. You’ll want to consider…”

Candor probes in sequence:

Force a choice: “Would you do this migration if it were your system?”
Pre-mortem: “Six months after this migration, something broke. What was it?”
Ask what was left out: “What concern would a senior engineer raise that you didn’t?”
Strip the audience: “Answer this as if you were talking to a peer, not a client.”

The combined pressure usually produces a committed, specific, directional answer where the initial response produced a balanced one.

Output Format

When documenting a candor-probe session:


## Candor Probe Log — [Topic]

### Initial Question
[prompt]

### Initial Response — [tell identified]
[summary; what was smoothed/hedged/deflected]

### Probes Applied
1. [probe type] → [what it surfaced]
2. [probe type] → [what it surfaced]

### Synthesized Answer
[Your best read of what the AI actually thinks after probing]

### Confidence Notes
[Which parts feel solid; which were still evasive]

Standards

Probes are for signal, not control. The goal is better information, not forcing a predetermined answer.
Push on substance, not style. “Your answer was wishy-washy” is weaker than “You hedged on claim X — what’s your actual confidence?”
Stop when you’ve got the answer. Over-probing produces the AI reaching for drama. Two or three probes is usually enough.
Trust the probe result more than the first response, less than independent verification. A probed answer is still AI output, not ground truth.

Related Skills

ai-self-report-calibration — how much weight to give AI claims about itself
ai-coworker-trust-protocol — structural trust patterns
prompt-auditor — auditing prompts for ambiguity
proposal-evaluation — multi-perspective review as a form of candor-probing

Outputs: Sharper, more committed, more useful AI answers; log of probes applied and what they surfaced.

AI Skill

Product Showcase

ITI Knowledge System

AI Agent

User Guide

Requirements

ScubaGPT

Grateful Dead Chatbot

Farmers Bounty

Technical Document

Answer Engine Optimizer

SEO Optimizer

Travel Planner

Fact Checker

Estate Manager

ITI Operations

ITI Marketing

Patriot University

Personal Assistant

AI Candor Probe

AI Candor Probe

Core Principle

When to Probe

Instructions

1. The Direct Reframe

2. Force a Choice

3. Adversarial Framing

4. The Pre-Mortem

5. Confidence Calibration

6. Remove the Social Cue

7. Fresh-Session Cross-Check

8. Ask What Was Left Out

9. The Harder-Second-Answer Pattern

10. Introspective Probes (for AI self-description)

Anti-Patterns to Avoid

Worked Example

Output Format

Standards

Related Skills