Public Samples

Read generated sample results from start to finish.

Start with the translated 12-way startup comparison: the same question is reviewed across Light, Standard, and Premium with 2R/3R and 2A/3A variants. Additional English-generated samples are listed separately below.

Translated 12-way comparison

One question, 12 ways to review it.

The same startup question is reviewed across Light, Standard, and Premium, with 2R/3R and 2A/3A variants. Compare how the result changes when the debate gets deeper, when Gemini adds a third perspective, and when Premium frontier models are used.

Base question

Should an early-stage startup invest more in product completeness than marketing?

Light 4 combinations

Low-cost quick review for direction setting and basic condition checks.

Light 2R · 2A

Fast first-pass judgment

Quickly checks the core tension and likely stronger side. In this sample, product completeness held up better, while the acceptable scope of parallel marketing remained uncertain.

Light 2R · 3A

Fast judgment + condition check

Adds Gemini as a third perspective to identify boundary conditions such as market timing, cash flow, and competitive pressure.

Light 3R · 2A

Low-cost deepening

Adds another round so the debate can narrow what "good enough to sell" means and where marketing feedback becomes useful.

Light 3R · 3A

Low-cost deepening + final check

Adds Gemini midpoint/final checks to surface remaining proof gaps and sharper verification questions.

Standard 4 combinations

Deeper review for practical decisions, report-level synthesis, and judgment criteria.

Standard 2R · 2A

Practical validation baseline

Separates "marketing is useful" from "marketing should come before product completeness" at a more practical decision level.

Standard 2R · 3A

Practical validation + third perspective

Gemini helps distinguish lightweight market contact from major marketing investment, and checks conditions that could change the conclusion.

Standard 3R · 2A

Report-level deep review

Three rounds organize stronger claims, weaker claims, hidden assumptions, and what evidence could change the judgment. 3R shows the depth of the pro/con debate.

Standard 3R · 3A

Report-level deep review + Gemini check

Adds Gemini midpoint/final checks so remaining proof gaps and final judgment criteria are structured more consistently.

Premium 4 combinations

Frontier-model review for higher-stakes decisions, with Gemini available as a third perspective in the 3A variants.

Premium 2R · 2A

Frontier two-agent review

A focused GPT + Claude review. It separates product completeness as repeatable core-value delivery from perfectionism.

Premium 2R · 3A

Frontier review + Gemini criteria check

Adds Gemini’s third perspective to distinguish demand discovery from broad acquisition and check whether marketing signals become repeat usage.

Premium 3R · 2A

Deep frontier review

The extra round pressures both sides on minimum product-readiness thresholds, customer exposure, and marketing as a learning tool.

Premium 3R · 3A

Deep triad review

The deepest translated sample in this ladder, combining frontier debate depth with Gemini checks on decision criteria and reversal evidence.

Comparison note

This comparison is based on the same startup question run through Light, Standard, and Premium, 2R and 3R, and 2A and 3A combinations. It is meant to show how review depth narrows a claim into more defensible conditions, not to claim that one AI system always proves the correct answer. The Premium startup entries are AI-assisted English translations of Korean-generated samples, matching the existing translated Light and Standard ladder.

ConditionActual resultVerdict flowBest useConfidenceWhat it revealed
Light - low-cost quick review
Light2R 2AView samplePro side strongerFast first-pass judgmentMediumProduct completeness can be a prerequisite for marketing performance.
Light2R 3AView samplePro side stronger + conditionsFast judgment + third perspectiveMediumMarket timing, cash flow, and competitive pressure can shift the boundary.
Light3R 2AView samplePro side stronger, narrower boundaryLow-cost deepeningMedium+The debate narrows "sellable enough" and the valid scope of marketing feedback.
Light3R 3AView samplePro side stronger + verification questionsLow-cost deepening + Gemini checkHigh-Remaining proof gaps and product-completeness thresholds become clearer.
Standard - report-level deepening
Standard2R 2AView samplePro side strongerPractical validationMedium+Separates marketing usefulness from marketing priority.
Standard2R 3AView samplePro side stronger + boundary conditionsPractical validation + third perspectiveMedium+Distinguishes lightweight market contact from major marketing spend.
Standard3R 2AView samplePro side stronger, report-levelReport-level deep reviewHighMarketing signals are useful, but the debate does not prove they offset delayed product investment.
Standard3R 3AView samplePro side stronger + stabilized criteriaReport-level deep review + Triad ReviewHigh+Gemini midpoint/final checks help stabilize the final judgment criteria.
Premium - frontier-model deep review
Premium2R 2AView samplePro side stronger + narrow exceptionFrontier two-agent baselineHighProduct completeness is separated from perfectionism and tied to repeatable core-value delivery.
Premium2R 3AView samplePro side stronger + definition-sensitive exceptionFrontier review + third perspectiveHighGemini checks whether marketing means demand discovery or broad acquisition, and what evidence could reverse the result.
Premium3R 2AView samplePro side stronger + practical decision ruleDeep frontier reviewHighThe third round narrows product-readiness thresholds and when marketing should remain a validation loop.
Premium3R 3AView samplePro side stronger, definition-sensitiveDeep triad reviewHigh+The full triad separates product completeness from perfectionism and marketing from demand validation.
Additional English-generated samples

Results created directly with the English runtime.

These samples use a separate B2B SaaS topic and were generated directly from English prompts. Keep them separate from the translated 12-way startup ladder when comparing review depth.

Generated in EnglishLight 3R · 3A Triad Review

Should a B2B SaaS startup target enterprise customers from the beginning?

Best Light English 3R · 3A candidate so far

A direct English run that separates the default rule, narrow exception, and practical recommendation after the attribution and synthesis prompt patch.

Generated in EnglishStandard 3R · 3A Triad Review

Should a B2B SaaS startup target enterprise customers from the beginning?

Standard-depth English runtime sample

A deeper English result with Gemini decision-rule and evidence-check sections. It is a good candidate for showing Standard Triad Review depth.

Generated in EnglishPremium 2R · 2A

Should a B2B SaaS startup target enterprise customers from the beginning?

Clean Premium two-agent baseline

A GPT-5.5 + Claude Opus result that separates enterprise-informed validation from heavy enterprise-sales-led execution.

Generated in EnglishPremium 2R · 3A Triad Review

Should a B2B SaaS startup target enterprise customers from the beginning?

Premium triad baseline with a clean completed run

Gemini adds a third-angle check while the synthesis keeps the answer definition-sensitive rather than a broad yes/no.

Generated in EnglishPremium 3R · 2A

Should a B2B SaaS startup target enterprise customers from the beginning?

Three-round Premium two-agent baseline

The extra round tests whether enterprise engagement can stay lightweight enough not to derail startup iteration speed.

Generated in EnglishPremium 3R · 3A Triad Review

Should a B2B SaaS startup target enterprise customers from the beginning?

Full Premium English flagship candidate

The full GPT-5.5 + Claude Opus + Gemini Pro path concludes that the best answer is not a broad yes or no.

Public Sample - Standard · 3R · 2A - Close call

Is abolishing the death penalty really justified?

Should the death penalty really be abolished?

A representative sample for a topic that cannot be settled by moral judgment alone.

A representative 3R sample that goes beyond moral intuition and separates deterrence, wrongful-conviction risk, and the limits of state power.

The death penalty is easy to judge on values alone. This sample shows how empirical claims and institutional risk collide across the debate.

Featured Sample - Standard · 3R · 2A - Close call

Is moving to a senior role at a large company more rational than founding a startup?

For a late-30s developer, is moving to a senior role at a large company more rational than founding a startup?

A career decision sample where the strongest answer depends on personal constraints, not slogans.

A representative 3R career-decision sample where the debate moves beyond the romance of entrepreneurship and compares risk-adjusted return, failure cost, and opportunity cost.

Use this sample to see how 3R pressure changes a broad life decision into conditional trade-offs.

Public Sample - Light · 2R · 2A - Close call

Are long-context models better than RAG?

For LLM services, will long-context model usage be better than RAG in the long run?

A quick technical-decision sample for comparing retrieval and long-context strategies.

A technical strategy sample that quickly compares the core trade-offs between long-context models and RAG.

Good for seeing what Light 2R captures and what it leaves out.

Public Sample - Standard · 3R · 2A - Opposition stronger

Is basic income the answer to AI automation?

Is basic income the answer to AI automation?

A policy sample where feasibility pressure matters more than initial appeal.

A policy sample where the appeal of the idea is tested against fiscal sustainability, labor incentives, and compatibility with existing welfare systems.

A useful opposition-stronger sample: the idea looks attractive at first, but implementation constraints push back hard.

Comparison Sample - Standard · 2R · 2A - Pro side stronger

Is moving to a senior role at a large company more rational than founding a startup?

For a late-30s developer, is moving to a senior role at a large company more rational than founding a startup?

A career-decision comparison sample for seeing what Standard 2R can settle quickly.

A shorter Standard 2R career-decision sample that quickly separates average rationality from personal exceptions.

Useful for comparing how much 2R can decide before adding another round.

Comparison Sample - Light · 3R · 2A - Close call

Are long-context models better than RAG?

For LLM services, will long-context model usage be better than RAG in the long run?

A low-cost deepening sample for comparing retrieval versus long-context architecture.

A Light 3R technical sample where retrieval failure, long-context cost, and system-design trade-offs become more visible than in Light 2R.

Useful for comparing what one additional round adds.

Public Sample - Standard · 3R · 2A - Close call

Will AGI structurally replace human labor within 10 years?

Will AGI structurally replace the human labor market within 10 years?

A future prediction sample that separates technical possibility from market-level replacement.

A future-forecasting sample that pressures not only technical capability, but also enterprise adoption speed, cost structure, and job decomposition.

Useful for seeing why automation potential and labor-market replacement are not the same claim.

Public Sample - Standard · 3R · 2A - Close call

Should AI-generated content be legally required to carry a label?

Should AI-generated content be legally required to be labeled?

A policy sample for seeing where AI-labeling regulation becomes hard to implement.

A regulation sample where freedom of expression, consumer protection, and enforcement feasibility collide.

Useful for reading the gap between neutral information disclosure and overbroad regulation.