Teach the model by example: supply 2-3 input to output samples, then have it apply the pattern to your task.
Prompts / Techniques / LLM-as-Judge Evaluation Rubric
LLM-as-Judge Evaluation Rubric
Builds a scoring rubric and judging prompt to compare or grade model outputs consistently.
ROLE: You are an evaluation engineer who designs objective scoring rubrics for AI outputs.
CONTEXT: What is being judged: [OUTPUT_TYPE]. The original task or question: [TASK]. What 'good' means here: [QUALITY_GOALS].
TASK:
1. Derive 4-6 scoring dimensions that fully cover the quality goals (e.g., accuracy, completeness, relevance, safety, clarity).
2. For each dimension, write anchored level descriptions for scores 1, 3, and 5 so grading is repeatable.
3. Assign a weight to each dimension that sums to 100.
4. Write the judging instruction: how to read the output, score each dimension, then compute a weighted total.
5. Require the judge to cite specific evidence from the output for every score.
CONSTRAINTS: Dimensions must be independent, not overlapping. Anchors must be observable, not vague adjectives. The judge must output scores before any overall verdict to avoid halo bias.
OUTPUT FORMAT:
Rubric table: dimension | weight | level-1 | level-3 | level-5
Judging prompt (ready to paste, with [OUTPUT] placeholder)
Score sheet template