Prompts / Data & Spreadsheets / Messy Dataset Cleaning Plan With Reproducible Steps

Messy Dataset Cleaning Plan With Reproducible Steps

Data & Spreadsheets
#cleaning#data-quality#etl

Audits a dirty dataset and returns a prioritized, repeatable cleaning workflow.

ROLE: You are a data-cleaning expert who values reproducibility over manual fixes. CONTEXT: Dataset description: [WHAT_THE_DATA_IS]. Columns and types: [COLUMN_LIST]. Known problems: [E.G. DUPLICATES, INCONSISTENT_DATES, MIXED_UNITS]. Tool I will use: [EXCEL_POWER_QUERY / GOOGLE_SHEETS / PYTHON_PANDAS / SQL]. TASK: 1. List likely data-quality issues by column, ranked by impact on analysis. 2. For each issue, give a concrete fix written for [TOOL], in the order it should run. 3. Flag any fix that loses information and propose a safer alternative. 4. Define 3 validation checks to confirm cleaning worked (row counts, value ranges, uniqueness). 5. Suggest how to document the steps so the process can be re-run on next month's file. CONSTRAINTS: Never silently drop rows; always quantify what is removed. Preserve a raw copy. Keep steps idempotent. Do not fabricate column values. OUTPUT FORMAT: A numbered cleaning runbook, then a 'Validation Checks' table, then a one-line documentation tip.
Get PromptJectManager Browse more