Garbage In, Garbage Out
Why an AI is only ever as good as what it was shown.
Teach a child that every mango is an “apple” and they'll point at mangoes and say apple — confidently, every time. That's exactly how machine learning works. The smartest AI fed bad data is just a confident mistake-maker.

Imagine a child learning to sort fruit. Every time they reach for a mango, an adult labels it "apple." For months, this is all they see: mangoes called apples. Now ask them to identify fruit at the market. They will confidently point at mangoes and say "apple" — every single time. They are not slow or broken. They learned exactly what they were taught. The lesson was wrong.
This is precisely how machine learning works. Understanding what an AI model really is helps here: an AI does not think. It finds patterns in the data it was trained on and applies those patterns to new situations. When the training data is accurate, representative, and complete, the results can be genuinely useful. When the training data is flawed — labelled wrong, skewed toward certain groups, full of gaps — the AI learns those flaws just as faithfully as a student memorising incorrect notes.
There is a four-word phrase that captures this completely: Garbage In, Garbage Out.
What "Garbage" Actually Means
Bad data is not always obvious. It does not look like corrupted files or typos. It often looks perfectly clean and perfectly usable — it is just systematically incomplete or unrepresentative.
Consider what happened when researchers at MIT studied commercial facial recognition tools sold by major technology companies. The Gender Shades study found that one system showed a 20.8 percentage point error gap between darker-skinned women and lighter-skinned men. The system was not broken. It was working exactly as trained. The training data simply contained far fewer examples of darker-skinned faces, so the model had less to learn from — and made more mistakes on the people it had seen least.
The smartest AI fed bad data is just a confident mistake-maker.
That is the instinct worth building. Not "AI is amazing" and not "AI is dangerous." Just this: the output of an AI system is a direct function of the data that shaped it. No amount of computational sophistication changes that relationship.
The Three Ways Data Goes Wrong
There are three failure modes worth understanding, because they show up at every level of AI use.
1. Missing groups. If the training data does not include enough examples of a particular type — a skin tone, an accent, a handwriting style, a medical condition — the AI will perform worse on that type. It cannot learn what it was never shown.
2. Wrong labels. If data is labelled incorrectly during collection, the model internalises the errors. The mango-as-apple problem. Labels are the ground truth the model trusts completely.
3. Outdated data. An AI trained on last year's information will reflect last year's patterns. A language model trained on text from a decade ago will not know current events, changed terminology, or updated guidance. The world changes; the frozen training data does not.
Each of these problems produces a system that is operationally functional but subtly (or not so subtly) wrong.

Why This Matters Before the Model Exists
Here is a detail that surprises most people: the data decisions happen before the AI is built. By the time engineers are tuning a model, the character of the output is already largely determined by the composition of the training set. Data collection is not preparation for the real work — data collection is the real work.
This is why AI4K12's Big Ideas framework treats Representation and Reasoning, and Learning, as foundational concepts. Understanding that an AI represents the world through data — and that it learns exclusively from that data — is the conceptual foothold that makes every other AI topic legible. How a model reasons about a photo, a sentence, or a medical scan all flows directly from what it was shown during training.
The UNESCO AI Competency Framework for Students places "AI techniques and applications" at the centre of what students need to understand, precisely because students who understand data will ask the right questions about any AI output they encounter.
The Practical Read-Across
For parents, this reframes one common worry about AI in helpful ways. The concern is often: "What is the AI going to do to my child?" A more useful question asks what data the AI was trained on and whether that data represents the child's context. It moves from vague anxiety to specific, answerable inquiry.
For school administrators and procurement teams, the implication is structural. Evaluating an AI tool for classroom use is not primarily a question of which algorithms it uses. It is a question of what the training data covers, whose voices are in it, and whether it has been validated for the specific student population in view. Those are auditable questions — the kind that belong in a procurement conversation. The UNESCO AI Competency Framework for Students treats data interrogation as a foundational competency across both its technical and ethical dimensions — not an advanced add-on, but a baseline expectation for any student engaging with AI tools.
What a Child Learns When They Learn This
A child who understands GIGO carries a durable critical reflex. They will not blindly trust a recommendation, a generated answer, or an automated decision. They will ask: what was this trained on? Who collected the data? Does it include people like me?
These are not expert questions. They are the natural next step for anyone who understands that AI is a pattern-matching machine, not an oracle.
Digital Codi builds this understanding hands-on. Its Data Science stream (S2) and Machine Learning stream (S3) are structured around the Six Rungs learning arc — moving from Awareness through to Creation — so students do not just hear that data quality matters. They experience what happens when it does not. They label datasets, spot gaps, and see how their choices alter outputs. The "Build / Judge / Refuse" test gives them a concrete frame for evaluating any AI system: can I build a case for this output? Can I judge its data source? Should I refuse to act on it uncritically? That three-word scaffold applies to GIGO directly.
The goal is mechanical intuition — one of the Five Capacities Digital Codi is designed around. Not memorised facts about data, but a felt sense of how training shapes output. That intuition, once built, does not expire.
Sources Cited
- MIT News — Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification — retrieved 2026-05-31
- UNESCO AI Competency Framework for Students — retrieved 2026-05-31
- AI4K12 — Big Idea 2: Representation and Reasoning — retrieved 2026-05-31
