The Graveyard of Good Ideas

The three-stage sequence that educational reform keeps failing to complete — and why the taxonomy in Knowing Enough to Distrust the Machine is only Stage 1

There is a cemetery in the literature of educational reform. The headstones are familiar. Multiple Intelligences. Learning Styles. 21st Century Skills. Growth Mindset. The ideas buried here were not wrong. Most of them named something real. They spread through professional development workshops and keynote addresses and laminated posters, and then they stopped. They became vocabulary. They never became practice. They never became measurement. And the students who were supposed to benefit from them learned in classrooms that had the poster on the wall and the same instruction as before.

We built the cemetery. We keep filling it. We have not asked who benefits from the filling.

I am going to name that failure mode. I am also going to name my own contribution to it — because that transparency is the only thing that makes what follows worth reading.

The Gardner Trap Is Not an Accident

Howard Gardner published Frames of Mind in 1983. Educators read it, recognized something true in it, and responded with the specific enthusiasm that precedes inaction: they made it into a framework. Hundreds of thousands of classrooms got posters. Lesson plans got retagged with intelligence labels. Professional development got reorganized around learning styles that may or may not map onto Gardner's actual theory. Forty years later, there is still no peer-reviewed, validated assessment for intrapersonal intelligence. There is no research program. There is vocabulary.

Gardner named something real. He did not finish the work. And crucially — no one held him accountable for not finishing it, because we had already mistaken the naming for the work.

Call this the Gardner Trap. It is the primary failure mode of educational reform: not bad ideas, but good ideas that stop at Stage 1 and call it done. The sequence that actually produces change has three stages. Name it. Teach it. Measure it. Educational reform has spent a century perfecting Stage 1. It has treated Stages 2 and 3 as optional — as the work that someone else will do, someday, once the idea has spread far enough.

The idea never spreads far enough to make Stages 2 and 3 happen on their own. That is not how institutions work.

The Rebrand Is Honest, and the Honesty Is the Point

What I am arguing for here — Name it, Teach it, Measure it — is a rebrand. I want to say that plainly, because the alternative is claiming novelty I do not possess, which is the branding equivalent of the frameworks I am critiquing.

The underlying sequence is not new. Backwards Design has existed since Wiggins and McTighe. Evidence-Centered Design has rigorous psychometric grounding and a literature that spans decades. Constructive Alignment is taught in virtually every faculty development program in higher education. These are the same three-stage sequence rendered in different vocabularies for different audiences — psychometricians got ECD, K-12 curriculum designers got Backwards Design, university faculty got Constructive Alignment — and none of them talked to each other, and none of them reached the whole field, and the field kept building the cemetery.

The argument for rebranding is not aesthetic. The research on idea diffusion is unambiguous: what determines whether an innovation spreads is not primarily the quality of its evidence. It is the perceived characteristics of the innovation — how observable it is, how trialable, how clearly it signals relative advantage to the person being asked to adopt it. The SAMR Model has no peer-reviewed validation. It is taught in virtually every EdTech professional development program in the country. Evidence-Centered Design has rigorous psychometric grounding. It is known almost exclusively by specialists. This is not an accident. This is how the idea economy works, and pretending otherwise does not serve the students whose education depends on ideas actually crossing from research into classrooms.

The cynical reading of that evidence: brand aggressively and the evidence will follow. I am making the opposite argument. If the idea is sound — if you have done the work, if the construct is real — then naming is the tool that closes the gap between quality and reach. Phonemic awareness did not achieve mass adoption because it had a catchy name. It achieved mass adoption because it had a catchy name, a validated pedagogy, and a reliable assessment battery. The name opened the door. The evidence furnished the room. The sequence was complete.

That is the only version that actually changes anything.

What Stays at Stage 1 Dies at Stage 1

The taxonomy I have published — seven tiers of human intelligence organized around the question of what machines can and cannot do — is a naming exercise. I am saying that plainly because the honesty is the credibility. The taxonomy names constructs: plausibility auditing, problem formulation, causal reasoning, metacognitive oversight. It argues that these are the tiers that current education leaves almost entirely unscaffolded, and that this gap is now an emergency because machines are superhuman at Tier 1 and genuinely absent at Tier 7, and the curriculum has not noticed.

But naming is Stage 1. The taxonomy is Stage 1. The Gardner Trap is right there, waiting.

Stage 2 asks a harder question: what does a lesson that actually develops plausibility auditing look like? Not in theory. In practice. On a Tuesday. With thirty students who have a midterm on Thursday. What does the teacher do differently tomorrow morning if they accept the argument of the taxonomy? The research on spaced practice has been robust for decades. Classroom adoption is still slow — not because educators are incurious, but because the research never crossed into curriculum design. It stayed at the level of findings. It never became a lesson plan. The name existed. The lesson plan did not.

Stage 3 is the hardest. How do you know whether the lesson worked? This is where almost every curriculum reform fails — because the Gardner problem is fundamentally a measurement problem. He named intelligences that could not be assessed, which meant they could not be taught with accountability, which meant the poster stayed on the wall and the pedagogy never changed. The 21st century skills movement is suffering the same fate right now. "Critical thinking" appears in virtually every school's mission statement. There is no agreed-upon, validated measure of critical thinking that a classroom teacher can deploy in forty minutes. So critical thinking is a value, not a curriculum target. You cannot teach what you cannot measure. You cannot improve what you cannot assess.

We keep writing that sentence. We keep not acting on it.

The Precision Threshold, and What Crosses It

There is a specific point in the development of a construct at which it becomes researchable rather than merely discussable. Call it the precision threshold. A construct crosses the threshold when two researchers, working independently, would recognize the same behavior in the same student — when the definition is specific enough to generate comparable tasks and comparable scoring without additional negotiation.

Phonemic awareness crossed it. "The ability to hear, identify, and manipulate individual sounds in spoken words" is precise enough to build standardized tools, run interventions, track outcomes. The construct was operationalized. The research became a program.

Multiple Intelligences did not cross it. "Musical intelligence" is a real phenomenon. It was never defined precisely enough to distinguish from musical talent, musical experience, or general pattern recognition applied to pitch. Without that precision, no validated assessment. Without assessment, no accountability. Without accountability, no feedback loop. The idea spread everywhere and changed almost nothing.

The constructs I am naming are at different points. Causal reasoning is close — the Clear-3K benchmark uses 3,000 assertion-reasoning questions to evaluate whether a subject can distinguish genuine causal relationships from semantic relatedness. Problem formulation has validated rubrics in mathematics education. These fields have done the work. The question is whether that work can cross into a K-12 curriculum sequence without losing its integrity.

Plausibility auditing is the hardest case. We know what we mean by it — the capacity to ask, when confronted with a confident output from any source, is this plausible, and how would I know? But defining it precisely enough to build a lesson around, and an assessment that follows, requires distinguishing it from general skepticism, from domain expertise, from critical thinking in its vague 21st-century-skills form. That work is Stage 1, continued. I have not finished it. I am saying so.

Who Benefits When the Sequence Stays Incomplete

Here is the pattern that the adoption research reveals: the frameworks that achieved mass adoption — Bloom's Taxonomy, Growth Mindset, Multiple Intelligences, SAMR — are almost all Stage 1 only. The frameworks that completed the sequence — phonemic awareness, number sense, Self-Regulated Strategy Development for writing — are known primarily to domain specialists and never became general cultural vocabulary.

The easy reading: good branding beats good evidence. But that reading is too comfortable. What the pattern actually shows is that Stage 1 without Stages 2 and 3 produces cultural vocabulary without practice change, while Stages 2 and 3 without Stage 1 produces practice change without cultural penetration. Neither is the full win.

Ask who benefits from the incomplete sequence. Publishers benefit — a framework with a name and a set of broad orientations generates textbooks and workshops indefinitely, with no validation study required to close the sale. Administrators benefit — a framework that cannot be assessed cannot produce accountability for whether it was implemented. Districts benefit — they can report adoption of the framework without reporting whether students learned anything differently. The students do not benefit. They get the poster.

The AI era is applying pressure that may, for once, force the sequence to completion. Employers can already see, in the hiring cycle, the difference between graduates who can use AI tools and graduates who are used by them. The difference is not tool familiarity. It is plausibility auditing, problem formulation, causal reasoning — the capacities that determine whether a person can work with AI productively or be productively fooled by it. That gap is becoming visible to people who make resource allocation decisions. Visibility is the forcing function.

The name travels farther when the stakes are clear.

What Finishing Requires

The machines are already in the classroom. They are already in the hands of students taking the tests, writing the papers, solving the problems that current assessments were designed to measure. The curriculum that has not noticed this is preparing students to compete on the machine's home turf — which is the most expensive preparation possible, because the machine will always win at Tier 1, and the student who has only Tier 1 has nowhere left to go.

I am committing, publicly, to Stages 2 and 3 — to the lesson plans, to the assessment instruments, to the validation studies that either confirm or disconfirm the taxonomy's claims. I am saying that here because committing publicly is the mechanism that prevents the Gardner Trap. It removes the comfortable option of naming and walking away. It makes the incompleteness visible.

Name it. Teach it. Measure it.

Not because the sequence is new. Because finishing it is the only thing that actually changes anything — and we have been not finishing it for long enough.

SUMMARY

This piece argues that educational reform fails not because ideas are wrong but because naming an idea has been mistaken, repeatedly and structurally, for doing the work. The Gardner Trap — the failure mode in which a real and important construct becomes vocabulary without becoming practice, because the sequence of Name it, Teach it, Measure it was abandoned after Stage 1 — is not bad luck. It is the predictable outcome of a system in which the beneficiaries of incomplete implementation (publishers, administrators, districts) are not the same people who pay its costs (students).

The piece makes a second argument alongside the structural one: the author's own taxonomy of human intelligence tiers is Stage 1, and saying so plainly is the only credible basis for committing to Stages 2 and 3. The transparency is not modesty. It is the anti-Gardner move — making the incompleteness visible in order to remove the option of walking away from it.

The reader is implicated in the pattern. The laminated poster in the classroom belongs to all of us — every educator who adopted the vocabulary and called it implementation, every administrator who reported framework adoption without measuring outcomes, every researcher who published findings and did not follow them into curriculum design. The AI era, the piece argues, is applying a forcing function that previous eras did not. The costs of staying at Stage 1 are now visible to people who control resources. Whether that visibility produces the full sequence or merely produces better-branded Stage 1 exercises is the question the piece leaves open — and leaves with the reader.