📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Recent developments indicate AI systems are now capable of automating most engineering tasks in AI R&D, but research processes remain less automated. This shift could accelerate AI progress significantly, though some research aspects still require human insight.
Recent benchmark results confirm that AI systems can now automate the majority of AI engineering tasks, including reproducing research and optimizing kernels, while research activities remain less automated.
Six key benchmarks measuring AI capabilities in core AI R&D skills show rapid progress, with some reaching near saturation levels. For example, the CORE-Bench, which assesses research reproduction, improved from 21.5% in September 2024 to 95.5% in December 2025, with the lead author declaring it ‘solved.’ Similarly, the MLE-Bench, testing Kaggle competition performance, advanced from 16.9% in October 2024 to 64.4% in February 2026, approaching mid-tier human levels.
These benchmarks reveal that AI can now handle complex, friction-laden engineering tasks such as reproducing research and optimizing hardware kernels at levels comparable to or exceeding human performance. Meanwhile, the progress in research—defined as generating new scientific insights—remains less clear, with some aspects potentially still requiring human creativity and insight.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.
AI research automation software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.
AI engineering development tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.
AI research reproduction benchmarks
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational
hardware kernel optimization tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications of Automated Engineering on AI Development
The rapid automation of engineering tasks in AI R&D suggests a potential acceleration in AI progress, reducing costs and increasing reproducibility. However, the residual research component—encompassing hypothesis generation, novel theory development, and creative problem-solving—may slow overall innovation if not equally automated. This shift could reshape the future of AI research, emphasizing engineering and implementation over original scientific discovery.
Progress in AI R&D Capabilities and Benchmark Milestones
Over the past two years, multiple independent benchmarks—covering research reproduction, Kaggle competition performance, and kernel optimization—have shown consistent, rapid improvements. These benchmarks, measured at different times, reveal a pattern of approaching saturation, indicating AI’s increasing proficiency at core engineering tasks. The concept of the ‘perspiration-vs-inspiration’ framing from Jack Clark’s analysis emphasizes that while engineering automation is nearing completion, the more creative research aspects are still less developed.
“The evidence suggests AI can today automate vast swaths, perhaps the entirety, of AI engineering, while research remains the residual challenge.”
— Thorsten Meyer
Unresolved Questions About AI Research Automation
While engineering tasks have become highly automatable, it remains unclear how much of the research process—such as hypothesis generation, scientific insight, and theory development—can be automated. The extent to which AI can fully replace or augment human creativity in research is still under investigation, and some experts warn that certain aspects may inherently require human intuition.
Next Milestones in AI R&D Automation
Expect continued benchmarking efforts to measure the limits of automation in research activities. Researchers and institutions are likely to focus on developing AI systems capable of generating novel scientific hypotheses, potentially reducing the residual gap. Additionally, advances in hardware and software optimization will further push the boundaries of engineering automation, possibly leading to a paradigm shift in how AI research is conducted.
Key Questions
What are the main benchmarks indicating AI’s progress in automation?
The CORE-Bench for research reproduction, MLE-Bench for Kaggle competition performance, and kernel design benchmarks are key indicators showing rapid progress toward automation of core engineering tasks.
Why is research still considered the residual in AI development?
Research involves hypothesis generation, scientific insight, and creative problem-solving, which are less amenable to automation compared to engineering tasks like coding and hardware optimization.
How might this shift impact AI research in the near future?
Automation of engineering tasks could accelerate development cycles and reduce costs, but the pace of scientific discovery may depend on breakthroughs in automating research processes or integrating human creativity.
Are there risks associated with highly automating engineering in AI?
Potential risks include over-reliance on automated systems, reduced human oversight, and challenges in ensuring the quality and safety of AI-generated research outputs.
What remains uncertain about AI’s ability to automate research?
The key uncertainty is whether AI can fully replicate or augment the creative, hypothesis-driven aspects of research, which are currently less automatable than engineering tasks.
Source: ThorstenMeyerAI.com