The Impact of AI Assistance on Developing Coding Skills

Recent research indicates that artificial intelligence (AI) significantly enhances job efficiency, allowing individuals to complete certain tasks at a much quicker pace. Observations from a study of Claude.ai reflect that AI can accelerate task completion by up to 80%. However, this increase in productivity raises the question: are there drawbacks to relying on AI? Additional studies have suggested that the use of AI can lead to decreased engagement and reduced personal effort, as individuals may begin to depend on AI for their cognitive tasks.

The impact of this cognitive offloading on skill development is uncertain; it may hinder workers from acquiring new competencies or comprehending the systems they create, particularly in coding. To explore these concerns, we conducted a randomized controlled trial involving software developers focusing on the potential downsides of AI assistance in the workplace.

This inquiry has significant implications for the design of AI tools that support learning, workplace AI policies, and broader societal resilience. Our focus on coding—an area where AI tools are rapidly becoming fundamental—reveals a critical tension: as automation increases and productivity rises, humans must still maintain the necessary skills to troubleshoot, guide outputs, and oversee AI use in high-stakes environments. Does AI create a shortcut to both improved efficiency and skill development, or do the benefits of AI undermine skill acquisition?

Our study specifically examined two key questions: 1) how effectively software developers learn a new skill (in this instance, a Python library) with or without AI assistance, and 2) whether AI use affected their understanding of the code written.

Findings indicated that AI assistance correlated with a statistically significant decrease in mastery. Participants relying on AI scored an average of 17% lower on a quiz covering concepts they had just engaged with minutes earlier. While AI use slightly expedited task completion, the improvement was not statistically significant.

However, reliance on AI did not uniformly guarantee poor quiz performance. The manner in which participants utilized AI affected their ability to retain information. Those who demonstrated a stronger grasp of the material used AI not just for generating code but also to enhance their understanding, either by asking follow-up questions, requesting clarifications, or posing conceptual inquiries while coding independently.

Study Design

We enlisted 52 software engineers, primarily at the junior level, each having used Python weekly for over a year. These participants also had some familiarity with AI coding tools but were new to Trio, the Python library central to our study tasks.

Our study comprised three stages: a warm-up, a primary coding task involving two features using Trio (which requires knowledge of asynchronous programming), and a quiz. Participants were informed about the impending quiz but were encouraged to complete the task as swiftly as possible.

The coding assignments were designed to replicate a self-guided tutorial experience. Each participant received a problem description, starter code, and a brief explanation of Trio concepts necessary for the solution. An online coding platform featured an AI assistant that could access participants’ code and generate correct solutions upon request.
¹

Evaluation Design

In designing our evaluation, we consulted existing research in computer science education to identify four types of questions commonly used to assess mastery in coding skills:

Debugging: The skill of identifying and diagnosing errors in code. This ability is essential for recognizing when AI-generated code is incorrect and understanding the underlying issues.
Code Reading: The capacity to read and comprehend code functionality. This skill is necessary for verifying AI-generated code before use.
Code Writing: The ability to write or select the correct coding approach. While low-level syntax may diminish in importance, high-level system design skills will remain crucial with the increasing use of AI tools.
Conceptual Understanding: The ability to grasp the fundamental principles behind tools and libraries. This understanding is vital for evaluating whether AI-generated code adheres to the intended software design patterns.

Our assessment primarily concentrated on debugging, code reading, and conceptual questions, deemed most critical for overseeing AI-generated outputs.

Results

Participants using AI assistance finished roughly two minutes faster, although this difference was not statistically significant. However, a notable disparity appeared in quiz scores: the AI group averaged 50%, while the hand-coding group averaged 67%, equating to nearly two letter grades (Cohen’s d=0.738, p=0.01). The most significant score discrepancy was seen in debugging questions, highlighting a potential issue in understanding why code malfunctions if AI reliance impedes coding proficiency.

Qualitative Analysis: AI Interaction Modes

We sought to understand not just whether, but how participants completed their tasks. Our qualitative analysis involved manually annotating screen recordings to evaluate the time spent formulating queries, types of questions asked, kinds of errors made, and time dedicated to active coding.

Surprisingly, some participants dedicated up to 11 minutes (30% of their total time) interacting with the AI assistant, composing as many as 15 queries. This interaction pattern partly explains why AI users completed tasks faster, even if the productivity gain wasn’t statistically significant. We anticipate that AI’s productivity impact would be more pronounced on repetitive or familiar tasks.

Conversely, participants in the No AI group faced more errors, including issues with syntax and Trio concepts, directly correlating with the evaluation topics. We hypothesize that those encountering more Trio-related errors (i.e., control group) likely improved their debugging abilities by resolving these issues independently.

We then categorized participants based on their interactions with AI to identify distinct patterns associated with varying results in task completion and learning.

Low-Scoring Interaction Patterns: Participants categorized under low-scoring patterns typically relied heavily on AI, either for generating code or for debugging. This group’s average quiz scores fell below 40%, indicating minimal independent thought and increased cognitive offloading. They were further divided into:

AI Delegation (n=4): Participants wholly depended on AI for code writing and task completion, finishing the fastest and encountering few, if any, errors.
Progressive AI Reliance (n=4): These participants began with a few questions but gradually relinquished all coding tasks to the AI. Their poor quiz performance resulted mainly from a lack of mastery over concepts in the second task.
Iterative AI Debugging (n=4): These participants relied on AI to troubleshoot their code. They posed more questions but leveraged the assistant to solve problems rather than enhance their own understanding, resulting in poor scores and slower task completion.

High-Scoring Interaction Patterns: We identified high-scoring quiz patterns as behaviors associated with scores of 65% or above. Participants in these groups utilized AI both for generating code and for asking conceptual questions.

Generation-Then-Comprehension (n=2): Participants in this category generated code, then manually integrated it into their work. After code generation, they engaged the AI with follow-up questions to clarify their understanding. Although their speed with AI wasn’t exceptional, they demonstrated greater comprehension on the quiz. Interestingly, this approach resembled that of the AI delegation group, except these participants validated their understanding.
Hybrid Code-Explanation (n=3): These participants constructed hybrid queries, requesting both code generation and explanations of the generated code. The time taken to comprehend the requested explanations enhanced their understanding.
Conceptual Inquiry (n=7): This group focused exclusively on conceptual inquiries and applied their enhanced understanding to complete tasks. Despite encountering numerous errors, they independently worked through these issues. This mode was the fastest among high-scoring patterns and the second fastest overall, behind AI delegation.

Although our qualitative analysis does not establish a direct causal link between interaction patterns and learning outcomes, it highlights behaviors correlated with varied results.

Conclusion

Our findings suggest that aggressive integration of AI in workplaces, especially within software engineering, involves certain trade-offs. It emphasizes that not all forms of AI reliance are equal; the manner in which we engage with AI while pursuing efficiency can significantly influence our learning. Under time constraints and organizational pressures, junior developers may turn to AI for rapid task completion, potentially sacrificing skill development—and particularly the ability to debug when issues arise.

While preliminary, these findings urge companies to thoughtfully navigate the shift to a higher ratio of AI-generated to human-written code. Productivity gains may inadvertently curtail the skills essential for validating AI-produced code if junior engineers’ development is hindered by AI use. Management should deliberately deploy AI tools at scale while implementing design choices that promote ongoing learning for engineers, ensuring they retain the capacity to oversee the systems they build.

For beginners in software engineering or other fields, our research serves as a modest indication of the necessity for deliberate skill development when using AI tools. Cognitive effort—and even overcoming challenges—plays a crucial role in attaining mastery. This principle also applies to workers’ strategies for engaging with AI and selecting tools. Prominent LLM services offer learning modes (e.g., Claude Code Learning and Explanatory mode or ChatGPT Study Mode) structured to enhance understanding. Insights into how individuals learn with AI can guide future design, emphasizing that AI should facilitate both efficient work and skill development.

Previous studies have produced mixed conclusions on whether AI enhances or hinders coding productivity. Our own research revealed that AI can decrease the time required for particular tasks by 80%. However, these two studies pose different queries and employ distinct methodologies: our observational work assessed productivity among individuals with established skills, whereas this study scrutinizes AI’s role when learners acquire new information. It is plausible that AI may expedite productivity in well-practiced areas while obstructing the development of new skills, yet further investigation is necessary to clarify this dynamic.

This study marks a foundational step in understanding how human-AI collaboration influences work experiences. Our sample size was relatively small, and our assessments evaluated comprehension immediately following the coding assignments. It is crucial to determine whether short-term quiz results are indicative of long-term skill development, a question unaddressed by this analysis. Future studies should explore the effects of AI on tasks beyond coding, the permanence of these effects over time as engineers become more proficient, and how AI assistance operates differently from human support in learning contexts.

Ultimately, to foster skill development in an AI-enhanced environment, a broader understanding of AI’s impact on workers is required. While productivity gains are crucial in an AI-augmented workplace, so too is the long-term cultivation of expertise that underpins these advantages.

Read the full paper for more details.

Acknowledgments

This project was led by Judy Hanwen Shen and Alex Tamkin, with editorial support from Jake Eaton, Stuart Ritchie, and Sarah Pollack.

Gratitude is extended to Ethan Perez, Miranda Zhang, and Henry Sleight for making this project feasible through the Anthropic Safety Fellows Program. We also appreciate the feedback from Matthew Jörke, Juliette Woodrow, Sarah Wu, Elizabeth Childs, Roshni Sahoo, Nate Rush, Julian Michael, and Rose Wang on experimental design.

@misc{aiskillformation2026,
  author = {Shen, Judy Hanwen and Tamkin, Alex},
  title = {How AI Impacts Skill Formation},
  year = {2026},
  eprint = {2601.20245},
  archivePrefix = {arXiv},
  primaryClass = {cs.LG},
  eprinttype = {arxiv}
}

Footnotes

Importantly, this setup is different from agentic coding products like Claude Code; we expect that the impacts of such programs on skill development are likely to be more pronounced than the results presented here.

Study Design

Evaluation Design

Results

Qualitative Analysis: AI Interaction Modes

Conclusion

Acknowledgments

Footnotes

Leave a Reply 取消回复

You May Also Like

New Video Effects and AI Tools to Transform Final Cut Pro

Shadow AI in Education: Who Benefits?

Legal Opinion: AI Use in Home Office Asylum Claims May Be Unlawful