Recently, Shopify has stirred considerable excitement around integrating AI into its development processes. A lot of this enthusiasm has been sparked by viral insights shared by CEO Tobi Lütke and other executives, shedding light on how AI is fundamentally transforming their engineers’ daily tasks.
In one notable post, Tobi revealed that he had “shipped more code in the last three weeks than in the entire previous decade,” attributing this remarkable output to the use of AI tools. This revelation quickly garnered attention from founders and investors alike, sparking curiosity about the authenticity of the claims and how leadership is fostering an AI-driven culture among engineers.
To delve deeper, we spoke with Farhan Thawar, Shopify’s VP and Head of Engineering. He estimates that his team has become 20% more productive due to their commitment to AI. Early on, he recognized the potential of AI tools, promptly reaching out to GitHub’s CEO about integrating Copilot into Shopify for every engineer. When told that it wasn’t feasible at the time, his response was simple: “Figure out a way.”
This exchange took place a year before the arrival of ChatGPT, which revolutionized AI accessibility. Farhan believes that AI would revolutionize the way they work even back then. Now, five years later, he continues to lead the charge in harnessing AI’s capabilities. In this operational guide, he shares his insights on AI—from the underlying infrastructure to measuring engineering productivity and his thoughts on its future development.
Key Takeaways from Shopify’s AI-First Engineering Strategy
- Standardized infrastructure promotes tool exploration. Instead of confining teams to a single AI tool, Shopify has standardized the underlying infrastructure—creating an LLM proxy that channels all AI requests through a single gateway. This strategy allows engineers to experiment with various tools like Claude Code, GitHub Copilot, and others simultaneously, while enabling centralized cost management, tracking usage analytics, and maintaining model flexibility. Key takeaway: standardize infrastructure, not tools, in a rapidly evolving AI landscape.
- The 20% productivity boost is legitimate, but it transcends mere code output. Farhan’s team has seen about a 20% increase in productivity, a figure he views as a conservative estimate. This gain isn’t merely based on traditional metrics like lines of code or pull requests, which can be manipulated. Real improvements manifest in quicker prototyping, evaluating multiple approaches instead of just one, and delivering higher-quality outputs across various functions. What’s the best way to measure progress? Weekly demos showcasing tangible progress and overcoming obstacles in real-time.
- Organic cultural adoption is more effective than top-down directives. Shopify’s approach of simplifying AI problem-solving—where leaders share how they utilize AI rather than solely showcasing their expertise—has led to organic adoption across engineering, sales, finance, and HR. Coupled with user-friendly resources such as prompt libraries and setup guides, this cultural shift has fostered unexpected innovations, with non-engineers pivoting to create tailored solutions. Combining access, enablement, and leadership support drives effective change management.
- Comprehension debt poses a significant long-term threat. While AI may significantly expedite development, Farhan cautions against neglecting deep understanding. If engineers cease to engage intellectually, they risk losing sight of how their systems function. His guideline: engineers should grasp systems two or three layers deeper than their immediate work, utilizing AI to enhance knowledge, not replace it. Companies that accumulate comprehension debt may struggle to maintain or evolve their systems when issues arise.
- Success in 2026 hinges on mastering agentic functions. The next competitive edge will be found in coordinating AI agents, whether through parallel execution (having 10 agents collaborate while humans oversee) or iterative critique loops (45+ minute thoughtful sessions involving multiple model interrogations). Farhan asserts: “If you don’t figure out how to harness agents by 2026, you’ll be left behind.” Developers must transition from code writers to system directors and output evaluators, necessitating a fresh skill set alongside updated infrastructure, workflows, and thinking paradigms.
Part 1: AI Infrastructure
1. Standardize Infrastructure, Not AI Tools
The cornerstone of Shopify’s AI strategy lies in its infrastructure. Instead of standardizing a single AI tool, Shopify concentrated on building a foundational platform that accommodates numerous tools and models. They developed a centralized LLM proxy—an internal gateway that channels all AI requests through this one platform. The proxy interfaces between Shopify’s tools and various AI models, ensuring that requests from services like Claude Code or Copilot are routed properly before reaching providers like OpenAI, Anthropic, or Google.
This approach offers multiple advantages, particularly in centralized cost management. Given that AI models often charge based on token usage, expenses can surge rapidly with thousands of employees involved. By purchasing tokens in bulk and using a shared gateway for tracking, Shopify benefits from discounted rates while keeping a close watch on spending across teams and initiatives.
Leadership gains visibility into active experiments and successful workflows. “I can analyze usage patterns by team, project, or individual,” Farhan elaborates. “We receive alerts if someone exceeds $250 in token spending in a single day.” Instead of imposing strict limits on AI usage, Shopify investigates substantial spending, often uncovering bold and valuable projects, such as efforts to overhaul significant segments of Shopify’s mobile codebase.
The LLM proxy also offers adaptability, allowing Shopify to switch models easily without disrupting the workflow, as all tools connect to the proxy rather than directly to specific providers.
2. Connect AI to Your Internal Systems
AI tools can significantly enhance efficiency when integrated with existing internal systems. At Shopify, this integration occurs via MCP servers and platforms such as their wiki, project management tool (GSD), and data warehouse, allowing AI tools to access and retrieve information in a structured manner.
For instance, if an employee is prepping for a meeting, they can prompt an AI assistant to gather background on a person or account. The AI system can access Salesforce, review relevant Slack conversations, as well as investigate calendar events or documents within Google Workspace to compile a comprehensive overview. Importantly, access controls remain intact; AI will only pull data that the user is authorized to view. “It follows the same authentication flow,” Farhan clarifies, “so it won’t disclose information outside a user’s access rights.”
Whether developed internally or sourced from vendors, these MCP servers must adhere to the same reliability and testing standards as any other internal systems. Once deeply woven into workflows, their integration becomes integral to the organization’s infrastructure.
3. Make Internal Tools Easy to Build and Deploy
Farhan likens Shopify’s internal tool development to the early internet days. “Remember GeoCities? You could easily create a website with just a URL,” he explains. A similar aim drives Shopify: to eliminate friction so that anyone within the company can create and disseminate simple software swiftly.
To facilitate this, Shopify introduced an internal platform called Quick. Employees can drag and drop JavaScript, TypeScript, or HTML files, assign a URL, and deploy functional applications accessible company-wide in an instant. This innovation significantly lowers barriers for teams in sales, support, finance, and others to utilize AI tools to develop and launch simple applications without engineering assistance.
Farhan shared a recent example from a merchant meeting where he received a Quick link consolidating vital information about the merchant—from internal systems and data sources—into a straightforward dashboard.
By empowering employees to create these tailored tools independently, Shopify diminishes operational friction and fosters a culture of experimentation, as teams increasingly address their own needs, enhancing go-to-market efficiency and revenue.
4. Allow Engineers to Experiment with Various Tools
Many organizations typically launch AI adoption initiatives by selecting a single standard tool for company-wide implementation, restricting access to others. Conversely, Farhan embraced a more open approach. Instead of adhering to one AI tool, Shopify standardized the infrastructure that supports them.
Today, Shopify engineers utilize a diverse array of AI coding tools, including Cursor, Claude Code, GitHub Copilot, OpenAI Codex, and experimental options from Gemini. This intentional variety reflects the rapidly evolving AI ecosystem, which has yet to yield a single definitive tool. “At Shopify, we typically have one tool per job—except with AI,” he notes, “because it’s still uncertain which model or workflow will ultimately prevail.”
Part 2: Adoption and Enablement
5. Drive AI Adoption Through Culture and Tooling
When Farhan initially deployed Cursor at Shopify, he was astonished by its rapid adoption, expecting primarily engineers to be the main users. Surprisingly, the tool quickly gained popularity among sales, finance, and HR teams as well. Farhan recalls Tobi humorously suggesting that the rollout worked almost too well. The Cursor team even inquired how Farhan achieved such widespread acceptance among sales personnel.
Farhan makes a habit of demonstrating his work with AI, emphasizing not his intelligence but rather his ability to leverage technology: “It wasn’t about showcasing my effort or brilliance; it was more like, look how efficiently I can work.” This culture of open sharing, visible leadership engagement, and frictionless access to tools has made adoption appear like a natural advantage rather than a compulsory task.
Once individuals beyond R&D began utilizing AI tools, they started to create what Farhan terms “n-of-1 software”—custom, highly focused applications catering to their specific workflows. Sales teams have implemented queries, generated reports, developed decks, and prepared Monthly Business Reviews (MBRs) with minimal reliance on engineering resources. Leadership further reinforced this behavior by publicly sharing their successes.
The outcome has been extensive and deep adoption across the board, not limited to engineering. “Much of our approach includes gentle nudges, demos, and celebrating how ‘lazy’ we can be—working smarter, not harder,” he reflects. “We often say, ‘Look what I accomplished in five minutes!’” There’s no compulsion; rather, Farhan aims to illustrate the potential for others.
He asserts that the drive toward AI utilization stems primarily from culture, complemented by intentional access and enablement measures. This encompasses practical onboarding, setup instructions, connecting all systems to MCP servers, and an internal library of prompts that allow employees to adapt and reuse existing, effective workflows.
Top-down encouragement also plays a crucial role: employees are evaluated on their “AI reflexivity” during performance reviews—assessing how quickly they leverage AI when facing challenges.
6. Establish Clear Ownership for AI Enablement
At Shopify, a specialized internal team is tasked with building the AI infrastructure, enabling engineers to experiment safely and efficiently. “We have a small ML infrastructure team,” Farhan explains, “consisting of about six engineers.” Their role is not to dictate how teams employ AI; rather, they aim to eliminate barriers.
The team oversees the LLM proxy, monitors model efficiency, and ensures that engineers can access AI tools seamlessly. “They are constantly seeking ways to alleviate repetitive tasks and enhance enablement,” Farhan states. “Their ongoing inquiry is focused on reducing latency, minimizing friction, and ensuring teams can work efficiently.”
Part 3: Tracking AI Impact
7. Don’t Confuse Output with Productivity
Determining engineering productivity has always been a complex issue—and the advent of AI amplifies this complexity. With AI tools capable of generating extensive code rapidly, traditional metrics such as lines of code or pull requests become even less reliable. Increased output doesn’t inherently translate to genuine progress.
“Historically, there hasn’t been a robust metric to assess whether an engineer is productive,” Farhan admits. He provides an example from his team: an intern who deleted six lines of code saved Shopify $600,000 annually in infrastructure expenses. Under conventional productivity guidelines, such an impactful change might go unrecognized. “Automated tools might miss this kind of significant effect,” he reflects.
In his view, effective engineering has never been about maximizing code volume. Often, it’s the opposite. He references a well-known humorous exchange within the pair programming community—when queried if pair programming would reduce the total amount of code written, the retort was that it might actually lead to even less.
This principle holds true in an AI-augmented environment. Although AI can produce large quantities of code, more code is not necessarily superior code. “Code is inexpensive now,” Farhan remarks. “But I’m not looking for just code; I want solutions.” Idealy, he envisions AI assisting engineers in delivering “concise, elegant, and efficient code,” rather than merely increasing volume.
Moreover, as AI takes on a greater role, the characteristics of the code generated, read, and modified by AI might significantly differ from that maintained previously by humans. Over time, new metrics may arise based on these evolving code structures. Only time will reveal this shift.
8. Measure Impact with Real Signals, Not Vanity Metrics
Given this complexity, Farhan argues that the best indicator of progress is to conduct weekly demos where teams present their actual work. “I firmly believe that the most effective way to determine if progress is occurring is through these weekly demonstrations. Many attempt to analyze engineering productivity through various metrics,” he notes. “However, the most reliable approach remains fundamentally human.”
Through these demos, leadership can assess alignment and engage with teams to uncover successes and obstacles. Farhan does notice a slight correlation between AI usage and an increase in code delivery, though he quickly points out that this insight is tenuous—engineers could easily manipulate such metrics without improving actual productivity.
Nonetheless, he sees significant productivity increases coinciding with heightened AI usage. “Our aim is to empower every engineer with superpowers. Our goal isn’t to downsize the workforce; it’s about enhancing our engineers’ capabilities,” he explains, emphasizing that engineers are exploring numerous solutions, testing ideas, and accelerating experimentation cycles. While expenses rise from equipping employees with AI, Farhan estimates engineer productivity has increased by about 20%, particularly evident in faster feature shipping and enhanced product quality.
Part 4: Quality and Security Guardrails
9. Keep an Eye on Quality Using Reversion Rate and Humans in the Loop
As AI accelerates software development, it raises an important question: how can quality and security be maintained with a noticeable increase in code generation? Farhan emphasizes the necessity of establishing guardrails to safeguard code quality and system integrity.
A common concern is that the rise in AI-generated code could lead to an uptick in production bugs. To track this, Shopify monitors reversion rates—measuring how often pull requests are rolled back after being merged. If AI were producing lower-quality output, an increased reversion rate would be anticipated. Farhan reports that so far, this expectation has not materialized.
He notes a slight increase in the number of pull requests made weekly by engineers using AI tools, yet the reversion rate for those requests has remained relatively stable. It seems engineers are increasing their output without a corresponding decline in quality.
Crucially, the team never merges code without a senior engineer’s review. “Shopify hasn’t reached the point where we allow AI to commit code directly to repositories,” he explains. “We still require human evaluators for pull requests, which is now becoming a significant bottleneck as AI generates more code that requires scrutiny.”
For now, Farhan views this hurdle as a necessary precaution. As AI speeds up development, rigorous human review ensures that rapidity doesn’t compromise reliability.
10. Use AI as a Partner in Identifying Security Vulnerabilities
Another concern related to the acceleration of software development via AI is whether security protocols can keep pace. Some advocates contend that LLMs can produce more secure code than humans, particularly concerning prevalent vulnerabilities like SQL injections. However, Farhan remains skeptical of this assertion. He believes AI-generated code often tends to be more verbose than that created by humans, potentially introducing additional opportunities for errors.
Rather than assuming AI inherently generates safer code, Shopify considers a different approach: leveraging AI as a security ally. “AI excels at identifying vulnerabilities,” Farhan points out. By supplying the model with appropriate context and prompts, engineers can instruct it to scrutinize code for logical errors, unsafe practices, or architectural vulnerabilities. In this role, the model serves more as a reviewer than as a creator.
This analysis can extend beyond the code itself, as AI can investigate APIs, assess system boundaries, and conduct fuzz testing—sending unexpected or malformed inputs to uncover latent vulnerabilities. Farhan emphasizes that this does not diminish human accountability for security. “I would never relinquish responsibility,” he states. “I would engage AI as a collaborative partner to help identify potential weaknesses.” In practical terms, that demands engineers actively guide the model. AI tools won’t autonomously search for vulnerabilities unless adequately prompted.
“You must direct it with commands like: Act as a senior security expert. Assess the following controller code for Insecure Direct Object Reference (IDOR) vulnerabilities. Specifically, investigate if the user_id or resource_id in the request parameters are being utilized to retrieve data from the database without verifying that the presently authenticated user (session.user_id) has express permission to access or modify that particular record. Highlight any lines where database lookups occur without a multi-tenant ownership check,” Farhan elaborates.
In this capacity, AI becomes a critical instrumentation for scaling security evaluations. Although it cannot guarantee perfect safety, it can substantially enhance the breadth of testing and review a team can achieve. “It’s a tedious task for humans, making this a promising application for LLMs,” affirms Farhan. “While it can’t ensure an absence of security vulnerabilities, it significantly amplifies the level of analysis we can perform.”
11. Beware of Comprehension Debt
Farhan identifies one AI-related risk that particularly concerns him: comprehension debt. “The brain functions like a muscle,” he asserts. “Neglecting it—like skipping the gym—leads to atrophy.” As AI systems automate more tasks and generate increased quantities of code, engineers may gradually detach from their understanding of the systems they support.
While tools designed for specific use (such as custom dashboards or workflow assistants) may not necessitate extensive oversight, Farhan insists that anything interfacing with Shopify’s core commerce infrastructure still demands thorough human scrutiny. “I generally advise my team to understand systems two or three layers down from their immediate focus,” he shares. Farhan likens this perspective to elite Formula One drivers, who don’t just know how to steer their vehicle but also comprehend its engine, braking systems, and materials. This profound understanding enables them to react effectively when issues occur.
This philosophy is equally valid for engineers in an AI-integrated environment. “You shouldn’t relinquish critical thinking,” Farhan emphasizes. “Delegate the repetitive tasks to AI instead.” AI can assist in examining APIs, testing edge cases, and accelerating experimentation, but engineers must retain an understanding of the systems they construct.
“If you’re connecting to an API, let AI help you gain knowledge,” Farhan advises. “Have it analyze the API on your behalf or evaluate its boundary conditions. But don’t delegate the mental labor and say, ‘Go build this for me; I’ll be back after lunch.’”
Finally, Farhan stresses that engineers must engage with AI in a manner that complements their learning rather than substitutes it. “If your reliance on AI diminishes your cognitive engagement, I believe you will ultimately lose out in the long term,” he warns.
Part 5: Agentic Workflows
12. Prepare for Agentic Development
Looking ahead, Farhan believes that the next major transformation in software development will involve the emergence of agentic workflows—systems in which multiple AI agents collaborate alongside engineers to write, test, and refine code. In this framework, developers will spend less time on each line of code and more time guiding and assessing AI systems.
“By 2026, the focus will shift toward harnessing agents effectively,” he predicts. Practically, this entails assigning AI the repetitive coding tasks, allowing engineers to prioritize high-level decision-making. The guiding principle becomes: “How can I direct AI to handle the tedious aspects of coding, so I can focus on the strategic dimensions?”
One emergent strategy involves employing multiple agents concurrently, with some of Shopify’s senior engineers activating several AI agents to tackle various segments of a codebase. The engineer then evaluates the outputs, discarding unfit results while integrating the successful components—accelerating development significantly.
Another strategy emphasizes more profound analysis over parallelism. Instead of orchestrating numerous agents, an engineer might engage a single model in extended critique loops, wherein the AI generates a response, assesses it, revises it, and continues refining it through lengthy reasoning periods. Both methodologies highlight a shift in how engineers will interact with software systems, transitioning from exhaustive coding to orchestrating AI systems and evaluating their outputs.
“Failing to adopt agentic workflows by 2026 will leave you at a disadvantage,” Farhan warns, noting that Shopify is already investing in building the necessary infrastructure to support this approach while ensuring that engineers retain control over final decisions.
The Future of AI-First Engineering
Shopify’s journey illustrates that creating an AI-first engineering organization isn’t simply about adopting a breakthrough tool; it’s about designing an operational framework that optimizes AI’s role. Infrastructure needs to facilitate safe, cost-effective experimentation. A culture of encouragement must motivate engineers to employ AI instinctively. Additionally, robust safeguards are vital to ensure teams accelerate their pace without compromising quality or insight into the systems they construct.
As AI capabilities evolve, the role of engineers may increasingly transition from being intensive code producers to orchestrating intelligent systems. Organizations that succeed in leveraging this potential—while maintaining a deep technical understanding—will shape the future of software development. However, knowing the foundational principles is distinct from addressing the real-time decisions faced by your team. The essential strategy lies in starting with the right approach suited to your current scale and phase, all while preparing for future growth.