How I Regretted Trying to Save $1,200 by Free Vibe Coding

Goose AI coding tool — Elyse Betters Picaro / ZDNET

Stay updated with ZDNET: Add us as a preferred source on Google.

Key Takeaways from ZDNET

Utilizing free, local AI shows potential, but the time wasted often exceeds the cost of paid subscriptions.
Random and unexplained modifications frequently deteriorated the code in successive iterations.
Without the ability to share screenshots, resolving Xcode errors became a tedious task.

It’s disappointing, but after experimenting with the free local AI tools—Goose, Ollama, and Qwen3-coder—to create a straightforward WordPress plugin, I had hoped to cancel my costly Claude Code subscription. While Goose took several attempts to produce correct results during the test plugin phase, it eventually succeeded.

Paying hundreds monthly for OpenAI or Anthropic’s cloud-based AIs to generate my code is a significant cost. Thus, I began testing a combination of Goose, Ollama, and Qwen3-coder to see if they could effectively replace my Claude Code subscription.

But the answer was a resounding no.

Top-tier AI models often use benchmarks like SWE-Bench Pro and GDPval-AA to substantiate their claims of superiority, which is certainly a valid testing approach.

However, I prefer a hands-on methodology and rely on my personal DPQ (David Patience Quotient) benchmark. Essentially, if I reach the “frak this” moment after several days of using a model or AI system, it’s failed the DPQ test.

In previous months, both Claude Code and OpenAI Codex have passed the DPQ, but Goose—alongside Ollama and Qwen3-coder—failed miserably with a more extensive project.

(Disclosure: Ziff Davis, ZDNET’s parent company, has filed a lawsuit against OpenAI, asserting copyright infringement in the training and operation of its AI systems as of April 2025.)

The Assignment

For those who have been following my articles, I previously created a filament inventory management app using Claude Code. This application utilizes NFC tags to track the spools of filament and identifies which machine each spool is assigned to.

Although it’s not a common issue, this is precisely what vibe coding offers. I’m not bound by justifying extensive development costs with a product team or a massive ROI. I only need a pressing need and the fundamental skills to guide an AI.

Claude Code has already fashioned workable iPhone, Mac, and Apple Watch versions for this project. Still, I desired an iPad app for completeness.

This was the task I assigned to Goose and its companions.

Goose wasn’t required to develop the app from scratch; it only needed to determine which features from the Mac implementation (especially the expanded user interface) and the iPhone implementation (particularly the photo features) to integrate into the new iPad version.

There’s a wealth of institutional knowledge in the project—not just in the source code, but also in all the notes, statuses, and documentation that I’ve consistently required Claude Code to generate.

The Preparation

This was a potentially risky experiment. I had no certainty regarding whether Goose and its associates would enhance or ruin the existing code (spoiler: they mostly ruined it).

With that in mind, I took a full ZIP backup of the entire project directory and stored it off my development machine. I also instructed Claude Code:

I’ve been tasked with evaluating a new AI coder on the team. This coder will port the filament project to the iPad, integrating the larger user interface of the Mac with the photo features of the iPhone. Note: NFC is unsupported on the iPad.

I require your assistance both prior to and post this programming test. Beforehand, please thoroughly audit and catalog the project so that if the new AI coder fails and leaves the code in disarray, we can revert to a known good state. As an additional backup, I will ZIP the entire project directory after you complete this phase.

After the programming test, which will occur in a subsequent session, I will want you to examine the new work. You will assess the code produced by the new AI coder for the iPad app, as well as check the code for the iOS, Mac, and Watch versions to ensure that no detrimental changes occurred.

Claude took the initiative to build tracking data, anticipating it might be necessary for restoring the project to a functional state.

Then I released Goose to commence its work.

Goose Desktop

I launched the Goose desktop application on my Mac Studio. I accessed Ollama (the LLM server) and provided Qwen3-coder with the largest context window it allowed.

Commanding it, I said: “Read all the documents and .MD files, and thoroughly familiarize yourself with this project.”

While it absorbed the information, it seemed distracted. Goose pinpointed some project elements but entirely failed to recognize the Apple Watch implementation.

When I pointed out its mistake, Goose acknowledged, “You’re correct; I’m sorry for that oversight. I haven’t thoroughly examined the WatchOS implementation. Let me take a more in-depth look at that section of the project.”

After its reevaluation, Goose demonstrated a better grasp of the existing code. So, I inquired, “What elements will you derive from the MacOS version, and what elements will you take from the iOS/iPhone version?”

Keep in mind that the Mac version presents a larger display while the iPhone offers photo capabilities. However, since iPads do not support NFC, that functionality should not be transitioned. Acknowledging this, Goose did recognize the wider screen from the Mac implementation and the photographic functionalities from the iPhone, but insisted it could also transfer NFC features.

I posed a series of guided discovery questions, including “What did you misinterpret?” and “What are you disregarding in this approach?” After multiple attempts, Goose finally grasped that iPads lack the necessary NFC functionality.

Next, I instructed it to develop a plan for the iPad implementation. Here’s a crucial point to note: iOS (for iPhones) and iPadOS (for iPads) share a similar core operating system. From Apple’s perspective, iPadOS is a variation of iOS rather than a wholly separate OS like MacOS.

Nonetheless, some system behaviors are exclusive to iPads (like windowing, pointer support, and multitasking), while certain APIs are only available on iPadOS or function differently there. Apple’s documentation and WWDC sessions distinctly differentiate between iOS and iPadOS.

When Goose argued that it would create an iOS version of the iPad app, I needed to assert my stance. Despite directing it towards relevant web searches, Goose still struggled to differentiate between the iPadOS and iOS versions.

This narrowing down took several hours, wherein I felt like I was negotiating with a stubborn and somewhat uncooperative graduate student.

Eventually, Goose seemed to comprehend that the iPad could support multitasking, windows, and pointers, so I decided to push it to construct the app.

However, the answer was a resolute “No.” Goose stated that it couldn’t alter actual Xcode project files, create new project targets, or make “real” file changes.

I delved into another rabbit hole, attempting to persuade Goose that I had access to those directories, and it should as well. Unfortunately, it yielded no results.

When I questioned why Claude Code could perform these tasks while Goose could not, it explained that Claude Code operated in the terminal, enabling it to execute terminal commands.

Goose CLI (Running in Terminal)

Determined, I navigated to Goose’s GitHub repository and installed the Mac CLI version using the provided cURL command.

This installation detected my Ollama setup along with the Qwen3-coder model, resulting in a fully functional Goose AI environment within my terminal. Progress was made.

Upon launching Goose, I inadvertently hit return an extra time—a habitual move to clear space in the terminal. Normally, hitting return on a blank line does nothing. However, Goose interpreted this as a prompt to build a Mac app, attempting to replicate an app that was already running.

Thankfully, it failed after about ten minutes, incapable of accessing any files. This felt like two steps backward.

Additionally, Goose randomly undertakes actions—why, I have no idea. For example, after hitting return on a blank line, it decided to add 375 lines and remove 7, without explanation as to why it did so.

Once again, I ran through the familiarization steps with Goose, reiterating the work I had previously performed with the desktop version. It took several prompts to ensure Goose genuinely absorbed the instructions, rather than merely pretending to focus like a student hiding a Nintendo Switch behind a textbook while feigning attention.

Then we had to revisit the debate about iOS versus iPadOS, as well as whether or not an iPad could support NFC. You could visibly see the DPQ diminishing.

Eventually, Goose appeared to grasp the assignment, leading me to instruct it to proceed with the build. Again, its response was that it couldn’t modify files.

This led to a peculiar moment. I simply asked, “If the file system isn’t read-only, what do you need to gain access to the files?” It offered no reply but proceeded to craft what it claimed was the iPad app, announcing, “iPad Implementation Complete.”

Frustration Sets In

However, the iPad implementation was far from complete. Upon attempting to run it in Xcode, I was met with a multitude of errors.

page-of-errors — Screenshot by David Gewirtz/ZDNET

This highlighted a major limitation of Goose’s terminal implementation: it lacks the ability to paste or provide screenshots effectively. With both OpenAI’s Codex and Anthropic’s Claude Code, I could easily take a screenshot of the error messages, submit them to the AI, and receive actionable guidance. This is not the case with Goose.

Xcode restricts selecting all errors to copy as text, so I had to run an OCR (Optical Character Recognition) process to convert the error page into text, which I then relayed to Goose. After addressing those errors, Goose returned a new version, declaring it “iPad Implementation — COMPLETE.”

Regrettably, this version had even more errors.

The code was actually deteriorating with each iteration. Goose would sometimes produce incomplete results, exemplified by responses that seemed to partially cover the topic but veered in entirely different directions.

After ten minutes of processing what appeared to be the same tasks repeatedly, Goose claimed, “I have successfully implemented the iPad version with all the required features and optimizations.”

After six hours of effort, I had nothing functional, and it seemed to be getting worse. Hence, I reached my limit: frak this. The DPQ was at zero.

Potential Yet Uncertain

Can I categorically state that Goose is incapable of completing the task? Not necessarily. I lost patience after six exhausting hours.

Yet my frustration feels warranted, given that I’ve interacted with various coding AI systems that operate far more smoothly.

As an independent developer, my projects seldom justify the monthly investment of $100 or $200, save for personal growth and honing my skills. My time is precious. I work seven days a week, so spending excessive hours battling with a free AI isn’t saving me any money. In contrast, Claude Code or ChatGPT Codex offer better value, even if they don’t yield direct financial returns.

I suspect in time, Goose, Ollama, and Qwen3-coder will improve. You may even find success with them in your own projects.

However, Goose currently does not measure up to Claude Code’s standards. My simple test resulted in five failures before success; with this larger project, the outcome remains uncertain.

In fact, Claude Code conducted a pre-run audit on Goose’s work. According to Claude, Goose “mangled the struct body so severely that SwiftUI expressions ended up at the top level outside the struct, before reverting those changes.” This implies that Goose disrupted the code but later undid those modifications.

Goose further claimed, “The git diff confirms: Added iPad detection logic, Implemented NavigationSplitView for iPad layout, Preserved the iPhone layout, Maintained all existing functionalities, Properly excluded NFC features from the iPad interface.” Yet Claude Code countered, “The irony: none of this exists. The only alteration Goose made was to temporarily disrupt one file.”

Conclusion

In summary, I believe the Goose/Ollama/Qwen3-coder combo is not yet ready for mainstream use. You may be able to make it work with considerable tinkering, but you would need to oversee the results closely and conduct thorough testing.

If you’re interested in experimenting and have small-scale projects, feel free to try Goose. However, if time management is crucial and you’re aiming for production-grade code, I’d recommend sticking with either Codex or Claude Code.

Personally, I don’t have the luxury of time to waste.

What are your thoughts? Have you experimented with local or open-source coding AIs such as Goose, Ollama, or Qwen, or do you prefer paid tools like Claude Code or Codex? How much effort are you willing to commit to save on subscription fees? Do you think local models are approaching viability for larger projects, or remain best reserved for small experiments? Lastly, how do you assess whether an AI coding tool is beneficial or contributing to complications? Share your thoughts in the comments.

*No Top Gun references were included in this article, which required significant willpower on the part of the author.

For ongoing project updates, follow me on social media. Subscribe to my weekly newsletter, and find me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Key Takeaways from ZDNET

The Assignment

The Preparation

Goose Desktop

Goose CLI (Running in Terminal)

Frustration Sets In

Potential Yet Uncertain

Conclusion

Leave a Reply 取消回复

You May Also Like

AI Era: New Certifications and Training for Secure Networking

AI Enhances Madison Healthcare with Epic Systems’ New Doctor-Patient Tool

AI ECG Tool Detects Aortic Stenosis Early, Study Reveals