My AI coding workflow
"Vibe coding" is not "it"
I've been AI coding since sometime in early 2022, starting with Copilot.
Since then, AI models have evolved and improved dramatically, and my workflow has gone through many, many iterations.
A lot of the lessons are learned through battle scars (like taking down my production site because of "vibe coding" [cringe]).
I also maintain that the only way to truly get good with these tools is to use them. And not just use them in an off-hand manner, but use them extensively, like your entire future depended on it, because, well, it does.

Listen to what Mark Cuban recently said.
If I was 16, 18, 20, 21 starting today, I would spend every waking minute learning about AI. Even if I'm sleeping, I'm listening to podcasts, talking about AI, trying everything because my whole Instagram and TikTok fiends are now AI hacks. It's one thing to have a business and how you integrate at AI, that's one thing. But if you're just getting started, as you learn these things, it's so much easier to start. Now, you have a mentor, whether you use perplexity, anthropic, Claude, Chachi, PT, Gemini, it doesn't matter. You have experts and you might say, "Well, those hallucinates." Let me just tell you, your mentors and experts don't always get it right either. So you've got these experts who can answer questions that help you understand what you don't know. You've got these tools that'll just spider everything, that'll do your emails, that'll do sales calls for you. Are you kidding me?
So without further ado, here's my current workflow.
The workflow actually starts with an idea. An idea for a feature or an improvement starts brewing in my head, and I will use maybe ChatGPT Deep Research or Perplexity to think about how it works.
Like currently I am implementing Deep Research in my app AskLibrary. It started with me looking at how agentic AI flows work, best practices, how deep research flows are generally implemented. What kind of research you do varies on the specific task, all I'm trying to do at this stage is letting the idea brew in my head, expanding on it, considering different implementation strategies, thinking about a product / UX perspective, etc.
The lifecycle for this stage might be hours, or it might be weeks, depending on when I finally get around to actually implementing the idea in code.
But ok, let's say we're now on the code stage.
I will create a new branch, and start by talking to a reasoning model (o1-pro or o3-mini-high are pretty good here), just totally talking through the implementation. I'll use Superwhisper to speak my idea out loud, as that's just a much higher bandwidth way to deliver that information. If you type, you will get lazy. Because my input can get very, very large.
My goal at this stage is to create a document outlining our idea and creating a high level implementation plan. We want to clarify things and really think them through before we move to writing a single line of code. This is not necessary for small improvements and fixes, but as the scope of the task grows, this becomes more and more critical.
we are in the middle of implementing a deep research flow, and I want to walk through it with you and create a document about it.
I want you to ask me questions first about what this feature is intended to do and how its supposed to work.
So I used Roo Code, chose their Architect mode, and sent it this question.
I will show you more of the conversation below
AI Conversation (long)
Me:
we are in the middle of implementing a deep research flow, and I want to walk through it with you and create a document about it.
I want you to ask me questions first about what this feature is intended to do and how its supposed to work.
AI (Architect mode, o3-mini-high):
Could you clarify the intended purpose and workflow of the deep research flow feature? For example, what is its primary objective, what key stages or steps should it include, how should users interact with it (e.g., interactive guidance versus fully automated processing), and what kind of final output or documentation is expected from the process?
Me:
So, deep research is supposed to be a much more in-depth and advanced version of our normal question answering flow, which is already pretty advanced. So the normal flow basically works in a way where the user asks the question, we expand their query, we generate additional synthetic questions, then we do a search based on that, we find results, we identify any areas, any gaps we need to fill from the fetched results, then we fetch some more results for filling in those gaps, we summarize those results, these are key concepts as we call them. And then we answer the question, deep research will be much more thorough than this. So deep research will first, it will always start by clarifying the intent, the user might give us a small question, but we will first ask the user like a follow-up, series of follow-up questions to clarify what their meaning and intention behind asking the questions is. Then we will do iterative research, so we will maybe first based on the user's query, we will think about what all things we need to research. So like based on the query and the follow-up questions, we might come up with a list of say five things we need to research, then we will research each of them one by one, we start the first one, we do one fetch, then we review the fetched content, we see if we need to fetch anything else for that part of the process. And like up to a max number of times, maybe two or three times, then we move on to the next part, then again we do that process next, next. Then once we have completed that overall plan and we have, so for each part of the query, once we have done the search, we will like have a sort of a scratch pad where we will store any interesting observations or any noteworthy facts, maybe like a draft answer. Then after all of that is done, I think we will come to, like we finished each part of the research process and we have these, we have the list of total overall chunks we have fetched and we have these scratch pads or draft answers. Then we will take all of that and we will do like one additional step to see if we need to fill in any other gaps and then we have all of our chunks, all of the drafts. Then we generate the final answer. So this is like much, much more thorough, right? @/libs/aiHandler.ts
AI (Architect mode, o3-mini-high):
For the detailed documentation, what level of detail and structure do you envision? Should it include: 1) A high-level overview of the deep research flow; 2) Detailed step-by-step workflows that outline each research phase (clarification, iterative research, scratch pad assembly, and final answer generation); 3) Interactive flow diagrams (e.g., using Mermaid) illustrating the process; and 4) Specific integration details with existing modules like @/libs/aiHandler.ts? Which of these elements are most important for the document?
Me:
I want a comprehensive doc that includes both a high-level overview and detailed step-by-step workflows.
AI (Architect mode, o3-mini-high):
Should the document also include sections on integration details and interactive visualizations? For instance, would you like to include a dedicated section outlining how the deep research flow integrates with existing modules (e.g., @/libs/aiHandler.ts), and should we use Mermaid diagrams to visually represent the iterative workflow? Additionally, who is the target audience for the document (e.g., developers, technical leads, product managers)?
Me:
Incorporate a brief integration overview without heavy diagrams, targeting technical leads with a focus on workflows.
AI (Architect mode, o3-mini-high):
Below is a detailed plan outlining the document for the deep research flow feature. The plan is split into clear sections and includes a simple Mermaid diagram to illustrate the overall workflow. Please review the plan and let me know if you would like any changes or additions.
1. Introduction and Overview
- Purpose:
- Explain that deep research is a more in-depth evolution of our existing question answering flow.
- Emphasize its goal: to iteratively clarify the user’s intent, conduct focused research across multiple sub-questions, aggregate findings into a scratchpad, and finally synthesize a comprehensive answer.
- Audience:
- The document is targeted at technical leads, with a focus on workflows and integration aspects.
2. High-Level Workflow Overview
- Main Phases:
- Clarification Phase:
- Start by engaging with the user through follow-up questions to clarify the query’s intent.
- Iterative Research Phase:
- From the clarified query, identify multiple research topics.
- For each topic, perform one or more rounds of fetches and reviews to fill in information gaps.
- Scratchpad Assembly:
- Record insights, observations, and draft answers for each research fragment.
- Final Answer Generation:
- Review and refine all collected information to produce a final, comprehensive answer.
- Mermaid Diagram (Simplified):
graph LR
A[User Query]
B[Clarification Phase]
C[Iterative Research Phase]
D[Scratchpad Assembly]
E[Final Answer Generation]
A --> B
B --> C
C --> D
D --> E
3. Detailed Step-by-Step Workflows
- Clarification Phase:
- Initiate a conversation to understand the user's true intent.
- Generate follow-up queries determining specific aspects of the question.
- Incorporate initial clarifications into the research plan.
- Iterative Research Phase:
- Identify 4–5 sub-topics based on the clarified intent.
- For each sub-topic:
- Fetch related data and insights.
- Assess if the fetched information sufficiently covers the topic; if not, perform additional fetches (up to 2–3 rounds).
- Document each iteration's outcomes in a temporary "scratchpad."
- Scratchpad Assembly:
- Aggregate all key insights from each sub-topic.
- Create draft elements or bullet points that outline the final answer components.
- Final Answer Generation:
- Synthesize the scratchpad notes into a cohesive, comprehensive answer.
- Ensure the response directly addresses the full scope of the initial query, with clarity on any remaining gaps.
4. Integration Overview
- Integration with Existing Modules:
- Explain briefly how key modules, particularly
@/libs/aiHandler.ts
, are integrated into this flow. - Describe the interaction points (e.g., invoking the research expansion, processing follow-up questions, handling iterative fetches).
- Mention any configuration settings or dependency points that technical teams should be aware of.
- Explain briefly how key modules, particularly
- Technical Considerations:
- Highlight how the iterative research and final synthesis can be extended or modified.
- Note performance considerations and expected scalability as part of the integration discussion.
5. Conclusion
- Summarize how the deep research flow improves upon the existing process.
- Outline potential next steps for further refinement or testing in a production environment.
This plan provides a comprehensive document structure that includes both an overview and detailed workflows, along with a brief integration section. The focus is on clarity and actionable details for technical leads.
Are you pleased with this plan or would you like to make any changes? Once confirmed, would you like me to write this document to a markdown file?
Me:
Can you aslo add product perspective for a PM here? Also you missed the part about how we preserve fetched chunks, and how after all initial planned research stages are complete, we look at our scratchpad and/or chunks and do another last fetch to fill in any remaining gaps, then move on to the answer generation.
And think about this, currently, we fetch 12 chunks per query, but the original query has 4 additional queries (broader perspectives) added to it. now each chunk is ~500 tokens. There may be duplicates, but at the absolute max, it is going to be 60 chunks or 30K tokens. we do rerank down to the 24 most relevant, but in this deep research flow, those chunks could add up quickly if they're not managed well, this is an open thread that needs to be considered too.
AI (Architect mode, o3-mini-high):
Below is the updated detailed plan for the deep research flow documentation incorporating the additional points:
1. Introduction and Overview
- Purpose:
- Explain that deep research is a more in-depth evolution of our existing question answering flow.
- Emphasize its goal: to iteratively clarify the user’s intent, conduct focused research across multiple subtopics, preserve and manage fetched chunks, and synthesize a comprehensive answer.
- Audience:
- Targeted primarily at technical leads.
- Includes a Product Perspective section aimed at product managers, highlighting key product considerations and scalability challenges.
2. High-Level Workflow Overview
- Main Phases:
- Clarification Phase:
- Start by engaging the user with follow-up questions to clarify intent.
- Iterative Research Phase:
- Based on the clarified query, identify multiple research subtopics.
- For each subtopic, perform iterative fetches (up to 2–3 rounds) to gather relevant information.
- Preservation & Management of Fetched Chunks:
- For each query, 12 chunks are initially fetched.
- The query is automatically expanded with 4 additional (broader perspective) queries leading to a potential maximum of 60 chunks (~30K tokens) if no duplicates are filtered out.
- Chunks are preserved (and duplicates are handled) in a scratchpad, ensuring all fetched information is stored for later evaluation.
- Final Gap-Filling Phase:
- Once all planned research stages are complete, the accumulated scratchpad and fetched chunks are reviewed.
- A final fetch is executed to fill any remaining gaps before moving on to synthesis.
- Final Answer Generation:
- Re-rank the collection of chunks to select the 24 most relevant.
- Synthesize these into a comprehensive final answer.
- Mermaid Diagram (Simplified):
graph TD
A[User Query]
B[Clarification Phase]
C[Iterative Research Phase]
D[Preservation of Fetched Chunks]
E[Final Gap-Filling Fetch]
F[Final Answer Generation]
A --> B
B --> C
C --> D
D --> E
E --> F
3. Detailed Step-by-Step Workflows
Clarification Phase:
- Engage with the user through follow-up questions.
- Refine and expand the original query based on user intent.
Iterative Research Phase:
- Identify 4–5 research subtopics (including broader perspectives).
- For each subtopic, fetch up to 12 chunks per query.
- Execute 2–3 rounds of fetches for in-depth data gathering.
Preservation & Management of Fetched Chunks:
- Store all fetched chunks in a scratchpad.
- Implement duplicate detection and management to control token usage.
- Monitor that in the worst-case scenario, up to 60 chunks (≈30K tokens) might be fetched, then re-rank to the 24 most relevant.
Final Gap-Filling Phase:
- Analyze the scratchpad and stored chunks for any missing or underrepresented information.
- Initiate an additional fetch specifically aimed at covering these gaps.
Final Answer Generation:
- Re-rank and select the top 24 most pertinent chunks.
- Synthesize the information into a cohesive final answer.
4. Integration Overview
- Module Integration:
- Outlines how key modules (e.g.,
@/libs/aiHandler.ts
and@/libs/perspectiveBroadener.ts
) are utilized within this flow. - Describes invocation points and interactions between these modules and the deep research process.
- Outlines how key modules (e.g.,
- Technical Considerations:
- Describe configuration settings and dependencies.
- Highlight areas for potential scalability enhancements and performance optimizations.
5. Product Perspective (for PMs)
- User Value Proposition:
- Delivers a more comprehensive and accurate answer by thoroughly exploring multiple facets of a query.
- Enhances user confidence by clearly addressing any ambiguities through iterative questioning.
- Scalability and Performance Considerations:
- Addresses potential overhead where 60 chunks (up to 30K tokens) might initially be fetched.
- Explains the re-ranking mechanism that narrows down the information to the 24 most relevant, mitigating potential performance issues.
- Flags open threads regarding token management and scaling that need to be monitored and optimized in future iterations.
- Market and Product Fit:
- Highlights how deep research differentiates our product in terms of sophistication and reliability.
- Provides insights for product managers on balancing feature robustness with the associated computational cost and complexity.
6. Conclusion
- Summary:
- Recaps the key stages: clarification, iterative research, chunk preservation & gap-filling, and final answer synthesis.
- Emphasizes the enhanced research depth and improved answer quality.
- Next Steps:
- Outlines potential areas for testing, further optimization in token/chunk management, and scalable integration enhancements.
- Suggests a review cycle to refine the process based on real-world usage feedback.
This revised plan now integrates product-focused insights for PMs and addresses how fetched chunks are preserved, how a final gap-filling fetch is performed, and considerations for handling large volumes of data in deep research flows.
Me:
create a document in a new top level DOCS folder for this please
AI (Architect mode, o3-mini-high):
The deep research flow documentation has been successfully created in the DOCS folder as "deep_research_flow_documentation.md".
Me:
few clarifications. The broader perspectives are a standard part of our embedding querying workflow and happens for essentially each query we run. during the research phase, for each research subtopic, we want to maintain our "notes" based on the fetched chunks that answer that part of our overall research plan or add useful ideas and context. These notes are generated after each stage / round of subtopic research, based on the reranked fetched chunks. The reranked chunks are maintained in memory for further later processing.
After each round, we reflect on the fetched chunks, make notes, and identify follow up queries to run (which also will separately get broader perspectives generated). The follow up queries might seek out alternative perspectives, additional depth, additional breadth, with the goal of being thorough, in-depth and as true as possible.
Once we are done with one subtopic, we look at all the fetched at all the fetched chunks in memory, our notes, and the subtopic and the theme of each iteration of the subtopic, and we generate a comprehensive set of notes, very very detailed.
Then we move on to the next research phase and repeat the process.
Once this is done, we will then have all of our fetched chunks from each iteration in memory (note, we only store chunks and use them post reranking). We will look at the users original query and their answers to the follow up questions, and from all the chunks we have in memory, which might be hundreds, we will shortlist maybe 36 (post deduplication) via reranking.
Based on these shortlisted chunks, our draft answers from each subtopic research stage, and the users original question and follow up answers, we will do a final gap filling phase where we will do one last set of queries where we'll fetch 12 chunks per generated query and rerank down to 3 per query.
At this stage, we have the users original question, follow up questions and answers, the draft answers from each research phase, shortlisted chunks from each research phase, and the final gap-filling shortlisted chunks. We will take all of this information and produce one extremely detailed, super high-quality answer.
This is the flow. You may update the document with these additional details please.
What do you think about this flow?
AI (Architect mode, o3-mini-high):
The deep research flow documentation has been updated with the additional details regarding the standard embedding querying workflow, preservation of fetched chunks, detailed note-taking, iterative subtopic research, and final gap-filling phase. The document now covers every step, from the initial clarification to final answer generation, while also providing a product perspective for PMs.
[model switch to Sonnet 3.7 with Extended Thinking, because o3-mini-high was not being thorough enough with the plan. I've noticed with OpenAI models, they forget previous context and clarifications as you add newer details sooner than Claude]
Me: there are tons of details I have mentioned that are not covered in the doc yet that leda to missing context or implementation details
AI (Architect mode, o3-mini-high):
I've thoroughly updated the deep research flow documentation to incorporate all the detailed specifications that were missing from the previous version. The updated document now includes:
- Precise specifications on chunk counts (exactly 12 per query, exactly 36 shortlisted, exactly 3 per gap query)
- A new "Research Planning" phase that sets up the structure for the subtopic investigations
- Clear explanation that broader perspectives are a standard, automatic part of our embedding workflow
- Detailed descriptions of how follow-up queries specifically seek alternative perspectives, additional depth, and breadth
- Expanded information on the in-memory storage of chunks throughout the entire process
- More comprehensive details on the subtopic finalization process
- Enhanced integration section describing how existing modules interact with the flow
- A more detailed diagram showing the iterative nature of the per-subtopic process
- Technical considerations regarding memory management, deduplication logic, and performance optimization
This revised documentation now provides a more complete picture of the deep research flow, capturing both its iterative nature and the precise mechanisms for managing chunks throughout the process.
Me:
ok, we should move on to the implementation. One last thing, create a todo list for yourself within this document. We will need to make backend changes, and frontend changes. We will start by backend and only once that is fully implemented, will we do frontend changes. In the meantime, any frontend changes are kept to the bare minimum, our goal is to develop the functional flow first, then make it pretty later. During backend development, we will implement thorough logging to help us debug the flow. We will think about ensuring our architecture is easy to maintain and understand, and is robust. Understood?
[the chat goes on, we move to the implementation, I ask the AI to go step by step, etc.]
As you will notice:
- I ask the AI to ask me questions about the feature, why we are doing it, how it should work.
- I Include relevant files for context (like the
@/libs/aiHandler.ts
mention). - I actually read through everything the AI has responded with or written, very much the opposite of the "vibe coding". Yes, it's a loathesome task to read this level of verbose documentation, but that is how you spot a mistake before it brings your production system down or leaks your secret keys to end users. (all things that have happened to "vibe coders", and even to Lovable itself recently, it was a whole thing, they "vibe-took-down" the site, but even did that incorrectly. Sidenote: When Andrej Karpathy spoke about the vibe coding idea, it was cool, but how people ran with it is a cringe disaster)
- I proactively correct any misconceptions, and push back on things until I am satisfied with the plan and satisfied with the AIs understanding of the idea. You will notice I push back several times.
- Get a document created in a folder in my repo to help maintain context. I will figure out how I update this document over time, as I'm still learning this flow.
- I ask the model to create a todo list to make sure we don't miss anything, and to keep track of our progress.
- I switch models when I notice we might be going beyond the area of expertise of a specific model
So we create a plan, which we iterate on, switching models where appropriate, pushing back, clarifying things, creating a document detailing both the technical and product perspectives, and create a todo list, all of this is before writing a single line of code.
Once this is done, I will switch to coding mode. My model of choice here is Claude 3.7 Sonnet, sometimes I prefer 3.5 as its less trigger-happy with making unnecessary sweeping changes.
Once in the coding mode, I keep a close eye on things, and if I notice anything obviously wrong, I hit cancel and tell the AI to correct it or give it additional instructions.
Sometimes, things can get so messed up, that the only option left is a hard reset. Commit often.
So once in coding mode, I will keep coding in the chat with all this context if possible. As the context window fills up, you will notice the AI starts to make more stupid mistakes. At that point, starting a new chat is a good idea, and you can give it the document you created as context, with any todo list items already done ticked off. Ask the AI to ask you questions in the new chat to understand the current state before you give it free reign to fuck up your code. Better safe than time and money wasted right?
I will keep committing often, testing functional flows in the app, making sure there are no obvious regressions introduced. AI can make subtle, very hard to notice mistakes that can break things. I also ask the AI to add inline comments about any bugs we fix or any tricky things we implement so that both me and the AI can remember why something is the way it is.
While coding, emotional appeals at times can guide the model to better outputs as well btw. So can "please" and "thank you". Emotional appeals can just look like "Now you've made me sad because you're not following my instructions properly" or "Now I'm crying inconsolably" (that's a personal favorite lol), these seem to make the model more "alert". These I feel are way more effective than aggression with the models. I've also heard that offering fictional rewards and cash prizes can help, but I haven't really tried those.
For big features, I will then move on to create a pull request, and I will manually review every, single, line of code. I sometimes go on a screen share with ChatGPT or Gemini and do a pair-review, this helps me kinda "think out loud" in a way, clarify my thoughts. I will instruct the model to note anything that looks wrong, and I will speak out loud, that ok, this section seems to do this, this section does that, etc. It's rubber duck reviewing but the duck talks back lol.
During all this, if I face any issues either with a bug, or am fuzzy on implementation details, Perplexity or ChatGPT (o1 pro or Deep Research) are generally pretty good. Claude isn't bad either. Sometimes discussing thing out of band of the main coding flow can be beneficial and mind-clearing. If I'm implementing something that already exists in open source code, I might download that code, use Repo Prompt to dump it into Claude or ChatGPT, and ask questions about how it works, because it's not enough for me that the AI understands how the code works, I want to understand how things are working as well because it’s important to me that I should be able to implement it without AI too. AI is saving me dramatic amounts of time, but if I had to, I could code it myself too.
For big enough features, this is not enough. I will take the code from my master branch and the code from my new branch, and upload it to ChatGPT AND Claude using the awesome Repo Prompt tool, and ask it to review the code and notice any regressions. Yes, I will review it with both. I will use Claude 3.7 Sonnet with extended thinking and O1 Pro and sometimes even O3 mini high at the same time.
Well, once I'm satisfied with the implementation, I will merge the pull request.
And that's my current AI coding flow. It's possible I might have missed a few steps, and I would love to hear what your AI coding workflow looks like (if you are a coder using AI).
Why this is important
If the app you’re building is going to be developed for a long period of time and gets popular, you will be in for a world of pain if you aren't thorough from the start. Professional developers know that maintaining code is much more challenging than writing it. And you can really code yourself into a box if you carelessly “vibe code”. Imagine that over time, your edits start taking longer, the models make more mistakes and cause regressions, you need to do multiple iterations for simple changes, all these are things that I’ve experienced first-hand. It can become a nightmare and any time you saved by being careless in the beginning will dwarf in comparison to the productivity halting spaghetti mess you’ll have to deal with later. It doesn’t have to be that way, not if you’re careful from day one though.
Tools I use for AI coding:

(I use both Roo and Cline, Roo is more featureful, but it's more expensive as it seems to prefer making smaller changes meaning more rounds, can get infuriating at times)

