AI engineering has quickly become one of the most valuable skill sets in tech
The problem is that most beginners have no clear idea what they should actually study
Some start with machine learning theory
Some get stuck endlessly watching tutorials
Others jump straight into prompts and agents without understanding APIs, backend basics, or how real products are actually built
The result is usually the same: a lot of confusion and very little practical skill
If your goal is to become an AI engineer, you don’t need to master every field of artificial intelligence
You need to learn how to build useful AI systems in the real world
That means learning how to:
- build end-to-end applications with LLMs
- work with model APIs such as OpenAI and Anthropic
- properly design prompts and context
- use structured outputs and tool calling
- add retrieval when needed
- deploy projects so people can actually use them
This guide was created to give you a practical 6-month roadmap
The article is 10,000+ WORDS, so reading it may take a few hours or even longer
But its real value is that for every skill you need to learn, there are resources and clear explanations of what to do
That way, within six months you can reach the level of AI engineering, and start using it for yourself already within the first 1-2 months
Writing this article took more than 40 HOURS, and I worked on it together with my friend @andy_ai0
He just started building his personal brand on X, but he understands AI very well and helped a lot with this article
I definitely think he deserves your follow and support as he grows
Now let's start reading the article ⬇️
What an AI Engineer actually does
A lot of people hear the phrase "AI engineer" and imagine someone training giant models from scratch
In reality, most modern AI engineers do something much more practical
They build products and systems on top of existing models
That usually includes:
- connecting to LLM APIs
- designing prompts and context flows
- building chat, search, or automation systems
- integrating tools, databases, and external APIs
- handling structured outputs
- improving reliability, cost, and latency
- deploying AI features into real applications
So in practice, an AI engineer often sits somewhere between:
- software engineering
- product engineering
- automation
- applied AI
This is why the role is growing so fast
Companies do not only need researchers They need people who can take models and turn them into useful products
That is also why this roadmap focuses less on heavy theory and more on practical execution
If you can build real LLM apps, retrieval systems, automations, and production-ready workflows, you are already much closer to being employable than most beginners
⏩------------------------------------------------------------------------⏪
Month 1: Get solid enough in coding and the fundamentals
Your goal this month: Become a functional Python developer
You don't need to be an expert, you just need to stop Googling basic syntax and be able to build simple programs confidently
AI engineering is first and foremost software engineering
Everything in the later months assumes you can write clean Python, use the terminal, call APIs, and manage a codebase. This month is your foundation
What to learn
1. Python
Python is the language of AI engineering. Full stop. Almost every library, API, and tutorial you'll encounter over the next six months is in Python
How to learn it:
Start with a structured course that forces you to write code, not just watch videos
The most common mistake beginners make is consuming content passively, reading along, nodding, and never opening a code editor
Fight this by coding every single example as you go
Resources:
1. Python for Everybody (Coursera, free to audit)
Link: https://www.coursera.org/specializations/python
The best starting point for absolute beginners. Dr. Chuck is one of the most beginner-friendly Python teachers on the internet
2. freeCodeCamp Python Course (YouTube, free)
Link: https://www.youtube.com/watch?v=rfscVS0vtbw
A comprehensive 4-hour video covering all the fundamentals
3. CS50P: Introduction to Programming with Python (Harvard, free)
Link: https://cs50.harvard.edu/python/
More rigorous. Includes problem sets and a final project. Great if you want structure
4. Official Python docs (the tutorial)
Link: https://docs.python.org/3/tutorial/
Dry but authoritative, use as a reference
What to focus on:
- Variables, data types, loops, conditionals, functions
- Lists, dictionaries, sets, tuples
- File I/O and working with JSON
- Classes and basic OOP (just enough to understand what you're reading)
- Error handling with try/except
- Virtual environments (venv) and pip
- Package management – understanding requirements.txt
Practice project: Build a simple CLI tool in Python. Something like a personal expense tracker that reads/writes to a JSON file, or a script that calls a public API (like a weather API) and prints formatted results
2. Git and GitHub
Git is how professional developers save and share code. You'll need it constantly, to version your projects, collaborate, and showcase your portfolio work on GitHub
How to learn it:
Git is confusing at first because the mental model is non-obvious
Don't try to memorize commands instead, understand what problem Git is solving
(tracking changes, enabling collaboration, letting you undo mistakes) and the commands will make sense
Resources:
1. GitHub Skills (free, interactive)
Link: https://skills.github.com/
Official interactive courses built inside GitHub itself. Start here
2 . Learn Git Branching (free, interactive)
Link: https://learngitbranching.js.org/
Hands-down the best visual tool for understanding branches and merges
3. Pro Git Book (free online book)
Link: https://git-scm.com/book/en/v2
The comprehensive reference. Skip to chapters you need
What to focus on:
- git init, add, commit, push, pull
- Branching and merging
- Understanding .gitignore
- Creating repos on GitHub and pushing local projects
- Reading and writing basic README files
Practice: From now on, every single project you build, even small scripts, should live in a GitHub repo. This builds the habit and gives you a portfolio
3. CLI / Terminal Basics
As an AI engineer you'll be running scripts, installing packages, managing servers, and navigating files entirely from the command line
Being slow or scared in the terminal is a real bottleneck
Resources:
1. The 50 most popular Linux & Terminal commands (full course for beginners)
Link: https://www.youtube.com/watch?v=ZtqBQ68cfJc
Good for absolute beginners on Linux/Mac
2. The Missing Semester of Your CS Education (MIT, free)
Link: https://missing.csail.mit.edu/
Covers shell scripting, terminal tools, and the command line fluency that most CS courses skip
What to focus on:
- Navigation: cd, ls, pwd, mkdir, rm
- Reading files: cat, less, grep
- Running Python scripts from the terminal
- Environment variables
- Basic understanding of PATH
4. JSON, APIs, HTTP, and Async Basics
You'll be calling LLM APIs from day one of Month 2
That means you need to understand how web APIs work before you ever touch OpenAI or Anthropic's SDKs
Resources:
1. HTTP basics – MDN Web Docs (free)
Link: https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview
The clearest explanation of how HTTP requests and responses work
2. REST API Tutorial
Link: https://restfulapi.net/
Short and practical
3. Python requests library docs
Link: https://requests.readthedocs.io/en/latest/
Learn how to call any web API in Python
4. Python async/await (free)
Link: https://realpython.com/async-io-python/
Understanding async is essential for working with streaming LLM responses later
What to focus on:
- GET, POST requests – what they are and how to make them in Python
- Reading and writing JSON
- HTTP status codes (200, 400, 401, 404, 500 – what each means)
- What an API key is and basic auth patterns
- What async def and await do and why they exist
Practice project: Write a Python script that calls a free public API (try Open-Meteo for weather data – no API key needed) and formats the result as a clean JSON output
5. Basic SQL and Pandas
You won't need to be a data scientist, but you will regularly need to inspect, query, and manipulate data
SQL basics and pandas fluency will save you constantly
Resources:
1 . SQLBolt (free, interactive)
Link: https://sqlbolt.com/
The fastest way to learn SQL from scratch. 20 short lessons with in-browser exercises
2. Pandas official getting started guide
Link: https://pandas.pydata.org/docs/getting_started/index.html
Work through the 10 Minutes to Pandas tutorial
3. Kaggle Pandas course (free)
Link: https://www.kaggle.com/learn/pandas
Hands-on, practical, short
What to focus on:
- SQL: SELECT, WHERE, GROUP BY, JOIN, ORDER BY
- Pandas: loading CSVs, filtering rows, selecting columns, basic aggregations
6. FastAPI
Resources:
1. FastAPI Official Tutorial (free)
Link: https://fastapi.tiangolo.com/tutorial/
Genuinely one of the best framework docs ever written
Work through it start to finish. Covers path parameters, request bodies, Pydantic validation, and running a dev server
2. Python API Development (19-Hour Course, freeCodeCamp, YouTube, free)
Link: https://www.youtube.com/watch?v=ZtqBQ68cfJc
Covers API design fundamentals including routes, serialization, schema validation, and SQL database integration. Builds a full social-media-style API from scratch
What to focus on: Creating GET and POST endpoints, path and query parameters, request bodies with Pydantic, running uvicorn, and using FastAPI's built-in /docs interface to test your API without writing a client
Month 1 Milestone
By the end of this month you should be able to:
- Write Python programs that read/write files, call APIs, and handle errors
- Version your code with Git and push projects to GitHub
- Navigate the terminal without hesitation
- Understand what an HTTP request is and make one in Python
- Query a SQLite database with basic SQL
- Build and run a simple FastAPI app locally
⏩------------------------------------------------------------------------⏪
Month 2: Master LLM App Development
Your goal this month: Build real AI-powered applications using the OpenAI and Anthropic APIs
By the end you should be comfortable writing prompts that work reliably, getting structured data out of models, making them call your functions, and handling everything that can go wrong
This is the core of AI engineering. Everything else in the roadmap builds on what you learn here
What to learn
1. Prompting Fundamentals
Prompting isn't just asking questions nicely. It's the craft of writing instructions that produce consistent, reliable outputs from models that are fundamentally probabilistic
As an AI engineer you'll spend a surprising amount of time here
How to learn it:
Start with Anthropic's interactive tutorial because it's the most hands-on
Then read OpenAI's official guide. After that, the Prompt Engineering Guide consolidates everything
Work through all three in order – each one reinforces the others
Resources:
1. Anthropic's Interactive Prompt Engineering Tutorial (free, GitHub)
Link: https://github.com/anthropics/prompt-eng-interactive-tutorial
A step-by-step course broken into 9 chapters with exercises, designed to give you many chances to practice writing and troubleshooting prompts yourself
Run it as Jupyter notebooks with the Claude API
2. Anthropic Prompt Engineering Docs (free)
Link: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
The official reference. Covers everything from basic clarity to XML structuring and agentic systems
3. OpenAI Prompt Engineering Guide (free)
Link: https://platform.openai.com/docs/guides/prompt-engineering
The official guide from OpenAI, covering prompt formats that work well with their models and lead to more useful outputs
4. PromptingGuide.ai (free)
Link: https://www.promptingguide.ai/
Covers essential techniques from basic prompting to advanced strategies, plus function calling, tool integration, and agentic systems
What to focus on: The difference between system and user messages, why specificity matters, chain-of-thought prompting (think step by step), using examples in prompts (few-shot), and how small wording changes can dramatically shift output quality
Practice: Take a real task – summarize a document, extract key info from text, classify a piece of feedback – and write 5 different prompts for it. Compare outputs. You'll immediately see how much prompt design affects reliability
3. Structured Outputs / JSON Schemas
In real applications you almost never want raw text from an LLM, you want structured data you can parse, store, and use in your code
Structured outputs solve this by forcing the model to match a schema you define
Resources:
1. OpenAI Structured Outputs Guide (official docs, free)
Link: https://platform.openai.com/docs/guides/structured-outputs
Covers the feature that ensures models always generate responses adhering to your JSON Schema, so you don't need to worry about missing keys or hallucinated values
2. Instructor library (free, open source)
Link: https://python.useinstructor.com/
The cleanest way to get structured outputs from any LLM provider using Pydantic models
Works with OpenAI, Anthropic, Google, and 15+ other providers using the same code interface, with automatic retries when validation fails
This is what most production AI engineers actually use
3. OpenAI Cookbook: Structured Outputs Introduction (free)
Link: https://developers.openai.com/cookbook/examples/structured_outputs_intro/
Practical examples covering chain-of-thought outputs, structured data extraction, and UI generation, good for understanding real-world use cases
What to focus on: Defining Pydantic models for your data, passing schemas to the API, understanding the difference between structured outputs and JSON mode, and handling refusals gracefully
Practice project: Build an invoice or receipt parser. Give it raw text (e.g. "Invoice #123, $45.99 for 3 widgets, due March 30") and have it return a structured Python object with fields like invoice_number, amount, items, due_date
4. Function / Tool Calling
Tool calling is what transforms an LLM from a text generator into something that can take actions – search the web, query a database, call your API, run code. It's one of the most important skills in this entire guide
How to understand it: The model doesn't actually execute your functions
It examines the prompt and returns a structured call with the function name and arguments when it decides a tool should be used
Your code then executes the call and sends the result back
Resources:
1. OpenAI Function Calling Guide (official docs, free)
Link: https://platform.openai.com/docs/guides/function-calling
The definitive reference. Covers defining tools, the 5-step calling flow, parallel calls, and best practices
2. Anthropic Tool Use Docs (free)
Link: https://docs.anthropic.com/en/docs/build-with-claude/tool-use
Anthropic's equivalent guide for Claude. The concepts are the same, the syntax is slightly different
3. OpenAI Cookbook: How to Call Functions with Chat Models (free, GitHub)
Link: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_call_functions_with_chat_models.ipynb
A complete runnable notebook walking through the full tool-calling loop with real examples
What to focus on: Describing functions clearly in JSON Schema, parsing tool call responses, executing the function and feeding results back, handling cases where no tool call is needed, and the concept of tool_choice: "auto"
Practice project: Build a simple assistant that has three tools: get_weather(city), calculate(expression), and search_notes(query) (just search a hardcoded dict). Wire them all up and watch the model decide which one to call based on what you ask it
5. Streaming Responses
Streaming means showing the model's output as it's being generated – word by word – rather than waiting for the full response. It makes your apps feel dramatically faster and more alive
Resources:
1. OpenAI Streaming Docs (official, free)
Link: https://platform.openai.com/docs/api-reference/streaming
The reference for adding stream=True to requests and iterating over chunks
2. Anthropic Streaming Docs (official, free)
Link: https://docs.anthropic.com/en/api/messages-streaming
Anthropic's streaming API reference with Python examples
3. How Streaming LLM APIs Work – Simon Willison (free)
Link: https://til.simonwillison.net/llms/streaming-llm-apis
A clear technical breakdown of how Server-Sent Events work under the hood for OpenAI, Anthropic, and Google, useful for understanding what's actually happening at the HTTP level
What to focus on: Setting stream=True, iterating over delta chunks, assembling the full response from parts, and wiring streaming into a FastAPI endpoint using StreamingResponse
Tip: Streaming is almost always the right choice for user-facing apps. Nobody wants to stare at a loading spinner for 10 seconds waiting for a full response to appear at once
5. Conversation State
LLMs are stateless – they have no memory between calls. Conversation history is something you manage by sending the full message list with every request. Understanding this is fundamental
Resources:
1. OpenAI Chat Completions Guide, Managing Conversations (official, free)
Link: https://platform.openai.com/docs/guides/conversation-state
The canonical explanation of how the messages array works and how to manage multi-turn conversations
2. Anthropic Messages API Docs (official, free)
Link: https://docs.anthropic.com/en/api/messages
Anthropic's equivalent. Same concept, worth reading both to see how they differ
What to focus on: The messages array structure, why you append both user and assistant messages, context window limits and what happens when you exceed them, and basic truncation strategies (drop oldest messages, summarize history)
Practice project: Build a simple multi-turn chatbot in the terminal. Each turn appends to the messages list. Add a /reset command to clear history, and print the current token count after each exchange
6. Cost, Latency, and Token Basics
Shipping AI apps without understanding costs and tokens is how you end up with surprise bills and slow apps. This is boring but critical
Resources:
1. OpenAI Pricing Page (official)
Link: https://openai.com/api/pricing
Know what input and output tokens cost per model. Bookmark this and check it whenever you pick a model
2. Anthropic Pricing Page (official)
Link: https://www.anthropic.com/pricing
Same for Claude models
3. OpenAI Tokenizer Tool (free, interactive)
Link: https://platform.openai.com/tokenizer
Paste any text and see exactly how many tokens it is. Use this constantly while you're learning
4. Tiktoken (Python library, free)
Link: https://github.com/openai/tiktoken
OpenAI's tokenizer library for counting tokens in code before sending requests
What to focus on: What a token is (roughly 4 characters / 3/4 of a word), how input vs output tokens are priced differently, how context window size affects what you can do, and the latency trade-off between smaller faster models and larger smarter ones
Also: don't use GPT-4/Opus for everything – cheaper models are often good enough for simple tasks
7. Failure Handling
LLM APIs fail. Rate limits get hit, responses time out, the model returns malformed JSON. Handling failures gracefully is what separates a demo from a production app
Resources:
1. OpenAI Error Codes Reference (official, free)
Link: https://platform.openai.com/docs/guides/error-codes
Every error type you'll encounter and what to do about it
2. Anthropic Error Handling Docs (official, free)
Link: https://docs.anthropic.com/en/api/errors
Same for Claude
3. Tenacity (Python library, free)
Link: https://tenacity.readthedocs.io/
A clean library for adding retry logic with exponential backoff to any Python function. One decorator and your retries are handled
What to focus on: Rate limit errors (429) and exponential backoff, timeout handling with httpx/requests, validating model output before using it, fallback strategies (retry with a different model, return a cached response), and never crashing your app because the LLM returned unexpected output
8. Prompt Injection Awareness
Prompt injection is the #1 security risk in LLM applications
It happens when untrusted user input is combined with system instructions, allowing a user to alter, override, or inject new behavior into the prompt –causing the system to perform unintended actions or generate manipulated outputs
You don't need to be a security expert, but you need to know this exists before you ship anything
Resources:
1. OWASP Top 10 for LLM Apps – LLM01: Prompt Injection (free)
Link: https://genai.owasp.org/llmrisk/llm01-prompt-injection/
The authoritative classification covering direct injections (jailbreaking), indirect injections via external content like documents or websites, and real-world attack scenarios
2. OWASP Prompt Injection Prevention Cheat Sheet (free)
Link: https://cheatsheetseries.owasp.org/cheatsheets/LLM_Prompt_Injection_Prevention_Cheat_Sheet.html
Practical defensive patterns: input validation, privilege control, and output validation
3. Evidently AI: What is Prompt Injection (free)
Link: https://www.evidentlyai.com/llm-guide/prompt-injection-llm
A clear developer-focused explainer on attack types, risks, and design patterns to mitigate them
What to focus on: The difference between direct and indirect injection, why system prompts aren't truly "secure", the principle of least privilege for tool access, and never trusting unvalidated LLM output to make consequential decisions automatically
Month 2 Milestone
By the end of this month you should be able to:
- Write prompts that produce consistent, reliable outputs for a given task
- Get structured JSON data out of any model using Pydantic + Instructor
- Wire up tool calling so a model can call your Python functions
- Stream responses in real time through a FastAPI endpoint
- Manage multi-turn conversation history properly
- Estimate the token cost of a request before sending it
- Handle API errors, timeouts, and bad outputs without crashing
- Explain what prompt injection is and apply basic defenses
⏩------------------------------------------------------------------------⏪
Month 3: Learn RAG Properly
Your goal this month: Build systems that let LLMs answer questions from your documents, not just from their training data
By the end you should be able to ingest documents, embed and store them, retrieve the right chunks at query time, and produce answers that are grounded, accurate, and citable
RAG is the most in-demand practical skill in AI engineering right now. Almost every real enterprise AI use case – customer support bots, internal knowledge bases, document Q&A – is built on it
Understanding it deeply, not just copying a tutorial, is what separates good engineers from great ones
1. Embeddings
Before you can build a RAG system, you need to understand what embeddings actually are – because they're the foundation everything else is built on
A text embedding is a piece of text projected into a high-dimensional vector space
The position of that text in this space is represented as a long sequence of numbers
Critically, text that is semantically similar ends up close together in that space – which is what makes similarity search possible
Resources:
1 . Stack Overflow Blog: An Intuitive Introduction to Text Embeddings (free)
Link: https://stackoverflow.blog/2023/11/09/an-intuitive-introduction-to-text-embeddings/
The best beginner explanation. Written by a developer who has spent years building NLP products, with a focus on building the right intuition rather than the math
2. Google ML Crash Course: Embeddings (free)
Link: https://developers.google.com/machine-learning/crash-course/embeddings
Covers why dense vector representations solve problems that one-hot encoding can't – specifically, capturing semantic relationships between items
3. HuggingFace: Getting Started With Embeddings (free)
Link: https://huggingface.co/blog/getting-started-with-embeddings
Hands-on guide. Shows how to generate embeddings using the sentence-transformers library, host them, and use them for semantic search over a real FAQ dataset
4. OpenAI Embeddings Guide (official docs, free)
Link: https://platform.openai.com/docs/guides/embeddings
The reference for using OpenAI's text-embedding-3-small and text-embedding-3-large models in code
What to focus on: What a vector is conceptually, why similar text produces similar vectors, how cosine similarity works, the difference between embedding models (OpenAI, HuggingFace sentence-transformers), and what embedding dimension means in practice
Practice: Take 20 sentences on related topics, embed them using OpenAI or sentence-transformers, and write a simple nearest-neighbor search that returns the 3 most similar to a query. This is literally the heart of RAG in miniature
2. Chunking
Your documents are too large to embed as a whole. Chunking is the process of breaking them into smaller pieces before embedding
How you chunk your documents directly affects your system's ability to find relevant information and give accurate answers, even a perfect retrieval system fails if it searches over poorly prepared data
Resources:
1. Weaviate: Chunking Strategies for RAG (free)
Link: https://weaviate.io/blog/chunking-strategies-for-rag
The most practical guide. Covers fixed-size, recursive, and semantic chunking, with clear guidance on when to use each
2. Unstructured: Chunking for RAG Best Practices (free)
Link: https://unstructured.io/blog/chunking-for-rag-best-practices
A technical deep-dive on chunk sizes, overlap, and how the embedding model's context window imposes hard limits
A good starting point for experimentation is a chunk size of around 250 tokens (approximately 1,000 characters), combined with a 10-20% overlap between consecutive chunks to avoid losing context at boundaries
3. LangChain Text Splitters Docs (official, free)
Link: https://python.langchain.com/docs/concepts/text_splitters/
The practical reference for using RecursiveCharacterTextSplitter, MarkdownTextSplitter, and semantic splitters in code
What to focus on: Fixed-size chunking with overlap as your baseline, recursive chunking for structured documents, semantic chunking for better boundary detection, and the core trade-off: chunks that are too large lose retrieval precision; chunks that are too small lose context
Beginner tip: Start with RecursiveCharacterTextSplitter from LangChain with chunk_size=500 and chunk_overlap=50. This is the most sensible default for most documents and gives you a working baseline to improve from
3. Vector Databases
Once you have embeddings, you need somewhere to store and search them efficiently. This is what vector databases are for
The right choice depends on your situation: use Chroma for fast local prototyping, Pinecone for managed turnkey scale, Weaviate for open-source flexibility with strong hybrid search, Qdrant for complex filters and cost-efficient self-hosting, and pgvector if you're already on PostgreSQL and want to avoid adding another system
Resources:
1. Chroma Official Docs (free)
Link: https://docs.trychroma.com/
Chroma is perfect for individual developers and small teams who prioritize development speed and simplicity, it runs in-memory or locally with no infrastructure to manage
2. Pinecone Learning Center (free)
Link: https://www.pinecone.io/learn/
Excellent free tutorials covering vector search concepts, hybrid search, and RAG pipelines. Good provider-agnostic material even if you don't use Pinecone
3. Qdrant Documentation (free)
Link: https://qdrant.tech/documentation/
Best open-source option for production with advanced filtering. Very fast, flexible, and free to self-host
4. pgvector (open source, free)
Link: https://github.com/pgvector/pgvector
If you're building something that already uses PostgreSQL, pgvector adds vector search directly to your existing database with no new infrastructure
What to focus on: Creating a collection, inserting embeddings with metadata, querying by similarity with top_k, and filtering by metadata at query time
You don't need to understand the indexing algorithms (HNSW, IVF) – just understand how to use them
Practice project: Index 50-100 pages from any public documentation (e.g. the Python docs, or a Wikipedia article dump) into Chroma with metadata (source URL, section title). Write a query function that retrieves the 5 most relevant chunks for any question
4. Metadata Filtering
Raw similarity search alone isn't enough for real applications. Metadata filtering lets you constrain retrieval to a relevant subset – by date, source, document type, user, category, or any other attribute you store alongside each chunk
Resources:
1. Pinecone: Metadata Filtering Guide (free)
Link: https://docs.pinecone.io/guides/data/filter-with-metadata
Clear explanation with code examples of filtering vectors by metadata fields before or during similarity search
2. LlamaIndex: Metadata Filters Guide (official docs, free)
Link: https://docs.llamaindex.ai/en/stable/module_guides/querying/node_postprocessors/node_postprocessors/
Explained how to apply filters at query time in LlamaIndex pipelines
What to focus on: Tagging every chunk with relevant metadata at ingestion time (source filename, page number, section, date, category), and using those fields to filter results at query time. This is what makes the difference between a toy demo and a production system where users can ask "only show me results from Q4 2025-Q1 2026 reports"
5. Reranking
Reranking is a technique that adds a semantic boost to the search quality of any keyword or vector search system
After first-stage retrieval returns a candidate set, a reranker re-scores those results based on true contextual relevance to the query – not just vector proximity
The two-stage pattern is: embed and search (fast, approximate) → rerank top-k (slower, more accurate). The result is dramatically better retrieval quality with only a modest latency cost
Resources:
1. Cohere Reranking Docs (official, free)
Link: https://docs.cohere.com/docs/reranking-with-cohere
The best place to start. Covers the full reranking workflow, including semi-structured data like emails and JSON documents. Requires just a single line of code to add to an existing retrieval pipeline
2. LangChain: Cohere Reranker Integration (official docs, free)
Link: https://python.langchain.com/docs/integrations/retrievers/cohere-reranker/
Explained how to wire Cohere reranking into a LangChain retriever using ContextualCompressionRetriever
What to focus on: The two-stage retrieve-then-rerank pattern, the difference between a bi-encoder (used for first-stage embedding search) and a cross-encoder (used for reranking), and the practical latency/quality trade-off of reranking top-20 vs top-5 results
6. Retrieval Quality Issues
Most RAG failures aren't model failures, they're retrieval failures. Understanding the ways retrieval can go wrong is essential for debugging real systems
Common issues to learn:
- Semantic drift: The query embedding doesn't match the relevant chunk embedding even though the information is there. Fix: try query rewriting or HyDE (Hypothetical Document Embeddings)
- Chunk boundary problems: The relevant information is split across two chunks. Fix: increase overlap or use semantic chunking
- Missing metadata context: Chunks are semantically similar to the query but belong to the wrong document, date, or user. Fix: use metadata filtering
- Top-k too small: The right chunk exists but isn't in the top 5 retrieved results. Fix: increase top_k at retrieval and reduce after reranking
Resources:
1. LangChain: Query Transformations (free)
Link: https://python.langchain.com/docs/how_to/#query-analysis
Covers query rewriting, step-back prompting, and HyDE
2. Pinecone: Improving Retrieval Quality (free)
Link: https://www.pinecone.io/learn/retrieval-augmented-generation/#retrieval-quality
Practical walkthrough of common failure modes with fixes
7. Hallucination Reduction
RAG dramatically reduces hallucinations compared to a vanilla LLM, but it doesn't eliminate them
By supplying the model with retrieved facts at runtime, RAG anchors its responses to real sources rather than relying on training data alone, and the model's output can even cite those sources, increasing transparency and trust
But retrieval failures, bad chunks, and conflicting information can still cause the model to make things up
Resources:
1. Zep: Reducing LLM Hallucinations – A Developer's Guide (free)
Link: https://www.getzep.com/ai-agents/reducing-llm-hallucinations/
Practical developer-focused guide covering prompt grounding strategies, chain-of-thought for factual tasks, and output verification patterns
2. Voiceflow: 5 Ways to Reduce LLM Hallucinations (free)
Link: https://www.voiceflow.com/blog/prevent-llm-hallucinations
Good overview of the combined strategy: RAG + chain-of-thought + guardrails together outperform any single approach
What to focus on: Prompting the model to answer only from provided context (and say "I don't know" when the answer isn't there), adding a confidence threshold before surfacing responses, and always validating retrieval quality before blaming the LLM
8. Citations and Grounding
A grounded RAG system doesn't just answer – it tells you where the answer came from. This is critical for user trust and for debugging
Resources:
1. Anthropic: Giving Claude Sources (docs, free)
Link: https://docs.anthropic.com/en/docs/build-with-claude/citations
Explained how to prompt Claude to produce cited responses with source references
2. LangChain: RAG with Sources (free)
Link: https://python.langchain.com/docs/how_to/qa_sources/
Explained how to return source documents alongside answers in a LangChain RAG pipeline
What to focus on: Passing chunk metadata (source filename, page number, URL) into your prompt context, instructing the model to reference sources in its answer, and surfacing those sources in your UI or API response
9. Your RAG Framework: LangChain or LlamaIndex
You don't need to build a RAG pipeline from scratch. Two frameworks dominate the space and are worth knowing:
LlamaIndex is optimized for putting search and indexing first it abstracts ingestion, chunking, embedding, and querying into a few lines of code, letting you build a working prototype in an afternoon
LangChain shines when your application looks more like an orchestration engine – it excels with multi-agent workflows, tool calling, and conditional chains that query multiple LLMs or external APIs before generating an answer
For Month 3, start with LlamaIndex for RAG. Move to LangChain when you hit Month 4's agents work
Resources:
1. LlamaIndex: Introduction to RAG (official docs, free)
Link: https://developers.llamaindex.ai/python/framework/understanding/rag/
Covers the five key stages of RAG: loading, indexing, storing, querying, and evaluating – and how LlamaIndex handles each one
2. LlamaIndex Starter Tutorial (official docs, free)
Link: https://developers.llamaindex.ai/python/framework/getting_started/starter_example/
The official quickstart. Build a working RAG system in under 30 lines
3. LangChain: Build a RAG Agent (official docs, free)
Link: https://docs.langchain.com/oss/python/langchain/rag
Shows how to build a Q&A app over unstructured text using a RAG agent, from a 40-line minimal version up to a full retrieval pipeline with reranking
Practice project: Build a "chat with your docs" app. Ingest 10–20 PDF or text files (your own notes, a textbook chapter, product documentation – anything). Build a FastAPI endpoint that accepts a question, retrieves the top 5 most relevant chunks with reranking, and returns a cited answer from Claude or OpenAI. This is a real portfolio piece
Month 3 Milestone
By the end of this month you should be able to:
- Explain what an embedding is and why similar text produces similar vectors
- Chunk any document intelligently using appropriate strategies
- Store and query embeddings in a vector database with metadata filtering
- Add a reranking step to improve retrieval quality
- Debug common retrieval failures systematically
- Build a complete end-to-end RAG pipeline using LlamaIndex or LangChain that ingests documents, retrieves relevant chunks, and returns grounded, cited answers
⏩------------------------------------------------------------------------⏪
Month 4: Agents, Tools, Workflows, and Evals
Your goal this month: Build AI systems that can take sequences of actions autonomously, wire together multi-step workflows, and critically evaluate whether they're working
By the end you should be able to build a real agent from scratch, understand when agents are the wrong choice, and measure the performance of anything you build
This is where AI engineering gets genuinely complex. The skills from Month 4 are what separate junior AI engineers from people who can own an entire AI feature end to end
1. Agent Loops
An agent is not magic, it's a surprisingly simple pattern
Think of agents as goal-driven systems that constantly cycle through observing, reasoning, and acting
This loop allows them to tackle tasks that go beyond simple questions and answers, moving into real automation, tool usage, and adapting on the fly
The "thinking" happens in the prompt, the "branching" is when the agent chooses between available tools, and the "doing" happens when we call external functions. Everything else is just plumbing
Once you internalize this, even the most complex agent frameworks become readable
Resources:
1. Anthropic: Building Effective Agents (official, free)
Link: https://www.anthropic.com/research/building-effective-agents
The single best piece of writing on agents in production. Read this before writing a single line of agent code
2. OpenAI: A Practical Guide to Building Agents (official PDF, free)
Link: https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
OpenAI's complementary guide covering agent patterns, guardrails, and safety patterns in production
3. freeCodeCamp: The Open Source LLM Agent Handbook (free)
Link: https://www.freecodecamp.org/news/the-open-source-llm-agent-handbook/
A comprehensive practical guide covering the agent loop, LangGraph, CrewAI, planning, memory, and tool use. Good for getting hands-on quickly
4. LangChain Academy: Introduction to LangGraph (free course)
Link: https://academy.langchain.com/courses/intro-to-langgraph
The official free course for LangGraph, the most widely used agent orchestration framework. Covers state, memory, human-in-the-loop, and more
What to focus on: The perceive → plan → act → observe cycle, how the agent loop terminates, what happens when a tool call fails inside a loop, and why agents are just while loops with an LLM making the branching decisions
Practice: Build an agent from scratch without any framework – just the OpenAI or Anthropic API directly. Give it 3 tools, a goal, and a loop. This is the most valuable thing you can do to actually understand what frameworks are abstracting
2. Tool Selection
Writing good tools is half the job. The descriptions for your tools and their parameters are the user manual for the LLM. If the manual is vague, the LLM will misuse the tool. Be painfully, relentlessly explicit
A poorly described tool will be called wrong, called at the wrong time, or ignored entirely. A well-described tool behaves predictably and gets selected correctly across a wide range of inputs
Resources:
1. OpenAI: Function Calling Best Practices (official docs, free)
Link: https://platform.openai.com/docs/guides/function-calling/best-practices
The canonical guide to writing tool descriptions that work reliably, with naming conventions and parameter documentation patterns
2. Anthropic: Tool Use Best Practices (official docs, free)
Link: https://docs.anthropic.com/en/docs/build-with-claude/tool-use/implement-tool-use#best-practices-for-tool-definitions
Anthropic's equivalent. Pay particular attention to the guidance on when to let the model choose vs forcing a specific tool
What to focus on: Writing tool names that are self-explanatory verbs, writing descriptions that explain when to call the tool (not just what it does), keeping parameters minimal and well-typed, and designing tools with the LLM as the caller
Beginner tip: Test every tool description by asking yourself: "If I had no documentation and only this JSON schema, would I know exactly when and how to call this?" If not, it needs more work
3. State Management
In LangGraph, state is a shared memory object that flows through the graph. It stores all the relevant information – messages, variables, intermediate results, and decision history – and is managed automatically throughout execution
Understanding state is the key to building agents that can handle multi-turn tasks, recover from failures, and hand off between components cleanly
Resources:
1. LangGraph Official Docs: State Management (free)
Link: https://langchain-ai.github.io/langgraph/concepts/low_level/#state
The definitive reference. Covers state schemas, reducers, and how state flows through nodes and edges
2. DataCamp: LangGraph Agents Tutorial (free)
Link: https://www.datacamp.com/tutorial/langgraph-agents
Covers the fundamentals of state, nodes, and edges with hands-on code, building up to stateful agents with persistent memory across sessions
3. Real Python: LangGraph in Python (free)
Link: https://realpython.com/langgraph-python/
A thorough tutorial building a complete stateful LangGraph agent, with detailed explanations of the state graph and conditional edges
What to focus on: Defining state schemas with TypedDict, how reducers work for merging parallel updates, the difference between in-memory state and persisted checkpointing, and how human-in-the-loop pauses work by inspecting and modifying state mid-execution
4. Retries and Failure Handling in Agents
Agents fail differently to regular LLM calls. A bad tool call mid-loop can corrupt state, cause infinite loops, or silently produce wrong answers. You need explicit strategies for all of these
Resources:
1. LangGraph: Error Handling and Retries (official docs, free)
Link: https://langchain-ai.github.io/langgraph/how-tos/autofill-tool-errors/
Explained how to add automatic error handling and retry logic at the tool node level in LangGraph
2. OpenAI Practical Agents Guide: Guardrails section (free)
Link: https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
Covers guardrails as a layered defense, combining LLM-based checks, rules-based filters like regex, and moderation APIs to vet both inputs and outputs at every stage of the agent loop
What to focus on: Maximum iteration limits to prevent infinite loops, per-tool retry with exponential backoff, catching and logging exceptions at the tool execution layer without crashing the agent, and when to surface a failure to the user vs retry silently
5. When NOT to Use Agents
This is one of the most important and most overlooked skills in AI engineering. Agents are exciting but they're also slow, expensive, unpredictable, and hard to debug. Knowing when to reach for something simpler is a sign of good judgment
Anthropic recommends finding the simplest solution possible and only increasing complexity when needed – this might mean not building agentic systems at all
Agentic systems trade latency and cost for better task performance, and you should carefully consider when this tradeoff makes sense
The decision framework is:
- Use a single LLM call if the task can be solved in one prompt with the right context
- Use a workflow if the steps are fixed and predictable
- Use an agent only if the number of steps is genuinely unpredictable and requires dynamic decision-making
Resources:
1. Anthropic: Building effective agents, when to use agents (official, free)
Link: https://www.anthropic.com/research/building-effective-agents
The most authoritative answer to this question, straight from the team that builds the models
2. Simon Willison: Designing Agentic Loops (free)
Link: https://simonwillison.net/2025/Sep/30/designing-agentic-loops/
A senior engineer's practical take on when agent complexity is justified and how to think about agentic loop design
What to memorize: A chain of 3 fixed LLM calls will always be faster, cheaper, and more debuggable than an agent that could make 3 calls. Reserve agents for genuinely open-ended tasks
6. Multi-Step Workflows
Between "single prompt" and "full agent" there is a vast productive middle ground: workflows. Workflows are ideal when the task can be cleanly decomposed into fixed subtasks – trading off latency for higher accuracy by making each individual LLM call an easier, more focused task
Common patterns include prompt chaining (output of one call is input to the next), routing (classify input and send to specialized handlers), parallelization (run multiple calls simultaneously and aggregate), and orchestrator-subagent (one LLM plans, others execute)
Resources:
1. Anthropic: Workflow Patterns (official, free)
Link: https://www.anthropic.com/research/building-effective-agents#workflow-patterns
Covers all the main patterns with diagrams and code examples. The parallelization and orchestration sections are particularly useful
2. LangGraph: Multi-Agent Networks (official docs, free)
Link: https://langchain-ai.github.io/langgraph/concepts/multi_agent/
Explained how to wire multiple agents together as a network, with supervisor and handoff patterns
Practice project: Build a 3-step content pipeline:
Step 1 – an LLM extracts key facts from an article
Step 2 – another LLM call uses those facts to generate a tweet, a LinkedIn post, and a summary in parallel
Step 3 – a final LLM call scores all three for quality and picks the best
No agent required, pure workflow
7. Evaluation Harnesses
Evals are how you know if your AI system is actually working — not just on the examples you tested by hand, but systematically across hundreds of inputs
AI agents are powerful but complex to deploy because their probabilistic, multi-step behavior introduces many points of failure
Different parts of an agent – the LLMs, tools, retrievers, and workflows – each need their own evaluation approach
Resources:
1. DeepEval (open source, free)
Link: https://deepeval.com/docs/getting-started
An open-source LLM evaluation framework inspired by pytest. Write test cases with inputs and expected outputs, run them with 50+ built-in metrics including hallucination, answer relevancy, and factual consistency, and catch regressions between versions
2. Promptfoo (open source, free)
Link: https://github.com/promptfoo/promptfoo
A CLI and library for testing and evaluating LLM apps with automated test suites. Supports side-by-side comparison of multiple prompts across multiple models, CI/CD integration, and red teaming for security vulnerabilities
3. LangSmith (free tier)
Link: https://smith.langchain.com/
Tracing, debugging, and evaluation for LangChain and LangGraph apps. The free tier is generous and the tracing UI makes debugging agent loops dramatically easier
4. Ragas (open source, free)
Link: https://docs.ragas.io/
Specialized evaluation framework for RAG pipelines. Measures faithfulness, answer relevancy, context precision, and context recall. Essential if you're evaluating RAG systems from Month 3
What to focus on: Building a golden test set of 20-50 representative inputs with expected outputs or rubrics, writing eval functions that score outputs deterministically (string match, JSON schema validation) or with LLM-as-judge, and running evals automatically when you change a prompt or swap a model
Critical mindset: Evals are not optional polish. Every prompt change, model swap, or retrieval tweak you make without running evals is a gamble. The engineers who ship reliable AI products run evals constantly
8. Task Success Metrics
Beyond automated evals, you need metrics that tell you whether your agent is accomplishing its actual goal
Resources:
1. Hamel Husain: Your AI Product Needs Evals (free)
Link: https://hamel.dev/blog/posts/evals/
One of the most practical pieces written on building eval pipelines for real production AI systems, by someone who has done it at scale
2. OpenAI Evals Framework (open source, free)
Link: https://github.com/openai/evals
OpenAI's own evaluation framework, with a large library of community-contributed eval patterns you can adapt
What to focus on: The difference between process metrics (did the agent call the right tool?) and outcome metrics (did the task succeed?), defining clear success criteria before building anything, and using LLM-as-judge for evaluation of outputs that resist exact matching (like long-form answers or multi-step reasoning traces)
Practice project: Take your RAG pipeline from Month 3 and build a proper eval harness around it. Create 30 question-answer pairs from your documents, run them through your pipeline, and score each answer for relevance, faithfulness, and completeness using DeepEval. Then change one thing (chunk size, model, top-k) and re-run to see if it improved
Month 4 Milestone
By the end of this month you should be able to:
- Explain what an agent loop is and implement one from scratch without a framework
- Write tool descriptions that get selected correctly and reliably
- Manage agent state properly using LangGraph or equivalent
- Handle failures inside agent loops without crashing
- Decide confidently whether a task needs an agent, a workflow, or a single prompt
- Build multi-step workflows that chain, route, and parallelize LLM calls
- Write automated evals that catch regressions when you change prompts or models
- Define and measure task success metrics for any AI system you build
⏩------------------------------------------------------------------------⏪
Month 5: Deployment, Product Thinking, and Reliability
Your goal this month: Take everything you've built and make it production-ready
By the end you should be able to deploy an AI app that handles real users, real traffic, and real failures without falling apart at 2am
This is where most AI engineers stall. They can build a great demo but can't ship a product that survives contact with the real world
The skills here are what companies actually pay for: reliability, security, cost control, and the ability to keep things running when something inevitably breaks
1. FastAPI Production Patterns
You already know how to build a FastAPI app from Month 1. Now you need to make it survive production traffic
The difference between dev and prod is brutal. A single uvicorn process with --reload is fine for building. In production it becomes the bottleneck the moment real traffic arrives
What you actually need: multi-worker ASGI configuration, proper error handling middleware, health check endpoints, and CORS policies
Resources:
1. FastAPI Deployment Docs (official, free)
Link: https://fastapi.tiangolo.com/deployment/
The official guide covering Uvicorn workers, Gunicorn, and Docker deployment. Start here before anything else
2. FastAPI Production Deployment Guide (CYS Docs, free)
Link: https://craftyourstartup.com/cys-docs/fastapi-production-deployment/
Comprehensive production patterns: Gunicorn config, Nginx reverse proxy, health checks, rate limiting. Includes real config files you can adapt
3. FastAPI Best Practices for Production (FastLaunchAPI, free)
Link: https://fastlaunchapi.dev/blog/fastapi-best-practices-production-2026
Covers async database pooling, Redis caching, JWT auth, and background tasks. Production-tested patterns from a real template used by 100+ developers
What to focus on: Running Gunicorn with Uvicorn workers (not bare Uvicorn), setting up health check endpoints, adding CORS middleware, implementing proper async database sessions, and using background tasks for anything that doesn't need to block the response
2. Docker
Docker is how you stop saying "it works on my machine" and start shipping consistent deployments
If you're building AI apps, Docker solves dependency conflicts, ensures consistent environments, and makes scaling straightforward
You don't need to become a Docker expert. You need to be able to containerize your FastAPI + LLM app and deploy it anywhere
Resources:
1. Docker Official Getting Started Guide (free)
Link: https://docs.docker.com/get-started/
The canonical starting point. Covers images, containers, Dockerfiles, and Docker Compose
2. freeCodeCamp: How to Build and Deploy a Multi-Agent AI System with Python and Docker (free)
Link: https://www.freecodecamp.org/news/build-and-deploy-multi-agent-ai-with-python-and-docker/
Practical end-to-end tutorial building a real multi-agent pipeline with Docker Compose. Covers separation of concerns, cron scheduling, and security considerations
3. DataCamp: Deploy LLM Applications Using Docker (free)
Link: https://www.datacamp.com/tutorial/deploy-llm-applications-using-docker
Step-by-step guide specifically for LLM apps with RAG pipelines. Covers Dockerfile creation, environment management, and deployment
4. Docker Containerization for LLM Apps (ApXML, free)
Link: https://apxml.com/courses/python-llm-workflows/chapter-10-deployment-operational-practices/containerization-docker-llm-apps
Covers base image selection, dependency management, multi-stage builds, and Docker Compose for multi-service LLM deployments
What to focus on: Writing a Dockerfile for a Python/FastAPI app, using multi-stage builds to keep images small, Docker Compose for multi-service setups (app + database + Redis), environment variables for secrets, and .dockerignore to avoid leaking sensitive files
Practice project: Containerize your RAG app from Month 3. Create a docker-compose.yml that runs your FastAPI app, a vector database (Chroma or Qdrant), and Redis for caching. Deploy it so that docker compose up starts everything
3. Background Jobs and Queues
LLM calls are slow. If a user asks your app to process a document and you make them wait 30 seconds for a response, they'll leave
Background jobs let you accept the request immediately, process it async, and notify the user when it's done
Resources:
1. Celery Official Getting Started Guide (free)
Link: https://docs.celeryq.dev/en/stable/getting-started/introduction.html
The standard Python task queue. Covers basic setup, task definition, and worker management
2. FastAPI Background Tasks Docs (official, free)
Link: https://fastapi.tiangolo.com/tutorial/background-tasks/
Built-in lightweight background tasks for simple use cases. Use this for quick fire-and-forget tasks, Celery for anything heavier
What to focus on: Understanding when to use FastAPI's built-in BackgroundTasks vs a proper task queue like Celery, setting up Redis as a message broker, handling task failures and retries, and returning job status to the user
4. Auth and API Key Security
If your AI app has an API, it needs authentication. Without it, anyone can use your endpoints, burn through your LLM credits, and you'll wake up to a $5,000 bill
Resources:
1. FastAPI Security Docs (official, free)
Link: https://fastapi.tiangolo.com/tutorial/security/
Covers OAuth2, JWT tokens, API keys, and dependency-based auth patterns. The official reference, work through the full tutorial
2. OWASP API Security Top 10 (free)
Link: https://owasp.org/API-Security/
The authoritative list of API security risks. Understand broken authentication, injection, and mass assignment before shipping anything
3. Auth0: API Auth Best Practices (free)
Link: https://auth0.com/docs/get-started/authentication-and-authorization
Practical guide to implementing authentication and authorization in APIs
What to focus on: JWT tokens for user auth, API key management for service-to-service communication, rate limiting per user/key, never storing secrets in code (use environment variables), and understanding the difference between authentication (who are you) and authorization (what can you do)
5. Logging and Observability
In production, if you can't see what's happening, you can't fix what's broken
LLM apps have a unique challenge: the model can return a 200 status code and still produce a useless or hallucinated answer. Traditional monitoring doesn't catch this. You need LLM-specific observability
Resources:
1. Langfuse (open source, free tier)
Link: https://langfuse.com/docs/observability/overview
Open-source LLM observability platform. Traces every request: prompt sent, response received, token usage, latency, tool calls. Supports prompt versioning, evaluation, and LLM-as-judge scoring. Integrates with OpenAI, Anthropic, LangChain, LlamaIndex
2. LangSmith (free tier)
Link: https://smith.langchain.com/
From the LangChain team. If you're using LangChain/LangGraph, setup is one environment variable. Tracing, debugging, monitoring dashboards, and online evals. The free tier is generous for development and small-scale production
3. Python Structlog (free)
Link: https://www.structlog.org/
Structured logging for Python. Produces JSON logs that are actually searchable and parseable. Far better than print() or basic logging for production apps
What to focus on: Tracing every LLM call (input prompt, output, tokens, latency, cost), structured logging with JSON output, setting up dashboards that show request volume, error rates, and cost per day, and alerting when something breaks or costs spike
6. Prompt and Version Management
In production, your prompts are code. They need version control, testing, and rollback ability
Changing a prompt in production without tracking what you changed is how you break things and can't figure out why
Resources:
1. Langfuse Prompt Management (free)
Link: https://langfuse.com/docs/prompts
Centralized prompt versioning with a built-in playground for testing. Version control your prompts separately from your application code. Deploy prompt changes without redeploying your app
2. Anthropic Prompt Management Best Practices (free)
Link: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
Best practices for organizing, iterating, and managing prompts at scale
What to focus on: Storing prompts outside your application code, versioning every prompt change, A/B testing prompt variants in production, and having a rollback strategy when a new prompt performs worse
7. Cost Monitoring and Rate Limits
LLM APIs charge per token. Without cost controls, a traffic spike or a bug in your prompt can burn through hundreds of dollars in minutes
Resources:
1. OpenAI Usage Dashboard (official)
Link: https://platform.openai.com/usage
Track spending by model, by day, and set usage limits
2. Anthropic Usage Dashboard (official)
Link: https://console.anthropic.com/Same for Claude API usage
3. Helicone (free tier)
Link: https://www.helicone.ai/
Proxy-based observability that captures every LLM call with automatic cost tracking. One line of code to set up: just change your base URL
4. LiteLLM (open source, free)
Link: https://github.com/BerriAI/litellm
Unified interface for 100+ LLM providers. Includes budget management, rate limiting, and spend tracking across providers
What to focus on: Setting hard spending limits per day/month, implementing per-user rate limits in your API, using cheaper models for simple tasks (don't use GPT-4/Opus for everything), caching repeated identical requests with Redis, and monitoring cost per request to catch expensive prompts early
8. Caching
If 20% of your users ask similar questions, you're paying for the same LLM call 20 times
Caching is the simplest way to reduce costs and latency simultaneously
Resources:
1. Redis Official Docs (free)
Link: https://redis.io/docs/
The standard in-memory data store. Fast, simple, and works perfectly for LLM response caching
2. GPTCache (open source, free)
Link: https://github.com/zilliztech/GPTCache
Semantic caching specifically designed for LLM applications. Uses embedding similarity to find cached responses for semantically similar (not just identical) queries
What to focus on: Exact-match caching for identical prompts, semantic caching for similar queries, cache invalidation strategies (TTL-based is simplest), and measuring cache hit rates to understand real cost savings
Month 5 Milestone
By the end of this month you should be able to:
- Deploy a FastAPI + LLM app in Docker with proper production configuration
- Handle long-running tasks with background jobs and queues
- Secure your API with auth, rate limits, and API key management
- Trace and debug LLM calls using Langfuse or LangSmith
- Manage prompts with version control and rollback capability
- Monitor costs in real time and set spending limits
- Cache LLM responses to reduce latency and cost
⏩------------------------------------------------------------------------⏪
Month 6: Specialize and Become Hireable
These knowledge and skills you gained can be applied in three directions (for sure it's only those which I see)
You need to choose one of them and focus on practice
Although everything mentioned above is also best learned purely through practice
Direction 1: AI Product Engineer
Best if you want startup jobs fast
This is the most common path. You build AI-powered products that real users interact with
You already have most of the skills from Months 1-5. Now go deeper on the product side
Focus on:
- LLM apps
- RAG
- agents
- deployment
- product UX
What to learn this month:
1. End-to-End Product Building
Stop building tutorials. Build products people can use
Resources:
1. Vercel AI SDK (free)
Link: https://sdk.vercel.ai/docs
The fastest way to build AI-powered UIs with streaming support. React, Next.js, and Vue integrations with built-in streaming UI components
2. Streamlit (free)
Link: https://docs.streamlit.io/
Build data apps and AI demos in pure Python. Ideal for internal tools and MVPs, not production-scale UIs
3. Gradio (free)
Link: https://www.gradio.app/docsQuick ML/AI interfaces with minimal code. Especially good for demoing models and building prototypes
What to focus on: Building 2-3 complete projects this month that you can demo. A "chat with your docs" app, an AI-powered internal tool, or an agent that automates a real workflow. Ship them. Put them on GitHub. Deploy them somewhere people can try them
2. Product UX for AI
AI products fail when the UX doesn't account for the model's limitations
Resources:
1. Google: People + AI Guidebook (free)
Link: https://pair.withgoogle.com/guidebook/
The best resource on designing human-AI interaction. Covers setting expectations, handling errors, and building trust
2. Nielsen Norman Group: AI UX Guidelines (free)
Link: https://www.nngroup.com/topic/artificial-intelligence/
Research-backed guidelines for AI interfaces
What to focus on: How to handle loading states with streaming, what to show when the model is wrong, how to let users give feedback, and designing for the fact that AI output is probabilistic – it will sometimes be wrong
Direction 2: Applied ML / LLM Engineer
Best if you want deeper technical roles
This direction is for engineers who want to go beyond API calls and understand what's happening under the hood
Focus on:
- fine-tuning
- when to fine-tune vs prompt
- evaluation
- inference optimization
- open-source models
- training pipelines
What to learn this month:
1. When to Fine-tune vs Prompt Engineer
The most important decision in applied ML: do you need to change the model, or just change how you talk to it?
Resources:
1. Google ML Crash Course: Fine-tuning, Distillation, and Prompt Engineering (free)
Link: https://developers.google.com/machine-learning/crash-course/llm/tuning
The clearest explanation of the three approaches and when to use each
2. Codecademy: Prompt Engineering vs Fine-Tuning (free)
Link: https://www.codecademy.com/article/prompt-engineering-vs-fine-tuning
Practical decision framework with clear use cases for each approach
3. IBM: RAG vs Fine-Tuning vs Prompt Engineering (free)
Link: https://www.ibm.com/think/topics/rag-vs-fine-tuning-vs-prompt-engineering
Covers the complete decision space including when to combine approaches
Decision framework to memorize: Start with prompt engineering (cheapest, fastest) Add RAG if the model needs access to specific data Fine-tune only when prompting + RAG can't achieve the required quality, consistency, or latency
2. Fine-tuning in Practice
When you do need to fine-tune, here's how
Resources:
1. OpenAI Fine-tuning Guide (official, free)
Link: https://platform.openai.com/docs/guides/fine-tuning
The easiest way to start fine-tuning. Upload a JSONL dataset, run a job, get a custom model. Good for learning the workflow even if you later move to open-source models
2. HuggingFace Transformers Fine-tuning Tutorial (free)
Link: https://huggingface.co/docs/transformers/training
The standard library for working with open-source models. Covers training, evaluation, and model saving
3. Unsloth (open source, free)
Link: https://github.com/unslothai/unsloth
2x faster fine-tuning with 80% less memory. Supports LoRA and QLoRA out of the box. The fastest path to fine-tuning open-source models on consumer hardware
4. LLaMA-Factory (open source, free)
Link: https://github.com/hiyouga/LLaMA-Factory
Unified framework for fine-tuning 100+ LLMs. Includes a web UI for no-code fine-tuning. Supports LoRA, QLoRA, full fine-tuning, RLHF, and DPO
What to focus on: Preparing training datasets (JSONL format), understanding LoRA and QLoRA (parameter-efficient fine-tuning), running a fine-tuning job on OpenAI or with HuggingFace, evaluating the fine-tuned model against the base model, and knowing when fine-tuning isn't worth the cost
3. Open-Source Models
Not everything needs to go through OpenAI or Anthropic. Open-source models give you full control, no API costs, and the ability to run locally
Resources:
1. Ollama (free)
Link: https://ollama.ai/
Run open-source LLMs locally with one command. Supports Llama, Mistral, Gemma, and dozens of others. The fastest way to experiment with open-source models
2. HuggingFace Model Hub (free)
Link: https://huggingface.co/models
The largest repository of open-source models. Browse, download, and deploy models for any task
3. vLLM (open source, free)
Link: https://github.com/vllm-project/vllm
High-throughput LLM inference engine. 2-4x faster than naive HuggingFace serving. The standard for production serving of open-source models
What to focus on: Running models locally with Ollama for testing, understanding quantization (GGUF, GPTQ, AWQ) and why it matters for deployment, benchmarking open-source models against API models for your use case, and serving models in production with vLLM
4. Inference Optimization
Making models run faster and cheaper in production
Resources:
1. HuggingFace: Optimizing LLM Inference (free)
Link: https://huggingface.co/docs/transformers/llm_optims
Covers KV-cache optimization, quantization, and batching strategies
2. NVIDIA TensorRT-LLM (free)
Link: https://github.com/NVIDIA/TensorRT-LLM
Maximum inference performance on NVIDIA GPUs. Used by most production LLM serving at scale
What to focus on: Batching strategies for throughput, quantization for reducing memory and cost, KV-cache optimization for faster generation, and choosing the right hardware for your inference workload
Direction 3: AI Automation Engineer
Best if you want to build for businesses immediately
This direction is about automating real business workflows with AI. Less about building products, more about solving operational problems
Focus on:
- workflow orchestration
- business process automation
- multi-tool systems
- CRM, docs, email, support, ops use cases
What to learn this month:
1. Workflow Orchestration
Real business automation is almost never one LLM call. It's chains of actions across multiple systems
Resources:
1. n8n (open source, free to self-host)
Link: https://docs.n8n.io/
Visual workflow automation with AI nodes. Connect LLMs to 400+ integrations (Slack, Gmail, Notion, CRMs, etc.). The best no-code/low-code option for AI automation
2. LangGraph: Multi-Agent Workflows (free)
Link: https://langchain-ai.github.io/langgraph/concepts/multi_agent/
Code-first orchestration for complex multi-agent systems. When n8n isn't enough and you need full programmatic control
3. Temporal (open source, free)
Link: https://docs.temporal.io/
Durable workflow engine for long-running, fault-tolerant processes. When your automation needs to survive crashes, retries, and timeouts
What to focus on: Designing workflows that handle failures gracefully, connecting AI to real business tools (email, CRM, databases, spreadsheets), building human-in-the-loop approval steps, and logging every automated action for audit trails
2. Business Process Automation
The money in AI automation is in solving specific, expensive business problems
Resources:
1. Zapier AI Actions (free tier)
Link: https://zapier.com/ai
Connect AI to 6,000+ apps without code. Good for prototyping automations before building custom solutions
2. Make (Integromat) (free tier)
Link: https://www.make.com/
Visual automation platform with advanced logic and AI integrations. More powerful than Zapier for complex workflows
What to focus on: Identifying the highest-ROI automation targets (usually tasks that are repetitive, time-consuming, and rules-based), building automations that augment humans rather than replace them, and measuring the actual time and money saved
3. CRM, Docs, Email, Support Automation
The most common and most valuable AI automation use cases
Resources:
1. OpenAI Cookbook: AI-Powered Email Processing (free)
Link: https://github.com/openai/openai-cookbook
Patterns for classifying, routing, and responding to emails with AI
2. LangChain: Document Processing Pipelines (free)
Link: https://python.langchain.com/docs/how_to/#document-loaders
Ingesting and processing documents from 80+ sources
What to focus on: Building an AI-powered email classifier and auto-responder, creating a document processing pipeline that extracts structured data, building a support chatbot that uses RAG over your knowledge base, and integrating AI into existing CRM workflows (HubSpot, Salesforce, etc.)
Practice project for Direction 3: Build an end-to-end lead qualification system. It should:
Scrape or import leads from a source (CSV, API, or form)
Use an LLM to research each lead (company info, fit assessment)
Score and rank leads based on your ICP
Draft personalized outreach messages
Log everything to a spreadsheet or CRM This is a real, sellable automation that businesses actually pay for
⏩------------------------------------------------------------------------⏪
CONCLUSION
What you can expect after these 6 months???
I'm going to be honest with you, without some money's mountains
This roadmap will not make you a senior AI engineer in 6 months
But it will make you someone who can build, ship, and deploy real AI systems that solve real problems
And right now, that is exactly what the market is paying for
The demand for AI engineers is not slowing down. Job postings grew 25% year-over-year
PwC found a 56% wage premium for roles that require AI skills vs the same roles without
Only 1% of companies are considered "AI mature" which means 99% still need help. The US Bureau of Labor Statistics projects 26% job growth through 2034
These are not hype numbers. That's the real numbers based on analytics (took from Claude kek)
If you go full-time in the US:
Junior AI engineers start at $90,000-$130,000
Mid-level (3-5 years) sits at $155,000-$200,000
Senior roles go $195,000-$350,000+
According to Glassdoor (March 2026), the average is $184,757
The mid-level band is growing the fastest at 9.2% year-over-year because companies desperately need people who can ship production AI without constant supervision
If freelance is more your thing:
AI agent development goes for $175-$300/hour
RAG implementation $150-$250/hour
LLM integration $125-$200/hour
One developer on Reddit built a document summarization tool for a legal firm in two weeks and made $8,000. A freelancer billing 25 hours/week at $150/hour pulls $195,000/year
And if you go the consulting route, which is what I talked about in my earlier post, you can charge:
$300-$5,000 to set up an AI agent for a business
$500-$2,000/month for AI content management
$1,000-$4,000 to automate customer support
$500-$2,000 for cold outreach setup
The service spectrum is even wider but once you master the skills from this roadmap, you are already a demanded specialist in 2026
These are real numbers from real people doing real work
Now here is what I actually want you to take away from all of this:
Pick one project from each month and build it. Not read about it. Not watch a tutorial. Build it, break it, fix it, deploy it, put it on GitHub. The engineers who get hired are the ones who show what they've built, not what they've studied
Start sharing what you learn. Write about it on X, LinkedIn, anywhere. Teaching is the fastest way to learn and it builds your reputation at the same time. The best opportunities I've seen come from people who were visible, not from people who applied to 500 job listings
And please don't wait until you feel ready. You will never feel ready. The gap between "I'm learning" and "I'm building" is where most people get stuck forever
Start applying, start freelancing, start offering services the moment you have working projects. Even if they're not perfect. The market doesn't reward perfection. It rewards people who can ship
6 months is enough to change everything if you actually put in the work
And I really believe each of you reading this can do it
Just never stop building and never stop learning
Hope this was useful for you my fam ❤️




