The AI-Ready Data Engineer
Chapter 5: Resources and Milestones
Curated AI resources for data engineers: top courses from DeepLearning.AI and Fast.ai, essential GitHub repos, YouTube channels, communities, documentation, and progress milestones to track your growth.
Courses worth your time
- Building and Evaluating Advanced RAG — DeepLearning.AI
- LangChain for LLM Application Development
- Functions, Tools and Agents with LangChain
- Knowledge Graphs for RAG
- Preprocessing Unstructured Data for LLM Applications
- DeepLearning.AI — Andrew Ng's team, consistently excellent
- Fast.ai — Free and better than most paid courses
- Complete Agentic AI Bootcamp — Udemy, solid content
- Coursera AI courses — IBM and Stanford courses are legit
- DataCamp — Good for beginners, nice UI
- LangChain Academy — Free and directly from the source
- Hugging Face AI Agents Course — Free, certified, covers smolagents, LlamaIndex, and LangGraph
- Hugging Face LLM Course — Free, covers Transformers, fine-tuning, and reasoning models
GitHub repos to star immediately
- AltimateAI/altimate-code — Open-source agentic data engineering harness
- artidoro/qlora — Efficient fine-tuning
- axolotl-ai-cloud/axolotl — Fine-tuning made easy
- confident-ai/deepeval — LLM testing framework
- crewAIInc/crewAI — Multi-agent orchestration
- dair-ai/Prompt-Engineering-Guide — Comprehensive prompting
- explodinggradients/ragas — RAG evaluation
- langchain-ai/langgraph — Stateful agents
- langchain-ai/rag-from-scratch — RAG fundamentals
- microsoft/AI-For-Beginners — Surprisingly comprehensive
- modelcontextprotocol/servers — MCP integrations
- NirDiamant/GenAI_Agents — Production agent examples
- NirDiamant/RAG_Techniques — Every RAG pattern you need
- openai/evals — Evaluation framework
- openai/openai-agents-python — OpenAI Agents SDK
- promptfoo/promptfoo — Prompt testing and red-teaming
- stanfordnlp/dspy — Automated prompt engineering
- unslothai/unsloth — Fast, memory-efficient fine-tuning
YouTube channels that don't waste your time
- Andrej Karpathy — Ex-OpenAI, pure technical content
- Sam Witteveen — LangChain specialist, great energy
- Matthew Berman — Open source focus, daily videos
- Yannic Kilcher — Best paper explanations
- Krish Naik — Practical implementations
- DeepLearning.AI — Official channel with free courses
- FreeCodeCamp — Long-form tutorials
- 3Blue1Brown — Best conceptual explanation of how neural networks actually work
Communities where real learning happens
- DataTalks.Club Slack — 13k+ members, super active
- r/LocalLLaMA — Where the open source community lives
- r/MachineLearning — Academic discussions
- Hugging Face Discord — The most active open-source ML community; directly tied to the tools you're using
- Twitter/X — Follow @karpathy, @sama, @ylecun, @emollick
Additional documentation and guides
- OpenAI Cookbook — Practical examples
- Anthropic Claude docs — Excellent MCP documentation
- Altimate Code docs — Agentic data engineering harness
- Hugging Face documentation — Models and datasets
- Google AI documentation — Gemini and more
- AWS AI/ML resources — Comprehensive guides
- Microsoft AI docs — Azure AI services
- Towards Data Science — Community articles
- Ben's Bites — Best daily AI newsletter, period (according to Anand!)
- The Rundown AI — Good alternative to Ben's
- Chip Huyen's blog — Production ML wisdom
- Papers with Code — Implementation paradise
- The Gradient — Thoughtful long-form content
- Latent Space — AI engineering focused
- Sebastian Raschka — Best paper summaries
- Alpha Signal — Daily digest of top repos, papers, and models; strong signal-to-noise for staying current
After a weekend
- You've built a working RAG app
- You understand agents and can build basic ones
- You can contribute meaningfully to AI discussions
- You know the landscape and key concepts
After a month
- You can architect AI systems
- You understand multiple frameworks well
- You can evaluate and debug AI applications effectively
After three months
- You're comfortable leading AI initiatives
- You're contributing to open source projects
- You understand production deployment deeply
- You can design and implement complex AI systems