This isn't a full-time position. We're looking for trusted collaborators we activate on a per-project basis—when we have work that fits your profile and your schedule. Collaboration starts with one project, and if the first one goes well, the next ones come naturally.
Realistic scenario: the first call is 30 minutes. If we click, the second call is technical—we talk about your past AI systems in production, how you handle hallucinations, what your model calls cost. If we call you for a real project, we start with a smaller scope (2–4 weeks) so both sides can see how the collaboration works before bigger commitments.
The work varies from project to project. Typical tasks include:
- RAG systems over client data—from extraction and chunking to retrieval strategy and evaluation, with the right priority on "no hallucinations" instead of "high accuracy on demo queries"
- Document classification and processing—invoices, contracts, emails, applications; OCR + LLM pipelines with human-in-the-loop strategy for edge cases
- AI assistants for customer support—from prompt engineering to tool orchestration, with escalation to a human when the model can't deliver
- Technical estimates for new projects—you help us tell the client how much something costs, how long it takes, and whether AI is even the right solution
- API call cost optimization—caching, model routing, batch processing, everything that separates a PoC from a system that can pay for itself in production
We don't do "we'll train our own LLM from scratch" or "add a GPT-4 button to the admin panel because the client heard about AI." Projects are practical, with measurable results in 30 days.
- 2+ years of work on AI/ML systems in production—not just Kaggle competitions, not just "I learned LangChain last month"
- Experience with LLM APIs in production—OpenAI, Anthropic, or open-weight models; you understand the trade-off between latency, cost, and quality
- Understanding of RAG architecture—embeddings, vector databases (pgvector, Pinecone, Weaviate), chunking strategies, evaluation
- Understanding of when RAG beats fine-tuning, when both are wrong, and when the problem isn't an AI problem
- Python + experience with at least one ML framework—PyTorch, scikit-learn, or Hugging Face transformers
- Ability to estimate technically—you can write "this can be done in X days with Y cost per call, with expected accuracy Z"
- Communication in English (Serbian is a plus)—you can clearly explain to non-technical people why AI isn't magic and how to measure success
- Remote discipline—you work independently, show up to weekly demos, escalate problems early
- Experience with fine-tuning open models—LoRA, QLoRA, on specific domain data
- Experience with orchestration—LangChain, LlamaIndex, or custom solutions when frameworks become the problem
- Experience with MLOps tools—model versioning, drift monitoring, A/B testing models in production
- Experience with OCR and document processing—Tesseract, AWS Textract, Google Document AI
- Domain experience—fintech, medtech, legaltech, or e-commerce
- Contributions to open-source AI projects, or published technical content you can point to
- Paid on time and fairly—the rate is agreed before the project, paid by invoice within the agreed timeframe
- Real AI projects, not PoCs that never make it to production—before each project you know the scope, deadline, rate, and client
- A technical collaborator who gets it—you won't have to explain why 90% accuracy on demo queries is different from 90% in production, or defend the decision to measure before optimizing
- Repeat collaboration—if the first project goes well, the next one usually comes 3–6 months later
- No AI buzzword salad—no "blockchain + AI + Web3," no "this has to use AI because it's a trend," no presentations using the word "revolutionary"
By submitting this application, you agree to our Privacy Policy and Terms of Service.