LLM Question Answering Evaluation with MLflow Learn how to evaluate various LLMs and RAG systems with MLflow, leveraging simple metrics such as toxicity, as well as LLM-judged metrics as relevance, and even custom LLM-judged metrics such as professionalism. Evaluating a 🤗 Hugging Face LLMs with MLflow Learn how to evaluate various Open-Source LLMs available in Hugging Face, leveraging MLflow's built-in LLM metrics and experiment tracking to manage models and evaluation results.
RAG Evaluation with MLflow and GPT-4 as Judge Learn how to evaluate RAG systems with MLflow, leveraging OpenAI GPT-4 model as a judge. RAG Evaluation with MLflow and Llama-2-70B as Judge Learn how to evaluate RAG systems with MLflow, leveraging Llama 2 70B model hosted on Databricks serving endpoint.