Epikurious

Podcast de Alejandro Santamaria Arza

inglés

Actualidad y política

Oferta limitada

2 meses por 1 €

Después 4,99 € / mesCancela cuando quieras.

20 horas de audiolibros / mes
Podcasts solo en Podimo
Podcast gratuitos

Empezar

Acerca de Epikurious

Cravings of knowledge around tech, AI and the mind

Todos los episodios

14 episodios

From Bias to Balance: Navigating LLM Evaluations

This research paper explores the challenges of evaluating Large Language Model (LLM) outputs and introduces EvalGen, a new interface designed to improve the alignment between LLM-generated evaluations and human preferences. EvalGen uses a mixed-initiative approach, combining automated LLM assistance with human feedback to generate and refine evaluation criteria and assertions. The study highlights a phenomenon called "criteria drift," where the process of grading outputs helps users define and refine their evaluation criteria. A qualitative user study demonstrates overall support for EvalGen, but also reveals complexities in aligning automated evaluations with human judgment, particularly regarding the subjective nature of evaluation and the iterative process of alignment. The authors conclude by discussing implications for future LLM evaluation assistants.

5 de dic de 2024 - 17 min

The LLM Performance Lab: Testing, Tuning, and Triumphs

Both sources discuss building effective evaluation systems for Large Language Model (LLM) applications. The YouTube transcript details a case study where a real estate AI assistant, initially improved through prompt engineering, plateaued until a comprehensive evaluation framework was implemented, dramatically increasing success rates. The blog post expands on this framework, outlining a three-level evaluation process—unit tests, human and model evaluation, and A/B testing—emphasizing the importance of removing friction from data analysis and iterative improvement. Both sources highlight the crucial role of evaluation in overcoming the challenges of LLM development, advocating for domain-specific evaluations over generic approaches. The blog post further explores leveraging the evaluation framework for fine-tuning and debugging, demonstrating the synergistic relationship between robust evaluation and overall product success.

5 de dic de 2024 - 24 min

RAGified: Smarter AI Conversations

Retrieval-Augmented Generation (RAG) applications, integrating information retrieval with language generation, are examined in this technical document. The paper explores methodologies for improving RAG performance, including iterative refinement and robust evaluation frameworks. Key challenges like context limitations and data quality issues are discussed alongside proposed solutions such as improved prompt engineering and effective data management. Finally, the document provides case studies illustrating RAG applications in various fields, along with a look toward the future directions of the technology.

5 de dic de 2024 - 14 min

Beyond the Benchmark: Crafting the Future of AI Agent Evaluation and Optimization

This research paper assesses the current state of AI agent benchmarking, highlighting critical flaws hindering real-world applicability. The authors identify shortcomings in existing benchmarks, including a narrow focus on accuracy without considering cost, conflation of model and downstream developer needs, inadequate holdout sets leading to overfitting, and a lack of standardization impacting reproducibility. They propose a framework to address these issues, advocating for cost-controlled evaluations, joint optimization of accuracy and cost, distinct benchmarking for model and downstream developers, and standardized evaluation practices to foster the development of truly useful AI agents. Their analysis uses case studies on several prominent benchmarks to illustrate the identified problems and proposed solutions. The ultimate goal is to improve the rigor and reliability of AI agent evaluation.

3 de dic de 2024 - 18 min

From Prompt Engineering to AI Agent Frameworks: A Complete Guide

This text presents a two-level learning roadmap for developing AI agents. Level 1 focuses on foundational knowledge, including generative AI, large language models (LLMs), prompt engineering, data handling, API wrappers, and Retrieval-Augmented Generation (RAG). Level 2 builds upon this foundation by exploring AI agent frameworks like LangChain, constructing simple agents, implementing agentic workflows and memory, evaluating agent performance, and mastering multi-agent collaboration and RAG within an agentic context. The roadmap aims to provide a structured path for learners to acquire the necessary skills in building and deploying AI agents. Free learning resources are offered to aid in the learning process.

3 de dic de 2024 - 6 min

Soy muy de podcasts. Mientras hago la cama, mientras recojo la casa, mientras trabajo… Y en Podimo encuentro podcast que me encantan. De emprendimiento, de salid, de humor… De lo que quiera! Estoy encantada 👍

MI TOC es feliz, que maravilla. Ordenador, limpio, sugerencias de categorías nuevas a explorar!!!

Me suscribi con los 14 días de prueba para escuchar el Podcast de Misterios Cotidianos, pero al final me quedo mas tiempo porque hacia tiempo que no me reía tanto. Tiene Podcast muy buenos y la aplicación funciona bien.

App ligera, eficiente, encuentras rápido tus podcast favoritos. Diseño sencillo y bonito. me gustó.

contenidos frescos e inteligentes

La App va francamente bien y el precio me parece muy justo para pagar a gente que nos da horas y horas de contenido. Espero poder seguir usándola asiduamente.

Elige tu suscripción

Sólo en Podimo

Audiolibros populares

El Señor de los Anillos nº 01/03 La Comunidad del Anillo

Empezar

2 meses por 1 €. Después 4,99 € / mes. Cancela cuando quieras.