Voyager: An Open-Ended Embodied Agent with Large Language Models
Abstract: We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting. Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. We open-source our full codebase and prompts at https://voyager.minedojo.org/.
Synopsis
Overview
- Keywords: Embodied agents, lifelong learning, large language models, Minecraft, automatic curriculum, skill library
- Objective: Introduce VOYAGER, an LLM-powered embodied agent capable of continuous exploration and skill acquisition in Minecraft without human intervention.
- Hypothesis: VOYAGER can outperform existing LLM-based agents in exploration and skill acquisition through an automatic curriculum and iterative prompting mechanism.
- Innovation: The integration of an automatic curriculum, a skill library for complex behaviors, and an iterative prompting mechanism for real-time feedback and program refinement.
Background
- Preliminary Theories: - Lifelong Learning: The ability of an agent to learn continuously over time, adapting to new tasks without forgetting previously acquired knowledge.
- Embodied AI: AI systems that interact with the physical or simulated environment, requiring decision-making and planning capabilities.
- Large Language Models (LLMs): Models like GPT-4 that leverage vast amounts of data to generate human-like text and can be adapted for various tasks, including coding.
- Curriculum Learning: A training strategy where tasks are presented in a structured manner, allowing agents to build on their knowledge progressively.
 
- Prior Research: - Development of reinforcement learning techniques for game environments, emphasizing exploration and task completion.
- Introduction of LLMs in planning and decision-making tasks, showcasing their ability to generate executable policies.
- Previous agents in Minecraft, such as AutoGPT and ReAct, which focused on task decomposition but lacked effective lifelong learning mechanisms.
 
Methodology
- Key Ideas: - Automatic Curriculum: Generates tasks based on the agent's current state and exploration progress, promoting adaptive learning.
- Skill Library: A repository of executable code that allows the agent to store and retrieve complex behaviors, enhancing reusability and interpretability.
- Iterative Prompting Mechanism: A feedback loop where the agent executes generated code, receives environmental feedback, and refines its approach based on success or failure.
 
- Experiments: - Evaluated against other LLM-based agents (e.g., ReAct, Reflexion, AutoGPT) in the MineDojo framework.
- Metrics included the number of unique items discovered, speed of tech tree mastery, and distance traversed in the game.
- Conducted ablation studies to assess the impact of each component (curriculum, skill library, feedback mechanisms) on performance.
 
- Implications: The design allows VOYAGER to adaptively learn and generalize across different tasks, overcoming limitations seen in traditional RL and LLM approaches. 
Findings
- Outcomes: - VOYAGER discovered 3.3× more unique items and unlocked tech tree milestones 15.3× faster than previous state-of-the-art methods.
- Demonstrated effective zero-shot generalization to unseen tasks, successfully applying learned skills in new environments.
- The iterative prompting mechanism significantly improved task success rates through real-time feedback and self-verification.
 
- Significance: VOYAGER's performance surpasses prior methods, highlighting the effectiveness of combining LLMs with structured learning approaches in open-ended environments. 
- Future Work: Explore multimodal capabilities, enhance the skill library with more complex behaviors, and refine the automatic curriculum for broader applications beyond Minecraft. 
- Potential Impact: Advancements in VOYAGER could lead to more capable generalist agents in various domains, including robotics and complex simulations, fostering autonomous learning and exploration. 
