Skip to content

Latest commit

 

History

History
51 lines (38 loc) · 1.89 KB

README.md

File metadata and controls

51 lines (38 loc) · 1.89 KB

MM Llama3.2 for ARC-AGI Challenge Demo

The ARC-AGI Challenge is a competation to solve the puzzles from the Abstraction and Reasoning Corpus (ARC) first outlined here. This repo is a small demo project to attempt to demonstrate how multimodal (MM) understanding can help large langugae models (LLMs) improve their performance on these challenges that are all very visual.

The first approach here is very naive and simply converts the puzzles into the images that are shown to human players here and feed Llama 3.2 Vision the questions in both text and image format and finetune it on the answers. The goal is to see if the MM approach can improve over a pure textual approach for the same sized Llama model.

Installation

Step 1: Install PyTorch.

# Nightly install for latest features
pip install --pre torch torchvision torchao --index-url https://download.pytorch.org/whl/nightly/cu121
pip install --pre torchtune --extra-index-url https://download.pytorch.org/whl/nightly/cpu --no-cache-dir
# Install requirements
pip install -e .

Step 2: Download the models and datasets

# text only baseline model (need hf access)
tune download meta-llama/Meta-Llama-3.1-8B-Instruct --ignore-patterns "original/consolidated.00.pth"
# Vision model
tune download meta-llama/Llama-3.2-11B-Vision-Instruct --ignore-patterns "original/consolidated.00.pth"
# Download data
git clone https://github.com/fchollet/ARC-AGI.git
python format_dataset.py

Running recipes

Run recipes as shown below. For more tune options see torchtune Docs

Text Recipe:

tune run full_finetune_single_device --config configs/8B_text.yaml

Text Eval

tune run eval --config configs/8B_text_eval.yaml