Information
# Video-Pre-Training
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
> :page_facing_up: [Read Paper](https://cdn.openai.com/vpt/Paper.pdf) \
:mega: [Blog Post](https://openai.com/blog/vpt) \
:space_invader: [MineRL Environment](https://github.com/minerllabs/minerl) (note version 1.0+ required) \
:checkered_flag: [MineRL BASALT Competition](https://www.aicrowd.com/challenges/neurips-2022-minerl-basalt-competition)
# Running agent models
Install pre-requirements for [MineRL](https://minerl.readthedocs.io/en/latest/tutorials/index.html).
Then install requirements with:
\`\`\`
pip install git+https://github.com/minerllabs/minerl
pip install -r requirements.txt
\`\`\`
> ️ Note: For reproducibility reasons, the PyTorch version is pinned as \`torch==1.9.0\`, which is incompatible with Python 3.10 or higher versions. If you are using Python 3.10 or higher, install a [newer version of PyTorch](https://pytorch.org/get-started/locally/) (usually, \`pip install torch\`). However, note that this *might* subtly change model behaviour (e.g., still act mostly as expected, but not reaching the reported performance).
To run the code, call
\`\`\`
python run_agent.py --model [path to .model file] --weights [path to .weight file]
\`\`\`
After loading up, you should see a window of the agent playing Minecraft.
# Agent Model Zoo
Below are the model files and weights files for various pre-trained Minecraft models.
The 1x, 2x and 3x model files correspond to their respective model weights width.
* [:arrow_down: 1x Model](https://openaipublic.blob.core.windows.net/minecraft-rl/models/foundation-model-1x.model)
* [:arrow_down: 2x Model](https://openaipublic.blob.core.windows.net/minecraft-rl/models/2x.model)
* [:arrow_down: 3x Model](https://openaipublic.blob.core.windows.net/minecraft-rl/models/foundation-model-3x.model)
### Demonstration Only - Behavioral Cloning
These models are trained on video demonstrations of humans playing Minecraft
using behavioral cloning (BC) and are more general than later models which
use reinforcement learning (RL) to further optimize the policy.
Foundational models are trained across all videos in a single training run
while house and early game models refine their respective size foundational
model further using either the housebuilding contractor data or early game video
sub-set. See the paper linked above for more details.
#### Foundational Model :chart_with_upwards_trend:
* [:arrow_down: 1x Width Weights](https://openaipublic.blob.core.windows.net/minecraft-rl/models/foundation-model-1x.weights)
* [:arrow_down: 2x Width Weights](https://openaipublic.blob.core.windows.net/minecraft-rl/models/foundation-model-2x.weights)
* [:arrow_down: 3x Width Weights](https://openaipublic.blob.core.windows.net/minecraft-rl/models/foundation-model-3x.weights)
#### Fine-Tuned from House :chart_with_upwards_trend:
* [:arrow_down: 3x Width Weights](https://openaipublic.blob.core.windows.net/minecraft-rl/models/bc-house-3x.weights)
#### Fine-Tuned from Early Game :chart_with_upwards_trend:
* [:arrow_down: 2x Width Weights](https://openaipublic.blob.core.windows.net/minecraft-rl/models/bc-early-game-2x.weights)
* [:arrow_down: 3x Width Weights](https://openaipublic.blob.core.windows.net/minecraft-rl/models/bc-early-game-3x.weights)
### Models With Environment Interactions
These models further refine the above demonstration based models with a reward
function targeted at obtaining diamond pickaxes. While less general then the behavioral
cloning models, these models have the benefit of interacting with the environment
using a reward function and excel at progressing through the tech tree quickly.
See the paper for more information
on how they were trained and the exact reward schedule.
#### RL from Foundation :chart_with_upwards_trend:
* [:arrow_down: 2x Width Weights](https://openaipublic.blob.core.windows.net/minecraft-rl/models/rl-from-foundation-2x.weights)
#### RL from House :chart_with_upwards_trend:
* [:arrow_down: 2x Width Weights](https://openaipublic.blob.core.windows.net/minecraft-rl/models/rl-from-house-2x.weights)
#### RL from Early Game :chart_with_upwards_trend:
* [:arrow_down: 2x Width Weights](https://openaipublic.blob.core.windows.net/minecraft-rl/models/rl-from-early-game-2x.weights)
# Running Inverse Dynamics Model (IDM)
IDM aims to predict what actions player is taking in a video recording.
Setup:
* Install requirements: \`pip install -r requirements.txt\`
* Download the IDM model [.model :arrow_down:](https://openaipublic.blob.core.windows.net/minecraft-rl/idm/4x_idm.model) and [.weight :arrow_down:](https://openaipublic.blob.core.windows.net/minecraft-rl/idm/4x_idm.weights) files
* For demonstration purposes, you can use the contractor recordings shared below to. For this demo we use
[this .mp4](https://openaipublic.blob.core.windows.net/minecraft-rl/data/10.0/cheeky-cornflower-setter-02e496ce4abb-20220421-092639.mp4)
and [this associated actions file (.jsonl)](https://openaipublic.blob.core.windows.net/minecraft-rl/data/10.0/cheeky-cornflower-setter-02e496ce4abb-20220421-092639.jsonl).
To run the model with above files placed in the root directory of this code:
\`\`\`
python run_inverse_dynamics_model.py --weights 4x_idm.weights --model 4x_idm.model --video-path cheeky-cornflower-setter-02e496ce4abb-20220421-092639.mp4 --jsonl-path cheeky-cornflower-setter-02e496ce4abb-20220421-092639.jsonl
\`\`\`
A window should pop up which shows the video frame-by-frame, showing the predicted and true (recorded) actions side-by-side on the left.
Note that \`run_inverse_dynamics_model.py\` is designed to be a demo of the IDM, not code to put it into practice.
# Using behavioural cloning to fine-tune the models
**Disclaimer:** This code is a rough demonstration only and not an exact recreation of what original VPT paper did (but it contains some preprocessing steps you want to be aware of)! As such, do not expect replicate the original experiments with this code. This code has been designed to be run-able on consumer hardware (e.g., 8GB of VRAM).
Setup:
* Install requirements: \`pip install -r requirements.txt\`
* Download \`.weights\` and \`.model\` file for model you want to fine-tune.
* Download contractor data (below) and place the \`.mp4\` and \`.jsonl\` files to the same directory (e.g., \`data\`). With default settings, you need at least 12 recordings.
If you downloaded the "1x Width" models and placed some data under \`data\` directory, you can perform finetuning with
\`\`\`
python behavioural_cloning.py --data-dir data --in-model foundation-model-1x.model --in-weights foundation-model-1x.weights --out-weights finetuned-1x.weights
\`\`\`
You can then use \`finetuned-1x.weights\` when running the agent. You can change the training settings at the top of \`behavioural_cloning.py\`.
Major limitations:
- Only trains single step at the time, i.e., errors are not propagated through timesteps.
- Computes gradients one sample at a time to keep memory use low, but also slows down the code.
# Contractor Demonstrations
### Versions
Over the course of the project we requested various demonstrations from contractors
which we release as index files below. In general, major recorder versions change for a new
prompt or recording feature while bug-fixes were represented as minor version changes.
However, some
recorder versions we asked contractors to change their username when recording particular
modalities. Also, as contractors internally ask questions, clarification from one contractor may
result in a behavioral change in the other contractor. It is intractable to share every contractor's
view for each version, but we've shared the prompts and major clarifications for each recorder
version where the task changed significantly.
The following is a list of the available versions:
* **6.x** Core recorder features subject to change [:arrow_down: index file](https://openaipublic.blob.core.windows.net/minecraft-rl/snapshots/all_6xx_Jun_29.json)
* 6.9 First feature complete recorder version
* 6.10 Fixes mouse scaling on Mac when gui is open
* 6.11 Tracks the hotbar slot
* 6.13 Sprinting, swap-hands, ... (see commits below)
* **7.x** Prompt changes [:arrow_down: index file](https://openaipublic.blob.core.windows.net/minecraft-rl/snapshots/all_7xx_Apr_6.json)
* 7.6 Bump version for internal tracking
* **8.x** :clipboard: House Building from Scratch Task [:arrow_down: index](https://openaipublic.blob.core.windows.net/minecraft-rl/snapshots/all_8xx_Jun_29.json)
* Note this version introduces 10-minute timer that ends the episode. It
cut experiments short occasionally and was fixed in 9.1
* 8.0 Simple House
* 8.2 Update upload script
* **9.x** :clipboard: House Building from Random Starting Materials Task [:arrow_down: index](https://openaipublic.blob.core.windows.net/minecraft-rl/snapshots/all_9xx_Jun_29.json)
hotbar = new ArrayList<>();
List inventory = new ArrayList<>();
// Ensure we give the player the basic tools in their hot bar
hotbar.add(new ItemStack(Items.STONE_AXE));
hotbar.add(new ItemStack(Items.STONE_PICKAXE));
hotbar.add(new ItemStack(Items.STONE_SHOVEL));
hotbar.add(new ItemStack(Items.CRAFTING_TABLE));
// Add some random items to the player hotbar as well
addToList(hotbar, inventory, Items.TORCH, random.nextInt(16) * 2 + 2);
// Next add main building blocks
if (random.nextFloat() < 0.7) \{
addToList(hotbar, inventory, Items.OAK_FENCE_GATE, random.nextInt(5));
addToList(hotbar, inventory, Items.OAK_FENCE, random.nextInt(5) * 64);
addToList(hotbar, inventory, Items.OAK_DOOR, random.nextInt(5));
addToList(hotbar, inventory, Items.OAK_TRAPDOOR, random.nextInt(2) * 2);
addToList(hotbar, inventory, Items.OAK_PLANKS, random.nextInt(3) * 64 + 128);
addToList(hotbar, inventory, Items.OAK_SLAB, random.nextInt(3) * 64);
addToList(hotbar, inventory, Items.OAK_STAIRS, random.nextInt(3) * 64);
addToList(hotbar, inventory, Items.OAK_LOG, random.nextInt(2) * 32);
addToList(hotbar, inventory, Items.OAK_PRESSURE_PLATE, random.nextInt(5));
\} else \{
addToList(hotbar, inventory, Items.BIRCH_FENCE_GATE, random.nextInt(5));
addToList(hotbar, inventory, Items.BIRCH_FENCE, random.nextInt(5) * 64);
addToList(hotbar, inventory, Items.BIRCH_DOOR, random.nextInt(5));
addToList(hotbar, inventory, Items.BIRCH_TRAPDOOR, random.nextInt(2) * 2);
addToList(hotbar, inventory, Items.BIRCH_PLANKS, random.nextInt(3) * 64 + 128);
addToList(hotbar, inventory, Items.BIRCH_SLAB, random.nextInt(3) * 64);
addToList(hotbar, inventory, Items.BIRCH_STAIRS, random.nextInt(3) * 64);
addToList(hotbar, inventory, Items.BIRCH_LOG, random.nextInt(2) * 32);
addToList(hotbar, inventory, Items.BIRCH_PRESSURE_PLATE, random.nextInt(5));
\}
// Now add some random decoration items to the player inventory
addToList(hotbar, inventory, Items.CHEST, random.nextInt(3));
addToList(hotbar, inventory, Items.FURNACE, random.nextInt(2) + 1);
addToList(hotbar, inventory, Items.GLASS_PANE, random.nextInt(5) * 4);
addToList(hotbar, inventory, Items.WHITE_BED, (int) (random.nextFloat() + 0.2)); // Bed 20% of the time
addToList(hotbar, inventory, Items.PAINTING, (int) (random.nextFloat() + 0.1)); // Painting 10% of the time
addToList(hotbar, inventory, Items.FLOWER_POT, (int) (random.nextFloat() + 0.1) * 4); // 4 Flower pots 10% of the time
addToList(hotbar, inventory, Items.OXEYE_DAISY, (int) (random.nextFloat() + 0.1) * 4); // 4 Oxeye daisies 10% of the time
addToList(hotbar, inventory, Items.POPPY, (int) (random.nextFloat() + 0.1) * 4); // 4 Poppies 10% of the time
addToList(hotbar, inventory, Items.SUNFLOWER, (int) (random.nextFloat() + 0.1) * 4); // 4 Sunflowers 10% of the time
// Shuffle the hotbar slots and inventory slots
Collections.shuffle(hotbar);
Collections.shuffle(inventory);
// Give the player the items
this.mc.getIntegratedServer().getPlayerList().getPlayers().forEach(p -> \{
if (p.getUniqueID().equals(this.getUniqueID())) \{
hotbar.forEach(p.inventory::addItemStackToInventory);
inventory.forEach(p.inventory::addItemStackToInventory);
\}
\});
\`\`\`
* 9.0 First version
* 9.1 Fixed timer bug
* **10.0** :clipboard: Obtain Diamond Pickaxe Task [:arrow_down: index](https://openaipublic.blob.core.windows.net/minecraft-rl/snapshots/all_10xx_Jun_29.json)
Sometimes we asked the contractors to signify other tasks besides changing the version. This
primarily occurred in versions 6 and 7 as 8, 9 and 10 are all task specific.
### Environment
We restrict the contractors to playing Minecraft in windowed mode at 720p which we downsample at 20hz to 360p
to minimize space. We also disabled the options screen to prevent the contractor from
changing things such as brightness, or rendering options. We ask contractors not to press keys
such as f3 which shows a debug overlay, however some contractors may still do this.
### Data format
Demonstrations are broken up into up to 5 minute segments consisting of a series of
compressed screen observations, actions, environment statistics, and a checkpoint
save file from the start of the segment. Each relative path in the index will
have all the files for that given segment, however if a file was dropped while
uploading, the corresponding relative path is not included in the index therefore
there may be missing chunks from otherwise continuous demonstrations.
Index files are provided for each version as a json file:
\`\`\`json
\{
"basedir": "https://openaipublic.blob.core.windows.net/data/",
"relpaths": [
"8.0/cheeky-cornflower-setter-74ae6c2eae2e-20220315-122354",
...
]
\}
\`\`\`
Relative paths follow the following format:
* \`/---