Using the CrowdPlay Dataset Engine
The CrowdPlay dataset package provides a powerful metadata engine that allows filtering of episodes by environment, recruitment channel, performance, behavioural statistics and other metadata. The following will simply load all the episodes in the dataset:
from crowdplay_datasets.dataset import get_engine_and_session, EpisodeModel
P0 = 'game_0>player_0' # shorthand for first player in each game
_, session = get_engine_and_session("crowdplay_atari-v0") # start dataset session
all_episodes = session.query(EpisodeModel).all() # get all episodes
You could then filter episodes using standard python list comprehension:
[e for e in all_episodes if
e.environment.keyword_data['all']['game'] == 'space_invaders' and
e.environment.keyword_data['all']['task'] == 'insideout' and
e.environment.keyword_data['all']['source'] == 'mturk' and
e.keyword_data[P0]['Score'] >= 100]
This selects all episodes where the game is Space Invaders, the participants were asked for follow a specific “insideout” behavior, participants were recruited on MTurk, and achieved an eppisode score of 100 or more.
It is also possible to filter episodes using SQL syntax, which comes at a slight performance benefit. This uses SQLAlchemy ORM syntax. The following filters for a specific environment ID and a minimum score:
filtered_episodes = session.query(EpisodeModel)\
.filter(EnvironmentModel.environment_id == EpisodeModel.environment_id)\
.filter(EnvironmentModel.task_id == task)\
.filter(EpisodeKeywordDataModel.episode_id == EpisodeModel.episode_id)\
.filter(EpisodeKeywordDataModel.key == 'Score')\
.filter(EpisodeKeywordDataModel.value >= 50)\
.all()
Loading Trajectories
EpisodeModel
objects by themselves are a collection of relational metadata, but can also be used to access the actual trajectory. We provide two methods for this: get_raw_trajectory()
returns the entire trajectory as-is, with all metadata and debug data intact, and with observations unprocessed and in CrowdPlay’s multiagent nested Dict format. get_processed_trajectory()
returns the trajectory with all observations processed, either in a format similar to the return values of a Gym step()
function, or in a format that can be used directly for D3RL training.
get_processed_trajectory()
returns a trajectory as a list of numpy arrays, where each array is four stacked, downsampled frames, and has shape(84, 84, 4)
. By default it returns the trajectory of the first agent’s observation.get_processed_trajectory(framestack_axis_first=False, stack_n=1)
returns the trajectory in a format that can be used directly for D3RL training. It differs from the default in that it does not stack frames (because D3RL provides its own framestacking), and it puts the framestacking axis first, result in observations of shape(1, 84, 84)
.get_processed_trajectory(agent='game_0>player_1')
andget_processed_trajectory(agent='game_0>player_1', framestack_axis_first=False, stack_n=1)
return the trajectory of the second agent’s observation in two-agent environments.
The raw trajectories contain observations and other information in the same format as is used in the EnvProcess
episode loop. Much of this information is only useful for debug purposes. Notable exception are trajectory[step_number]['info']['game_0>agent_0']['RAM']
, which contains the emulator RAM state at every frame; and trajectory[step_number]['user_type']['game_0>agent_0']
which is set to 1
if the agent is controlled by a human, and 2
if the agent is controlled by an AI policy. This is useful in multiagent environments with fallback AIs, if you wish to distinguish parts of the episode where an AI took over control from a disconnected human.
Episode Metadata
In the CrowdPlay Atari datset, for technical reasons agents are identified as 'game_0>player_0'
and 'game_0>player_1'
. Per-agent metadata is stored as key-value pairs in the keyword_data[agent_id]
field of the EpisodeModel
object. For instance episode.keyword_data['game_0>player_0']['Active playtime']
returns the amount of time the agent was actively palying in the episode. Metadata stored in the keyword_data
dictionary corresponds to the realtime statistics calculated using callables when the episode was recorded.
It is possible to calculate metadata offline as well, and data_analysis/data_analysis.ipynb
and data_analysis/batch_create_metadata
provide some examples of this. Such metadata can also be stored as part of the dataset database for later use.
Integrating Into Offline Learning Pipelines
The CrowdPlay dataset integrates easily into downstream offline learning pipelines such as d3rlpy. As a minimal end to end example of using d3rlpy to train an agent on the CrowdPlay Atari dataset, consider the following script:
import d3rlpy
import numpy as np
from crowdplay_datasets.dataset import (
EnvironmentModel,
EpisodeKeywordDataModel,
EpisodeModel,
get_engine_and_session,
)
# Load all Space Invaders episodes that have score >= 50
_, session = get_engine_and_session("crowdplay_atari-v0")
episodes = (
session.query(EpisodeModel)
.filter(EnvironmentModel.environment_id == EpisodeModel.environment_id)
.filter(EnvironmentModel.task_id == "space_invaders")
.filter(EpisodeKeywordDataModel.episode_id == EpisodeModel.episode_id)
.filter(EpisodeKeywordDataModel.key == "Score")
.filter(EpisodeKeywordDataModel.value >= 50)
.all()
)
# Get Gym-formatted trajectory for each episode
obs, acts, rews, term = [], [], [], []
for episode in episodes:
(o,a,r,t) = ep.get_processed_trajectory(
framestack_axis_first=True, downsample_frequency=4, downsample_offset=0, stack_n=1
)
obs.append(np.array(o))
acts.append(np.array(a))
rews.append(np.array(r))
term.append(np.array(t))
# Convert into d3rlpy dataset
obs = np.concatenate(obs)
acts = np.concatenate(acts)
rews = np.concatenate(rews)
term = np.concatenate(term)
input_data = d3rlpy.dataset.MDPDataset(obs, acs, rew, term)
train_episodes, test_episodes = train_test_split(input_data, test_size=0.1)
# Create Algorithm
algo = d3rlpy.algos.DiscreteBC(
learning_rate=3e-05,
n_frames=4,
q_func_factory="mean",
batch_size=256,
target_update_interval=2500,
scaler="pixel",
use_gpu=gpu,
)
# Train!
algo.fit(
train_episodes,
eval_episodes=test_episodes,
n_steps=1000000,
n_steps_per_epoch=1000,
)
We also provide a sample implementation of testing a variety of dataset and algorithm combinations in offline/offline_atari.py
. This script was used to create the benchmarks in the CrowdPlay paper.