Questions tagged [imitation-learning]

Question 1

I am reading the following article given over here - The goal of both inverse reinforcement learning (IRL) algorithms (e.g. AIRL, GAIL) and preference comparison is to discover a reward function. In ...

Question 2

How can imitation learning data be collected? Can I use a neural network for that? It might be noisy. Should I use manual gathering?

Question 3

In this page Limitations on horizon length from the Imitation library, the authors recommend that the user sticks to fixed horizon experiments because there could be "information leak" ...

Question 4

Some IL approaches train the agents by using some specific ratio of expert demonstrations to trajectories generated using the policy being optimized. In the specific paper I'm reading they say "...

Question 5

In the DAGGER algorithm, how does one determine the number of samples required for one iteration of the training loop? Looking at the picture above, I understand initially, during the 1st iteration, ...

Question 6

I'm less familiar with reinforcement learning compared to other neural network learning approaches, so I'm unaware of anything exactly like what I want for an approach. I'm wondering if there are any ...

Question 7

For simplicity, let's consider the discrete version of BCQ where the paper and the code are available. In the line 5 of Algorithm 1 we have the following: $$ a' = \text{argmax}_{a'|G_{\omega}(a', s')/\...

Question 8

In AlphaGo, the authors initialised a policy gradient network with weights trained from imitation learning. I believe this gives it a very good starting policy for the policy gradient network. the ...

Question 9

I've been reading this paper that formulates invariant task-parametrized HSMMs. The task parameters are represented in $F$ coordinate systems defined by $\{A_j,b_j\}_{j=1}^F$, where $A_j$ denotes the ...

Question 10

I just read the following points about the number of required expert demonstrations in imitation learning, and I'd like some clarifications. For the purpose of context, I'll be using a linear reward ...

Question 11

I've been reading A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning lately, and I can't understand what they mean by the surrogate loss function. Some relevant ...

Question 12

Is the GAIL applicable if the expert's trajectories (sample data) are for the same task but are in a different environment (modified but will not be completely different)? My gut feeling is, yes, ...

Question 13

Imitation learning uses experiences of an (expert) agent to train another agent, in my understanding. If I want to use an on-policy algorithm, for example, Proximal Policy Optimization, because of it'...

Question 14

In short, imitation learning means learning from the experts. Suppose I have a dataset with labels based on the actions of experts. I use a simple binary classifier algorithm to assess whether it is ...

Question 15

Due to my RL algorithm having difficulties learning some control actions, I've decided to use imitation learning/apprenticeship learning to guide my RL to perform the optimal actions. I've read a few ...

Stack Exchange Network

Questions tagged [imitation-learning]

What are reward networks in reinforcement learning?

How can imitation learning data be collected?

Why could there be "information leak" if we do not use fixed horizons?

Why not use only expert demonstrations in Imitation Learning approaches?

How to decide size of generated dataset in DAGGER agorithm

Is there a standardized method to train a reinforcement learning NN by demonstration?

Action selection in Batch-Constrained Deep Q-learning (BCQ)

Initialising DQN with weights from imitation learning rather than policy gradient network

How do multiple coordinate systems help in capturing invariant features?

What does the number of required expert demonstrations in Imitation Learning depend on?

What is the surrogate loss function in imitation learning, and how is it different from the true cost?

Is GAIL applicable if the expert's trajectories are for the same task but are in a different environment?

Can we use imitation learning for on-policy algorithms?

What is the difference between imitation learning and classification done by experts?

In imitation learning, do you simply inject optimal tuples of experience $(s, a, r, s')$ into your experience replay buffer?

Hot Network Questions