Questions tagged [imitation-learning]
For questions related to imitation learning (IL), a reinforcement learning technique where a policy is learned from examples (represented as trajectories) of an (optimal) agent's behavior. IL is similar to inverse reinforcement learning (IRL), where a reward function is learned from examples of the (optimal) agent's behavior, which can then be used to solve the RL problem (i.e. find the policy).
15 questions
3 votes
3 answers
586 views
What are reward networks in reinforcement learning?
I am reading the following article given over here - The goal of both inverse reinforcement learning (IRL) algorithms (e.g. AIRL, GAIL) and preference comparison is to discover a reward function. In ...
1 vote
1 answer
424 views
How can imitation learning data be collected?
How can imitation learning data be collected? Can I use a neural network for that? It might be noisy. Should I use manual gathering?
1 vote
1 answer
86 views
Why could there be "information leak" if we do not use fixed horizons?
In this page Limitations on horizon length from the Imitation library, the authors recommend that the user sticks to fixed horizon experiments because there could be "information leak" ...
1 vote
1 answer
307 views
Why not use only expert demonstrations in Imitation Learning approaches?
Some IL approaches train the agents by using some specific ratio of expert demonstrations to trajectories generated using the policy being optimized. In the specific paper I'm reading they say "...
1 vote
0 answers
80 views
How to decide size of generated dataset in DAGGER agorithm
In the DAGGER algorithm, how does one determine the number of samples required for one iteration of the training loop? Looking at the picture above, I understand initially, during the 1st iteration, ...
3 votes
1 answer
390 views
Is there a standardized method to train a reinforcement learning NN by demonstration?
I'm less familiar with reinforcement learning compared to other neural network learning approaches, so I'm unaware of anything exactly like what I want for an approach. I'm wondering if there are any ...
0 votes
1 answer
354 views
Action selection in Batch-Constrained Deep Q-learning (BCQ)
For simplicity, let's consider the discrete version of BCQ where the paper and the code are available. In the line 5 of Algorithm 1 we have the following: $$ a' = \text{argmax}_{a'|G_{\omega}(a', s')/\...
1 vote
1 answer
421 views
Initialising DQN with weights from imitation learning rather than policy gradient network
In AlphaGo, the authors initialised a policy gradient network with weights trained from imitation learning. I believe this gives it a very good starting policy for the policy gradient network. the ...
2 votes
0 answers
31 views
How do multiple coordinate systems help in capturing invariant features?
I've been reading this paper that formulates invariant task-parametrized HSMMs. The task parameters are represented in $F$ coordinate systems defined by $\{A_j,b_j\}_{j=1}^F$, where $A_j$ denotes the ...
6 votes
1 answer
231 views
What does the number of required expert demonstrations in Imitation Learning depend on?
I just read the following points about the number of required expert demonstrations in imitation learning, and I'd like some clarifications. For the purpose of context, I'll be using a linear reward ...
2 votes
1 answer
420 views
What is the surrogate loss function in imitation learning, and how is it different from the true cost?
I've been reading A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning lately, and I can't understand what they mean by the surrogate loss function. Some relevant ...
1 vote
1 answer
228 views
Is GAIL applicable if the expert's trajectories are for the same task but are in a different environment?
Is the GAIL applicable if the expert's trajectories (sample data) are for the same task but are in a different environment (modified but will not be completely different)? My gut feeling is, yes, ...
2 votes
0 answers
96 views
Can we use imitation learning for on-policy algorithms?
Imitation learning uses experiences of an (expert) agent to train another agent, in my understanding. If I want to use an on-policy algorithm, for example, Proximal Policy Optimization, because of it'...
6 votes
2 answers
3k views
What is the difference between imitation learning and classification done by experts?
In short, imitation learning means learning from the experts. Suppose I have a dataset with labels based on the actions of experts. I use a simple binary classifier algorithm to assess whether it is ...
7 votes
1 answer
219 views
In imitation learning, do you simply inject optimal tuples of experience $(s, a, r, s')$ into your experience replay buffer?
Due to my RL algorithm having difficulties learning some control actions, I've decided to use imitation learning/apprenticeship learning to guide my RL to perform the optimal actions. I've read a few ...