InterPReT: Interactive Policy Restructuring and Training Enable Effective Imitation Learning from Laypersons

Gavin Zhu, Jean Oh, Reid Simmons — Carnegie Mellon University

Overview figure of InterPReT

InterPReT lets non-technical users teach control policies through a multi-turn loop: provide instructions, provide demonstrations, inspect agent behavior, and refine the next round.

Abstract

Imitation learning has shown success in many tasks by learning from expert demonstrations. However, most existing work relies on large-scale demonstrations from technical professionals and close monitoring of the training process. These are challenging for a layperson when they want to teach the agent new skills. To lower the barrier of teaching AI agents, we propose Interactive Policy Restructuring and Training (InterPReT), which takes user instructions to continually update the policy structure and optimize its parameters to fit user demonstrations. This enables end-users to interactively give instructions and demonstrations, monitor the agent's performance, and review the agent's decision-making strategies. A user study (N=34) on teaching an AI agent to drive in a racing game confirms that our approach yields more robust policies without impairing system usability, compared to a generic imitation learning baseline, when a layperson is responsible for both giving demonstrations and determining when to stop. This shows that our method is more suitable for end-users without much technical background in machine learning to train a dependable policy.

Walkthrough: Multi-turn Teaching Session

Study Setup

We conducted an in-person between-subject user study (N=34) in a car-racing teaching task. Participants used a gamepad interface to provide demonstrations, and the experimental group also provided language instructions for restructuring.

Participants first practiced control, then repeatedly taught and evaluated policies. They could choose start conditions, provide multiple demonstrations, retrain, and test iteratively before submission.

Interface screenshot from appendix

Appendix interface screenshot from the paper source.

Findings (Hypotheses Matched with Figures)

Additional Analysis

Related Work

[IJCAI 25] Sample-Efficient Behavior Cloning Using General Domain Knowledge

Acknowledgements

We would like to thank Shridhula Srinivasan and Justin Ma for their help in prototyping the user interface, experimenting with prompt engineering, and running some pilot user studies. Additional thanks to Justin for running some final studies.

This research has been partially supported by Microsoft Corporation as part of the Keio CMU partnership. And Feiyu is supported by the Softbank Group - Arm PhD Fellowship.

This webpage is created by Copilot with GPT-5.3-Codex.

BibTeX

Show BibTeX citation
@inproceedings{10.1145/3757279.3785549,
author = {Zhu, Feiyu Gavin and Oh, Jean and Simmons, Reid},
title = {InterPReT: Interactive Policy Restructuring and Training Enable Effective Imitation Learning from Laypersons},
year = {2026},
isbn = {9798400721281},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3757279.3785549},
doi = {10.1145/3757279.3785549},
booktitle = {Proceedings of the 21st ACM/IEEE International Conference on Human-Robot Interaction},
pages = {864–873},
numpages = {10},
keywords = {Adaptable Policy Structure, Interactive Learning, Learning from Demonstrations},
location = {Edinburgh, Scotland, UK},
series = {HRI '26}
}