Learning practically feasible policies for online 3D bin packing

来源 :中国科学:信息科学(英文版) | 被引量 : 0次 | 上传用户:mm1234567mm
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
We tackle the online 3D bin packing problem (3D-BPP),a challenging yet practically useful variant of the classical bin packing problem.In this problem,the items are delivered to the agent without informing the full sequence information.The agent must directly pack these items into the target bin stably without changing their arrival order,and no further adjustment is permitted.Online 3D-BPP can be naturally formulated as a Markov decision process (MDP).We adopt deep reinforcement learning,in particular,the on-policy actor-critic franework,to solve this MDP with constrained action space.To learn a practically feasible packing policy,we propose three critical designs.First,we propose an online analysis of packing stability based on a novel stacking tree.It attains a high analysis accuracy while reducing the computational complexity from O(N2) to O(N log N),making it especially suited for reinforcement learning training.Second,we propose a decoupled packing policy learning for different dimensions of placement which enables high-resolution spatial discretization and hence high packing precision.Third,we introduce a reward function that dictates the robot to place items in a far-to-near order and therefore simplifies the collision avoidance in movement planning of the robotic arm.Furthermore,we provide a comprehensive discussion on several key implemental issues.The extensive evaluation demonstrates that our learned policy outperforms the state-of-the-art methods significantly and is practically usable for real-world applications.
其他文献
An increasing number of deep learning methods is being applied to quantify the perception of urban environments,study the relationship between urban appearance and resident safety,and improve urban appearance.Most advanced methods extract image feature re
Participatory sensing is a promising approach with which people contribute sensory information to form a body of knowledge.In practice,people may have different ways to engage in a participatory sensing campaign.For example,there are several possible rout
This study focuses on state-feedback and output-feedback neural learning control problems for discrete-time nonlinear systems in the pure-feedback form.First,an extended result for the exponential stability for a class of discrete-time linear time-varying
This paper proposes an event-triggered robust nonlinear model predictive control (NMPC) frame-work for cyber-physical systems (CPS) in the presence of denial-of-service (DoS) attacks and additive distur-bances.In the framework,a new robustness constraint
This paper addresses cooperative global robust output regulation for heterogeneous and uncertain multiagent nonlinear systems in the output-feedback normal form.Specifically,we develop a Lyapunov-based dynamic output-feedback law using a nonlinear interna
期刊
Multichannel adaptive signal detection uses test and training data jointly to form an adaptive detector to determine whether a target exists.The resulting adaptive detectors typically possess constant false alarm rate(CFAR)properties;thus,no additional CF
Hardware security primitives that preserve secrets are playing a crucial role in the Internet-of-Things(IoT)era.Existing physical unclonable function(PUF)instantiations,exploiting static randomness,generate challenge-response pairings(CRPs)to produce uniq
党建工作已经成为公立医院管理的核心工作。推进医院党建融合,有利于加强公立医院党的全面领导,有利于提升党建工作对医疗护理工作的引领力和统筹力,有利于凝聚医院党员和医护人员的向心力、战斗力,有利于更好地提升医疗护理水平和服务质量,推动医院高质量发展。新形势下,探讨如何推进医院党建工作和业务工作的深度融合,以高质量党建推进公立医院高质量发展,已经成为一项重要的课题。
The von Neumann bottleneck and memory wall have posed fundamental limitations in latency and energy consumption of modern computers based on von Neumann architecture.In-memory computing represents a radical shift in the computer architecture that can addr