Learning practically feasible policies for online 3D bin packing

来源 :中国科学:信息科学(英文版) | 被引量 : 0次 | 上传用户:mm1234567mm
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
We tackle the online 3D bin packing problem (3D-BPP),a challenging yet practically useful variant of the classical bin packing problem.In this problem,the items are delivered to the agent without informing the full sequence information.The agent must directly pack these items into the target bin stably without changing their arrival order,and no further adjustment is permitted.Online 3D-BPP can be naturally formulated as a Markov decision process (MDP).We adopt deep reinforcement learning,in particular,the on-policy actor-critic franework,to solve this MDP with constrained action space.To learn a practically feasible packing policy,we propose three critical designs.First,we propose an online analysis of packing stability based on a novel stacking tree.It attains a high analysis accuracy while reducing the computational complexity from O(N2) to O(N log N),making it especially suited for reinforcement learning training.Second,we propose a decoupled packing policy learning for different dimensions of placement which enables high-resolution spatial discretization and hence high packing precision.Third,we introduce a reward function that dictates the robot to place items in a far-to-near order and therefore simplifies the collision avoidance in movement planning of the robotic arm.Furthermore,we provide a comprehensive discussion on several key implemental issues.The extensive evaluation demonstrates that our learned policy outperforms the state-of-the-art methods significantly and is practically usable for real-world applications.
An increasing number of deep learning methods is being applied to quantify the perception of urban environments,study the relationship between urban appearance and resident safety,and improve urban appearance.Most advanced methods extract image feature re
Participatory sensing is a promising approach with which people contribute sensory information to form a body of knowledge.In practice,people may have different ways to engage in a participatory sensing campaign.For example,there are several possible rout
This study focuses on state-feedback and output-feedback neural learning control problems for discrete-time nonlinear systems in the pure-feedback form.First,an extended result for the exponential stability for a class of discrete-time linear time-varying
This paper proposes an event-triggered robust nonlinear model predictive control (NMPC) frame-work for cyber-physical systems (CPS) in the presence of denial-of-service (DoS) attacks and additive distur-bances.In the framework,a new robustness constraint
This paper addresses cooperative global robust output regulation for heterogeneous and uncertain multiagent nonlinear systems in the output-feedback normal form.Specifically,we develop a Lyapunov-based dynamic output-feedback law using a nonlinear interna
Multichannel adaptive signal detection uses test and training data jointly to form an adaptive detector to determine whether a target exists.The resulting adaptive detectors typically possess constant false alarm rate(CFAR)properties;thus,no additional CF
Hardware security primitives that preserve secrets are playing a crucial role in the Internet-of-Things(IoT)era.Existing physical unclonable function(PUF)instantiations,exploiting static randomness,generate challenge-response pairings(CRPs)to produce uniq
The von Neumann bottleneck and memory wall have posed fundamental limitations in latency and energy consumption of modern computers based on von Neumann architecture.In-memory computing represents a radical shift in the computer architecture that can addr