Learning practically feasible policies for online 3D bin packing

来源 :中国科学：信息科学（英文版） | 被引量 : 0次 | 上传用户：mm1234567mm

【摘要】

：

【作者】

：

Hang ZHAO Chenyang ZHU Xin XU Hui HUANG Kai XU

【机构】

：

School of Computer Science,National University of Defense Technology,Changsha 410073,China;College o

【出处】

：

中国科学：信息科学（英文版）

【发表日期】

：

2022年1期

【关键词】

：

bin packing problem online 3D-BPP reinforcement learning

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

We tackle the online 3D bin packing problem (3D-BPP),a challenging yet practically useful variant of the classical bin packing problem.In this problem,the items are delivered to the agent without informing the full sequence information.The agent must directly pack these items into the target bin stably without changing their arrival order,and no further adjustment is permitted.Online 3D-BPP can be naturally formulated as a Markov decision process (MDP).We adopt deep reinforcement learning,in particular,the on-policy actor-critic franework,to solve this MDP with constrained action space.To learn a practically feasible packing policy,we propose three critical designs.First,we propose an online analysis of packing stability based on a novel stacking tree.It attains a high analysis accuracy while reducing the computational complexity from O(N2) to O(N log N),making it especially suited for reinforcement learning training.Second,we propose a decoupled packing policy learning for different dimensions of placement which enables high-resolution spatial discretization and hence high packing precision.Third,we introduce a reward function that dictates the robot to place items in a far-to-near order and therefore simplifies the collision avoidance in movement planning of the robotic arm.Furthermore,we provide a comprehensive discussion on several key implemental issues.The extensive evaluation demonstrates that our learned policy outperforms the state-of-the-art methods significantly and is practically usable for real-world applications.

其他文献

AR-CNN:an attention ranking network for learning urban perception

An increasing number of deep learning methods is being applied to quantify the perception of urban environments,study the relationship between urban appearance and resident safety,and improve urban appearance.Most advanced methods extract image feature re

期刊

ranking networkurban perceptionattribute learningattention networkcolour and

Truthfully coordinating participation routes in informative participatory sensing

Participatory sensing is a promising approach with which people contribute sensory information to form a body of knowledge.In practice,people may have different ways to engage in a participatory sensing campaign.For example,there are several possible rout

期刊

coordinating participation routesinformative participatory sensingincentive me

Neural learning control for discrete-time nonlinear systems in pure-feedback form

This study focuses on state-feedback and output-feedback neural learning control problems for discrete-time nonlinear systems in the pure-feedback form.First,an extended result for the exponential stability for a class of discrete-time linear time-varying

期刊

discrete-time nonlinear systemspure-feedback systemslearning controlneural ne

Event-triggered robust MPC of nonlinear cyber-physical systems against DoS attacks

This paper proposes an event-triggered robust nonlinear model predictive control (NMPC) frame-work for cyber-physical systems (CPS) in the presence of denial-of-service (DoS) attacks and additive distur-bances.In the framework,a new robustness constraint

期刊

cyber-physical systemsnonlinear model predictive controlevent-triggered mechan

Nonlinear output-feedback tracking in multiagent systems with an unknown leader and directed communi

This paper addresses cooperative global robust output regulation for heterogeneous and uncertain multiagent nonlinear systems in the output-feedback normal form.Specifically,we develop a Lyapunov-based dynamic output-feedback law using a nonlinear interna

期刊

multiagent systemsinternal modelunknown leadersoutput regulationoutput-feedb

坚持问题导向精准施策突出重点严管严控春节期间疫情防控措施落实落细落到位

期刊

Multichannel adaptive signal detection:basic theory and literature review

Multichannel adaptive signal detection uses test and training data jointly to form an adaptive detector to determine whether a target exists.The resulting adaptive detectors typically possess constant false alarm rate(CFAR)properties;thus,no additional CF

期刊

constant false alarm ratemultichannel signalsignal mismatchstatistical distri

Reconfigurable physical unclonable cryptographic primitives based on current-induced nanomagnets swi

Hardware security primitives that preserve secrets are playing a crucial role in the Internet-of-Things(IoT)era.Existing physical unclonable function(PUF)instantiations,exploiting static randomness,generate challenge-response pairings(CRPs)to produce uniq

期刊

reconfigurable physical unclonable functionspin-orbit torquecryptographic prim

探索“融合党建”新模式　推动医疗护理发展新思路

党建工作已经成为公立医院管理的核心工作。推进医院党建融合，有利于加强公立医院党的全面领导，有利于提升党建工作对医疗护理工作的引领力和统筹力，有利于凝聚医院党员和医护人员的向心力、战斗力，有利于更好地提升医疗护理水平和服务质量，推动医院高质量发展。新形势下，探讨如何推进医院党建工作和业务工作的深度融合，以高质量党建推进公立医院高质量发展，已经成为一项重要的课题。

期刊

党建融合公立医院高质量发展

In-memory computing with emerging nonvolatile memory devices

The von Neumann bottleneck and memory wall have posed fundamental limitations in latency and energy consumption of modern computers based on von Neumann architecture.In-memory computing represents a radical shift in the computer architecture that can addr

期刊

in-memory computingvon Neumann bottlenecknonvolatile memoryenergy efficiency

Learning practically feasible policies for online 3D bin packing

与本文相关的学术论文