RapidMiner克服了时间序列的计算需求

对于一个以可靠的服务和快速的产品交付为荣的组织来说，预测整个供应链的需求是至关重要的。了解Domino的数据科乐鱼平台进入学团队如何应对这一挑战，并通过复杂的时间序列预测练习——从原型到交付——发现了一种扩展基于r的时间序列模型的创新方法，以减少错误和更快的运行时间。

观看下面的完整视频，了解Dominos的数据科学团队如何使用RapidMiner通过可扩乐鱼平台进入展的时间序列预测和缩放的基于r的模型来改善供应链。

00:04好。讲一点关于我的事。我是达美乐的数据科学经理。乐鱼平台进入我是第一次参加智慧会议。所以很明显，我同意在没有任何背景的情况下陈述我们在这里讨论的内容。我以前的一位导师曾对我说:“要害怕那些想要麦克风的人。”很显然，我今天就是那个人。之后，私下里，有人告诉我我做得怎么样，好吗?关于我就说这么多。高水平的达美乐，我们是美国市场占有率第一的披萨公司。 The road to number one is filled with technology innovations that I think probably many of you are familiar with. You will have potentially seen some of the marketing around our mobile app. So there were times when the mobile app can do voice recognition and image recognition. We had a project where we would allow a loyalty customer to take a picture of any pizza and earn 10 loyalty points. So that’s the kind of innovation we’re doing. It could have been a picture of a dog toy, and it would still– a pizza-shaped dog toy, and it would still work. So it’s with that kind of focus on innovation that I come to talk to you today about supply chain demand forecasting. Fun.

01:23所以我为我的客户，也就是供应链，做项目的目标是提供高度准确、高度可扩展的需求预测。问题是整个生态系统共享资源，而生态系统正在迅速扩张。乐鱼体育安装我们很多人都在用固定的资源做数据科学。乐鱼平台进入乐鱼体育安装所以我要给你们讲的解决方案是使我使用的时间序列预测工具具有可扩展性，并创造性地思考，以保持较小的占用空间，从而不影响我的一些同行。所以，我们讨论的核心是商店库存生命周期，对吧?一切都始于那些饥肠辘辘的顾客，他们点了餐，耗尽了商店里的库存。它会流向商店经营者，他们会在一天结束时清点存货。他们通过在线工具订购库存补充。我们的供应链系统随后会出现，它们会完成库存请求，为商店的库存重新进货。正是这个商店操作员的过程给我们提供了大量的数据供我们分析。 So that’s where we’ll be mining insights.

02:37所以我们的目标是，高度准确，高度可扩展的需求预测。一个简单的例子。我不能给任何人用来逆向工程我的东西的信息。这是奶酪需求量和磅数的图，没有坐标轴，也没有日期。蓝线显然代表了这一系列商店的奶酪需求历史，而红色虚线是我们的预测。你可以看到一些重要的数据点，它们用灰色条表示。我想确切地告诉你们这是什么意思，但这可能是一个重要的日历事件。一年中的某些日子，人们会点更多的披萨，这也可能是全国性的促销。那么我们为什么要这样做呢?商业价值来自于我们从预测中得到的东西，这样我们就可以提前通知供应商即将到来的需求增长。 Nobody likes to be shocked with large demand and have to figure out where to source the product from, so we give our suppliers heads up. It gives us the option to reduce food waste. So in the stores and in the supply chain centers, let’s optimize against food waste. And lastly, we can scale demand to meet the demand, right? So if it’s going to be a lower volume week, then maybe we don’t need as many folks producing dough.

03:50所以这就是商业价值的来源。你们都知道这对销售C层套件有多重要，如何让你的产品运行起来。那么我们要怎么解决这个问题呢?我们在达美乐有很多可用的资源。乐鱼体育安装我所在的团队大约有50人。我不可能管理全部50个。我只能应付5个。但该团队中有许多人拥有高等学位。顺便说一下，我在这张幻灯片上的观点是，我们有很多方法可以解决这个问题，我只给你们看我们做过的那个。我们有化学、计算机科学、应用统计学和电气工程的高级学位。 One guy has three masters, one of which is nuclear science, some talented people. And then we have a comprehensive text act, which kind of touches on the user’s desktop environment, an AI/ML server side environment where you can run RapidMiner, JupyterHub, RStudio. We’ve got a couple of Nvidia GPU servers in our database at the bottom there with SQL and Hadoop. Most importantly, the RapidMiner SAC. And this is going to be principal to a number of the techniques I talk about. We have three queues, so if you have used RapidMiner server, you know what a queue is. We have three of them kind of named after who pays for it. But I have access to use any of them when I need to. And each queue has 2 machines underneath with 40 cores on the data science queue, 40 on the marketing, and 80 on the memory queue. So these are my tools.

05:16我们开始的原型，不是让你读每一个进程，而是查询SQL Server数据库，接收模型所需的输入。我马上就会解释。将数据传递到模型中，运行模型拟合和预测，然后将结果写在他们需要去的地方，这是一些下游生产系统。如果你是程序员，请举手。也许你应该坐在前面。R-script，我只会高层次地讲一下，因为不是每个人都是程序员。这里的想法是从数据库中接收三种信息——顺便说一下，我应该说，我们使用的是Facebook的开源时间序列预测工具Prophet。所以Prophet需要很多输入。RapidMiner接收到的SQL查询将示例集传递到预测函数。我们将其过滤到单个SKU供应链中心组合。 So think Michigan cheese or Georgia pepperoni filtering down to just one thing. We run fit in forecast, and then we wrap that whole thing up with a parallel process, and ours do parallel package, so we can do 16 scenarios concurrently.

06:32所以在增强的时间轴上，这个东西正在生产中，所以它挺过了一些数据科学项目遇到的巨大障碍。乐鱼平台进入Ingo今早提到只有不到1%的项目最终投入生产。我是这样到达那里的。RapidMiner并不是每一块都不可或缺的。我将集中讨论它所处的位置，只是对每个里程碑的意义进行一些高层次的思考。因此，首先，我们在单个VM上启动我们的原型，回想一下RapidMiner架构。我们要求它做200次预测，结果花了8个小时。有人对两个200个预报满意吗?我们首先要做的是看看为什么要花这么长时间，大部分时间都是简单地从数据库中检索数据。数据工程解决了这个问题。 We got down to one VM, 200 forecasts in 15 minutes. That’s a lot more interesting. So then the business said, “Great, you’re getting some performance from runtime perspective. How about model performance?” So we took the original model, and I’ll get into this in a minute. We did some grid search in Bayesian optimization to replace the default, the Facebook profit defaults. That took our MAPE from 6.5% to 6.23%, so we got a nice little boost from simply training hyperparameters. And the business said, “Hey, this is great. Okay. You’ve been doing a pilot set of inventory items. Let’s do them all.” That meant a 20X increase in terms of what they’re asking the workload to do. So my runtime went to a now regrettable, one VM, 4,000 forecasts, eight hours again. We’re back to eight hours, and the data footprint was over 150 gigabytes on the disk. So also, not good because my database is limited in size and I need to shrink the footprint.

08:25回到数据工程。我们使用聚集比较存储索引。如果你不知道那是什么，不要担心。结果是我们的数据空间缩小到5gb。所以我们解决了规模扩张的问题，但我们仍然需要8小时的运行时间。那么，除了让它跑得更快之外，你还能做什么呢?你可以要求它早点开始。我们的大部分工作，都被安排在凌晨四点开始或者我们预计没人的时候。我构建了一个小的RapidMiner进程，使其基于事件。它只是检查。 Are all predecessors done? Are all predecessors done? And the second they are, then it kicks off. So I saved myself 15, 20 minutes. Big win there. And then I’ll end on two things that I haven’t done yet, but we’re soon going to do, which will get us down to where we’re going. And that’s we’re going to use all six of the VMs. We’re going to do 4,000 forecasts in 27 minutes. Remember where we started. 200 forecasts, eight hours, so we’re much faster and a huge volume more of forecasts to do.

09:30所以现在我要把你的注意力集中在RapidMiner在解决方案中不可或缺的地方。首先，我们想要调整这些超参数，以便业务能够适应其准确性。我用你们之前看到的函数，参数化它，对吧?我只是说，“让我们允许默认变量移动一个shift，我们将通过所谓的随机网格搜索场景列表进行测试。”如果我在单个VM上运行，它将花费60个小时，我不想等待60个小时。我想明天看到结果。这就是我用RapidMiner来做我想做的，也就是并行并行处理。我就是这么叫的。这里的想法是，我们有一个循环，一个子进程和六个调度进程，它们简单地指向我们有两次的RapidMiner队列。分配的方式是，当监听器得到第一个任务时，它将任务发送给第一个机器，几毫秒后，第二个任务到达监听器，它将任务发送给另一个机器。 So I’m taking my workload from running on one machine to now splitting it across six. So this is a little trick there with schedule process.

10:46一旦网格搜索完成，我们将网格搜索结果作为贝叶斯优化的种子。同样，我们所做的就是使用现有的函数将它的某些部分参数化然后调用r -包，也就是我们的r -贝叶斯优化，它平衡了根据参数找到热点的需求和搜索未搜索区域的需求。我在这里停一下。昨天有谁去参加黑客马拉松了吗?我从中得到的一点就是低代码可能比这么多代码更好。我的作业之一就是弄清楚如何在原生RapidMiner函数中做这些事情。那么我们如何使用网格搜索加贝叶斯优化呢?我已经知道答案了。MAPE从6.5%提高到6.23%。我们用我们的黑客来安排跨机器的子进程，结果不是60小时，而是10小时。 So the next day, I had the results ready for analysis. And for any of you who are sitting in the front row, you might be able to read the grid on the right, which is nothing more than a list of all the scenarios we tested iterating over those default parameters. And you’ll see that the R-Bayesian optimization did a pretty good job at finding the hot spots out there.

12:11那么回到我的小胜利，基于事件的处理，对吗?所以我不想在4点开始的时候发现不是所有的前辈在4点前都完成了所以我用的是不完整的数据。我也不想让我的工作在4点结束而之前所有的工作都应该在凌晨3点前完成。所以，我有点像黑了RapidMiner来搜索一个表示“一切都完成了”的令牌。你现在就可以开始了。”所以现在我提前15分钟跑步，同样，15分钟之类的。只是顶部基于事件的触发器的快速快照。它只是一个循环，表示我要做这个测试多少次?我用的60只基于经验证据。然后这里有一个子进程，如果运行超过60次，它将抛出一个错误并给我发送一封电子邮件。 So I’ve got noticed that things didn’t perform the way they should. At the bottom, I have a SQL query that looks for the token I’m looking for that says everything is done with a time stamp on it, and I wrap that up with Extract Performance, one of the native operators, and I say, “Is this binary condition met or not? No. Exit.” Trip everything else down line. So what’s next, right? One VM, 200 forecasts, eight hours. This part’s in grey. I haven’t done it yet. I’m going to do it in the next week or two. The idea is to take this process the same way that I handled the hyperparameters, the grid searching, and to split it into six mutually exclusive pieces and hammer each of the VMs. So not everyone’s going to think throwing more cores at it is the sexiest solution, but that’s how I’m going to do it right now.

13:54最后一件事。所以我们使用Facebook的盈利模式进行时间序列预测。如果你仔细阅读了相关条目了解它们的进展情况，很可能会在短期内发布你可以传递这个函数，一个真/假的声明，“我想让你做蒙特卡洛不确定性抽样”最后在图上画出不确定度区间是一项昂贵的计算。此时不需要不确定区间。我喜欢关掉它。我可以下载源代码，然后注释掉这一部分，但我不想让它维护代码。我记得之前有人在一个主题会议上说维护代码不是最有趣的事情。所以我将等待Facebook推出新的，你可以简单地通过一个假的进行蒙特卡洛模拟。最重要的是你从1.3小时的运行时间变成了27分钟。 So that’s where the gains and run times come from. Lastly, Michael from Forester said something about using optimization as a skill set on top of prediction, right? It’s a complementary skill set. That’s where we’re going next. Optimization problems really is the call-out there. And what was it I wanted to say that– saying that he a funny comment this morning. Use math to spend cash, something like that. That’s what we’re going to do there.

15:25所以我想结束，给一些时间来回答关于RapidMiner是如何帮助我的问题。这是一个低代码的界面，对吧?如果你做对了，你不需要任何代码。如果你按照我在这里做的方法来做，你仍然可以得到快速的开发和快速的测试。显然，我们与脚本语言集成在一起。它非常适合跨系统进行协调。我们有服务器，所以我们做所有的服务器端。我不用把我的笔记本电脑绑上几个小时。这都是在服务器端并行执行完成的，对吧?最后一件事是我发布的基于事件的黑客，以使我的工作在今天早些时候开始。 So that’s how we achieve the goal. Highly accurate, highly scalable demand forecasts with the problem of shared resources are limited. And my peer and partner data scientists are spinning up their own projects and gobbling up all those resources as we speak. So the solution is creative thinking to keep the footprint small. I went faster than I planned, so.

克服了时间序列的计算需求

利用RapidMiner进行基于r的需求预测

以高度准确、高度可扩展的需求预测来改善供应链

相关资源乐鱼体育安装

克服了时间序列的计算需求

利用RapidMiner进行基于r的需求预测

以高度准确、高度可扩展的需求预测来改善供应链

相关资源乐鱼体育安装

NCP如何重塑客户分析:在COVID-19期间提高RapidMiner的参与度

克服了时间序列的计算需求

寻找故事:一家全球创意机构如何利用数据科学乐鱼平台进入

弥合差距:用数据科学衡量和增强完整性乐鱼平台进入