Multivariate time series simulation in RapidMiner?
           I'm actually "Getting started" with RapidMiner (RM). I'm an R expert but totally newbie to RM. The problem involved is analysing and forecasting the dispersion of multivariate timeseries (in finance). The R codes we have are very difficult for economists. I found RM as a real alternative because of its visual "self-documenting" and simplcity in ETL tasks. To be honest, I fitted some models to my data as an experiment , it was also very expressive. What are the experiences of others? Is it possible to SIMULATE a Series model in RM (or in other visual and simple tool)? The Looping operators were not enaugh to do this because (for example in a VAR1 setup) this kind of model simulation requires peturbating the error terms with some noise that affects the following predictions. To be extreme it would be neccesary to modify the VAR1 parameters (to modfy the base cases, e.g. the point prediction's trajectory). It’s possible in R, but not too intiutive, so It would be great implement a this kind of model in RM.
           
(The problem in short: /multivariate historical data/ -> /few factors/ -> /multivariate timeseries model fitted to these factors with indogeneous and exogeneous variables and lags/ -> /simulated factors/ -> /simulated data/ -> /transform and visualize dispersion of data/)
           
I know that RM is not well suited for this task by design, it is designed for others. Any case: I'm sure that we could use it much more than an (on site) ETL tool.
           
Thanks, regards
          (The problem in short: /multivariate historical data/ -> /few factors/ -> /multivariate timeseries model fitted to these factors with indogeneous and exogeneous variables and lags/ -> /simulated factors/ -> /simulated data/ -> /transform and visualize dispersion of data/)
I know that RM is not well suited for this task by design, it is designed for others. Any case: I'm sure that we could use it much more than an (on site) ETL tool.
Thanks, regards
           Tagged:
          
          
           0
           
          
         
 
          
 Contributor II
Contributor II
Answers
说实话,我没有completely understand what you want to do. What do you mean by "simulating a model"? Can you please be a bit more specific?
Best,
Marius
sorry if I wasn't precise enough.
Let me show an example: an one dimensional AR(1)-process is given by X(t) = constant + A*X(t-1) + error. t is the time index, the constant and A parameters are fitted to data. The error has zero mean and constant variance. For the sake of simplicity assume that the training data is spaced monthly and I want to know the process' one year ahead dispersion (e.g. histogram). In this case I would make many (e.g. ten tausand) 12 step ahead simulation of the process - applying the estimated parameters and error terms drawed with a random number generator - and using the recorded 12th X(t+12) values it is possible to make a histogram. The effect of all former error term realisations are persistent in the trajectory (in one run of the ten tausand) of this process.
当然,我知道在这个简化的情况下,其他e is analytic solution as well but I would like to experiment with more complicated models in a very self explanatory visual way (this for tried RapidMiner).
Hope I was clearer now.
Many thanks,
Peter
as you know, in RapidMiner all standard operators are based on example sets and only work on one row (i.e. example) at a time. So if you want to apply standard methods to time series data, you have to encode the values of several points in time into one example and set the label to a future value (in your case t+12). The Windowing operator from the Series extension can do this for you - no need for loops. The window_size specifies how many example of past data are encoded into each example, and the horizon specifies the amount of time to look ahead (12 in your example).
Does this make sense for your task?
Happy mining!
Marius
thank you replying. The model is already fitted, I'll paste the xml to the end of this post. The next step is harder: I would make many (e.g. ten tausand) 12 step ahead applying the fitted model. I see Predict Series operator would be elegant but I cannot - if I'm not mistaken - use random numbers as error terms.
It may not possible in RM?
Best regards,
Peter
However, you can use the Add Noise operator for both tasks: for noise prior to model learning, add the noise to the label, if you really want to add artificial noise to the predictions, do it after application of the Regression model. Which brings us to the next topic:
Simply pass the output of the regression operator to an Apply Model operator. Additionally pass in the test data to that operator. The output will contain the original data plus a prediction attribute with the values estimated by the model.
To know if your chosen learning algorithm is suited for the data, you should evaluate it with the X-Validation. Additionally, for regression tasks I like to visualize the outcome of the model by plotting the prediction versus the true label. If you don't have a dedicated test set, use X-Prediction to avoid applying the model to the training data.
Happy Mining!
Marius
I'll do some experiment on this weekend, thanks,
regards,
Peter