"Controlling loops (break/continue)"
           Hi,
           
I experienced that the clear view of a process can quickly be lost if there are some nested loops or branches. In some cases I would have been happy if the "Branch" operator was a simple one instead of a super operator, delivering the input data either to a 'then' or an 'else' output port. This way you wouldn't have the chance of combining or delivering modified input data for each port, but in most cases this wasn't necessary for me. This could be a simple and clear switch for a different flow of data dependent on the condition and perhaps an alternative to the usual "Branch" operator for simple decisions. But this was just a thought coming to my mind a few times during process design...
           
My actual question is something different: I don't know how the different loops are internally translated into Java code, but they should make use of one of the language's standard methods, I guess. Is there a way of controlling loops in RapidMiner by calls as they are possible in Java (continue/break)? Or does this conflict with the process structure of RapidMiner? Before trying to add some (hopefully simple) operators for these tasks I wanted to make sure if it's possible at all.
Maybe you also have some alternative suggestions. In the current case I am using "Loop examples" on a list of URLs, retrieve each page via "Get Page" and then follows some information extraction. I already had to add one "Branch" after the "Get Page" to avoid that the process fails if a single page wasn't retrieved properly (due to connection problems or something else). Now there are some cases that make the following XPath interpreter abort the process due to invalid XHTML code. In this case the page doesn't contain useful information and the the current loop iteration can stop at this point. Instead of using another super operator and putting the major part of the process inside it, I would prefer a simple single operator or something similar to skip/stop the iteration without any result. If I am just eliminating the error sources (as I did for now) this results in mostly empty examples that have to be filtered out later.
I hope my idea and question becomes understandable, but perhaps I am just thinking into the wrong direction and someone wants to point me towards a proper solution 
           
           
Best regards,
Matthias
           
P.S. The only related question I found was in an older topic (http://rapid-i.com/rapidforum/index.php/topic,892.0.html)which didn't provide a real solution.
          I experienced that the clear view of a process can quickly be lost if there are some nested loops or branches. In some cases I would have been happy if the "Branch" operator was a simple one instead of a super operator, delivering the input data either to a 'then' or an 'else' output port. This way you wouldn't have the chance of combining or delivering modified input data for each port, but in most cases this wasn't necessary for me. This could be a simple and clear switch for a different flow of data dependent on the condition and perhaps an alternative to the usual "Branch" operator for simple decisions. But this was just a thought coming to my mind a few times during process design...
My actual question is something different: I don't know how the different loops are internally translated into Java code, but they should make use of one of the language's standard methods, I guess. Is there a way of controlling loops in RapidMiner by calls as they are possible in Java (continue/break)? Or does this conflict with the process structure of RapidMiner? Before trying to add some (hopefully simple) operators for these tasks I wanted to make sure if it's possible at all.
Maybe you also have some alternative suggestions. In the current case I am using "Loop examples" on a list of URLs, retrieve each page via "Get Page" and then follows some information extraction. I already had to add one "Branch" after the "Get Page" to avoid that the process fails if a single page wasn't retrieved properly (due to connection problems or something else). Now there are some cases that make the following XPath interpreter abort the process due to invalid XHTML code. In this case the page doesn't contain useful information and the the current loop iteration can stop at this point. Instead of using another super operator and putting the major part of the process inside it, I would prefer a simple single operator or something similar to skip/stop the iteration without any result. If I am just eliminating the error sources (as I did for now) this results in mostly empty examples that have to be filtered out later.
I hope my idea and question becomes understandable, but perhaps I am just thinking into the wrong direction and someone wants to point me towards a proper solution
 
           Best regards,
Matthias
P.S. The only related question I found was in an older topic (http://rapid-i.com/rapidforum/index.php/topic,892.0.html)which didn't provide a real solution.
           Tagged:
          
          
           0
           
          
         
 
          
 Maven
Maven
Answers
well, there's this nice operator "Handle Exception" which will catch the exception of it's inner operators and then let the loop continue to work. All you have to to is to ensure the outputs are always suitable for the following operators.
Of course the Java Controll Structures are used for implementing the loop. But it isn't as simple to trap out of this loops as writing a script with "break". This won't work, you are in a completely different class's method then!
Coincidently we discussed this problem here and I think the better place to solve this hole issue is to change the behavior of the getPage operator.
What do you think?
Greetings,
Sebastian