How to divide ZIP Code into a cluster analysis?
Hello,
sorry for my simple question, but i work not so long with rapidminer and i need it for education. I have a simple case but i do not right solve the problem: I have a dataset of 100.000 Zip Code and Customers numbers and want to analyse the best selling areas in my country. So i decided to use the cluster analyse. The ZIP Code in Germany is from 00001 to 99999 and i want to build clusters for example 00001 to 00500 and for example 70000 to 75000.
My question: How can i tell rapidminer how they build the cluster by this range?
Many many thanks for help.
           Tagged:
          
          
           0
           
          
         
 
          
 Contributor I
Contributor I
Answers
Hi@a_trunk
You can try to use theSplit Dataoperator to create some partitions of your data, like in this process :
I hope it helps,
Regards,
Lionel
You might also want to create a new attribute (using Generate Attributes) that corresponds to some higher level groupings of postal codes. Using the prefix function, you can create aggregated groups at the 1 digit level, the 2 digit level, etc. These can then be made available to the clustering algorithm rather than the raw zip code. The problem with the raw zip code is that RapidMiner has no idea it is a hierarchical relationship---it just interprets it as a set of distinct nominal values.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts