Support Vector Machine Process for Text Mining
I am trying to set up a process model that will classify a dataset of movie reviews into two classes, negative and positive using support vector machine. I created this model which uses SVM and I split the dataset into a "training" set with 700 text reviews (both positive and negative) and a "test" set with 300 reviews (positive and negative). Whenever I run the model I get this error about three quarters of the way through. I tried adding a "stopwords" dictionary to solve the error, but the model seemed to hate every other word it came across. Can someone help me shed some light on this? I have attached the saved model.

           Tagged:
          
          
           0
           
          
         
 
          
 Contributor I
Contributor I
Answers
You need to save your wordlist from your training data and then apply that when you process any new reviews later (using the Wordlist input in the Process Documents operator). The error you are getting is because your wordlist is different in the new data you are trying to score, so attributes that are in the model you saved from the original wordlist are not present.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts