"getting started with text-mining"
 Legacy User
          MemberPosts:0
Legacy User
          MemberPosts:0 Newbie
Newbie
         
           Hi,
           
I want to take a look at the text-mining part of the rapid miner. I am evaluating some products as part of a case study about text-mining.
           
So what I want to do is:
           
谢谢!
           
Benjamin
          I want to take a look at the text-mining part of the rapid miner. I am evaluating some products as part of a case study about text-mining.
So what I want to do is:
- to get data input from file/web (I did this with the input and the crawler)
- to get the feature extraction running-> I can excract here some entities, right?
- to get the english stop word filter running
谢谢!
Benjamin
           Tagged:
          
          
           0
           
          
         
 
          
Answers
Many operators like classification and regression methods or the PerformancEvaluator require the input example sets to have a label or class attribute. If this not the case, applying these operators is pointless. If you read the data using an ExampleSource, you can specify the label attribute by using a 'label' tag in the attribute description file.
any suggestions
I must admit I do not completely understand yet, what you are trying to do. Could you please post the XML representation of your RapidMiner process.
And btw.: the error you mention occurs, when there is no label defined but you try to do supervised learning.
Regards,
Tobias
We´re doing a study about text-mining software at my university and we try to compare them in a kind of way. Means that we´re looking at three different programs and we want also to take a look in a "free" program like the Rapid miner. For that I want to determine what the Rapid Miner is capable of.
So let´s see the technical part.
I installed the program and selected in the wizard the feature selection. I took as an Input file a .txt file with some random text in it copied out of wikipedia. As output file I saved it with a random name. So if I try to run that I get the error with the learner. I don´t know if I´m missing anything but the program is capable of getting the text from the file. I´m now under Linux and will post later how I did it. But perhaps you have some suggestions already.
Regards,
Tobias
Ich bekomm den einfachen Durchlauf von der Feature Selection nicht hin. Ich würde gerne einfach mal die Durchlaufen lassen können um zu sehen, wozu sie fähig ist. Da ich es wahrscheinlich einfach falsch bediene, hab ich mal ein paar Screenshots gemacht dazu.
Vorgehen:Zuerst die Datei, die ich ihm zum einlesen gebe. Ist was ganz einfaches, 10 Wörter, jeweils durch Leerzeichen getrennt.
Dann starte ich den Wizard und wähle Feature Selection. Bei dem Fenster Make Settings wähle ich jetztStart Configuration Wizard. Dort lese ich die Datei ein und wähleuse first row Column names. Bei dem nächsten Fenster, belasse ich die attribute Value Types mitnominal. Siehe hier:
Im nächsten Fenster wähle ich dann wieder nichts neues aus, bzw. Ich hab auch schon versucht ein Element als Label zu definieren, hat aber nix gebracht.
Als Filename für die Attribute Datei wähle ich einfach einenxbeliebigen und speicher das in die Datei. Danach sieht mein Baum so aus:
Das passende XML ist:我要是das ganze jetzt mal durchlaufen lasse, bekomme ich das hier:
Der Fehler sagt ja was darüber aus, dass Label fehlen, aber wär cool, wenn ihr mir da genaueres Feedback geben könntet, da ich Testweise größere Datenmengen einlesen und die auswerten will.
Danke auf jedenfall für das bisherige Feedback
ok, thanks for the detailed description of your problem. Now I understand what you do ... wrong!
The general problem why your process does not work is properly explained by the error message you observe. You simply did not define a label. This can be either done in the wizard where you are asked to specify special attributes. Alternatively, you can wait until you finished the wizard and then put a [tt]ChangeAttributeRole[/tt] operator in your operator tree between the [tt]ExampleSource[/tt] operator and the [tt]FeatureSelection[/tt] operator. You mentioned that the first way did not solve your problem. To be honest, I doubt that this resulted in the same error. So please try again or use the second way.
Another remark on your so-called test data. As I tried to tell you in the previous posts, normally data in the scope of text mining is not just simply put into an example set (in nominal attributes) to apply a learner on that data afterwards. You should hence not use the [tt]ExampleSource[/tt] operator but an input operator from the text plugin.
If you only want to see the feature selection in action, I would recommend to apply it on a simplenormaldata set, e.g. some data that come in the samples directory with RapidMiner.
Regards,
Tobias