Text Tokenization Using Regular Expression For Text Mining
           Hello,
I have a problem and i need your help, please.
I want to tokenize a unstructured document using regular expression. I have a text file where each rows include a sentence such as:
           
1.String1 String2 String3 String4 String5
2.String6 - String7 - -
...
n.String8 - String9 String10 - (assume string2 and string5 dont exist.)
           
What I exactly want to do is that tokenization will extract each word and give the results in a table in Excel format such as:
           
           
S1 S2 S3 S4 S5
1. String1 String2 String3 String4 String5
2. String6 - String7 - -
3.
..
n. String8 - String9 String10 -
           
           
which operators and and which regular expression structure can i use in Rapid Miner?
Thank you for your help in advance.
           
          
          I have a problem and i need your help, please.
I want to tokenize a unstructured document using regular expression. I have a text file where each rows include a sentence such as:
1.String1 String2 String3 String4 String5
2.String6 - String7 - -
...
n.String8 - String9 String10 - (assume string2 and string5 dont exist.)
What I exactly want to do is that tokenization will extract each word and give the results in a table in Excel format such as:
S1 S2 S3 S4 S5
1. String1 String2 String3 String4 String5
2. String6 - String7 - -
3.
..
n. String8 - String9 String10 -
which operators and and which regular expression structure can i use in Rapid Miner?
Thank you for your help in advance.
           Tagged:
          
          
           0
           
          
         
 
          
 Contributor I
Contributor I
Answers
Best regards,
Marius