“使用跨越距离两个文档相似性”
- 我正在使用rapidminer比较similarity between two text fields in two sheets in same excel file using cross distance, as i want to compart one request will all referernce to return the similarity value by cosine similarity, the problem is the distance returns as question mark '?' without knowing the reason
<参数键= value =“transform_to瞧wer case"/>
<参数键= "目录" value = " /用户/电脑/ Downloads/WordNet-3.0/dict"/>
< portSpacing port="source_document" spacing="0"/>
< portSpacing port="sink_document 1" spacing="0"/>
< portSpacing port="sink_document 2" spacing="0"/>
<参数键= value =“transform_to瞧wer case"/>
<参数键= "目录" value = " /用户/电脑/ Downloads/WordNet-3.0/dict"/>
< portSpacing port="source_document" spacing="0"/>
< portSpacing port="sink_document 1" spacing="0"/>
< portSpacing port="sink_document 2" spacing="0"/>
< portSpacing port="source_input 1" spacing="0"/>
< portSpacing port="sink_result 1" spacing="0"/>
< portSpacing port="sink_result 2" spacing="0"/>Read Requirements Document Read Requirements Change Requests 



Best Answers
- 
         lionelderkrikor
           Moderator, RapidMiner Certified Analyst, MemberPosts:1,195 lionelderkrikor
           Moderator, RapidMiner Certified Analyst, MemberPosts:1,195 Unicorn UnicornHi@asafwat, I think I found elements of answers (now calculated distances/similarities have numerical values) : In the documentation of theCross-Distancesoperator it is said that : "Please note that both input ExampleSets should havethe same attributesand in the same order". So you have to use aSuperset(cf documentation of this operator) operator to feed thereqandrefports of theCross-Distancesoperator with 2 datasets which have strictly the same attributes. Moreover, I made some modifications in your process : - in theProcess Documents from Dataoperators : vector creation ->Term Occurences. - in theTokenize运营商:模式- >non letters. The process : 
 <参数键= "目录" value = " /用户/电脑/ Downloads/WordNet-3.0/dict"/>
 < portSpacing port="source_document" spacing="0"/>
 < portSpacing port="sink_document 1" spacing="0"/>
 < portSpacing port="sink_document 2" spacing="0"/>
 <参数键= "目录" value = " /用户/电脑/ Downloads/WordNet-3.0/dict"/>
 < portSpacing port="source_document" spacing="0"/>
 < portSpacing port="sink_document 1" spacing="0"/>
 < portSpacing port="sink_document 2" spacing="0"/>
 < portSpacing port="source_input 1" spacing="0"/>
 < portSpacing port="sink_result 1" spacing="0"/>
 < portSpacing port="sink_result 2" spacing="0"/>Read Requirements Document Read Requirements Change Requests I hope it helps, Regards, Lionel 0
- 
         lionelderkrikor
           Moderator, RapidMiner Certified Analyst, MemberPosts:1,195 lionelderkrikor
           Moderator, RapidMiner Certified Analyst, MemberPosts:1,195 Unicorn UnicornHi (one more time ...)@asafwat, Just a (last ?) little advice, you don't need to specify that an attribute is "regular" in theSet Roleoperator : By default, RapidMiner set automatically an attribute as "regular"... Regards, Lionel 1

 
          
 Contributor I
Contributor I
Answers
Hi@asafwat,
Are your attributes "numerical" ?
Can you share your dataset(s) in order we can reproduce what you observe ?
Regards,
Lionel
Sure, here is it, i have changed it to csv in order to attach it
Hi again@asafwat,
I have difficulties with your CSV file, can you send me your original Excel file by :
- zipping it, then, attaching it to this post
- sending your Excel file on Google Drive and then copy and share the link here in the forum
Regards,
Lionel
再次你好(再一次)@asafwat,
Can you send me your Wordnet dictionnary too (by zipping it for example).
Regards,
Lionel
@lionelderkrikorwooow it works, great efforts, really you made my day. much apperciated
Thanks a lot