信息過(guò)濾information filtering綜述_第1頁(yè)
已閱讀1頁(yè),還剩35頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、信息過(guò)濾(Information Filtering,IF)綜述,中科院計(jì)算所軟件室 王斌wangbin@ict.ac.cn2001.12.10,主要內(nèi)容,IF的基本概念I(lǐng)F系統(tǒng)的分類(lèi)IF系統(tǒng)的組成IF系統(tǒng)的評(píng)估IF的現(xiàn)狀及發(fā)展趨勢(shì),一、基本概念,,定義,IF定義:從動(dòng)態(tài)的信息流中將滿足用戶興趣的信息挑選出來(lái),用戶的興趣一般在較長(zhǎng)一段時(shí)間內(nèi)不會(huì)改變(靜態(tài))。Selective Dissemination of Info

2、rmation(SDI),來(lái)自圖書(shū)館領(lǐng)域。Routing,來(lái)自Message Understanding。Current Awareness, Data Mining,IF vs IR/分類(lèi)/IE,IF&IR:廣義地講,IF是IR的一部分Database動(dòng)態(tài),需求靜態(tài);Database靜態(tài),需求靜態(tài)User Profile vs QueryIF用戶要對(duì)系統(tǒng)有所了解,IR不需要。IF要涉及到用戶建模/個(gè)人隱私等社會(huì)問(wèn)題

3、IF&CategorizationCategorization中的Category不會(huì)經(jīng)常改變。相對(duì)而言,User Profile會(huì)動(dòng)態(tài)變化IF&IEIF關(guān)心相關(guān)性,IE只關(guān)心抽取的那些部分,不管相關(guān)性,IF applications,Internet Search Results FilterPersonal Email FilterList Server/Newsgroup FilterBrowser

4、FilterFilter for childrenFilter for customers: recommendation,二、IF分類(lèi)體系,,IF分類(lèi)示意圖,Initiative of operation,Active IF systemsCollect and send relevant info to usersPush to usersInfo overload, so make accurate user profi

5、lePassive IF systemsNot collect info for usersEmail or Usenet news,Location of operation,At the info sourcePost profiles to info providerClipping serviceUsually pay feeAt a filtering serverInfo provider send info

6、 to serverServe distributed info to usersAt the user siteLocal filtering systemSuch as outlook & Netscape Email & Foxmail,Filtering approach,Cognitive filteringContent-based filteringDocument content vs use

7、r profilesSociological filteringCollaborative filtering, or properties-based filteringSimilarity between usersRecommendation systemsUser modeling & User clusteringComplement for content-based systems,Methods of

8、 acquiring knowledge about users,Explicit approachUser interrogationFilling formsImplicit approachRecording user behaviorTime/times/context/activity(save/discard/print/browsing/click)/etc.Explicit & Implicit ap

9、proachDocument space (case-based)Stereotypic inference(predefined default profile,then change during scanning),三、IF系統(tǒng)的組成,,一般組成,(d)LearningComponent,User,InformationProvider,(b)FilteringComponent,(a)DataAnalyzer

10、Component,(c)User-ModelComponent,,,,,,,,updates,feedback,relevantdata items,represented data items,data items,personal details,userprofile,Data-analyzer component,Be close to the info providerObtain or collect data

11、from the info providerAnalyze & represent documents(such as Boolean Model, VSM, etc)Pass the representation to the filtering component,User-model component,Gather info about users(explicitly and/or implicitly)Cons

12、truct the user profiles or other user models(rules, VSM, documents center) Pass the user models to the filtering componentUser models must be suitable for the document representation,Filtering component,The heart of th

13、e IF systemMatch the user profiles with the represented data itemsDecision may be binary or probabilistic (ordered by rank) The selected items’ relevancy can be determined by the userThe relevancy info can be sent t

14、o the learning component (feedback info),Learning component,To improve further filteringDetect shifts in users’ interestsUpdate the user-model,Two concepts used in IF systems,System based on the statistical conceptSys

15、tem based on the knowledge-based concept,Statistical concept,User-model component:Profile is a weighted-vector of index terms(such as: VSM, LSI)Filtering componentCorrelation, Cosine measureRobertson&Sparck-Jones

16、 formula (PRM)(naïve) Bayesian classifierLearning componentFeedback, query reconstruction(such as: Rocchio),Knowledge-based concept,Rule-based and Semantic-nets filtering systems:Rule (if .. Then take action), o

17、bsolescence problemUser profile represents by semantic-net (wordnet)Neural-network filtering systemsGenetic-based filtering systems,User modeling for IF systems,Acquisition of the data for the modelImplicit approach:

18、 observation of user behaviorExplicit approach: fill forms, interact (feedback)Data included in the modelShallow semantics: keywordsEnhanced user model, high level knowledge about the user(background past experience)

19、Semantic networks/Stereotypic inference/Statistical inference on the relationship between words in docsUnderlying ArchitectureAgent/neural networks for auto inferred modelVSM/LSI for explicit inferenceConcept model

20、for intelligent systemsKeyword system for statistically-based systems,Learning in IF systems,Methods of LearningLearning by observationLearning by feedbackUser-training learningFrequency of learningCritical learnin

21、gPeriodic learning,四、IF系統(tǒng)的評(píng)估,Methods & Measures,Evaluation methods of IF systems,Evaluation by ExperimentsEvaluation by Simulation: such as TRECAnalytical Evaluation,Measures of evaluation of IF systems,Simple Pre

22、cision & RecallStatistical MeasurementsCorrelation(User evaluation vs. System evaluation): Rank vectorSet-based MeasurementsUtility=(A*R+)+(B*N+)+(C*R-)+(D*N-), NormalizeASP(average set precision)=P*R, if P or R

23、=0, ASP is not suitableUser-oriented MeasuresCoverage Ratio=|Rk|/|U|=|A∩U|/|U|, Rk is the number of documents known to the userNovelty=|Ru|/(|Ru|+|Rk|),五、IF的現(xiàn)狀及發(fā)展趨勢(shì),,Current situation,IF system is indispensableBut IF

24、 system is unreliableCommercial IF system’s relevancy is about 50%Results of the TREC experiments are poorUser prefers to read non-relevant info, fear the loss of important infoStill many things to do to improve the

25、effectiveness of IF systems,User modeling,Integrate several methods to model the users(Not only keywords, but also property of users and other parameters)Profile updating & updating timeInclude a learning moduleQu

26、eries formulation and tracking their changes over time,Filtering techniques,Goal: get more relevant docs, although get some non-relevant docsCombining several methodsResearch directions:Intelligent agents: decentraliz

27、ed, based on trust,evolve, compete & collaborateVisualization techniques: mapVariety of multiple implicit resources on user behavior: open profiling standardFiltering of multimedia repositories:VOD, not text-based

28、Others: such multilingual filtering,Evaluation standardization,Analytical evaluation: formalismTREC filtering trackDiagnostic simulated evaluation,Technical and infrastructure architecture,Simple and Object-orientedD

29、istributed and client-server basedRobust and SecureArchitecture-neutralPortable and ScalableHigh-performanceMultithread and MultitaskingDynamic,Commercial IF systems(1),,Commercial IF systems(2),,References,URI HAN

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫(kù)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論