Christian BöhmUniversity for Health Informatics and TechnologyPowerful Database Primitivesto Support High Performance Data MiningTutorial, IEEE Int. Conf. on Data Mining, Dec/09/2002Motivation2120Christian BöhmƒƒƒƒƒƒƒƒƒHigh Performance Data Mining Marketing Fraud Detection CRM Online Scoring OLAPFast decisions require knowledge just in time3120Previous Approaches to Fast Data MiningSamplingApproximations (grid) Loss of qualityDimensionality reduct.Expensive & complexParallelismAll approaches combinable with DB primitivesKDD appl. get parallelism for free4120Christian Böhm Christian BöhmFeature Based Similarity5120Simple Similarity Queries• Specify query object and- Find similar objects – range query- Find the k most similar objects – nearest neighbor q.6120Christian Böhm Christian BöhmÎÎMultidimensional Index Structure (R-Tree)Directory Page: Data Page: Rectangle , Address1 1Point : x , x , x , ...1 11 12 132 2Point : x , x , x , ...2 21 22 23Rectangle , Address3 3Point : x , x , x , ...3 31 32 334 47120Similarity – Range Queries• Given: Query point qMaximum distance ε• Formal definition:• Cardinality of the result set isdifficult to control:ε too small no results8 ε too large complete DB120Christian Böhm Christian BöhmIndex Based Processing of Range Queries9120Similarity – Nearest Neighbor Queries• Given: Query point q• Formal definition:• Ties must be handled:- Result ...
Voir