High Performance Data MiningChapter 4: Association RulesVipin KumarArmy High Performance Computing Research CenterDepartment of Computer ScienceUniversity of Minnesota http://www.cs.umn.edu/~kumar© R. Grossman, C. Kamath, V. Kumar Data Mining for Scientific and Engineering Applications Ch 4/ 1Chapter 4: Algorithms for Association Rules DiscoveryOutline Serial Association Rule Discovery– Definition and Complexity.– Apriori Algorithm. Parallel Algorithms– Need– Count Distribution, Data Distribution– Intelligent Data Distribution, Hybrid Distribution– Experimental Results© R. Grossman, C. Kamath, V. Kumar Data Mining for Scientific and Engineering Applications Ch 4/ 2Association Rule Discovery: Support and ConfidenceTID Items1 Bread, Milk2 Beer, Diaper, Bread, Eggs3 Beer, Coke, Diaper, Milk4 Beer, Bread, Diaper, Milk5 Coke, Bread, Diaper, MilkExample:Association Rule:X ys,{Diaper, Milk} Beers, (X y) (Diaper, Milk, Beer) 2Support:s (s P(X, y))s 0.4| T |Total Number of Transactions 5 (X y) (Diaper,Milk,Beer)Confidence: ( P(y | X)) 0.66 (X) | (Diaper,Milk) |© R. Grossman, C. Kamath, V. Kumar Data Mining for Scientific and Engineering Applications Ch 4/ 3Handling Exponential Complexity Given n transactions and m different items:m1O(m2 )– number of possible association rules: mO(nm2 )– ...
Voir