Define: Data, Information and Knowledge.
[3 marks]Do feature wise comparison between Business Intelligence and Data Warehouse.
[4 marks]Define KDD. Explain KDD process in detail.
[7 marks]What is Market Basket Analysis? Explain Association Rules with two measures: Support and Confidence.
[3 marks]Differentiate between OLTP and OLAP.
[4 marks]Discuss research issues in Data Mining.
[7 marks]Explain Mean, Median, Mode, Variance, Standard Deviation and five number summary with suitable database example.
[7 marks]Define data cube and explain three operations on it.
[3 marks]Define noise. Explain binning methods for data smoothing.
[4 marks]Explain three-tier Data Warehouse Architecture.
[7 marks]Define: Numerosity Reduction, Data Integration and Data Transformation.
[3 marks]What is Concept Hierarchy? List and explain types of Concept Hierarchy.
[4 marks]Define the Apriori Property. Generate candidate itemsets, frequent itemsets and association rules using Apriori algorithm on the following data set with minimum support count is 2 and minimum confidence is 60%. Transaction ID Items T1 Hot Dogs, Buns, Ketchup T2 Hot Dogs, Buns T3 Hot Dogs, Coke, Chips T4 Chips, Coke T5 Chips, Ketchup T6 Hot Dogs, Coke, Chips
[7 marks]Explain Spatial mining using example.
[3 marks]Define Schema. Explain the following schemas with suitable example. 1) Star 2) Snowflakes 3) Constellations
[4 marks]Calculate 2 clusters using k-means cluster algorithm. For finding the distance use euclidian distance.1 Subject A B 1 1.0 1.0 2 1.5 2.0 3 3.0 4.0 4 5.0 7.0 5 3.5 5.0 6 4.5 5.0 7 3.5 4.5 Assume mean 1 as subject 1 and mean 2 as subject 4.
[7 marks]Explain cluster analysis and outlier analysis with example.
[3 marks]Discuss the application of data warehousing and data mining.
[4 marks]Generate decision tree using CART algorithm for the following dataset. Sr. Outlook Temperature Humidity Windy Class No. 1 Sunny Hot High False N 2 Sunny Hot High True N 3 Overcast Hot High False Y 4 Rain Mild High False Y 5 Rain Cool Normal False Y 6 Rain Cool Normal True N 7 Overcast Cool Normal True Y 8 Sunny Mild High False N 9 Sunny Cool Normal False Y 10 Rain Mild Normal False Y 11 Sunny Mild Normal True Y 12 Overcast Mild High True Y 13 Overcast Hot Normal False Y 14 Rain Mild High True N
[7 marks]Discuss about Big Data.
[3 marks]Define sampling methods for data reduction.
[4 marks]What is the need of data preprocessing? Explain data cleaning process for missing values and noisy data treatment.
[7 marks]Differentiate Classification and Clustering.
[3 marks]Minimum salary is Rs. 35,000 and Maximum salary is Rs. 2,10,000. Map the salary Rs. 1,30,000 in new range of Rs. (70,000 , 3,10,000) using min-max normalization method.
[4 marks]Draw and explain Hadoop architecture.
[7 marks]