In real-world data, tuples with missing values for some attributes are a common occurrence. Describe various methods for handling this problem.
[3 marks]Adata warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data – Justify.
[4 marks]Briefly discussed steps of KDD process.
[7 marks]Define: Support, Confidence, Confusion Matrix
[3 marks]Explain Star, Snowflake, and Fact Constellation Schema for Multidimensional database.
[4 marks]With the help of a neat diagram explain the 3-tier architecture of a data warehouse.
[7 marks]List common tasks involved in data pre-processing. Explain briefly any four methods of data pre-processing with suitable example.
[7 marks]How classification differs from the prediction? Explain phases of classification.
[3 marks]What is Decision Tree? Explain how classification is done using decision tree induction.
[4 marks]Consider the following set of transactions. Let min_sup = 30% and min_conf = 60%. TID Items bought T1 pen, pencil T2 book, eraser, pencil T3 book, chalk, eraser, pen T4 chalk, eraser, pen T5 book, pen, pencil T6 book, eraser, pen, pencil T7 ink, pen T8 book, pen, pencil T9 eraser, pen, pencil T10 book, chalk, pencil 1. Find all frequent itemsets using Apriori algorithm. 2. Generate strong association rules. OR1
[7 marks]Describe the different classifications of Association Rule Mining.
[3 marks]Apply Min-Max normalization to scale the data into the range [1, 10] to the age values: 10, 30, 45, 23, 57, 63, 72, 27, 37, 55, 15, 32 and the normalized value for age = 30 is _________________. If Mean sales value is Rs. 65,500 and Standard Deviation is Rs. 17,000 then Z-Score Sales value of Rs. 82,200 is ____________.
[4 marks]Construct decision tree using ID3 classifier for the given training dataset. Temperature Outlook Humidity Windy Played Mild Sunny 80 No Yes Hot Sunny 75 Yes No Hot Overcast 77 No Yes Cool Rainy 70 No Yes Cool Overcast 72 Yes Yes Mild Sunny 77 No No Cool Sunny 70 No Yes Mild Rainy 69 No Yes Mild Sunny 65 Yes Yes Mild Overcast 77 Yes Yes Hot Overcast 74 No Yes Mild Rainy 77 Yes No Cool Rainy 73 Yes Yes Mild Rainy 78 No Yes
Briefly explain Linear and Non-linear regression.
[3 marks]Explain Bayesian learning and inference with suitable example.
[4 marks]What are the limitations of the Apriori approach for mining? Briefly describe the techniques to improve the efficiency of Apriori algorithm.
[7 marks]Differentiate OLTP and OLAP.
[3 marks]What is Big Data? Describe 4 Vs of Big Data?
[4 marks]Explain K-Mean clustering algorithm. How K-Mean clustering differs from K-Medoid clustering method?
[7 marks]Explain NameNode and DataNode in HDFS.
[3 marks]With the help of a suitable example, illustrate the OLAP operations: ‘drill-down’, ‘roll-up’, ‘slice’ and ‘dice’.
[4 marks]How data mining is useful for web mining. Discuss any four web mining applications.
[7 marks]Explain Click-stream mining.
[3 marks]Briefly explain the spatial data mining and temporal mining.
[4 marks]Draw and explain Hadoop architecture.
[7 marks]