Explain the difference between OLTP and OLAP systems with suitable examples.
[3 marks]Describe the architecture and key components of a Data Warehouse.
[4 marks]Discuss the role of Metadata in a Data Warehouse and explain how it supports Business Intelligence operations.
[7 marks]Define KDD and explain its relationship with Data Mining.
[3 marks]List and explain various data preprocessing techniques with examples.
[4 marks]Design a data preprocessing flow for handling missing and noisy data.
[7 marks]Explain Mean, Median, Mode, and Variance& Standard Deviation in brief.
[7 marks]What are frequent itemsets? Explain support and confidence with example.
[3 marks]Explain Apriori algorithm in brief. Page 1 of
[3 marks]Consider the following set of transactions. Let min_sup = 40% and min_conf = 60%. 1. Find all frequent itemsets using Apriori algorithm. 2. Generate strong association rules. Transaction Items ID Purchased {Milk, 1 Bread, Diapers} {Bread, 2 Coffee, Diapers} {Milk, 3 Diapers, Eggs} {Milk, 4 Bread, Coffee} {Bread, 5 Diapers, Eggs}
[7 marks]Define support and confidence. Why are these measures important in Association Rule Mining?
[3 marks]Describe steps in the FP-Growth algorithm.
[4 marks]i) Suppose that the data for analysis includes the attribute age. The age values for the data tuples are (in increasing order): 13, 15, 16, 16, 19, 20, 23, 29, 35, 41, 44, 53, 62, 69, 72 Use min-max normalization to transform the value 45 for age onto the range [0:0, 1:0] ii) Suppose that the mean and standard deviation of the values for attribute income are 54,000 Rs. and 16,000 Rs., respectively. then find z score value of 73,600 Rs salary.
[7 marks]Explain the difference between classification and prediction with examples.
[3 marks]Describe Bayesian classification with a suitable example.
[4 marks]Explain Decision Tree algorithm (ID3) to classify dataset.
[7 marks]What is clustering? Differentiate between partitioning and hierarchical clustering.
[3 marks]Explain K-means clustering algorithm with suitable example.
[4 marks]Define outlier analysis. Why outlier mining is important? Briefly describe the different approaches for outlier detection. Page 2 of
[3 marks]What is Big Data? Explain the characteristics and importance of Big Data Analytics.
[3 marks]Explain how Hadoop handles distributed storage and processing using HDFS and MapReduce.
[4 marks]Discuss real-world applications of Data Mining in Business Intelligence, such as fraud detection and customer segmentation.
[7 marks]Explain any four applications of Business Intelligence.
[3 marks]Discuss Hadoop ecosystem components: Pig, Hive, and HBase.
[4 marks]Explain how MapReduce can be used to perform Market Basket Analysis on a large dataset. Page 3 of
[3 marks]