DATA MINING
Data Mining
1.The apriori property means
Select one:
a. To improve the efficiency, do level-wise generation of frequent item sets Incorrect
b. If a set can pass a test, its supersets will fail the same test
c. If a set cannot pass a test, its supersets will also fail the same test
d. To decrease the efficiency, do level-wise generation of frequent item sets
The correct answer is: If a set cannot pass a test, its supersets will also fail the same test
2.Given a frequent itemset L, If |L| = k, then there are
Select one:
a. 2k candidate association rules
b. 2k - 1 candidate association rules
c. 2k -2 candidate association rules
d. 2k - 2 candidate association rules Incorrect
Feedback
The correct answer is: 2k -2 candidate association rules
3.With Bayes theorem the probability of hypothesis H¾ specified by P(H) ¾ is referred to as
Select one:
a. a conditional probability
b. a posterior probability
c. an a priori probability Correct
d. a bidirectional probability
Feedback
The correct answer is: an a priori probability
4.In _________ clusterings, points may belong to multiple clusters
Select one:
a. Partial
b. Fuzzy
c. Exclusive
d. Non exclusivce Incorrect
Feedback
The correct answer is: Fuzzy
5.Which statement about outliers is true?
Select one:
a. Outliers should be part of the test dataset but should not be present in the training data.
b. The nature of the problem determines how outliers are used Correct
c. Outliers should be identified and removed from a dataset.
d. Outliers should be part of the training dataset but should not be present in the test data.
Feedback
The correct answer is: The nature of the problem determines how outliers are used
6.Which is not part of the categories of clustering methods?
Select one:
a. Partitioning methods
b.
Rule-based methods Correct
c. Hierarchical methods
d. Density based methods
Feedback
The correct answer is:
Rule-based methods
7.The most general form of distance is
Select one:
a. Mean
b. Manhattan
c. Minkowski
d. Eucledian Incorrect
Feedback
The correct answer is: Minkowski
8.The correlation coefficient for two real-valued attributes is –0.85. What does this value tell you?
Select one:
a. The attributes are not linearly related.
b. As the value of one attribute increases the value of the second attribute also increases.
c. The attributes show a linear relationship
d. As the value of one attribute decreases the value of the second attribute increases. Correct
Feedback
The correct answer is: As the value of one attribute decreases the value of the second attribute increases.
9.Frequent item sets is
Select one:
a. Subset of maximal frequent item sets
b. Superset of only closed frequent item sets
c. Superset of only maximal frequent item sets Incorrect
d. Superset of both closed frequent item sets and maximal frequent item sets
Feedback
The correct answer is: Superset of both closed frequent item sets and maximal frequent item sets
10.In Apriori algorithm, if 1 item-sets are 100, then the number of candidate 2 item-sets are
Select one:
a. 200 Incorrect
b. 100
c. 4950
d. 5000
Feedback
The correct answer is: 4950
11.Significant Bottleneck in the Apriori algorithm is
Select one:
a. Number of iterations
b. Finding frequent itemsets
c. Pruning
d. Candidate generation Correct
Feedback
The correct answer is: Candidate generation
12.Which statement is true about neural network and linear regression models?
Select one:
a. Both models require numeric attributes to range between 0 and 1.
b. The output of both models is a categorical attribute value.
c. Both techniques build models whose output is determined by a linear sum of weighted input attribute values.
d. Both models require input attributes to be numeric. Correct
Feedback
The correct answer is: Both models require input attributes to be numeric.
13.If {A,B,C,D} is a frequent itemset, candidate rules which is not possible is
Select one:
a. D -->ABCD
b. B --> ADC Incorrect
c. C --> A
d. A --> BC
Feedback
The correct answer is: D -->ABCD
14What is the final resultant cluster size in Divisive algorithm, which is one of the hierarchical clustering approaches?
Select one:
a. singleton
b. Three
c. Zero Incorrect
d. Two
Feedback
The correct answer is: singleton
15.Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that
16.Select one:
a. X is false when Y is known to be false.
b. Y is false when X is known to be false.
c. X is true when Y is known to be true
d. Y is true when X is known to be true. Correct
Feedback
The correct answer is: Y is true when X is known to be true.
17.This data transformation technique works well when minimum and maximum values for a real-valued attribute are known.
Select one:
a. decimal scaling
b. z-score normalization
c. min-max normalization Correct
d. logarithmic normalization
Feedback
The correct answer is: min-max normalization
18.Which statement is true about the K-Means algorithm?
Select one:
a. The output attribute must be cateogrical.
b. All attributes must be numeric Correct
c. All attribute values must be categorical.
d. Attribute values may be either categorical or numeric
Feedback
The correct answer is: All attributes must be numeric
19.Clustering is ___________ and is example of ____________learning
Select one:
a. Predictive and unsupervised
b. Descriptive and supervised
c. Predictive and supervised
d. Descriptive and unsupervised Correct
Feedback
The correct answer is: Descriptive and unsupervised
20.Find odd man out
Select one:
a. K medoid
b. PAM Incorrect
c. DBSCAN
d. K mean
Feedback
The correct answer is: DBSCAN
21. _________ is an example for case based-learning
Select one:
a. Neural networks
b. K-nearest neighbor
c. Decision trees Incorrect
d. Genetic algorithm
Feedback
The correct answer is: K-nearest neighbor
22.The _______ step eliminates the extensions of (k-1)-itemsets which are not found to be frequent,from being considered for counting support
Select one:
a. Partitioning
b. Pruning Correct
c. Itemset eliminations
d. Candidate generation
Feedback
The correct answer is: Pruning
23. Assume that we have a dataset containing information about 200 individuals. A supervised data mining session has discovered the following rule:
IF age < 30 & credit card insurance = yes THEN life insurance = yes
Rule Accuracy: 70% and Rule Coverage: 63%
How many individuals in the class life insurance= no have credit card insurance and are less than 30 years old?
Select one:
a. 63
b. 30
c. 70 Incorrect
d. 38
Feedback
The correct answer is: 38
24.A good clustering method will produce high quality clusters with
Select one:
a. no inter class similarity
b. low intra class similarity
c. high inter class similarity
d. high intra class similarity Correct
Feedback
The correct answer is: high intra class similarity
25.If an item set ‘XYZ’ is a frequent item set, then all subsets of that frequent item set are
Select one:
a. Not frequent
b. Frequent Correct
c. Undefined
d. Can not say
Feedback
The correct answer is: Frequent
26.The number of iterations in apriori ___________
Select one:
a. decreases with the increase in size of the data
b. increases with the size of the data
c. decreases with increase in size of the maximum frequent set Incorrect
d. increases with the size of the maximum frequent set
Feedback
The correct answer is: increases with the size of the maximum frequent set
27.Arbitrary shaped clusters can be found by using
Select one:
a. Density methods
b. Agglomerative
c. Hierarchical methods Incorrect
d. Partitional methods
Feedback
The correct answer is: Density methods
28.Which two parameters are needed for DBSCAN
Select one:
a. Min sup and min confidence
b. Number of centroids
c. Min points and eps Correct
d. Min threshold
Feedback
The correct answer is: Min points and eps
29.Given desired class C and population P, lift is defined as
Select one:
a. the probability of class C given a sample taken from population P divided by the probability of C within the entire population P. Correct
b. the probability of population P given a sample taken from P
c. the probability of class C given a sample taken from population P.
d. the probability of class C given population P divided by the probability of C given a sample taken from the population
Feedback
The correct answer is: the probability of class C given a sample taken from population P divided by the probability of C within the entire population P.
30.Which Association Rule would you prefer
Select one:
a. High support and low confidence
b. Low support and high confidence
c. Low support and low confidence
d. High support and medium confidence Incorrect
Feedback
The correct answer is: Low support and high confidence
No comments:
Post a Comment