Sunday 1 April 2018

DATA MINING


                                                            Data Mining


1.The apriori property means

Select one:
a. To improve the efficiency, do level-wise generation of frequent item sets Incorrect
b. If a set can pass a test, its supersets will fail the same test
c. If a set cannot pass a test, its supersets will also fail the same test
d. To decrease the efficiency, do level-wise generation of frequent item sets

The correct answer is: If a set cannot pass a test, its supersets will also fail the same test


2.Given a frequent itemset L, If |L| = k, then there are

Select one:
a. 2k   candidate association rules
b. 2k   - 1 candidate association rules
c. 2k -2 candidate association rules
d. 2k   - 2 candidate association rules Incorrect
Feedback
The correct answer is: 2k -2 candidate association rules


3.With Bayes theorem the probability of hypothesis H¾ specified by P(H) ¾ is referred to as

Select one:
a. a conditional probability
b. a posterior probability
c. an a priori probability Correct
d. a bidirectional probability
Feedback
The correct answer is: an a priori probability

4.In _________ clusterings, points may belong to multiple clusters

Select one:
a. Partial
b. Fuzzy
c. Exclusive
d. Non exclusivce Incorrect
Feedback
The correct answer is: Fuzzy


5.Which statement about outliers is true?

Select one:
a. Outliers should be part of the test dataset but should not be present in the training data.
b. The nature of the problem determines how outliers are used Correct
c. Outliers should be identified and removed from a dataset.
d. Outliers should be part of the training dataset but should not be present in the test data.
Feedback
The correct answer is: The nature of the problem determines how outliers are used


6.Which is not part of the categories of clustering methods?


Select one:
a. Partitioning methods
b.
Rule-based methods Correct
c. Hierarchical methods
d. Density based methods
Feedback
The correct answer is:
Rule-based methods


7.The most general form of distance is

Select one:
a. Mean
b. Manhattan
c. Minkowski
d. Eucledian Incorrect
Feedback
The correct answer is: Minkowski


8.The correlation coefficient for two real-valued attributes is –0.85. What does this value tell you?

Select one:
a. The attributes are not linearly related.
b. As the value of one attribute increases the value of the second attribute also increases.
c. The attributes show a linear relationship
d. As the value of one attribute decreases the value of the second attribute increases. Correct
Feedback
The correct answer is: As the value of one attribute decreases the value of the second attribute increases.


9.Frequent item sets is

Select one:
a. Subset of maximal frequent item sets
b. Superset of only closed frequent item sets
c. Superset of only maximal frequent item sets Incorrect
d. Superset of both closed frequent item sets and maximal frequent item sets
Feedback
The correct answer is: Superset of both closed frequent item sets and maximal frequent item sets

10.In Apriori algorithm, if 1 item-sets are 100, then the number of candidate 2 item-sets are

Select one:
a. 200 Incorrect
b. 100
c. 4950
d. 5000
Feedback
The correct answer is: 4950


11.Significant Bottleneck in the Apriori algorithm is

Select one:
a. Number of iterations
b. Finding frequent itemsets
c. Pruning
d. Candidate generation Correct
Feedback
The correct answer is: Candidate generation

12.Which statement is true about neural network and linear regression models?

Select one:
a. Both models require numeric attributes to range between 0 and 1.
b. The output of both models is a categorical attribute value.
c. Both techniques build models whose output  is determined by a  linear sum of weighted input attribute values.
d. Both models require input attributes to be numeric. Correct
Feedback
The correct answer is: Both models require input attributes to be numeric.



13.If {A,B,C,D} is a frequent itemset, candidate rules which is not possible is

Select one:
a. D -->ABCD
b. B --> ADC Incorrect
c. C --> A
d. A --> BC
Feedback
The correct answer is: D -->ABCD


14What is the final resultant cluster size in Divisive algorithm, which is one of the hierarchical clustering approaches?

Select one:
a. singleton
b. Three
c. Zero Incorrect
d. Two
Feedback
The correct answer is: singleton


15.Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that



16.Select one:
a. X is false when Y is known to be false.
b. Y is false when X is known to be false.
c. X is true when Y is known to be true
d. Y is true when X is known to be true. Correct
Feedback
The correct answer is: Y is true when X is known to be true.


17.This data transformation technique works well when minimum and maximum values for a real-valued attribute are known.

Select one:
a. decimal scaling
b. z-score normalization
c. min-max normalization Correct
d. logarithmic normalization
Feedback
The correct answer is: min-max normalization

18.Which statement is true about the K-Means algorithm?

Select one:
a. The output attribute must be cateogrical.
b. All attributes must be numeric Correct
c. All attribute values must be categorical.
d. Attribute values may be either categorical or numeric
Feedback
The correct answer is: All attributes must be numeric


19.Clustering is ___________ and is example of ____________learning

Select one:
a. Predictive and unsupervised
b. Descriptive and supervised
c. Predictive and supervised
d. Descriptive and unsupervised Correct
Feedback
The correct answer is: Descriptive and unsupervised


20.Find odd man out

Select one:
a. K medoid
b. PAM Incorrect
c. DBSCAN
d. K mean
Feedback
The correct answer is: DBSCAN


21. _________ is an example for case based-learning

Select one:
a. Neural networks
b. K-nearest neighbor
c. Decision trees Incorrect
d. Genetic algorithm
Feedback
The correct answer is: K-nearest neighbor


22.The _______ step eliminates the extensions of (k-1)-itemsets which are not found to be frequent,from being considered for counting support

Select one:
a. Partitioning
b. Pruning Correct
c. Itemset eliminations
d. Candidate generation
Feedback
The correct answer is: Pruning


23. Assume that we have a dataset containing information about 200 individuals.  A supervised data mining session has discovered the following rule:

                IF  age < 30 & credit card insurance = yes   THEN life insurance = yes
                        Rule Accuracy:    70%   and  Rule Coverage:   63%

How many individuals in the class life insurance= no have credit card insurance and are less than 30 years old?
Select one:
a. 63
b. 30
c. 70 Incorrect
d. 38
Feedback
The correct answer is: 38


24.A good clustering method will produce high quality clusters with

Select one:
a. no inter class similarity
b. low intra class similarity
c. high inter class similarity
d. high intra class similarity Correct
Feedback
The correct answer is: high intra class similarity


25.If  an item set ‘XYZ’ is a frequent item set, then all subsets of that frequent item set are


Select one:
a. Not frequent
b. Frequent Correct
c. Undefined
d. Can not say
Feedback
The correct answer is: Frequent


26.The number of iterations in apriori ___________

Select one:
a. decreases with the increase in size of the data
b. increases with the size of the data
c. decreases with increase in size of the maximum frequent set Incorrect
d. increases with the size of the maximum frequent set
Feedback
The correct answer is: increases with the size of the maximum frequent set


27.Arbitrary shaped clusters can be found by using

Select one:
a. Density methods
b. Agglomerative
c. Hierarchical methods Incorrect
d. Partitional methods
Feedback
The correct answer is: Density methods


28.Which two parameters are needed for DBSCAN

Select one:
a. Min sup and min confidence
b. Number of centroids
c. Min points and eps Correct
d. Min threshold
Feedback
The correct answer is: Min points and eps


29.Given desired class C and population P, lift is defined as

Select one:
a. the probability of class C given a sample taken from population P divided by the probability of C within the entire population P. Correct
b. the probability of  population P given a sample taken from P
c. the probability of class C given a sample taken from population P.
d. the probability of class C given population P divided by the probability of C given a sample taken from the population
Feedback
The correct answer is: the probability of class C given a sample taken from population P divided by the probability of C within the entire population P.

30.Which Association Rule would you prefer

Select one:
a. High support and low confidence
b. Low support and high confidence
c. Low support and low confidence
d. High support and medium confidence Incorrect
Feedback
The correct answer is: Low support and high confidence










No comments:

Post a Comment