Thursday, May 7, 2009

DMDW FAQS with solutions for JNTU BTECH and MCA students unit 3

1. Explain various Data mining primitives.

Data mining task primitives are used to describe data mining query which are given as input to data mining system. Every task in data mining is defined in the form of data mining query. These task primitives enables the user to communicate interactively during the discovery stage. Data mining should be an interactive process
Users must be provided with a set of primitives to be used to communicate with the data mining system. Incorporating these primitives in a data mining query language
More flexible user interaction. Foundation for design of graphical user interface
Standardization of data mining industry and practice

Data mining primitives:

Task-relevant data
Type of knowledge to be mined
Background knowledge
Pattern interestingness measurements
Visualization of discovered patterns

2. What is concept hierarchy in Data mining. Discuss different types of hierarchies in data mining.

A concept hierarchy defines a sequence of mappings from a set of low-level concepts to higher level, more general concepts.

Types of concept hierarchy:
1. Schema hierarchy: A schema hierarchy is a total or partial order among attributes in the database schema. Schema hierarchies may formally express existing semantic, relationships between attributes. Typically, a schema hierarchy specifies a data warehouse dimension.
2. Set-grouping hierarchies: A set-grouping hierarchy organizes values for a given attribute or dimension into groups of constants or range values. A total or partial order can be defined among groups. Set-grouping hierarchies can be used to refine or enrich schema-defined hierarchies, when the two types of hierarchies are combined. They are typically used for defining small sets of object relationships.

3.Define confidence and support interestingness measure with formula.

4.List out discuss different types of coupling schemas.

1. No coupling : No coupling means that a DM system will not utilize any function of a DB or DW system. It may fetch data from a particular source, process the data using some data mining algorithms, and then store the mining results in another file.
2. Loose coupling: Loose coupling means that a DM system will use some facilities of a DB or DW system, fetching data from a data repository managed by these systems, performing data mining and then storing the mining results either in a file or in a designated place in a database or data warehouse.
3. Semi tight coupling : Semi tight coupling means that besides linking a DM system to a DB/DW system, efficient implementations of a few essential data mining primitives can be provided in the DM/DW system.
4. Tight coupling : Tight coupling means that a DM system is smoothly integrated into the DB/DW system.

5. Give an example for each type of concept hierarchy.

Schema hierarchy
E.g., street < color="#33ffff">6.Write DMQL syntax for association and classification

Association
Mine_Knowledge_Specification ::= mine associations [as pattern_name]

Classification
Mine_Knowledge_Specification ::= mine classification [as pattern_name] analyze classifying_attribute_or_dimension

7. Define the various ways of carrying out the presentation and visualization specification.

Different backgrounds/usages may require different forms of representation
E.g., rules, tables, crosstabs, pie/bar chart etc.
Concept hierarchy is also important
Discovered knowledge might be more understandable when represented at high level of abstraction
Interactive drill up/down, pivoting, slicing and dicing provide different perspective to data
Different kinds of knowledge require different representation: association, classification, clustering, etc.
9. Write DMQL syntax for prediction and association.

Prediction
Mine_Knowledge_Specification ::= mine prediction [as pattern_name] analyze prediction_attribute_or_dimension {set {attribute_or_dimension_i= value_i}}

Association
Mine_Knowledge_Specification ::= mine associations [as pattern_name

10, 11. Give an example for location, age hierarchy with dmql syntax.

use database AllElectronics_db
use hierarchy location_hierarchy for B.address
mine characteristics as customerPurchasing
analyze count%
in relevance to C.age, I.type, I.place_made
from customer C, item I, purchases P, items_sold S, works_at W, branch
where I.item_ID = S.item_ID and S.trans_ID = P.trans_ID
and P.cust_ID = C.cust_ID and P.method_paid = ``AmEx''
and P.empl_ID = W.empl_ID and W.branch_ID = B.branch_ID and B.address = ``Canada" and I.price >= 100
with noise threshold = 0.05
display as table

define hierarchy age_hierarchy for age on customer as
{age_category(1), ..., age_category(5)} := cluster(default, age, 5) < color="#33ffff">12. Explain Rule – based hierarchies. With example

rule-based hierarchies
define hierarchy profit_margin_hierarchy on item as
level_1: low_profit_margin <> $50) and ((price - cost) <= $250)) level_1: high_profit_margin <> $250

13. Explain set – grouping hierarchies with example.

set-grouping hierarchies
define hierarchy age_hierarchy for age on customer as
level1: {young, middle_aged, senior} < color="#33ffff">14. Discuss the various classification of data mining system

15. Write an DMQL query to compare the students majoring in science with students majoring in arts.

use database AllElectronics_db
use hierarchy location_hierarchy for B.address
mine characteristics as customerPurchasing
analyze count%
in relevance to C.age, I.type, I.place_made
from customer C, item I, purchases P, items_sold S, works_at W, branch
where I.item_ID = S.item_ID and S.trans_ID = P.trans_ID
and P.cust_ID = C.cust_ID and P.method_paid = ``AmEx''
and P.empl_ID = W.empl_ID and W.branch_ID = B.branch_ID and B.address = ``Canada" and I.price >= 100
with noise threshold = 0.05
display as table

16. state DMQL syntax for Task relevant Data specification and knowledge to be mined specification.

use database database_name, or use data warehouse data_warehouse_name
from relation(s)/cube(s) [where condition]
in relevance to att_or_dim_list
order by order_list
group by grouping_list
having condition

Characterization
Mine_Knowledge_Specification ::= mine characteristics [as pattern_name] analyze measure(s)
Discrimination
Mine_Knowledge_Specification ::= mine comparison [as pattern_name] for target_class where target_condition {versus contrast_class_i where contrast_condition_i} analyze measure(s)
Association
Mine_Knowledge_Specification ::= mine associations [as pattern_name]

No comments:

Post a Comment