DATA MINING AND WAREHOUSING
Quiz by Albert Bandol
Tag the questions with any skills you have. Your dashboard will track each student's mastery of each skill.
Process of discovering patterns, trends, anduseful insights from large datasets
Assessing the patterns or models discovered to determine if they are useful and valid. This involves statistical and business evaluation criteria.
Presenting the discovered knowledge in a way that is comprehensible and actionable (e.g., reports, visualizations, decision trees).
Gathering and aggregating data from various sources. This may involve database systems, APIs, and more.
Addressing missing values, outliers, filter noise, and inconsistencies to ensure the data is of high quality
Converting data into a consistent format for mining, such as scaling attributes, encoding categorical variables, or aggregating feature
Choosing the most relevant features and if necessary, creating new ones.
Applying machine learning algorithms to train models. This step can include techniques like regression, classification, and clustering.
Evaluating model performance using validation techniques (e.g., k-fold validation, confusion matrix, ROC curve) to determine the accuracy, precision, recall, etc.
Pushing the model into production where it can make predictions on new data. This could involve integrating with business applications
Ensuring the model continues to perform well overtime and updating it as needed.
Preparing the data for analysis through cleaning (removing errors or inconsistencies), transformation (changing formats), and normalization (scaling features). This ensures the quality of data before applying data mining algorithms.
A statistical technique used to predict continuous variables. Example: Predicting the "Last Purchase Amount" of a customer based on their age, gender, or membership status
Used when data is categorized into specific classes. Example: Classifying customers as "High Spend" vs. "Low Spend."
Techniques such as Market Basket Analysis are used to identify relationships between variables. Example: "Customers who buy electronics often also buy home appliances."
Grouping similar data points together. Example: K-means clustering to group customers based on age, location, and membership status
Refers to unprocessed data collected from different sources. Needs to be cleaned, transformed, and structured before it can be used.
Converting continuous data into discrete categories. Example: Binning ages into "18-25," "26-35,""36-50," etc.
Occurs when certain values are not recorded. It can be handled by:
*Removing rows with missing data.
*Imputing missing values using mean, median, or other techniques.
Scaling data within a specific range (e.g., 0 to1).
Handling missing or incorrect data by replacing values. Example: Replacing missing purchase amounts with the average of other customers.
The overall process of cleaning, transforming, and organizing data to a suitable format for analysis.
It is the key objective of data mining, focusing on uncovering trends, patterns, and relationships hidden in the raw data
It is the structured process of data preparation, model evaluation, and deployment in a sequential manner.
A ____ _________ is a central repository for structured data used for analysis and reporting.