Ana Gomez, a data analyst at Cha-Ching Bank, has compiled data on 500 past customers to whom Cha-Ching Bank marketed its Home Equity Line of Credit (HELOC) product. The data includes the age, sex, income, and whether or not the customer responded to the HELOC offer. Ana would like to team up with you to accomplish two data mining tasks:

(a) Develop a k-NN model for predicting whether or not a bank customer will respond to a HELOC offer.
(b) Identify for each of the 20 new customers if they are likely to respond to a HELOC offer.

Follow the k-NN optimization (with normalization) process as shown in the example process 07-01-RidingMowers k-NN Optimized Normalized.rmp with some changes as described below:

Make a copy of the RidingMowers process mentioned above. Rename the process by right-clicking it. Double-click and load this process on the RapidMiner canvas to start making changes to it.

Import HELOC.csv and HELOC-score.csv data into RapidMiner repository.

Load the files in the process appropriately (connect them instead of the existing data files).

Remove the Nominal to Binominal operator from the original process.

Instead, use the Numerical to Binominal operator to convert HELOC outcome variable to a binomial attribute.

Use the Set Role operator to set HELOC as the label role.

In the Edit Parameter Settings panel of the Optimize Parameters (Grid) operator, change the range of k to vary from a minimum of 1 to a maximum of 50 in 25 steps (linear scale).

Inside the Optimize Parameters (Grid) operator, change the split ratio of the Validation (Split Validation) operator to 0.75 split ratio with stratified sampling.

In the k-NN operator, change the measure types to MixedMeasures and mixed measure to MixedEuclideanDistance (since we have 2 numeric and 1 categorical attribute (Sex)).

In the Performance (Binomial Classification) operator, set the positive class to true and the main criterion for optimization to f-measure.

Run the process. Report the following results and provide your interpretation (important):

What is the optimal k value obtained?
What is the optimal (f-measure) value for the validation partition?
What is the AUC of your model?
What is the precision, recall, and accuracy of the model?
Provide screenshots of the following:
a. Confusion matrix obtained from the Performance operator
b. Result from Optimize Parameters (Grid) showing the optimal k-value selected
c. Result with a table showing all the k-values and performance metrics. Sort by f-measure in descending order.
d. Show the 20 new customer data, clearly showing the confidence (true), confidence (false), and the prediction (HELOC) columns.

Question

09-06-2024
Business

Answered

Discover the answers you need at Westonci.ca, a dynamic Q&A platform where knowledge is shared freely by a community of experts. Join our platform to connect with experts ready to provide precise answers to your questions in various areas. Experience the convenience of finding accurate answers to your questions from knowledgeable experts on our platform.

Ana Gomez, a data analyst at Cha-Ching Bank, has compiled data on 500 past customers to whom Cha-Ching Bank marketed its Home Equity Line of Credit (HELOC) product. The data includes the age, sex, income, and whether or not the customer responded to the HELOC offer. Ana would like to team up with you to accomplish two data mining tasks:

(a) Develop a k-NN model for predicting whether or not a bank customer will respond to a HELOC offer.
(b) Identify for each of the 20 new customers if they are likely to respond to a HELOC offer.

Follow the k-NN optimization (with normalization) process as shown in the example process 07-01-RidingMowers k-NN Optimized Normalized.rmp with some changes as described below:

Make a copy of the RidingMowers process mentioned above. Rename the process by right-clicking it. Double-click and load this process on the RapidMiner canvas to start making changes to it.

Import HELOC.csv and HELOC-score.csv data into RapidMiner repository.

Load the files in the process appropriately (connect them instead of the existing data files).

Remove the Nominal to Binominal operator from the original process.

Instead, use the Numerical to Binominal operator to convert HELOC outcome variable to a binomial attribute.

Use the Set Role operator to set HELOC as the label role.

In the Edit Parameter Settings panel of the Optimize Parameters (Grid) operator, change the range of k to vary from a minimum of 1 to a maximum of 50 in 25 steps (linear scale).

Inside the Optimize Parameters (Grid) operator, change the split ratio of the Validation (Split Validation) operator to 0.75 split ratio with stratified sampling.

In the k-NN operator, change the measure types to MixedMeasures and mixed measure to MixedEuclideanDistance (since we have 2 numeric and 1 categorical attribute (Sex)).

In the Performance (Binomial Classification) operator, set the positive class to true and the main criterion for optimization to f-measure.

Run the process. Report the following results and provide your interpretation (important):

What is the optimal k value obtained?
What is the optimal (f-measure) value for the validation partition?
What is the AUC of your model?
What is the precision, recall, and accuracy of the model?
Provide screenshots of the following:
a. Confusion matrix obtained from the Performance operator
b. Result from Optimize Parameters (Grid) showing the optimal k-value selected
c. Result with a table showing all the k-values and performance metrics. Sort by f-measure in descending order.
d. Show the 20 new customer data, clearly showing the confidence (true), confidence (false), and the prediction (HELOC) columns.

Sagot :

We hope our answers were helpful. Return anytime for more information and answers to any other questions you may have. Thanks for stopping by. We strive to provide the best answers for all your questions. See you again soon. Westonci.ca is committed to providing accurate answers. Come back soon for more trustworthy information.

The two epics written by Homer the Illiad and the Odyssey are considered to be what's known as Primary epics. The third book written by Virgil, The Aeneid , is

The coordinates of the vertices of ANGLE PQR are P(-3,3), Q(2,3), and R(-3,-4). Find the side lengths to the nearest hundredth and the angle measures to the nea

a store mixes red fescue worth $12 per pound and chewings fescue worth $16 per pound. The mixture is to sell for $15 per pound. Find out how much of each should

Your model locomotive is 16 inches long. It is an exact model of a locomotive that is 40 feet long. A window on the locomotive is how many times wider than a wi

The two epics written by Homer the Illiad and the Odyssey are considered to be what's known as Primary epics. The third book written by Virgil, The Aeneid , is

How to make money quickly

How is correct : "cus" or "cuz" ?

Does the green tea have coffein ???

How is correct : "cus" or "cuz" ?

Does the green tea have coffein ???

Sagot :

Other Questions