At Westonci.ca, we make it easy to get the answers you need from a community of informed and experienced contributors. Our platform provides a seamless experience for finding reliable answers from a network of experienced professionals. Our platform offers a seamless experience for finding reliable answers from a network of knowledgeable professionals.

What is the entropy of this collection of training examples with respect to the positive class B. What are the information gains of A1 and A2 relative to the training dataset For A3, which is a continuous attribute, compute the information gain for every possible split. C. What is the best split (among A1,A2, and A3) according to the information gain

Sagot :

The data set is missing in the question. The data set is given in the attachment.

Solution :

a). In the table, there are four positive examples and give number of negative examples.

Therefore,

[tex]$P(+) = \frac{4}{9}$[/tex]   and

[tex]$P(-) = \frac{5}{9}$[/tex]

The entropy of the training examples is given by :

[tex]$ -\frac{4}{9}\log_2\left(\frac{4}{9}\right)-\frac{5}{9}\log_2\left(\frac{5}{9}\right)$[/tex]

= 0.9911

b). For the attribute all the associating increments and the probability are :

  [tex]$a_1$[/tex]   +   -

  T   3    1

  F    1    4

Th entropy for   [tex]$a_1$[/tex]  is given by :

[tex]$\frac{4}{9}[ -\frac{3}{4}\log\left(\frac{3}{4}\right)-\frac{1}{4}\log\left(\frac{1}{4}\right)]+\frac{5}{9}[ -\frac{1}{5}\log\left(\frac{1}{5}\right)-\frac{4}{5}\log\left(\frac{4}{5}\right)]$[/tex]

= 0.7616

Therefore, the information gain for [tex]$a_1$[/tex]  is

  0.9911 - 0.7616 = 0.2294

Similarly for the attribute [tex]$a_2$[/tex]  the associating counts and the probabilities are :

  [tex]$a_2$[/tex]  +   -

  T   2    3

  F   2    2

Th entropy for   [tex]$a_2$[/tex] is given by :

[tex]$\frac{5}{9}[ -\frac{2}{5}\log\left(\frac{2}{5}\right)-\frac{3}{5}\log\left(\frac{3}{5}\right)]+\frac{4}{9}[ -\frac{2}{4}\log\left(\frac{2}{4}\right)-\frac{2}{4}\log\left(\frac{2}{4}\right)]$[/tex]

= 0.9839

Therefore, the information gain for [tex]$a_2$[/tex] is

  0.9911 - 0.9839 = 0.0072

[tex]$a_3$[/tex]     Class label      split point       entropy        Info gain

1.0         +                        2.0            0.8484        0.1427

3.0        -                         3.5            0.9885        0.0026

4.0        +                        4.5            0.9183         0.0728

5.0        -

5.0        -                        5.5            0.9839        0.0072

6.0        +                       6.5             0.9728       0.0183

7.0        +

7.0        -                        7.5             0.8889       0.1022

The best split for [tex]$a_3$[/tex]  observed at split point which is equal to 2.

c). From the table mention in part (b) of the information gain, we can say that [tex]$a_1$[/tex] produces the best split.

View image AbsorbingMan