20230227115732-A1A
To generate potential nodes for the root, first each numerical feature is ordered lowest to highest. The average based off the current row and the row below is also calculated.
Age | Height | Average for Age | Average for Height |
---|---|---|---|
5 | 100 | 7.5 | 125 |
10 | 150 | 15 | 175 |
20 | 200 | - | - |
Then each root can be generated like so:
flowchart TD
A[Age < 7.5\nGini impurity: 0.32]
A --> |True| B[Loves the song]
A --> |False| C[Deos not love the song]
flowchart TD
A[Age < 15\nGini impurity: 0.2]
A --> |True| B[Loves the song]
A --> |False| C[Deos not love the song]
flowchart TD
A[Height < 125\nGini impurity: 0.48]
A --> |True| B[Loves the song]
A --> |False| C[Deos not love the song]
flowchart TD
A[Height < 175\nGini impurity: 0.41]
A --> |True| B[Loves the song]
A --> |False| C[Deos not love the song]
The root with the lowest gini impurity can then be selected. In this case, it would be age < 15
.