{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# # Classifiers introduction\n", "\n", "In the following program we introduce the basic steps of classification of a dataset in a matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import the package for learning and modeling trees" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true }, "outputs": [], "source": [ "from sklearn import tree" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define the matrix containing the data (one example per row)\n", "and the vector containing the corresponding target value" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "X = [[0, 0, 0], [1, 1, 1], [0, 1, 0], [0, 0, 1], [1, 1, 0], [1, 0, 1]]\n", "Y = [1, 0, 0, 0, 1, 1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Declare the classification model you want to use and then fit the model to the data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "clf = tree.DecisionTreeClassifier()\n", "clf = clf.fit(X, Y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Predict the target value (and print it) for the passed data, using the fitted model currently in clf" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0]\n" ] } ], "source": [ "print(clf.predict([[0, 1, 1]]))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 0]\n" ] } ], "source": [ "print(clf.predict([[1, 0, 1],[0, 0, 1]]))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "Tree\n", "\n", "\n", "\n", "0\n", "\n", "X[2] <= 0.5\n", "gini = 0.5\n", "samples = 6\n", "value = [3, 3]\n", "\n", "\n", "\n", "1\n", "\n", "X[0] <= 0.5\n", "gini = 0.444\n", "samples = 3\n", "value = [1, 2]\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "True\n", "\n", "\n", "\n", "6\n", "\n", "X[0] <= 0.5\n", "gini = 0.444\n", "samples = 3\n", "value = [2, 1]\n", "\n", "\n", "\n", "0->6\n", "\n", "\n", "False\n", "\n", "\n", "\n", "2\n", "\n", "X[1] <= 0.5\n", "gini = 0.5\n", "samples = 2\n", "value = [1, 1]\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "\n", "\n", "\n", "5\n", "\n", "gini = 0.0\n", "samples = 1\n", "value = [0, 1]\n", "\n", "\n", "\n", "1->5\n", "\n", "\n", "\n", "\n", "\n", "3\n", "\n", "gini = 0.0\n", "samples = 1\n", "value = [0, 1]\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "\n", "\n", "\n", "4\n", "\n", "gini = 0.0\n", "samples = 1\n", "value = [1, 0]\n", "\n", "\n", "\n", "2->4\n", "\n", "\n", "\n", "\n", "\n", "7\n", "\n", "gini = 0.0\n", "samples = 1\n", "value = [1, 0]\n", "\n", "\n", "\n", "6->7\n", "\n", "\n", "\n", "\n", "\n", "8\n", "\n", "X[1] <= 0.5\n", "gini = 0.5\n", "samples = 2\n", "value = [1, 1]\n", "\n", "\n", "\n", "6->8\n", "\n", "\n", "\n", "\n", "\n", "9\n", "\n", "gini = 0.0\n", "samples = 1\n", "value = [0, 1]\n", "\n", "\n", "\n", "8->9\n", "\n", "\n", "\n", "\n", "\n", "10\n", "\n", "gini = 0.0\n", "samples = 1\n", "value = [1, 0]\n", "\n", "\n", "\n", "8->10\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os\n", "os.environ[\"PATH\"] += os.pathsep + 'C:/Users/galat/.conda/envs/aaut/Library/bin/graphviz'\n", "import graphviz\n", "dot_data = tree.export_graphviz(clf, out_file=None) \n", "graph = graphviz.Source(dot_data) \n", "graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following we start using a dataset (from UCI Machine Learning repository)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import load_iris\n", "iris = load_iris()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Declare the type of prediction model and the working criteria for the model induction algorithm" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "clf = tree.DecisionTreeClassifier(criterion=\"entropy\",random_state=300,min_samples_leaf=5,class_weight={0:1,1:1,2:1})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Split the dataset in training and test set" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Generate a random permutation of the indices of examples that will be later used \n", "# for the training and the test set\n", "import numpy as np\n", "np.random.seed(0)\n", "indices = np.random.permutation(len(iris.data))\n", "\n", "# We now decide to keep the last 10 indices for test set, the remaining for the training set\n", "indices_training=indices[:-10]\n", "indices_test=indices[-10:]\n", "\n", "iris_X_train = iris.data[indices_training] # keep for training all the matrix elements with the exception of the last 10 \n", "iris_y_train = iris.target[indices_training]\n", "iris_X_test = iris.data[indices_test] # keep the last 10 elements for test set\n", "iris_y_test = iris.target[indices_test]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Fit the learning model on training set" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# fit the model to the training data\n", "clf = clf.fit(iris_X_train, iris_y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Obtain predictions" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Predictions:\n", "[1 2 1 0 0 0 2 1 2 0]\n", "True classes:\n", "[1 1 1 0 0 0 2 1 2 0]\n", "['setosa' 'versicolor' 'virginica']\n" ] } ], "source": [ "# apply fitted model \"clf\" to the test set \n", "predicted_y_test = clf.predict(iris_X_test)\n", "\n", "# print the predictions (class numbers associated to classes names in target names)\n", "print(\"Predictions:\")\n", "print(predicted_y_test)\n", "print(\"True classes:\")\n", "print(iris_y_test) \n", "print(iris.target_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Print the index of the test instances and the corresponding predictions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Look at the specific examples" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Instance # 88: \n", "sepal length (cm)=5.6, sepal width (cm)=3.0, petal length (cm)=4.1, petal width (cm)=1.3\n", "Predicted: versicolor\t True: versicolor\n", "\n", "Instance # 70: \n", "sepal length (cm)=5.9, sepal width (cm)=3.2, petal length (cm)=4.8, petal width (cm)=1.8\n", "Predicted: virginica\t True: versicolor\n", "\n", "Instance # 87: \n", "sepal length (cm)=6.3, sepal width (cm)=2.3, petal length (cm)=4.4, petal width (cm)=1.3\n", "Predicted: versicolor\t True: versicolor\n", "\n", "Instance # 36: \n", "sepal length (cm)=5.5, sepal width (cm)=3.5, petal length (cm)=1.3, petal width (cm)=0.2\n", "Predicted: setosa\t True: setosa\n", "\n", "Instance # 21: \n", "sepal length (cm)=5.1, sepal width (cm)=3.7, petal length (cm)=1.5, petal width (cm)=0.4\n", "Predicted: setosa\t True: setosa\n", "\n", "Instance # 9: \n", "sepal length (cm)=4.9, sepal width (cm)=3.1, petal length (cm)=1.5, petal width (cm)=0.1\n", "Predicted: setosa\t True: setosa\n", "\n", "Instance # 103: \n", "sepal length (cm)=6.3, sepal width (cm)=2.9, petal length (cm)=5.6, petal width (cm)=1.8\n", "Predicted: virginica\t True: virginica\n", "\n", "Instance # 67: \n", "sepal length (cm)=5.8, sepal width (cm)=2.7, petal length (cm)=4.1, petal width (cm)=1.0\n", "Predicted: versicolor\t True: versicolor\n", "\n", "Instance # 117: \n", "sepal length (cm)=7.7, sepal width (cm)=3.8, petal length (cm)=6.7, petal width (cm)=2.2\n", "Predicted: virginica\t True: virginica\n", "\n", "Instance # 47: \n", "sepal length (cm)=4.6, sepal width (cm)=3.2, petal length (cm)=1.4, petal width (cm)=0.2\n", "Predicted: setosa\t True: setosa\n", "\n" ] } ], "source": [ "for i in range(len(iris_y_test)): \n", " print(\"Instance # \"+str(indices_test[i])+\": \")\n", " s=\"\"\n", " for j in range(len(iris.feature_names)):\n", " s=s+iris.feature_names[j]+\"=\"+str(iris_X_test[i][j])\n", " if (j\n", "\n", "\n", "\n", "\n", "\n", "Tree\n", "\n", "\n", "\n", "0\n", "\n", "petal length (cm) ≤ 2.45\n", "entropy = 1.585\n", "samples = 150\n", "value = [50, 50, 50]\n", "class = setosa\n", "\n", "\n", "\n", "1\n", "\n", "entropy = 0.0\n", "samples = 50\n", "value = [50, 0, 0]\n", "class = setosa\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "True\n", "\n", "\n", "\n", "2\n", "\n", "petal width (cm) ≤ 1.75\n", "entropy = 1.0\n", "samples = 100\n", "value = [0, 50, 50]\n", "class = versicolor\n", "\n", "\n", "\n", "0->2\n", "\n", "\n", "False\n", "\n", "\n", "\n", "3\n", "\n", "petal length (cm) ≤ 4.95\n", "entropy = 0.445\n", "samples = 54\n", "value = [0, 49, 5]\n", "class = versicolor\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "\n", "\n", "\n", "8\n", "\n", "petal length (cm) ≤ 4.95\n", "entropy = 0.151\n", "samples = 46\n", "value = [0, 1, 45]\n", "class = virginica\n", "\n", "\n", "\n", "2->8\n", "\n", "\n", "\n", "\n", "\n", "4\n", "\n", "sepal length (cm) ≤ 5.15\n", "entropy = 0.146\n", "samples = 48\n", "value = [0, 47, 1]\n", "class = versicolor\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n", "\n", "7\n", "\n", "entropy = 0.918\n", "samples = 6\n", "value = [0, 2, 4]\n", "class = virginica\n", "\n", "\n", "\n", "3->7\n", "\n", "\n", "\n", "\n", "\n", "5\n", "\n", "entropy = 0.722\n", "samples = 5\n", "value = [0, 4, 1]\n", "class = versicolor\n", "\n", "\n", "\n", "4->5\n", "\n", "\n", "\n", "\n", "\n", "6\n", "\n", "entropy = 0.0\n", "samples = 43\n", "value = [0, 43, 0]\n", "class = versicolor\n", "\n", "\n", "\n", "4->6\n", "\n", "\n", "\n", "\n", "\n", "9\n", "\n", "entropy = 0.65\n", "samples = 6\n", "value = [0, 1, 5]\n", "class = virginica\n", "\n", "\n", "\n", "8->9\n", "\n", "\n", "\n", "\n", "\n", "10\n", "\n", "entropy = 0.0\n", "samples = 40\n", "value = [0, 0, 40]\n", "class = virginica\n", "\n", "\n", "\n", "8->10\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dot_data = tree.export_graphviz(clf, out_file=None, \n", " feature_names=iris.feature_names, \n", " class_names=iris.target_names, \n", " filled=True, rounded=True, \n", " special_characters=True) \n", "graph = graphviz.Source(dot_data) \n", "graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Artificial inflation" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Generate a random permutation of the indices of examples that will be later used \n", "# for the training and the test set\n", "import numpy as np\n", "np.random.seed(42)\n", "indices = np.random.permutation(len(iris.data))\n", "\n", "# We now decide to keep the last 10 indices for test set, the remaining for the training set\n", "indices_training=indices[:-10]\n", "indices_test=indices[-10:]\n", "\n", "iris_X_train = iris.data[indices_training] # keep for training all the matrix elements with the exception of the last 10 \n", "iris_y_train = iris.target[indices_training]\n", "iris_X_test = iris.data[indices_test] # keep the last 10 elements for test set\n", "iris_y_test = iris.target[indices_test]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "samples_x = []\n", "samples_y = []\n", "for i in range(0, len(iris_y_train)):\n", " if iris_y_train[i] == 1:\n", " for _ in range(9):\n", " samples_x.append(iris_X_train[i])\n", " samples_y.append(1)\n", " elif iris_y_train[i] == 2:\n", " for _ in range(9):\n", " samples_x.append(iris_X_train[i])\n", " samples_y.append(2)\n", "\n", "#Samples inflation\n", "iris_X_train = np.append(iris_X_train, samples_x, axis = 0)\n", "iris_y_train = np.append(iris_y_train, samples_y, axis = 0)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 0.9\n", "F1: 0.9153439153439153\n" ] } ], "source": [ "clf = tree.DecisionTreeClassifier(criterion=\"entropy\",random_state=300,min_samples_leaf=10,class_weight={0:1,1:1,2:1})\n", "clf = clf.fit(iris_X_train, iris_y_train)\n", "predicted_y_test = clf.predict(iris_X_test)\n", "acc_score = accuracy_score(iris_y_test, predicted_y_test)\n", "f1 = f1_score(iris_y_test, predicted_y_test, average='macro')\n", "print(\"Accuracy: \", acc_score)\n", "print(\"F1: \", f1)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "Tree\n", "\n", "\n", "\n", "0\n", "\n", "petal length (cm) ≤ 4.75\n", "entropy = 1.235\n", "samples = 968\n", "value = [48, 460, 460]\n", "class = versicolor\n", "\n", "\n", "\n", "1\n", "\n", "petal length (cm) ≤ 2.45\n", "entropy = 0.491\n", "samples = 448\n", "value = [48, 400, 0]\n", "class = versicolor\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "True\n", "\n", "\n", "\n", "4\n", "\n", "petal width (cm) ≤ 1.75\n", "entropy = 0.516\n", "samples = 520\n", "value = [0, 60, 460]\n", "class = virginica\n", "\n", "\n", "\n", "0->4\n", "\n", "\n", "False\n", "\n", "\n", "\n", "2\n", "\n", "entropy = 0.0\n", "samples = 48\n", "value = [48, 0, 0]\n", "class = setosa\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "\n", "\n", "\n", "3\n", "\n", "entropy = 0.0\n", "samples = 400\n", "value = [0, 400, 0]\n", "class = versicolor\n", "\n", "\n", "\n", "1->3\n", "\n", "\n", "\n", "\n", "\n", "5\n", "\n", "petal length (cm) ≤ 4.95\n", "entropy = 0.991\n", "samples = 90\n", "value = [0, 50, 40]\n", "class = versicolor\n", "\n", "\n", "\n", "4->5\n", "\n", "\n", "\n", "\n", "\n", "12\n", "\n", "petal length (cm) ≤ 4.85\n", "entropy = 0.159\n", "samples = 430\n", "value = [0, 10, 420]\n", "class = virginica\n", "\n", "\n", "\n", "4->12\n", "\n", "\n", "\n", "\n", "\n", "6\n", "\n", "entropy = 0.0\n", "samples = 30\n", "value = [0, 30, 0]\n", "class = versicolor\n", "\n", "\n", "\n", "5->6\n", "\n", "\n", "\n", "\n", "\n", "7\n", "\n", "petal width (cm) ≤ 1.55\n", "entropy = 0.918\n", "samples = 60\n", "value = [0, 20, 40]\n", "class = virginica\n", "\n", "\n", "\n", "5->7\n", "\n", "\n", "\n", "\n", "\n", "8\n", "\n", "entropy = 0.0\n", "samples = 30\n", "value = [0, 0, 30]\n", "class = virginica\n", "\n", "\n", "\n", "7->8\n", "\n", "\n", "\n", "\n", "\n", "9\n", "\n", "petal length (cm) ≤ 5.45\n", "entropy = 0.918\n", "samples = 30\n", "value = [0, 20, 10]\n", "class = versicolor\n", "\n", "\n", "\n", "7->9\n", "\n", "\n", "\n", "\n", "\n", "10\n", "\n", "entropy = 0.0\n", "samples = 20\n", "value = [0, 20, 0]\n", "class = versicolor\n", "\n", "\n", "\n", "9->10\n", "\n", "\n", "\n", "\n", "\n", "11\n", "\n", "entropy = 0.0\n", "samples = 10\n", "value = [0, 0, 10]\n", "class = virginica\n", "\n", "\n", "\n", "9->11\n", "\n", "\n", "\n", "\n", "\n", "13\n", "\n", "sepal width (cm) ≤ 3.1\n", "entropy = 0.918\n", "samples = 30\n", "value = [0, 10, 20]\n", "class = virginica\n", "\n", "\n", "\n", "12->13\n", "\n", "\n", "\n", "\n", "\n", "16\n", "\n", "entropy = 0.0\n", "samples = 400\n", "value = [0, 0, 400]\n", "class = virginica\n", "\n", "\n", "\n", "12->16\n", "\n", "\n", "\n", "\n", "\n", "14\n", "\n", "entropy = 0.0\n", "samples = 20\n", "value = [0, 0, 20]\n", "class = virginica\n", "\n", "\n", "\n", "13->14\n", "\n", "\n", "\n", "\n", "\n", "15\n", "\n", "entropy = 0.0\n", "samples = 10\n", "value = [0, 10, 0]\n", "class = versicolor\n", "\n", "\n", "\n", "13->15\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dot_data = tree.export_graphviz(clf, out_file=None, \n", " feature_names=iris.feature_names, \n", " class_names=iris.target_names, \n", " filled=True, rounded=True, \n", " special_characters=True) \n", "graph = graphviz.Source(dot_data) \n", "graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Class weights" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "# Generate a random permutation of the indices of examples that will be later used \n", "# for the training and the test set\n", "import numpy as np\n", "np.random.seed(1231)\n", "indices = np.random.permutation(len(iris.data))\n", "\n", "# We now decide to keep the last 10 indices for test set, the remaining for the training set\n", "indices_training=indices[:-10]\n", "indices_test=indices[-10:]\n", "\n", "iris_X_train = iris.data[indices_training] # keep for training all the matrix elements with the exception of the last 10 \n", "iris_y_train = iris.target[indices_training]\n", "iris_X_test = iris.data[indices_test] # keep the last 10 elements for test set\n", "iris_y_test = iris.target[indices_test]" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 0.8\n", "F1: 0.5\n" ] } ], "source": [ "clf = tree.DecisionTreeClassifier(criterion=\"entropy\",random_state=300,min_samples_leaf=5,class_weight={0:1,1:10,2:10})\n", "clf = clf.fit(iris_X_train, iris_y_train)\n", "predicted_y_test = clf.predict(iris_X_test)\n", "acc_score = accuracy_score(iris_y_test, predicted_y_test)\n", "f1 = f1_score(iris_y_test, predicted_y_test, average='macro')\n", "print(\"Accuracy: \", acc_score)\n", "print(\"F1: \", f1)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "Tree\n", "\n", "\n", "\n", "0\n", "\n", "petal length (cm) ≤ 4.85\n", "entropy = 1.211\n", "samples = 140\n", "value = [43, 480, 490]\n", "class = virginica\n", "\n", "\n", "\n", "1\n", "\n", "petal length (cm) ≤ 2.45\n", "entropy = 0.648\n", "samples = 90\n", "value = [43, 450, 20]\n", "class = versicolor\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "True\n", "\n", "\n", "\n", "8\n", "\n", "petal width (cm) ≤ 1.75\n", "entropy = 0.327\n", "samples = 50\n", "value = [0, 30, 470]\n", "class = virginica\n", "\n", "\n", "\n", "0->8\n", "\n", "\n", "False\n", "\n", "\n", "\n", "2\n", "\n", "entropy = 0.0\n", "samples = 43\n", "value = [43, 0, 0]\n", "class = setosa\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "\n", "\n", "\n", "3\n", "\n", "petal width (cm) ≤ 1.45\n", "entropy = 0.254\n", "samples = 47\n", "value = [0, 450, 20]\n", "class = versicolor\n", "\n", "\n", "\n", "1->3\n", "\n", "\n", "\n", "\n", "\n", "4\n", "\n", "entropy = 0.0\n", "samples = 35\n", "value = [0, 350, 0]\n", "class = versicolor\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n", "\n", "5\n", "\n", "sepal length (cm) ≤ 6.1\n", "entropy = 0.65\n", "samples = 12\n", "value = [0, 100, 20]\n", "class = versicolor\n", "\n", "\n", "\n", "3->5\n", "\n", "\n", "\n", "\n", "\n", "6\n", "\n", "entropy = 0.863\n", "samples = 7\n", "value = [0, 50, 20]\n", "class = versicolor\n", "\n", "\n", "\n", "5->6\n", "\n", "\n", "\n", "\n", "\n", "7\n", "\n", "entropy = 0.0\n", "samples = 5\n", "value = [0, 50, 0]\n", "class = versicolor\n", "\n", "\n", "\n", "5->7\n", "\n", "\n", "\n", "\n", "\n", "9\n", "\n", "entropy = 0.985\n", "samples = 7\n", "value = [0, 30, 40]\n", "class = virginica\n", "\n", "\n", "\n", "8->9\n", "\n", "\n", "\n", "\n", "\n", "10\n", "\n", "entropy = 0.0\n", "samples = 43\n", "value = [0, 0, 430]\n", "class = virginica\n", "\n", "\n", "\n", "8->10\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dot_data = tree.export_graphviz(clf, out_file=None, \n", " feature_names=iris.feature_names, \n", " class_names=iris.target_names, \n", " filled=True, rounded=True, \n", " special_characters=True) \n", "graph = graphviz.Source(dot_data) \n", "graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3. Avoid overfitting" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "# Generate a random permutation of the indices of examples that will be later used \n", "# for the training and the test set\n", "import numpy as np\n", "np.random.seed(42)\n", "indices = np.random.permutation(len(iris.data))\n", "\n", "# We now decide to keep the last 10 indices for test set, the remaining for the training set\n", "indices_training=indices[:-10]\n", "indices_test=indices[-10:]\n", "\n", "iris_X_train = iris.data[indices_training] # keep for training all the matrix elements with the exception of the last 10 \n", "iris_y_train = iris.target[indices_training]\n", "iris_X_test = iris.data[indices_test] # keep the last 10 elements for test set\n", "iris_y_test = iris.target[indices_test]" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 0.9\n", "F1: 0.9153439153439153\n" ] } ], "source": [ "clf = tree.DecisionTreeClassifier(criterion=\"entropy\",random_state=300,min_samples_leaf=3,class_weight={0:1,1:10,2:10}, min_impurity_decrease = 0.005, max_depth = 4, max_leaf_nodes = 6)\n", "clf = clf.fit(iris_X_train, iris_y_train)\n", "predicted_y_test = clf.predict(iris_X_test)\n", "acc_score = accuracy_score(iris_y_test, predicted_y_test)\n", "f1 = f1_score(iris_y_test, predicted_y_test, average='macro')\n", "print(\"Accuracy: \", acc_score)\n", "print(\"F1: \", f1)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "Tree\n", "\n", "\n", "\n", "0\n", "\n", "petal length (cm) ≤ 4.85\n", "entropy = 1.211\n", "samples = 140\n", "value = [43, 480, 490]\n", "class = virginica\n", "\n", "\n", "\n", "1\n", "\n", "petal length (cm) ≤ 2.45\n", "entropy = 0.648\n", "samples = 90\n", "value = [43, 450, 20]\n", "class = versicolor\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "True\n", "\n", "\n", "\n", "2\n", "\n", "petal width (cm) ≤ 1.75\n", "entropy = 0.327\n", "samples = 50\n", "value = [0, 30, 470]\n", "class = virginica\n", "\n", "\n", "\n", "0->2\n", "\n", "\n", "False\n", "\n", "\n", "\n", "3\n", "\n", "entropy = 0.0\n", "samples = 43\n", "value = [43, 0, 0]\n", "class = setosa\n", "\n", "\n", "\n", "1->3\n", "\n", "\n", "\n", "\n", "\n", "4\n", "\n", "petal width (cm) ≤ 1.65\n", "entropy = 0.254\n", "samples = 47\n", "value = [0, 450, 20]\n", "class = versicolor\n", "\n", "\n", "\n", "1->4\n", "\n", "\n", "\n", "\n", "\n", "7\n", "\n", "entropy = 0.0\n", "samples = 44\n", "value = [0, 440, 0]\n", "class = versicolor\n", "\n", "\n", "\n", "4->7\n", "\n", "\n", "\n", "\n", "\n", "8\n", "\n", "entropy = 0.918\n", "samples = 3\n", "value = [0, 10, 20]\n", "class = virginica\n", "\n", "\n", "\n", "4->8\n", "\n", "\n", "\n", "\n", "\n", "5\n", "\n", "petal length (cm) ≤ 5.05\n", "entropy = 0.985\n", "samples = 7\n", "value = [0, 30, 40]\n", "class = virginica\n", "\n", "\n", "\n", "2->5\n", "\n", "\n", "\n", "\n", "\n", "6\n", "\n", "entropy = 0.0\n", "samples = 43\n", "value = [0, 0, 430]\n", "class = virginica\n", "\n", "\n", "\n", "2->6\n", "\n", "\n", "\n", "\n", "\n", "9\n", "\n", "entropy = 0.918\n", "samples = 3\n", "value = [0, 20, 10]\n", "class = versicolor\n", "\n", "\n", "\n", "5->9\n", "\n", "\n", "\n", "\n", "\n", "10\n", "\n", "entropy = 0.811\n", "samples = 4\n", "value = [0, 10, 30]\n", "class = virginica\n", "\n", "\n", "\n", "5->10\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dot_data = tree.export_graphviz(clf, out_file=None, \n", " feature_names=iris.feature_names, \n", " class_names=iris.target_names, \n", " filled=True, rounded=True, \n", " special_characters=True) \n", "graph = graphviz.Source(dot_data) \n", "graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4. Confusion Matrix" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([[2, 0, 0],\n", " [0, 4, 0],\n", " [0, 1, 3]])" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# initializes the confusion matrix\n", "confusion = np.zeros([3, 3], dtype = int)\n", "\n", "# print the corresponding instances indexes and class names\n", "for i in range(len(iris_y_test)): \n", " #increments the indexed cell value\n", " confusion[iris_y_test[i], predicted_y_test[i]]+=1\n", "confusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5. ROC Curves" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[(0.0, 48.0), (30.0, 0.0), (30.0, 0.0), (60.0, 0.0), (400.0, 0.0), (400.0, 0.0)], [(0.0, 400.0), (0.0, 30.0), (20.0, 10.0), (40.0, 20.0), (48.0, 0.0), (400.0, 0.0)], [(0.0, 400.0), (10.0, 20.0), (20.0, 40.0), (30.0, 0.0), (48.0, 0.0), (400.0, 0.0)]]\n" ] }, { "data": { "text/plain": [ "[[[0, 0.0, 30.0, 60.0, 120.0, 520.0, 920.0],\n", " [0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0]],\n", " [[0, 0.0, 0.0, 20.0, 60.0, 108.0, 508.0],\n", " [0, 400.0, 430.0, 440.0, 460.0, 460.0, 460.0]],\n", " [[0, 0.0, 10.0, 30.0, 60.0, 108.0, 508.0],\n", " [0, 400.0, 420.0, 460.0, 460.0, 460.0, 460.0]]]" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Calculates the ROC curves (x, y)\n", "leafs = []\n", "class_pairs = [[],[],[]]\n", "roc_curves = [[[0], [0]], [[0], [0]], [[0], [0]]]\n", "for i in range(clf.tree_.node_count):\n", " if (clf.tree_.feature[i] == -2):\n", " leafs.append(i)\n", "\n", "# c = class index\n", "for leaf in leafs:\n", " for c in range(3):\n", " #pairs(neg, pos)\n", " class_pairs[c].append((clf.tree_.value[leaf][0].sum() - clf.tree_.value[leaf][0][c], clf.tree_.value[leaf][0][c]))\n", "\n", "#pairs sorting\n", "for c in range(3):\n", " class_pairs[c] = sorted(class_pairs[c], key=lambda t: t[0]/max(1,t[1]))\n", "print(class_pairs)\n", "\n", "for i in range(1, len(leafs) + 1):\n", " for c in range(3):\n", " roc_curves[c][0].append(class_pairs[c][i - 1][0] + roc_curves[c][0][i - 1])\n", " roc_curves[c][1].append(class_pairs[c][i - 1][1] + roc_curves[c][1][i - 1])\n", "\n", "roc_curves" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD7CAYAAABzGc+QAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAANSklEQVR4nO3dX4xc9XmH8edbm0BLqgBha7kY10RYRKgSJlohELlIIbQ0jQIXCIGi1heWfJOqpI2UQnsVqRdBqgJUqqJYIa1VhQAlpCAUJaUOqKpUOVkXSgBDMQQSLIOXFJK0F22dvL2YY7wstne8nvX69T4fabTn3+z85vj48dmzM55UFZKkfn5puQcgSVocAy5JTRlwSWrKgEtSUwZckpoy4JLU1OpxNkryMvAz4OfAgaqaTnIOcB+wAXgZuLGq3lyaYUqS5juWM/DfqqpNVTU9zN8K7KiqjcCOYV6SdIJknDfyDGfg01X1xpxlzwMfqap9SdYCj1fVRUf7Pueee25t2LDh+EYsSSvMrl273qiqqfnLx7qEAhTwj0kK+FJVbQPWVNW+Yf1rwJqFvsmGDRuYmZkZd8ySJCDJK4dbPm7AP1xVe5P8GvBokufmrqyqGuJ+uAfeCmwFWL9+/TEMWZJ0NGNdA6+qvcPX/cA3gMuA14dLJwxf9x/hvtuqarqqpqem3vUTgCRpkRYMeJIzk/zqwWngt4GngYeBzcNmm4GHlmqQkqR3G+cSyhrgG0kObn9PVX0ryfeA+5NsAV4Bbly6YUqS5lsw4FX1EnDJYZb/GLh6KQYlSVqY78SUpKYMuCQ1Ne7LCJfXtm1wzz3LPQpJWpxNm+DOOyf+bXucgd9zDzz55HKPQpJOKj3OwGH0L9jjjy/3KCTppNHjDFyS9C4GXJKaMuCS1JQBl6SmDLgkNWXAJakpAy5JTRlwSWrKgEtSUwZckpoy4JLUlAGXpKYMuCQ1ZcAlqSkDLklNGXBJasqAS1JTBlySmjLgktSUAZekpgy4JDVlwCWpKQMuSU0ZcElqyoBLUlMGXJKaMuCS1NTYAU+yKskTSR4Z5i9IsjPJniT3JXnP0g1TkjTfsZyB3wLsnjN/O3BHVV0IvAlsmeTAJElHN1bAk6wDfg/48jAf4CrggWGT7cD1SzFASdLhjXsGfifwWeAXw/z7gbeq6sAw/ypw3oTHJkk6igUDnuTjwP6q2rWYB0iyNclMkpnZ2dnFfAtJ0mGMcwZ+JfCJJC8D9zK6dHIXcFaS1cM264C9h7tzVW2rqumqmp6amprAkCVJMEbAq+q2qlpXVRuAm4DvVNUngceAG4bNNgMPLdkoJUnvcjyvA/9T4E+S7GF0TfzuyQxJkjSO1QtvckhVPQ48Pky/BFw2+SFJksbhOzElqSkDLklNGXBJasqAS1JTBlySmjLgktSUAZekpgy4JDVlwCWpKQMuSU0ZcElqyoBLUlMGXJKaMuCS1JQBl6SmDLgkNWXAJakpAy5JTRlwSWrKgEtSUwZckpoy4JLUlAGXpKYMuCQ1ZcAlqSkDLklNGXBJasqAS1JTBlySmjLgktSUAZekpgy4JDVlwCWpqQUDnuSMJN9N8u9JnknyuWH5BUl2JtmT5L4k71n64UqSDhrnDPx/gKuq6hJgE3BtksuB24E7qupC4E1gy9INU5I034IBr5H/GmZPG24FXAU8MCzfDly/JCOUJB3WWNfAk6xK8iSwH3gUeBF4q6oODJu8Cpx3hPtuTTKTZGZ2dnYSY5YkMWbAq+rnVbUJWAdcBnxw3Aeoqm1VNV1V01NTU4scpiRpvmN6FUpVvQU8BlwBnJVk9bBqHbB3wmOTJB3FOK9CmUpy1jD9y8A1wG5GIb9h2Gwz8NBSDVKS9G6rF96EtcD2JKsYBf/+qnokybPAvUn+AngCuHsJxylJmmfBgFfVU8Clh1n+EqPr4ZKkZeA7MSWpKQMuSU0ZcElqyoBLUlMGXJKaMuCS1JQBl6SmDLgkNWXAJakpAy5JTRlwSWrKgEtSUwZckpoy4JLUlAGXpKYMuCQ1ZcAlqSkDLklNGXBJasqAS1JTBlySmjLgktSUAZekpgy4JDVlwCWpKQMuSU0ZcElqyoBLUlMGXJKaMuCS1JQBl6SmDLgkNbVgwJOcn+SxJM8meSbJLcPyc5I8muSF4evZSz9cSdJB45yBHwA+U1UXA5cDn0pyMXArsKOqNgI7hnlJ0gmyYMCral9V/dsw/TNgN3AecB2wfdhsO3D9Ug1SkvRux3QNPMkG4FJgJ7CmqvYNq14D1kx0ZJKkoxo74EneC3wd+HRV/XTuuqoqoI5wv61JZpLMzM7OHtdgJUmHjBXwJKcxivdXq+rBYfHrSdYO69cC+w9336raVlXTVTU9NTU1iTFLkhjvVSgB7gZ2V9UX5qx6GNg8TG8GHpr88CRJR7J6jG2uBH4f+H6SJ4dlfwZ8Hrg/yRbgFeDGpRmiJOlwFgx4Vf0LkCOsvnqyw5Ekjct3YkpSUwZckpoy4JLUlAGXpKYMuCQ1ZcAlqSkDLklNGXBJasqAS1JTBlySmjLgktSUAZekpgy4JDVlwCWpKQMuSU0ZcElqyoBLUlMGXJKaMuCS1JQBl6SmDLgkNWXAJakpAy5JTRlwSWrKgEtSUwZckpoy4JLUlAGXpKYMuCQ1ZcAlqSkDLklNGXBJamrBgCf5SpL9SZ6es+ycJI8meWH4evbSDlOSNN84Z+B/C1w7b9mtwI6q2gjsGOYlSSfQggGvqn8G/nPe4uuA7cP0duD6CY9LkrSAxV4DX1NV+4bp14A1ExqPJGlMx/1LzKoqoI60PsnWJDNJZmZnZ4/34SRJg8UG/PUkawGGr/uPtGFVbauq6aqanpqaWuTDSZLmW2zAHwY2D9ObgYcmMxxJ0rjGeRnh14B/BS5K8mqSLcDngWuSvAB8dJiXJJ1AqxfaoKpuPsKqqyc8FknSMfCdmJLUlAGXpKYMuCQ1ZcAlqSkDLklNGXBJasqAS1JTBlySmjLgktSUAZekpgy4JDVlwCWpKQMuSU0ZcElqyoBLUlMGXJKaMuCS1JQBl6SmDLgkNWXAJakpAy5JTRlwSWrKgEtSUwZckpoy4JLUlAGXpKYMuCQ1ZcAlqSkDLklNGXBJasqAS1JTBlySmjqugCe5NsnzSfYkuXVSg5IkLWzRAU+yCvhr4HeBi4Gbk1w8qYFJko7ueM7ALwP2VNVLVfW/wL3AdZMZliRpIccT8POAH82Zf3VYJkk6AVYv9QMk2QpsBVi/fv3ivsmmTRMckSSdGo4n4HuB8+fMrxuWvUNVbQO2AUxPT9eiHunOOxd1N0k6lR3PJZTvARuTXJDkPcBNwMOTGZYkaSGLPgOvqgNJ/hD4NrAK+EpVPTOxkUmSjuq4roFX1TeBb05oLJKkY+A7MSWpKQMuSU0ZcElqyoBLUlMGXJKaStXi3luzqAdLZoFXFnn3c4E3JjicrtwPI+6HQ9wXI6fyfviNqpqav/CEBvx4JJmpqunlHsdycz+MuB8OcV+MrMT94CUUSWrKgEtSU50Cvm25B3CScD+MuB8OcV+MrLj90OYauCTpnTqdgUuS5mgR8JX04clJzk/yWJJnkzyT5JZh+TlJHk3ywvD17GF5kvzVsG+eSvKh5X0Gk5NkVZInkjwyzF+QZOfwXO8b/htjkpw+zO8Z1m9YznFPWpKzkjyQ5Lkku5NcsUKPhz8e/k48neRrSc5YqcfEQSd9wFfghycfAD5TVRcDlwOfGp7vrcCOqtoI7BjmYbRfNg63rcAXT/yQl8wtwO4587cDd1TVhcCbwJZh+RbgzWH5HcN2p5K7gG9V1QeBSxjtkxV1PCQ5D/gjYLqqfpPRf2F9Eyv3mBipqpP6BlwBfHvO/G3Abcs9rhP4/B8CrgGeB9YOy9YCzw/TXwJunrP929t1vjH6hKcdwFXAI0AYvUlj9fzjgtH/SX/FML162C7L/RwmtB/eB/xg/vNZgcfDwc/gPWf4M34E+J2VeEzMvZ30Z+Cs4A9PHn7suxTYCaypqn3DqteANcP0qbp/7gQ+C/ximH8/8FZVHRjm5z7Pt/fBsP4nw/angguAWeBvhstJX05yJivseKiqvcBfAj8E9jH6M97Fyjwm3tYh4CtSkvcCXwc+XVU/nbuuRqcVp+zLh5J8HNhfVbuWeywngdXAh4AvVtWlwH9z6HIJcOofDwDDNf7rGP2D9uvAmcC1yzqok0CHgI/14cmnkiSnMYr3V6vqwWHx60nWDuvXAvuH5afi/rkS+ESSl4F7GV1GuQs4K8nBT5Ga+zzf3gfD+vcBPz6RA15CrwKvVtXOYf4BRkFfSccDwEeBH1TVbFX9H/Ago+NkJR4Tb+sQ8BX14clJAtwN7K6qL8xZ9TCweZjezOja+MHlfzC8+uBy4CdzfrRuqapuq6p1VbWB0Z/3d6rqk8BjwA3DZvP3wcF9c8Ow/SlxRlpVrwE/SnLRsOhq4FlW0PEw+CFweZJfGf6OHNwPK+6YeIflvgg/5i8wPgb8B/Ai8OfLPZ4lfq4fZvTj8FPAk8PtY4yu3+0AXgD+CThn2D6MXqXzIvB9Rr+lX/bnMcH98RHgkWH6A8B3gT3A3wOnD8vPGOb3DOs/sNzjnvA+2ATMDMfEPwBnr8TjAfgc8BzwNPB3wOkr9Zg4ePOdmJLUVIdLKJKkwzDgktSUAZekpgy4JDVlwCWpKQMuSU0ZcElqyoBLUlP/DwNqiysvGuY6AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD4CAYAAAAXUaZHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAPIElEQVR4nO3dfYydZZmA8eu2BaqySwUGxLbZwVijJC7VVKxREoG4FlBKtBIQtTFNmhg2wUjiAia7MVkTPxLxIxsCWYx1+RKQTRskYbttje4fVAeptdBFRgNpm2pHPuoSglq494/3KTnUlpnOnJnTuXv9ksm87/O+M+d5yuHq6TvnzInMRJJUy2sGPQFJUv8Zd0kqyLhLUkHGXZIKMu6SVNDcQU8A4NRTT83h4eFBT0OSZpWHHnroD5k5dKhjR0Xch4eHGRkZGfQ0JGlWiYgnD3fMyzKSVJBxl6SCjLskFWTcJakg4y5JBRl3SSrIuEtSQUfF89xL+MtfYMMG2LIF/DXKkibqIx+Bd7+779/WuE9FJvz853DrrXDnnTA21o1HDHZekmaPN73JuB81Rkfhttu6qI+Owrx5cMkl8MlPwoc+BMcfP+gZSjrGGfeJGhuDu+7qgv7gg92j8/POg+uvh49+FE46adAzlKSXGfdX8/zzsH59F/QHHoD9++Hss+HrX4crroAFCwY9Q0k6JON+sBdfhE2buqDfey889xwsXAjXXANXXgnveMegZyhJ4zLu0P1gdOvWLuh33AF79nSXWS6/vLuOfu658BqfNSpp9ji24/7EE3D77V3Ud+yA446Diy/ugn7xxd0PSiVpFjr24v7003DPPV3Qf/rTbuzcc+Gmm2DlSjj55MHOT5L64NiI+wsvwI9+1AX9/vvhz3+Gt78dvvxl+MQnwHeBklRM3bi/9BL85Cfd89Hvvhv27YM3vhGuugo+9SlYssQXG0kqq17ct2/vHqHffjvs3Aknnggf+1h3Hf2882DOnEHPUJKmXY2479rVPcvl1lth27Yu4MuXw9e+1r1y9HWvG/QMJWlGze64//rX8NnPwubN3dMZ3/Me+M534LLL4LTTBj07SRqY2R33deu6Fxx98YuwahUsXjzoGUnSUWF2x/2A666D179+0LOQpKOGL7uUpIKMuyQVZNwlqSDjLkkFGXdJKsi4S1JBxl2SCjLuklTQhOMeEXMi4uGIuK/tnxkRWyJiNCJ+EBHHt/ET2v5oOz48PVOXJB3OkTxyvxrY0bP/VeCGzHwL8Aywuo2vBp5p4ze08yRJM2hCcY+IhcDFwL+3/QDOB+5pp6wFLm3bK9o+7fgF7XxJ0gyZ6CP3bwJfAF5q+6cAz2bm/ra/C1jQthcAOwHa8X3t/FeIiDURMRIRI2NjY5OcviTpUMaNe0R8GNibmQ/184Yz8+bMXJqZS4eGhvr5rSXpmDeR3wr5PuCSiLgImAf8LfAtYH5EzG2PzhcCu9v5u4FFwK6ImAucBDzV95lLkg5r3EfumXldZi7MzGHgcmBTZl4JbAZWttNWAeva9vq2Tzu+KTOzr7OWJL2qqTzP/Z+Az0fEKN019Vva+C3AKW3888C1U5uiJOlIHdGbdWTmj4Eft+3fAucc4pwXgI/3YW6SpEnyFaqSVJBxl6SCjLskFWTcJakg4y5JBRl3SSrIuEtSQcZdkgoy7pJUkHGXpIKMuyQVZNwlqSDjLkkFGXdJKsi4S1JBxl2SCjLuklSQcZekgoy7JBVk3CWpIOMuSQUZd0kqyLhLUkHGXZIKMu6SVJBxl6SCjLskFWTcJakg4y5JBRl3SSrIuEtSQcZdkgoy7pJUkHGXpILGjXtEzIuIn0XELyPikYj4Uhs/MyK2RMRoRPwgIo5v4ye0/dF2fHh6lyBJOthEHrn/CTg/M88GlgDLI2IZ8FXghsx8C/AMsLqdvxp4po3f0M6TJM2gceOenefa7nHtI4HzgXva+Frg0ra9ou3Tjl8QEdG3GUuSxjWha+4RMScitgJ7gQ3Ab4BnM3N/O2UXsKBtLwB2ArTj+4BTDvE910TESESMjI2NTW0VkqRXmFDcM/PFzFwCLATOAd421RvOzJszc2lmLh0aGprqt5Mk9TiiZ8tk5rPAZuC9wPyImNsOLQR2t+3dwCKAdvwk4Km+zFaSNCETebbMUETMb9uvBT4I7KCL/Mp22ipgXdte3/ZpxzdlZvZz0pKkVzd3/FM4A1gbEXPo/jK4KzPvi4hHgTsj4l+Bh4Fb2vm3AP8REaPA08Dl0zBvSdKrGDfumbkNeOchxn9Ld/394PEXgI/3ZXaSpEnxFaqSVJBxl6SCjLskFWTcJakg4y5JBRl3SSrIuEtSQcZdkgoy7pJUkHGXpIKMuyQVZNwlqSDjLkkFGXdJKsi4S1JBxl2SCjLuklSQcZekgoy7JBVk3CWpIOMuSQUZd0kqyLhLUkHGXZIKMu6SVJBxl6SCjLskFWTcJakg4y5JBRl3SSrIuEtSQcZdkgoy7pJUkHGXpILGjXtELIqIzRHxaEQ8EhFXt/GTI2JDRDzePr+hjUdEfDsiRiNiW0S8a7oXIUl6pYk8ct8PXJOZZwHLgKsi4izgWmBjZi4GNrZ9gAuBxe1jDXBj32ctSXpV48Y9M/dk5i/a9v8BO4AFwApgbTttLXBp214BfD87DwLzI+KMvs9cknRYR3TNPSKGgXcCW4DTM3NPO/Q74PS2vQDY2fNlu9rYwd9rTUSMRMTI2NjYEU5bkvRqJhz3iDgR+CHwucz8Y++xzEwgj+SGM/PmzFyamUuHhoaO5EslSeOYUNwj4ji6sN+Wmfe24d8fuNzSPu9t47uBRT1fvrCNSZJmyESeLRPALcCOzPxGz6H1wKq2vQpY1zP+6fasmWXAvp7LN5KkGTB3Aue8D/gU8KuI2NrGrge+AtwVEauBJ4HL2rH7gYuAUeB54DN9nbEkaVzjxj0z/weIwxy+4BDnJ3DVFOclSZoCX6EqSQUZd0kqyLhLUkHGXZIKMu6SVJBxl6SCjLskFWTcJakg4y5JBRl3SSrIuEtSQcZdkgoy7pJUkHGXpIKMuyQVZNwlqSDjLkkFGXdJKsi4S1JBxl2SCjLuklSQcZekgoy7JBVk3CWpIOMuSQUZd0kqyLhLUkHGXZIKMu6SVJBxl6SCjLskFWTcJakg4y5JBRl3SSpo3LhHxHcjYm9EbO8ZOzkiNkTE4+3zG9p4RMS3I2I0IrZFxLumc/KSpEObyCP37wHLDxq7FtiYmYuBjW0f4EJgcftYA9zYn2lKko7EuHHPzJ8ATx80vAJY27bXApf2jH8/Ow8C8yPijH5NVpI0MZO95n56Zu5p278DTm/bC4CdPeftamN/JSLWRMRIRIyMjY1NchqSpEOZ8g9UMzOBnMTX3ZyZSzNz6dDQ0FSnIUnqMdm4//7A5Zb2eW8b3w0s6jlvYRuTJM2gycZ9PbCqba8C1vWMf7o9a2YZsK/n8o0kaYbMHe+EiLgD+ABwakTsAv4F+ApwV0SsBp4ELmun3w9cBIwCzwOfmYY5S5LGMW7cM/OKwxy64BDnJnDVVCclSZoaX6EqSQUZd0kqyLhLUkHGXZIKMu6SVJBxl6SCjLskFWTcJakg4y5JBRl3SSrIuEtSQcZdkgoy7pJUkHGXpIKMuyQVZNwlqSDjLkkFGXdJKsi4S1JBxl2SCjLuklSQcZekgoy7JBVk3CWpIOMuSQUZd0kqyLhLUkHGXZIKMu6SVJBxl6SCjLskFWTcJakg4y5JBRl3SSpoWuIeEcsj4rGIGI2Ia6fjNiRJh9f3uEfEHODfgAuBs4ArIuKsft+OJOnwpuOR+znAaGb+NjP/DNwJrJiG25EkHcZ0xH0BsLNnf1cbe4WIWBMRIxExMjY2NrlbeutbYeVKmDNncl8vSUUN7AeqmXlzZi7NzKVDQ0OT+yYrVsDdd8O8ef2dnCTNctMR993Aop79hW1MkjRDpiPuPwcWR8SZEXE8cDmwfhpuR5J0GHP7/Q0zc39E/CPwADAH+G5mPtLv25EkHV7f4w6QmfcD90/H95Ykjc9XqEpSQcZdkgoy7pJUkHGXpIIiMwc9ByJiDHhykl9+KvCHPk7naOd6a3O9tfV7vX+XmYd8FehREfepiIiRzFw66HnMFNdbm+utbSbX62UZSSrIuEtSQRXifvOgJzDDXG9trre2GVvvrL/mLkn6axUeuUuSDmLcJamgWR33im/EHRHfjYi9EbG9Z+zkiNgQEY+3z29o4xER327r3xYR7xrczI9cRCyKiM0R8WhEPBIRV7fxquudFxE/i4hftvV+qY2fGRFb2rp+0H5VNhFxQtsfbceHBzn/yYqIORHxcETc1/bLrjcinoiIX0XE1ogYaWMDuT/P2rgXfiPu7wHLDxq7FtiYmYuBjW0furUvbh9rgBtnaI79sh+4JjPPApYBV7X/hlXX+yfg/Mw8G1gCLI+IZcBXgRsy8y3AM8Dqdv5q4Jk2fkM7bza6GtjRs199vedl5pKe57MP5v6cmbPyA3gv8EDP/nXAdYOeV5/WNgxs79l/DDijbZ8BPNa2bwKuONR5s/EDWAd88FhYL/A64BfAe+hesTi3jb98v6Z7T4T3tu257bwY9NyPcJ0L6YJ2PnAfEMXX+wRw6kFjA7k/z9pH7kzwjbiLOD0z97Tt3wGnt+0yfwbtn+DvBLZQeL3tEsVWYC+wAfgN8Gxm7m+n9K7p5fW24/uAU2Z2xlP2TeALwEtt/xRqrzeB/4qIhyJiTRsbyP15Wt6sQ9MnMzMiSj1/NSJOBH4IfC4z/xgRLx+rtt7MfBFYEhHzgf8E3jbgKU2biPgwsDczH4qIDwx6PjPk/Zm5OyJOAzZExP/2HpzJ+/NsfuR+LL0R9+8j4gyA9nlvG5/1fwYRcRxd2G/LzHvbcNn1HpCZzwKb6S5LzI+IAw+0etf08nrb8ZOAp2Z4qlPxPuCSiHgCuJPu0sy3qLteMnN3+7yX7i/vcxjQ/Xk2x/1YeiPu9cCqtr2K7tr0gfFPt5+6LwP29fzz76gX3UP0W4AdmfmNnkNV1zvUHrETEa+l+/nCDrrIr2ynHbzeA38OK4FN2S7OzgaZeV1mLszMYbr/Pzdl5pUUXW9EvD4i/ubANvAPwHYGdX8e9A8gpvjDi4uAX9Ndt/zioOfTpzXdAewB/kJ3DW413XXHjcDjwH8DJ7dzg+4ZQ78BfgUsHfT8j3Ct76e7RrkN2No+Liq83r8HHm7r3Q78cxt/M/AzYBS4Gzihjc9r+6Pt+JsHvYYprP0DwH2V19vW9cv28ciBJg3q/uyvH5CkgmbzZRlJ0mEYd0kqyLhLUkHGXZIKMu6SVJBxl6SCjLskFfT/ZVxDnwNir8YAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD4CAYAAAAXUaZHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAPPUlEQVR4nO3dfYydZZnH8e9ly4u7ulTaWYJt42CsURIRSWWrmMhCVGDV8gcYjUKjTeofGEEwLrCJK3E1EhKLxA1KFqSsRl58g5AmyraY9SWWHaQi0FXGF6QN2BELrK9QvPaPc5ccastMZ86Zh7nm+0lOzvPc933Oc93T01+f3uc5cyIzkSTV8ryuC5AkDZ7hLkkFGe6SVJDhLkkFGe6SVNDCrgsAWLJkSY6OjnZdhiTNKXfeeedvMnNkX33PiXAfHR1lbGys6zIkaU6JiAf21+eyjCQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGGuyQV9Jy4zr1z27fDNdfA7t1dVyJpvnnb2+C1rx340xruTzzR++Fu3QoRXVcjab558YsN96H42Md6wf6Nb8Dq1V1XI0kDMb/X3L/7Xbj0Uli71mCXVMr8DffHH4ezzoLRUVi/vutqJGmg5u+yzHnnwa9+Bd/5DrzwhV1XI0kDNT/P3L/+dfjCF+Cii+D1r++6GkkauPkX7g8/DOvWwXHHwUc/2nU1kjQU8yvcM3tvnv7ud/DFL8LBB3ddkSQNxfxac//852HjRrjiCnjlK7uuRpKGZv6cuf/0p3DBBfDmN8M553RdjSQN1fwI9yef7F32eMghvV8z8Lz5MW1J89f8WJb55Cfhjjvgxhth6dKuq5Gkoat/CrtlC3z84/Ce98CZZ3ZdjSTNitrh/vvf95Zjli6Fz36262okadbUXpb58IdhfBw2b4bDDuu6GkmaNXXP3DduhM99Ds4/H048setqJGlW1Qz3iQl43/vgVa+CT3yi62okadbVW5bJhPe/H3btgm99q3f5oyTNM/XC/dpre78Y7LLL4Jhjuq5GkjpRa1nmF7+AD34Q3vhG+NCHuq5GkjpTJ9yfegrOPrv36dMNG2DBgq4rkqTO1FmWueyy3tfmXXcdvOQlXVcjSZ2qceZ+1129381+5pm9T6JK0jw35XCPiAURcVdE3Nr2j4qILRExHhE3RMTBrf2Qtj/e+keHU3rzxz/2An3JErjySogY6uEkaS44kDP3c4FtffuXAusz82XALmBta18L7Grt69u44bn4Yrjvvt7X5i1ePNRDSdJcMaVwj4hlwD8B/9H2AzgJ+EobsgE4vW2vbvu0/pPb+MH73vfg8svhAx+At7xlKIeQpLloqmfulwMfAf7S9hcDj2bm7ra/Hdjzu3SXAg8CtP7H2vhniIh1ETEWEWMTExPTq/773+/dX3LJ9B4vSUVNGu4R8VZgZ2beOcgDZ+ZVmbkyM1eOjIzM7Mn8FKokPcNULoU8AXh7RJwGHAr8HfAZYFFELGxn58uAHW38DmA5sD0iFgKHAY8MvHJJ0n5NeuaemRdl5rLMHAXeCWzOzHcDtwNntGFrgJvb9i1tn9a/OTNzoFVLkp7VTK5z/2fg/IgYp7emfnVrvxpY3NrPBy6cWYmSpAN1QJ9QzcxvA99u2z8Hjt/HmD8Bfp+dJHWoxidUJUnPYLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGGuyQVZLhLUkGThntEHBoRd0TEjyLi3oi4pLUfFRFbImI8Im6IiINb+yFtf7z1jw53CpKkvU3lzP3PwEmZ+WrgWOCUiFgFXAqsz8yXAbuAtW38WmBXa1/fxkmSZtGk4Z49v2u7B7VbAicBX2ntG4DT2/bqtk/rPzkiYmAVS5ImNaU194hYEBFbgZ3AbcDPgEczc3cbsh1Y2raXAg8CtP7HgMX7eM51ETEWEWMTExMzm4Uk6RmmFO6Z+VRmHgssA44HXjHTA2fmVZm5MjNXjoyMzPTpJEl9Duhqmcx8FLgdeB2wKCIWtq5lwI62vQNYDtD6DwMeGUi1kqQpmcrVMiMRsahtPx94E7CNXsif0YatAW5u27e0fVr/5szMQRYtSXp2CycfwpHAhohYQO8fgxsz89aIuA+4PiL+DbgLuLqNvxr4z4gYB34LvHMIdUuSnsWk4Z6ZdwOv2Uf7z+mtv+/d/ifgzIFUJ0maFj+hKkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVJDhLkkFGe6SVNCk4R4RyyPi9oi4LyLujYhzW/vhEXFbRNzf7l/U2iMiroiI8Yi4OyKOG/YkJEnPNJUz993ABZl5NLAKOCcijgYuBDZl5gpgU9sHOBVY0W7rgCsHXrUk6VlNGu6Z+VBm/rBt/x+wDVgKrAY2tGEbgNPb9mrguuz5AbAoIo4ceOWSpP06oDX3iBgFXgNsAY7IzIda18PAEW17KfBg38O2t7a9n2tdRIxFxNjExMQBli1JejZTDveIeAHwVeC8zHy8vy8zE8gDOXBmXpWZKzNz5cjIyIE8VJI0iSmFe0QcRC/Yv5SZX2vNv96z3NLud7b2HcDyvocva22SpFkylatlArga2JaZn+7rugVY07bXADf3tZ/drppZBTzWt3wjSZoFC6cw5gTgLODHEbG1tV0MfAq4MSLWAg8A72h9G4HTgHHgD8B7B1qxJGlSk4Z7Zn4XiP10n7yP8QmcM8O6JEkz4CdUJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCjLcJakgw12SCpo03CPimojYGRH39LUdHhG3RcT97f5FrT0i4oqIGI+IuyPiuGEWL0nat6mcuV8LnLJX24XApsxcAWxq+wCnAivabR1w5WDKlCQdiEnDPTP/G/jtXs2rgQ1tewNwel/7ddnzA2BRRBw5qGIlSVMz3TX3IzLzobb9MHBE214KPNg3bntr+ysRsS4ixiJibGJiYpplSJL2ZcZvqGZmAjmNx12VmSszc+XIyMhMy5Ak9ZluuP96z3JLu9/Z2ncAy/vGLWttkqRZNN1wvwVY07bXADf3tZ/drppZBTzWt3wjSZolCycbEBFfBk4ElkTEduBfgU8BN0bEWuAB4B1t+EbgNGAc+APw3iHULEmaxKThnpnv2k/XyfsYm8A5My1KkjQzfkJVkgoy3CWpIMNdkgoy3CWpIMNdkgoy3CWpIMNdkgoy3CWpIMNdkgoy3CWpIMNdkgoy3CWpIMNdkgoy3CWpIMNdkgoy3CWpIMNdkgoy3CWpIMNdkgoy3CWpIMNdkgoy3CWpIMNdkgoy3CWpIMNdkgoy3CWpIMNdkgoy3CWpIMNdkgoy3CWpIMNdkgoy3CWpIMNdkgoy3CWpoKGEe0ScEhE/iYjxiLhwGMeQJO3fwMM9IhYA/w6cChwNvCsijh70cSRJ+zeMM/fjgfHM/HlmPgFcD6wewnEkSfsxjHBfCjzYt7+9tT1DRKyLiLGIGJuYmJjekV7+cjjjDFiwYHqPl6SiOntDNTOvysyVmblyZGRkek+yejXcdBMceuhgi5OkOW4Y4b4DWN63v6y1SZJmyTDC/X+AFRFxVEQcDLwTuGUIx5Ek7cfCQT9hZu6OiA8A3wQWANdk5r2DPo4kaf8GHu4AmbkR2DiM55YkTc5PqEpSQYa7JBVkuEtSQYa7JBUUmdl1DUTEBPDANB++BPjNAMt5rnO+tTnf2gY935dk5j4/BfqcCPeZiIixzFzZdR2zxfnW5nxrm835uiwjSQUZ7pJUUIVwv6rrAmaZ863N+dY2a/Od82vukqS/VuHMXZK0F8Ndkgqa0+Fe8Yu4I+KaiNgZEff0tR0eEbdFxP3t/kWtPSLiijb/uyPiuO4qP3ARsTwibo+I+yLi3og4t7VXne+hEXFHRPyozfeS1n5URGxp87qh/apsIuKQtj/e+ke7rH+6ImJBRNwVEbe2/bLzjYhfRsSPI2JrRIy1tk5ez3M23At/Efe1wCl7tV0IbMrMFcCmtg+9ua9ot3XAlbNU46DsBi7IzKOBVcA57c+w6nz/DJyUma8GjgVOiYhVwKXA+sx8GbALWNvGrwV2tfb1bdxcdC6wrW+/+nz/MTOP7buevZvXc2bOyRvwOuCbffsXARd1XdeA5jYK3NO3/xPgyLZ9JPCTtv154F37GjcXb8DNwJvmw3yBvwF+CPwDvU8sLmztT7+u6X0nwuva9sI2Lrqu/QDnuYxeoJ0E3ApE8fn+EliyV1snr+c5e+bOFL+Iu4gjMvOhtv0wcETbLvMzaP8Ffw2whcLzbUsUW4GdwG3Az4BHM3N3G9I/p6fn2/ofAxbPbsUzdjnwEeAvbX8xteebwLci4s6IWNfaOnk9D+XLOjQ8mZkRUer61Yh4AfBV4LzMfDwinu6rNt/MfAo4NiIWAV8HXtFxSUMTEW8FdmbmnRFxYtf1zJI3ZOaOiPh74LaI+N/+ztl8Pc/lM/f59EXcv46IIwHa/c7WPud/BhFxEL1g/1Jmfq01l53vHpn5KHA7vWWJRRGx50Srf05Pz7f1HwY8MsulzsQJwNsj4pfA9fSWZj5D3fmSmTva/U56/3gfT0ev57kc7vPpi7hvAda07TX01qb3tJ/d3nVfBTzW99+/57zonaJfDWzLzE/3dVWd70g7Yycink/v/YVt9EL+jDZs7/nu+TmcAWzOtjg7F2TmRZm5LDNH6f393JyZ76bofCPibyPihXu2gTcD99DV67nrNyBm+ObFacBP6a1b/kvX9QxoTl8GHgKepLcGt5beuuMm4H7gv4DD29igd8XQz4AfAyu7rv8A5/oGemuUdwNb2+20wvM9Brirzfce4KOt/aXAHcA4cBNwSGs/tO2Pt/6Xdj2HGcz9RODWyvNt8/pRu927J5O6ej376wckqaC5vCwjSdoPw12SCjLcJakgw12SCjLcJakgw12SCjLcJamg/weUVTeZzfbL2wAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "# Not ordered\n", "for c in range(3):\n", " plt.plot(roc_curves[c][0], roc_curves[c][1], color = \"red\")\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 1 }