UniTO/anno3/apprendimento_automatico/esercizi/marco/coverage_plots.ipynb

494 lines
87 KiB
Text
Raw Normal View History

2020-06-17 20:01:41 +02:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Coverage plots"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import numpy as np\n",
"from matplotlib import pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us consider the following function which applies a linear model to the given data. \n",
"Specifically, given a \"model\" vector containing the model coefficients $(a,b)$ and a $n \\times 2$ \"data\" matrix containing the data points to be classified, the function outputs a vector $\\mathbf{z}$, $|\\mathbf{z}| = n$ of booleans where $z_i$ is `True` if $a \\cdot x_{i,1} + b \\cdot x_{i,2} \\geq 0$, it is `False` otherwise."
]
},
{
"cell_type": "code",
2020-07-03 19:08:23 +02:00
"execution_count": 2,
2020-06-17 20:01:41 +02:00
"metadata": {},
"outputs": [],
"source": [
"def apply_linear_model(model, data):\n",
" return np.dot(data, np.transpose(model)) > 0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us define `data` by generating $1000$ points drawn uniformly from $\\mathcal{X} = [-100,100]^2$."
]
},
{
"cell_type": "code",
2020-07-03 19:08:23 +02:00
"execution_count": 3,
2020-06-17 20:01:41 +02:00
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
2020-07-03 19:08:23 +02:00
"/home/user/.local/lib/python3.7/site-packages/ipykernel_launcher.py:1: DeprecationWarning: This function is deprecated. Please call randint(-100, 100 + 1) instead\n",
2020-06-17 20:01:41 +02:00
" \"\"\"Entry point for launching an IPython kernel.\n"
]
},
{
"data": {
"text/plain": [
2020-07-03 19:08:23 +02:00
"array([[ 63, 42],\n",
" [ 77, -65],\n",
" [ 24, -27],\n",
2020-06-17 20:01:41 +02:00
" ...,\n",
2020-07-03 19:08:23 +02:00
" [ 47, 20],\n",
" [-55, -72],\n",
" [-58, -23]])"
2020-06-17 20:01:41 +02:00
]
},
2020-07-03 19:08:23 +02:00
"execution_count": 3,
2020-06-17 20:01:41 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = np.random.random_integers(-100,100,[1000,2])\n",
"data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"and let target_labels be the labeling output by applying `apply_linear_model` with our target model: $4x -y > 0$"
]
},
{
"cell_type": "code",
2020-07-03 19:08:23 +02:00
"execution_count": 4,
2020-06-17 20:01:41 +02:00
"metadata": {},
"outputs": [],
"source": [
"target_model = [4.,-1.]\n",
"target_labels = apply_linear_model(target_model, data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By using matplotlib.pyplot module it is easy to plot the generated points onto a 2D plot:"
]
},
{
"cell_type": "code",
2020-07-03 19:08:23 +02:00
"execution_count": 5,
2020-06-17 20:01:41 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2020-07-03 19:08:23 +02:00
"<matplotlib.collections.PathCollection at 0x7f9753a8a510>"
2020-06-17 20:01:41 +02:00
]
},
2020-07-03 19:08:23 +02:00
"execution_count": 5,
2020-06-17 20:01:41 +02:00
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
2020-07-03 19:08:23 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAD4CAYAAAAEhuazAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOxdZ5gUxdqtyR1mdslZRMkqQURJBjCgmEWMmDAAKhdQQAx4RQETil4DZhREQTFhQsRMEIElCJIlJ2HdQFg2zdT342zd7ump6q6emXW5H3Oepx+W3ZkO1VVvvfG8HkopySCDDDLI4OiCt6pvIIMMMsggg38eGeGfQQYZZHAUIiP8M8gggwyOQmSEfwYZZJDBUYiM8M8ggwwyOArhr+obkEGtWrVokyZNqvo2Msgggwz+p5CTk5NLKa3N+9v/hPBv0qQJWbJkSVXfRgYZZJDB/xQ8Hs9W0d8ybp8MMsggg6MQGeGfQQYZZHAUIiP8M8gggwyOQmSEfwYZZJDBUYiM8P8fAKWErFhByIIFhJSUVPXd/P/G5s2E/PILIX//Hf/7Xbvw+927q+a+qgQFBYTMnUvIxo1VfScZVALSIvw9Hs8kj8ez1+PxrDL9robH45nj8Xg2VPxbveL3Ho/H84LH49no8Xh+93g8HdJxD//fEIsRsngxIe+/T0izZoR060ZIr16E1K5NyIcfVu29rVtHyE8/EZKfX7X3YcXu3YT8+CMhW4X5DWIcOEBIz56EnHACIZdeSkjDhoSMGEFIaSkhffsS0rQpfn/88YTccAMhZWXpv/8jBpQS8uijhNSvT8jFFxPSti0hZ5xBSF5eVd9ZBukEpTTlgxByJiGkAyFklel3TxNC7q/4+X5CyFMVP19ICJlFCPEQQjoTQn5zOv8pp5xCjyYsXUppgwaU6jqlHg+lWI3GoaqU/vEH/7uHDlH6ww+ULl5MaSyW3vvat4/Szp0p1TRKs7MpVRRK//1v5+vs2EHpnDmUbtqU3vthKC+n9NZbKQ2FjPu65BJKi4rkz3HVVfi+eZw1jdILLsB4W8f//vsp/fVXSn/6idLDhyldtAjjfuhQ5TzjP4oPP8TkMz90IEBpz574+/79lH7/PSbqwYN48EWL0j/h/imYn+d/9RkEIIQsoSK5LfqD24MQ0sQi/NcRQupX/FyfELKu4ufXCCHX8T4nOo4m4X/4MKXVqycKfPPh81E6eHDid995BwIrK4vScJjSJk0oXbMmfffWowdkgPledJ3SGTP4ny8ro/TmmyGMmVC+8MLkBWRREdbovHkQ+AyPP47nNt+XolB6xx0Q0HPmQEaJsH8/pcEgf6y9Xv7vPR5KIxGMs8eD62VlYTzefTe55zti0KkT/6FDIUqfeAK7X1YW/u/xYBDCYUqPPZbS1aur+u7d4eWXjefRdUpbtKD0zz/Td/68PEpnz6Z02bIq2ViqSvgXmH72sP8TQr4khJxu+tv3hJCOnPP1J4QsIYQsady4cSUPUWrYs4fSr76idMoUSmfNojQ/P/Ez+/dT+s03lM6fT2k0yj9PSQmlo0cnapq846qr4r+7fHmiAPR4YEGYBWWy2LUrUTNmR+fO/O+MGycWym4xYwbkS1YWhG7dupQuWYK/1a8vHif2HU3D5sjDzp24L6cxlz1UldLff3f/jEcMjj9e/GBOA1WnDibc4cOUfvcdpT/+SGlpaVU/ER/z5iVOUK+X0mbN0iOox43DeEUi+Ldx4398YlS58K/4fz51IfzNR7Ka/86dlH7xBaU5OZWz6cZilN5zD7RG5p7xevGen3/e+NzrrxvKRSRCacOGlK5cGX+uH36gtFo1OSGk65ROnhz//YEDYRFYPxuJ4NypYvVqCFLe/TRrxv9OvXr8zyuKuw1pwwb+xlO9OmSMdf2KDk2jdMWKxPPHYngnPK2/Zk33wt/no3TQoOTGef16Sj//HP9WGe6+O9HEYw/G80Naj1NPxcTLysJRvTr8Y0cS9u6l9Iwz+PcfDlP6/vt4Edu2JXf+r75KdJ0xjWz06PQ+iw2OOrdPLEbpv/4FgcGsuXbtoKGnE++8I9aGNQ2KRU4OXzjVq2cIwLw8/jwRKV/t21NaXBx/L1dcwf98JELpRx+l/qxlZXx3VCDAd0FRKhbKPp+8Pz43V6zZZ2VR+sknlPbqJSeTvF5K77qLf53p0+M38WAQ53/gAYy532+cQ+Y99enjbnwPH4ZLjCkJqkrpRRfh9/84du2CBm+d3LIPLxKoBQVV8DAc/Pvf0EB42hKboEx4KAoCSiJzXYSePe0X8RdfVM6zWVBVwn+8JeD7dMXPF1kCvouczu1W+E+enChM/X5KzzrL/eCJkJNjCATe4fFQet11lPbvz18zkQj815RS+sYb9sI/GITg7dwZFgXPZz5pEt9qUJT0bXrTp0OgMwEZCmETM5//4EHM66+/pvT88/lC+aST5K950UVimaPrlL75JqySrCxDWbXbCLKyINvMmDgR46TreKd+P6W1a+P5VBXnPe44Sjt2pPSaa8TxAfN9TZnibmwHD058f4pC6dCh7s6TNvz1FyyA+vVTE/rmQXnrrap5lrIyBH4+/RQavaymxQ5No/TFF91ds0MH+3Oed17lPKsFlS78CSHTCCG7CSFlhJAdhJDbCCE1K1w6Gwgh3xFCalR81kMIeZkQ8ichZKWTy4cmIfxPPpk/3qEQpbt3JzmKJpSWUlqrlvOcOfdcSnv3FgshppE//TTfyiYEbsLXX0c8QIQdOxCnsq5RTZO3MDdsoPSDDyhduNDeRbZoEaXXXktply6UPvwwMoAYPvrICDgzP7umGcLS78e6mz9f7p4KCuwFraIYGURbt1I6ZAg2yOuusw/gnn66cY0lS+TcRl4vpaNG4Tuid8rGvGNH+/fFg8ilFom4O48j9u+n9LPPKH3hBexQPB90LEbpnXdiwYi0Y7dHMEjpM8+k+WEkkJODnZxNymQ3MpFvU4THHrOfvCefXDnPa8E/ovlX5uFW+IviVbqeHl/qrFmYR07KwiuvYH3xFA1FgduRUiQC8ASQrKDs3Dlxjfp8lA4f7vzdsjJos4oCARQKIWnDbZbQ9u38QLWiQIHs1o3SAQMoXbdO/py7doljIB4P4i0ijB4tXueKYmj/IstMNH8OHoRA5m0Op50GKyIZV42dB8IR5eUwI6dNo3TLFkp//hk/W7NWZszASzJfLBSi9Oyz483Jt9921o59PvjMeYPBO6oiCi6rpbEX2K2bWAurW9fdtfPzYS6KJuCYMdhk58+HNbJ2baUMwVEn/IcO5W+6deu6d93x8MEH9nM+FKK0bVv4tUtKIBTMwl3TkAhgRr9+8etN15HRY6eFl5VR+t574vnasaPzs4wfz994vF5KJ0yQH5OnnuKPOXPNJINYDBsRT/D37OkcxBfFCnTd2IT69JGTDey6onfv91P64IP8+9i+Hemfjz+OONGGDfh9bi5caZ9+CrnDu2b37g6DtH49pY0a4abYBAoEjAwT5q8W7c6E4MXdfbdxTieXBSHQfoqLkdGjacaGoig4n/lauk7pbbfJvPL0YvZsZy3NfAwahPxo3svt18/99QsLYSabtQtVpbRpU0yCk06CxsXe1VVXYVGnEUed8N+7F+uBzT+fD/Pzyy9dnUaIv/7iB3o9HkqbN4f2Zw5oFhdDAPbsSenVVyP7zYpYDELgkkvg5/7wQ/uNau1aCDc7Ba15c+dnESknbJ4uWiQ3Jg89xPe1h0Jym8iffyJWM2tW/Pz//nu8OxZfURTEInfudD7nsGH8DalWLWNs33tP3gXs9yM+I3ITXXcdnmHOHCOY/9BDRjo8myPBIKy1UMhIitH1eBcZCzhbs8LiEItR2rKlfZCD+drHj7d3Q+i6cd6mTcWf83hwo++9Z3x+zRqkm51+OqVXXonJPnkyJvJll8HNVBXFUzNmyFsmbAxGj+ZPuFWroKGPGweT/rff5J9pyRJKb7qJ0nPOwWLYvx8BMavWpmnwAacRR53wpxRW19NPQ+AOHCiuiE0WTzwRH/zUdUrPPDPtGzcXsRgEu92aD4VQheqEOnX
2020-06-17 20:01:41 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"colors = ['r' if l else 'b' for l in target_labels]\n",
"plt.scatter(data[:,0], data[:,1], color=colors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally let us now generate at random 100 linear models with coefficients in $[-5,5]$:"
]
},
{
"cell_type": "code",
2020-07-03 19:08:23 +02:00
"execution_count": 6,
"metadata": {
"scrolled": true
},
2020-06-17 20:01:41 +02:00
"outputs": [
{
"data": {
"text/plain": [
2020-07-03 19:08:23 +02:00
"array([[-3.56708359, 4.2173679 ],\n",
" [-2.24819841, -4.78033511],\n",
" [-2.65217433, -0.6063707 ],\n",
" [ 2.22522724, -1.3737868 ],\n",
" [ 1.14192194, -2.89068754],\n",
" [-2.57784496, -0.48320328],\n",
" [-1.98567784, -4.32466545],\n",
" [ 0.18087823, 3.4818154 ],\n",
" [-1.29875835, 1.51557613],\n",
" [ 4.81972953, -0.42049176],\n",
" [ 0.50982555, 4.85988498],\n",
" [-4.65192691, 4.7090496 ],\n",
" [ 1.76978104, -1.68074194],\n",
" [ 3.27116811, -2.24224267],\n",
" [-4.23830139, -3.18612366],\n",
" [-3.8511695 , -3.53659876],\n",
" [ 4.17130749, -3.99493944],\n",
" [-1.77612232, -2.98547622],\n",
" [ 2.60284283, -3.54389893],\n",
" [ 1.94474263, 0.87574031],\n",
" [-0.77056305, 0.69232701],\n",
" [ 3.34291539, 4.50561588],\n",
" [-2.72563297, 3.53671823],\n",
" [ 4.90984365, 1.68029481],\n",
" [ 3.19955049, 3.70221394],\n",
" [ 1.92131467, 3.118134 ],\n",
" [-1.3720329 , -2.8185771 ],\n",
" [-2.10756157, -3.29110329],\n",
" [ 1.23355252, -4.78030807],\n",
" [-1.43163348, -2.53721308],\n",
" [ 2.31520897, -1.3516775 ],\n",
" [-3.76663448, 0.31355621],\n",
" [ 4.85577408, 0.71145002],\n",
" [-4.3294729 , 1.68046326],\n",
" [-2.59917913, 4.08500999],\n",
" [ 3.67943457, 1.29892268],\n",
" [ 1.20118785, -0.04765288],\n",
" [-3.00013655, -4.55005222],\n",
" [ 4.2866447 , -0.31240087],\n",
" [-1.22791008, -2.15250488],\n",
" [-3.95873121, 2.13560657],\n",
" [ 1.40466773, 1.03243411],\n",
" [ 3.40406949, -0.88692212],\n",
" [ 0.62708997, 3.38264686],\n",
" [-3.48823227, -0.89532811],\n",
" [-4.34004124, 4.93494885],\n",
" [ 3.30806754, -3.92628874],\n",
" [-2.67753461, 0.36983349],\n",
" [-4.17195968, -1.85452959],\n",
" [-4.77495175, -3.33001656],\n",
" [-1.56106315, 4.34240146],\n",
" [ 4.10238174, -4.49656149],\n",
" [ 3.49311758, -1.39028564],\n",
" [-0.40272994, 4.22102051],\n",
" [ 4.50566149, 1.10356814],\n",
" [-2.3856706 , -3.97857028],\n",
" [ 3.72462319, 3.22213904],\n",
" [ 4.25264231, -1.68128306],\n",
" [ 0.73249504, -3.71816993],\n",
" [ 1.81677967, 4.73196931],\n",
" [-1.21442479, -2.84759613],\n",
" [-1.31382345, -4.09759844],\n",
" [-4.77299951, 4.85379267],\n",
" [-3.55287405, 0.95910989],\n",
" [ 0.18951388, 4.43661861],\n",
" [ 4.72987013, -4.09163053],\n",
" [ 0.8149054 , 4.31898901],\n",
" [-0.87583647, 1.16675956],\n",
" [-0.64637859, 3.40167131],\n",
" [-0.62658074, 4.58651024],\n",
" [-1.78645632, 3.85215258],\n",
" [ 1.15926923, -1.75726543],\n",
" [-2.00600541, 4.92990747],\n",
" [ 1.35433823, -2.32709303],\n",
" [-2.73621106, -4.0425546 ],\n",
" [ 3.83111626, 1.8644197 ],\n",
" [-1.22410301, 0.86166692],\n",
" [-1.51047856, -0.90824993],\n",
" [ 4.89797894, -2.64889207],\n",
" [-2.59993126, 4.9208619 ],\n",
" [-4.23181733, 2.23064469],\n",
" [-4.76666042, -3.66915997],\n",
" [-4.88905088, -0.44233631],\n",
" [-2.44930748, 0.56878184],\n",
" [-1.09013812, -2.24695652],\n",
" [ 0.49901451, -2.84495088],\n",
" [ 0.46975656, -0.99109978],\n",
" [-3.190664 , 2.63920782],\n",
" [-2.79980528, -4.23234307],\n",
" [ 0.13665919, -2.78904176],\n",
" [-3.56727479, 0.55472856],\n",
" [-3.59152304, 0.07804187],\n",
" [ 4.38128857, -3.87300022],\n",
" [ 3.84261658, 1.82166838],\n",
" [-3.44899234, -3.68102876],\n",
" [ 1.03133548, 3.09634947],\n",
" [-2.30690512, 0.22837024],\n",
" [ 1.35170795, -0.80041915],\n",
" [-2.70097645, 2.7137504 ],\n",
" [ 4.12379905, 4.98983271]])"
2020-06-17 20:01:41 +02:00
]
},
2020-07-03 19:08:23 +02:00
"execution_count": 6,
2020-06-17 20:01:41 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"models = (np.random.rand(100,2) - 0.5) * 10\n",
"models"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Write a function that, taken two list of labellings build the corresponding confusion matrix [[1](#hint1)];\n",
"1. For each model in `models` plot the [FP,TP] pairs on a scatter plot;\n",
"1. Just looking at the plot: which is the best model in the pool?\n",
"1. Find the model with the best accuracy [[2](#hint2)] and compare it with the target model, is it close? Is it the model you would have picked up visually from the scatter plot?\n",
"1. If everything is ok, you should have found a pretty good model for our data. It fits the data quite well and it is quite close to the target model. Did you expect this? If so, why? If not so, why not?\n",
"\n",
"<a name=\"hint1\">Hint 1:</a> it may be helpful to have a way to map TRUE to 0, FALSE to 1 and to use these values as indices in the confusion matrix. \n",
"\n",
"<a name=\"hint2\">Hint 2:</a> one way to proceed is to build a function `accuracy`, use the `map` function to calculate the accuracies of all the models, and then apply the `numpy.argmax` to retrieve the index of the best model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Es. 1"
]
},
{
"cell_type": "code",
2020-07-03 19:08:23 +02:00
"execution_count": 7,
2020-06-17 20:01:41 +02:00
"metadata": {},
"outputs": [],
"source": [
"def build_confusion_matrix(labels1, labels2):\n",
" confusion_matrix = np.zeros((2,2))\n",
" for i in range(len(labels1)):\n",
" confusion_matrix[1 - labels1[i], 1 - labels2[i]] += 1\n",
" return confusion_matrix"
]
},
{
"cell_type": "code",
2020-07-03 19:08:23 +02:00
"execution_count": 8,
2020-06-17 20:01:41 +02:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2020-07-03 19:08:23 +02:00
"[[124. 372.]\n",
" [386. 118.]]\n"
2020-06-17 20:01:41 +02:00
]
}
],
"source": [
"print(build_confusion_matrix(apply_linear_model(target_model, data), apply_linear_model(models[0], data)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Es. 2\n"
]
},
{
"cell_type": "code",
2020-07-03 19:08:23 +02:00
"execution_count": 9,
2020-06-17 20:01:41 +02:00
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
2020-07-03 19:08:23 +02:00
"<matplotlib.collections.PathCollection at 0x7f977f72ad50>"
2020-06-17 20:01:41 +02:00
]
},
2020-07-03 19:08:23 +02:00
"execution_count": 9,
2020-06-17 20:01:41 +02:00
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
2020-07-03 19:08:23 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD4CAYAAAAXUaZHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAcyklEQVR4nO3df4xd9Xnn8ffjYQLjJMvYMGVhbGOHWLZABg8ZgZGjVew0MT9KGLmkhJLGWqH6j2alkCK3Yy1bIIvqiSwVGmlF19lEJRuXmF8xLnhrWOyoEopJx7WNccCLIWCYktgJHrobT9Jr+9k/7vcOd+7cc+65P+fecz4vaTT3/Jg755jhme885/k+X3N3REQkXWbN9AWIiEjjKbiLiKSQgruISAopuIuIpJCCu4hICp0z0xcAcOGFF/rChQtn+jJERDrKvn37funufeWOtUVwX7hwIaOjozN9GSIiHcXM3o46prSMiEgKKbiLiKSQgruISAolCu5m9paZHTKzA2Y2GvbNNbPnzez18HlO2G9m9i0zO2pmL5vZ1c28ARERma6akfsqd1/u7oNhexh4wd0XAy+EbYAbgMXhYz3wcKMuVkREkqmnWuYW4DPh9SPAj4A/D/u/5/mOZHvNrNfMLnb39+q50FLb94+xedcR/mV8gkt6e9iwZglDA/2N/BYiIh0r6cjdgefMbJ+ZrQ/7LioK2D8HLgqv+4F3ir723bBvCjNbb2ajZjZ64sSJqi56+/4xNj51iLHxCRwYG59g41OH2L5/rKr3ERFJq6TB/dPufjX5lMtXzew/FB8Mo/Sqege7+xZ3H3T3wb6+sjX4kTbvOsJE7syUfRO5M2zedaSq9xERSatEwd3dx8Ln48APgWuAX5jZxQDh8/Fw+hgwv+jL54V9DfMv4xNV7RcRyZqKwd3MPmpmHy+8Bj4PvALsANaF09YBT4fXO4CvhKqZFcAHjc63X9LbU9V+EZGsSfJA9SLgh2ZWOP/v3P0fzOyfgMfM7E7gbeAPwvk7gRuBo8Ap4D82+qI3rFnCxqcOTUnN9HR3sWHNkrLn6+GriGSNtcMye4ODg15tb5mkAbvw8LU0Rz+7exZ/ufZKBXkR6Vhmtq+oPH2KtmgcVouhgf5Egbncw1eAU7mzbHj84OR7iYikSerbD8Q9ZM2dde5+7KBKKEUkdVIf3Cs9ZD3jrhp5EUmd1Af3qIesxVQjLyJpk/rgPjTQz+zuyrepGnkRSZPUB3eAv1x7Jd2zLPYc1ciLSJp0bLVMNQrVMJt3HWFsfAJjaq+EuBp5EZFOlIngDlNLJzWpSUTSLjPBvVjSGnkRkU6VyeAeR6N6EUmDjm0/0AxRrQoK5szu5t6br4hsc6BfCiLSSnHtBzJRLZNUVKuCgpOncty17QDL739uyqQnLR4iIu1Gwb1I0lr38YnclOCtxUNEpN0ouBeppta9OHhr8RARaTcK7kU2rFlCT3dX4vMLwTvJ4iHb94+xcmQ3i4afZeXIbqVsRKSpFNyLDA30s2ntMvoTjuALwbvcL4XiiVHKyYtIq6laJsb2/WPct+Mw4xO5ace6ZxkfO+8cxk/luKS3h1VL+9jz2omy1TIrR3YzViZFM2d2N7M/co4qbESkJnHVMgruCRWXOp7f082v/+00uTMf/tv1dHexae2yssF50fCzJPlX7p5lbP7iVQrwIpKISiEbYGignxeHV/OzkZv46LnnTAnsEF8dk/RBbe6sc9+Ow3Vfq4iIgnsNqq2OqeZBbbkUkIhItRTca5CkOqZY8YNag8QPbEVEaqXeMjXYsGbJtDYFldoGlzYrG/jGc5w8NX2UPmd2d2MvVkQySSP3GpQbiUc9TI1y781X0N01dQGR7i7j3puvmNxWbbyI1Eoj9xrV2za4eAGRcqWQpU3MCrXxxV8rIhJFwX0Gxf2CiOpXc/djBye/VkQkitIybSqq8uaMO1/fdoB7th9q8RWJSCdRcG9TcbXxDmzde0w5eBGJpODepirVxjuopbCIRFLOvQ2UtjYwg/FTOXpnd/Ob02eI6hChlsIiEkXBfYaVVsUUz1A9eSpH9ywjFxHdq+k/LyLZorTMDKu0tF/urDO7exZWsr/SpCkRybbEwd3Musxsv5k9E7YXmdlLZnbUzLaZ2UfC/nPD9tFwfGFzLj0dkqRWJnJnefC25XVNmhKRbKkmLfM14FXg34XtbwIPuvsPzOxvgDuBh8Pnk+7+STP7UjjvtgZec6pc0ttTttd76Tn1TpoSkWxJNHI3s3nATcD/CNsGrAaeCKc8AgyF17eEbcLxz4bzpYxKVTE93V2sWtqnNgQiUpWkI/eHgD8DPh62LwDG3f102H4XKAwr+4F3ANz9tJl9EM7/ZfEbmtl6YD3AggULar3+jlfahqC4WqawwtO2n7xD7mz+oerY+AQbHtcsVRGJVzG4m9nvAcfdfZ+ZfaZR39jdtwBbIL8SU6PetxPFpVyW3//cZGAvKCzqoeAuIlGSjNxXAl8wsxuB88jn3P8a6DWzc8LofR5QyBWMAfOBd83sHOB84FcNv/KMiFq8o95FPYpr67V+q0j6VMy5u/tGd5/n7guBLwG73f0OYA9wazhtHfB0eL0jbBOO7/Z2WKhVJhVq68fGJ3A+7DipXL5IetRT5/7nwJ+a2VHyOfXvhP3fAS4I+/8UGK7vErMtavGOehb1iOo4qXYGIulR1QxVd/8R8KPw+k3gmjLn/Ab4YgOuTcgv6rHhiYNTFuQuXdSjWtWuASsinUftB9pcpUU9CqrJoUfV1pdrZ6DcvEhnUnDvAJUmMFW7alPSNWC1GpRI51JvmRSIW7Wp3MSnpGvAKjcv0rk0ck+BuFWboPyIO0k7A+XmRTqXRu4pkKT1by0j7qj3Vathkfan4J4ClfrTFFQ74i73vmo1LNIZlJZJgdKKmllmkymZYtWOuJNW6ohI+7F2mDw6ODjoo6OjM30ZqVFa5QL5Effvf6qfPa+dUKAWSQkz2+fug+WOaeSeQuVG3KuW9vHkvjGVNYpkhIJ7SpVWw6wc2R1Z1qjgLpI+eqCaEVEPU8fGJ7QAiEgKKbhnRNzDVHWFFEkfBfeMqFQuqZmnIuminHtGFD9kjVqQWzNPRdJDwT1DCg9ZV47sju0KqU6QIp1PaZkMipt52sxVmrbvH2PlyO6yzcxEpLEU3DMoritkszpBamk/kdZSWiajorpCNqsTZNwvDaV8RBpPI3eZolmdINU+WKS1FNxliqh8/KqlfXXly6N+OTgo/y7SBAruMkW5fPzvf6qfJ/eN1ZUvj6uzV/5dpPHUFVIqiiqd7O/t4cXh1YlLJwvnRdXZF95PRJKJ6wqpkbtUFJcvr6YKZmignxeHV2NVfh8RqZ6Cu1QU95C1ltLJSg9tVQ8vUj8Fd6kobtJTLVUwMzWJSiRLFNylorhJT7WUTs7EJCqRrNEkJkkkatLThjVLyi7pV2kR7VZPohLJGo3cpS5xo/BaNGsSlUjWaOQudYsahUeJK52s9S8BEZlKwV1aqvDANGqh7nKLe6vlsEj1FNylpZI0EEv6l4D6zotEqxjczew84B+Bc8P5T7j7vWa2CPgBcAGwD/gjd/83MzsX+B7wKeBXwG3u/laTrl86TNxC3YuGn+X8nm7MYPxUruJs17i/AESyLskD1d8Cq939KmA5cL2ZrQC+CTzo7p8ETgJ3hvPvBE6G/Q+G80SA+AejDoxP5Dh5Klexxl0lkyLxKgZ3z/t/YbM7fDiwGngi7H8EGAqvbwnbhOOfNbOoGeeSMZUW6i4VFbBVMikSL1EppJl1mdkB4DjwPPAGMO7up8Mp7wKFv4X7gXcAwvEPyKduSt9zvZmNmtnoiRMn6rsL6RilpZNJlAvYKpkUiZcouLv7GXdfDswDrgGW1vuN3X2Luw+6+2BfX1+9bycdpNBA7GcjN9GfIBiXC9hxLQxEpMpJTO4+DuwBrgN6zazwQHYeUEiMjgHzAcLx88k/WBWZplKaJipgN3rylEjaJKmW6QNy7j5uZj3A58g/JN0D3Eq+YmYd8HT4kh1h+8fh+G5vh6bx0pZK69pLq2VWLe1
2020-06-17 20:01:41 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fp = []\n",
"tp = []\n",
"\n",
"for model in models:\n",
" confusion = build_confusion_matrix(target_labels, apply_linear_model(model, data))\n",
" fp.append(confusion[1,0])\n",
" tp.append(confusion[0,0])\n",
"plt.scatter(fp, tp)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Es. 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Il modello migliore è quello in alto a sinistra (max TP/FP)"
]
},
{
"cell_type": "code",
2020-07-03 19:08:23 +02:00
"execution_count": 10,
2020-06-17 20:01:41 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2020-07-03 19:08:23 +02:00
"array([ 3.40406949, -0.88692212])"
2020-06-17 20:01:41 +02:00
]
},
2020-07-03 19:08:23 +02:00
"execution_count": 10,
2020-06-17 20:01:41 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"models[np.argmax([t / f for t, f in zip(tp, fp)])]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Es. 4\n"
]
},
{
"cell_type": "code",
2020-07-03 19:08:23 +02:00
"execution_count": 11,
2020-06-17 20:01:41 +02:00
"metadata": {},
"outputs": [],
"source": [
"def accuracy(tp, tn, total):\n",
" return (tp + tn) / total"
]
},
{
"cell_type": "code",
2020-07-03 19:08:23 +02:00
"execution_count": 12,
2020-06-17 20:01:41 +02:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2020-07-03 19:08:23 +02:00
"model: [ 3.40406949 -0.88692212] accuracy: 0.998\n"
2020-06-17 20:01:41 +02:00
]
}
],
"source": [
"accuracies = []\n",
"\n",
"for model in models:\n",
" confusion = build_confusion_matrix(target_labels, apply_linear_model(model, data))\n",
" accuracies.append(accuracy(confusion[0,0], confusion[1,1], 1000))\n",
"\n",
"print(\"model: \", models[np.argmax(accuracies)], \" accuracy: \", accuracies[np.argmax(accuracies)])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Il modello è lo stesso predetto dalla plot"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Es. 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mi aspettavo di trovare un modello con un'accuracy alta ma non come quella trovata (0.995), perchè su 100 modelli, con due variabili comprese tra 5 e -5, generati con una funzione random uniforme mi aspetto dei valori vicini a quelli target."
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2020-06-30 23:21:29 +02:00
"version": "3.7.7"
2020-06-17 20:01:41 +02:00
}
},
"nbformat": 4,
"nbformat_minor": 1
}