{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Tutorial" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "This Python notebook demonstrates how OASIS can be used to efficiently evaluate a classifier, based on an example dataset from the entity resolution domain." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We begin by loading the required packages (including OASIS) and setting the random seeds for reproducability." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "import numpy as np\n", "import random\n", "import oasis\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "np.random.seed(319158)\n", "random.seed(319158)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Example dataset" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The dataset we shall use for this tutorial is derived from the `Amazon-GoogleProducts` dataset available from [here](http://dbs.uni-leipzig.de/en/research/projects/object_matching/fever/benchmark_datasets_for_entity_resolution). It is described in the following publication:\n", "\n", "> H. Köpcke, A. Thor, and E. Rahm. \"Evaluation of entity resolution approaches on real-world match problems.\" *Proceedings of the VLDB Endowment* 3.1-2 (2010): 484-493.\n", "\n", "The dataset consists of product listings from two e-commerce websites: *Amazon* and *Google Products* (which no longer exists as of 2017). Our goal is to train a classifier to identify pairs of records across the two data sources which refer to the same products. This involves forming the cross join of the two data sources and classifying each pair of records as a \"match\" or \"non-match\". Since the focus of this notebook is evaluation, we shall not demonstrate how to build the classifier here. Instead, we shall load the data from a classifier we prepared earlier." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Loading the data" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Using our pre-trained classifier, we calculated predictions and scores on a test set containing 676,267 record pairs. The data is stored in HDF5 format and is available in the GitHub repository.\n", "\n", "Below, we make use of the ``Data`` class in the OASIS package to read the HDF file into memory." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "data = oasis.Data()\n", "data.read_h5('Amazon-GoogleProducts-test.h5')\n", "data.calc_true_performance() #: calculate true precision, recall, F1-score" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "deletable": true, "editable": true }, "source": [ "## Evaluating the classifier" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Our goal is to estimate the F1-score of the classifier by sequentially labelling items in the test set. This example is somewhat contrived since we already know the ground truth labels (they are included with the test set). However, we can simulate the labelling by defining an oracle which looks up the labels as follows:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "def oracle(idx):\n", " return data.labels[idx]" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "In the following experiments, we shall adopt the parameter settings below:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "alpha = 0.5 #: corresponds to F1-score\n", "n_labels = 5000 #: stop sampling after querying this number of labels\n", "max_iter = 1e6 #: maximum no. of iterations that can be stored" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### OASIS" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Here we use the OASIS method to estimate the F1-score. The first step is to initialise the sampler." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Automatically setting n_bins = 2398.\n", "Automatically setting goal_n_strata = 63.\n" ] } ], "source": [ "smplr = oasis.OASISSampler(alpha, data.preds, data.scores, oracle, max_iter=max_iter)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Next we query ``n_labels`` sequentially." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "smplr.sample_distinct(n_labels)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Finally, we plot the history of estimates to check for convergence. Since we already know the true value of the F1-score for this example (because we were given all of the labels in advance), we have indicated it on the plot using a red line. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcHHWd//HXp3vOTGYymWRyJ+QgQhIgAYZwiogcAQUU\nFwVUwCsu6sL+WBHwQFBXF0VddVGJrK6uCALqGgUMGMEgEpKQECAJISEJOQjkvidzdH9+f1R10zOZ\noyeZmk73vJ+Pxzymqrq6+lOTTn/6W99vfb7m7oiIiADEch2AiIgcPpQUREQkTUlBRETSlBRERCRN\nSUFERNKUFEREJC2ypGBmPzOzTWb2UjuPm5n9wMxWmtkLZnZCVLGIiEh2omwp/A8wrYPHLwDGhz/T\ngR9HGIuIiGQhsqTg7nOAbR3scgnwSw/MBarNbGhU8YiISOeKcvjaw4F1Gevrw20bW+9oZtMJWhNU\nVFScePTRR3f5xbbva2T99vqDizRLo2r60K+8ONLXEBE5GM8999wWd6/tbL9cJoWsufsMYAZAXV2d\nL1iwoMvH+PWza/nC71/k/z5zOoOrSrs1vvXb67nsJ89wxwcmc+kJI7r12CIi3cHMXstmv1wmhQ3A\nyIz1EeG2SA3tV8bgqrJuPWZjc7Jbjycikiu5HJI6E7gqHIV0CrDT3Q+4dCQiIj0nspaCmd0HnAUM\nNLP1wFeAYgB3/wnwCHAhsBLYB3w0qlhERCQ7kSUFd7+ik8cd+ExUry8iIl2nO5pFRCRNSaEbab4i\nEcl3SgoiIpKmpNANDMt1CCIi3UJJQURE0pQUREQkTUlBRETSek1ScDQ0SESkM70mKaRE2SWstCMi\n+a7XJYUomAYfiUiBUFIQEZE0JQUREUlTUhARkTQlBRERSVNSEBGRNCWFbuQqkyoieU5JQURE0pQU\nREQkTUlBRETSlBRERCRNSUFERNJ6TVLQwCARkc71mqSQFmHxOuUdEcl3vS8pREBVUkWkUCgpiIhI\nmpKCiIikKSmIiEiakoKIiKQpKYiISJqSQnfSmFQRyXNKCt3ANCZVRAqEkoKIiKQpKYiISJqSgoiI\npCkpiIhIWqRJwcymmdlyM1tpZje38fgoM3vCzBaZ2QtmdmFUsWhgkIhI5yJLCmYWB+4CLgAmAleY\n2cRWu30JeMDdjwcuB34UVTzpuCIsk+pKPSKS56JsKUwFVrr7KndvBO4HLmm1jwNV4XI/4PUI44mM\nBqSKSKGIMikMB9ZlrK8Pt2W6Dfiwma0HHgH+pa0Dmdl0M1tgZgs2b94cRawiIkLuO5qvAP7H3UcA\nFwL/a2YHxOTuM9y9zt3ramtrezxIEZHeIsqksAEYmbE+ItyW6ePAAwDu/gxQBgyMMCYREelAlElh\nPjDezMaYWQlBR/LMVvusBd4FYGYTCJKCrg+JiORIZEnB3ZuBzwKzgGUEo4yWmNlXzezicLd/Az5p\nZouB+4Br3F1DeEREcqQoyoO7+yMEHciZ227NWF4KnB5lDD1J6UxE8l2uO5oLgoqkikihUFIQEZE0\nJQUREUlTUhARkbTekxTUCywi0qnekxRC6hQWEWlfr0sKUVJbRETynZJCN4iyHLeISE9SUhARkbSs\nk4KZ9YkyEBERyb1Ok4KZnWZmS4GXw/XJZhb5DGkiItLzsmkpfA84H9gK4O6LgTOjDEpERHIjq8tH\n7r6u1aZEBLGIiEiOZVMldZ2ZnQa4mRUD1xOUwpZWdH+ciOS7bFoK/wx8hmB+5Q3AlHBdQrohTkQK\nRYctBTOLAx9x9w/1UDwiIpJDHbYU3D0BXNlDsYiISI5l06fwdzP7L+A3wN7URndfGFlUIiKSE9kk\nhSnh769mbHPg7O4PJzrqAxYR6VynScHd39kTgfSUKPuEXalHRPJcNnc09zOz75rZgvDnO2bWryeC\nExGRnpXNkNSfAbuBD4Q/u4CfRxlUvtGIVBEpFNn0KYxz9/dnrN9uZs9HFZCIiORONi2FejM7I7Vi\nZqcD9dGFJCIiuZJNS+Fa4BcZ/QjbgWsii0hERHImm9FHzwOTzawqXN8VeVQiIpIT2Yw++oaZVbv7\nLnffZWb9zezrPRFcvlFBPBHJd9n0KVzg7jtSK+6+HbgwupBERCRXskkKcTMrTa2YWTlQ2sH+vY/G\npIpIgcimo/leYLaZpe5N+Cjwi+hCEhGRXMmmo/kOM1sMnBNu+pq7z4o2LBERyYVOk4KZVQCPufuf\nzewo4CgzK3b3pujD6z7qBBYR6Vw2fQpzgDIzGw78GfgI8D9RBhUl0zRpIiLtyiYpmLvvAy4Ffuzu\nlwGTog0rP6kxIiL5LqukYGanAh8CHg63xbM5uJlNM7PlZrbSzG5uZ58PmNlSM1tiZr/OLmwREYlC\nNqOPrgduAX7v7kvMbCzwRGdPCud3vgs4F1gPzDezme6+NGOf8eGxT3f37WY26GBOItdMY1JFpEBk\nM/poDkG/AmY2xN1XAddlceypwMpwf8zsfuASYGnGPp8E7gpviMPdN3UtfBER6U7ZXD7K9EgX9h0O\nrMtYXx9uy/Q24G1m9rSZzTWzaW0dyMympyb52bx5c9ciFhGRrHU1KXT3dZIiYDxwFnAF8FMzq269\nk7vPcPc6d6+rra3t5hBERCSlq0nhp13YdwMwMmN9RLgt03pgprs3uftq4BWCJCEiIjnQpaTg7j8C\nMLO+Wew+HxhvZmPMrAS4HJjZap//I2glYGYDCS4nrepKTIcV3SEnInmuqy2FlKWd7eDuzcBngVnA\nMuCBcPTSV83s4nC3WcBWM1tKMKLpRnffepAxiYjIIWp39JGZ3dDeQ0A2LQXc/RFadU67+60Zyw7c\nEP7kLd0kLSKFoqOWwjeA/kBlq5++nTxPRETyVEf3KSwE/s/dn2v9gJl9IrqQREQkVzpKCh8F2ru+\nXxdBLJFydQKLiHSqo6TwathZfAB3fzOieCKny/8iIu3rqG9gXmrBzH7YA7HkPbVFRCTfdZQUMr9U\nnx51ICIiknsdJQV98c2SLkmJSKHoqE/haDN7geAzb1y4TLju7n5c5NGJiEiP6igpTOixKERE5LDQ\nblJw99d6MhAREck93ZksIiJpSgrdSPfHiUi+azcpmNns8PcdPReOiIjkUkcdzUPN7DTg4nB+5RYj\nL919YaSR5RFTmVQRKRAdJYVbgS8TzJj23VaPOXB2VEGJiEhudDT66CHgITP7srt/rQdjEhGRHOmo\npQCAu38tnCntzHDTk+7+p2jD6n7qAxYR6Vyno4/M7JvA9QRTcC4Frjezb0QdWFR0+V9EpH2dthSA\ndwNT3D0JYGa/ABYBX4gysHykORtEJN9le59CdcZyvygCERGR3MumpfBNYJGZPUEwLPVM4OZIo8oz\nuiIlIoUim47m+8zsSeCkcNNN7v5GpFGJiEhOZNNSwN03AjMjjkVERHJMtY9ERCRNSUFERNKySgpm\ndoaZfTRcrjWzMdGGlZ80IFVE8l02N699BbgJuCXcVAz8KsqgREQkN7JpKbwPuBjYC+DurwOVUQaV\nb3SXtIgUimySQqMHt+o6gJlVRBuSiIjkSjZJ4QEzuxuoNrNPAn8B7ok2rO6nChQiIp3L5ua1O83s\nXGAXcBRwq7s/HnlkETHdfywi0q5Ok4KZ3eHuNwGPt7FNREQKSDaXj85tY9sF3R1IIdAlKhHJd+22\nFMzsWuDTwFgzeyHjoUrg6agDExGRntfR5aNfA48SVEnNrIq62923RRpVnlE/hYgUinYvH7n7Tndf\n4+5XuPtrQD3BsNS+ZjYqm4Ob2TQzW25mK82s3XLbZvZ+M3Mzq+vyGYiISLfJ5o7mi8xsBbAa+Buw\nhqAF0dnz4sBdBP0PE4ErzGxiG/tVEkz3+WyXIhcRkW6XTUfz14FTgFfcfQzwLmBuFs+bCqx091Xu\n3gjcD1zSxn5fA+4A9mcXsoiIRCWbpNDk7luBmJnF3P0JIJvLPMOBdRnr68NtaWZ2AjDS3R/u6EBm\nNt3MFpjZgs2bN2fx0iIicjCymWRnh5n1BeYA95rZJsI6SIfCzGLAd4FrOtvX3WcAMwDq6uoO24Gf\nh21gIiJZyqalcAlBJ/P/A/4MvApclMXzNgAjM9ZHhNtSKoFjgCfNbA3BJaqZednZrMFHIlIgsilz\nsRfAzKqAP3bh2POB8eHcCxuAy4ErM467ExiYWg/ngf6cuy/owmuIiEg3yqbMxaeA2wk6gpME34sd\nGNvR89y92cw+C8wC4sDP3H2JmX0VWODumvNZROQwk02fwueAY9x9S1cP7u6PAI+02nZrO/ue1dXj\ndymWKA8uIlIgsulTeBXYF3UgPUbX/0VE2pVNS+EW4B9m9izQkNro7tdFFpWIiORENknhbuCvwIsE\nfQrSDleZVBHJc9kkhWJ3vyHySPKY5mgWkUKRTZ/Co+EdxUPNrCb1E3lkIiLS47JpKVwR/r4lY1un\nQ1JFRCT/ZHPz2pieCERERHKvo5nXznb3v5rZpW097u6/iy4sERHJhY5aCu8gGHXUVp0jB5QUREQK\nTLtJwd2/Ei5+1d1XZz4W1jMSEZECk83oo9+2se2h7g4kn2lEqogUio76FI4GJgH9WvUrVAFlUQcm\nIiI9r6M+haOA9wDVtOxX2A18MsqgoqC7jUVEOtdRn8IfgD+Y2anu/kwPxhQp3X0sItK+bPoU3mdm\nVWZWbGazzWyzmX048shERKTHZZMUznP3XQSXktYARwI3RhmUiIjkRjZJoTj8/W7gwXAaTWmDui1E\nJN9lU/voj2b2MlAPXGtmtQRTc0rI1FEhIlmY88pmHnpuPaVFMcqK4wzpV8agylIuPHYoFaXZfBxH\nz7IZlRNWRd3p7gkzqwAq3f2NyKNrQ11lpS848cQuP2/jzv28tnUvdaNrKIp174d4IunMX7ONUQMq\nGNZPo3VF5C3usH1fI69t20dDUwKAkqIYiSQkkuEUNWZUlxdTEo9hFpSMSCadvmVFlBbFKC+OY2YU\nxYz4QX5+2d/+9py713W2X0f3KXze3b8Vrr7L3R8MTtD3mtkXgS8cVGQiInmkOek0NCeJGxTFY21+\nqWxoTpJIOqXFMeLhlYM9Dc2s317Pjn2N6f1SrYMhVcGXx0TSWbN1H/ubEjQ2J9nT0By8ZiJIFlv2\nNLR4nTEDKxhcFe0Xz3ZbCma20N1PaL3c1npPqqur8wULFnT5efc8tYqvP7yMF247j6qy4s6f0AV7\nGpo55iuz+OKFE/jkmaooLlIo3J1p//kUy9/cnd42qLKUwVVlFMWN4lgMDOat3pZ+fMLQKvqWxpm/\nZjsA8ZjxkVOO4NIThnPciOqsXnfxuh3MXbWViyYP47Wt+1i7bS/uUDe6P0cOqjyoczGzQ2sp0LJ6\nQ+vUqIvoIlJQ9jY0s7O+CYDn1+3gL8ve5JlXt7Jx537OmTCYC48dwrpt9by6eQ+79zfRnHSaEkma\nE86xw/uxbW8jG3bUs2zjLgCGV5dz1alH8Kl3jOtyLJNHVjN5ZJBAhlWXc+q4Ad13op3oKCl4O8tt\nrYuIpN03by1PLt/EiUf058Qj+lNWHAegqqyYkTV9eiyOZNKZ/fImEskkpUVxSotjDKosoyQeIx43\n4hZco0+6c/I3Zh/w/JE15Ywf1Jdv/9Nx9K8o6fC13J3Nu4PLPbWVpXk7AKWjpDDZzHYRtArKw2XC\ndfWmtsGVK6WXmLXkDT71v88BbVcJSF2VnrXkzQMeGz+oL49e/3aK4geOiN9Z38Q1P5/HvoYExUVG\nSTxGcTxGbWUp5cVxrj5tNMcM7wfA6i172bijnhMykk6mZNIZ+4VHunReZxw5kIsmDwXgmOH9mDSs\nX9bPNTMGRXy9vyd0VObiwL+ytCk/vw8cXhau3c7lM+bS2JxssX3q6BrePn5gi21nHTWIY0dk/59V\nuqYpkeTvK7bQ0JykojTO7GWbmLXkDQb0LaF/nxIampLMWxNcQ7948jBGD2jjm78ZFx03lIbmJBt2\n1ANBovjGI8tYsWkP2/c1UVtZyitv7uaNnfupqShheHU5P//HGhat3cEpY2voU1JEY3OSfY3NLFq7\ngw076nnwufVMHVNDZWkRs1/eBMCX3j2BcyYM5qw7n6Q4bpQVx+lTEmdT+K29T0mc+6efQtKDpLNj\nXyPNCSeRdBLuNCedRCJJWXGcy+pGHvTonkJxeAyMlV5v5aY9NDYn+djpY6gsC96W//XESuat2Zb+\nAEpZuHY7P//o1FyEWfC++egy7v7bqjYfO3JQXzbvbmDT7gZKi2L88IrjOW/SkE6PmfpmD9DQnOD6\n+5/nvXc9TZ+SOCs27WnzOTOuqjtgQMjvFq7nieWbWbt1L3v2NzNxaBVLN+7ie4+/wtcfXgbASaNr\nOGpIJdv2NrK3oZmG5iQ/u+YkittolUjblBQkp/6+YgsLXttGedj8/8w7xzGgbykA/3rOeBLJlpfk\nPnD3MzQldJmuO23Z08AjL25kV31TOiEMrirlp1fVsWVPA00JZ8KQKka11SLoorePr+Wa00azs76J\n+sYEfUrinH/MEIZXl7N1T/BBfvr4gW2OELz0hBFcesKIFtt+/exaFq/bAcCUUdVcMXXUIcfY2/W6\npNC7G4aHn5t++0L68gJAn5K33pJmRlG85b9YUSxGc7LlJSZp2/w121i9eW96/c7HlpNIOsePqqa0\n6K2rww+/uLHF8773wcm87/iWH77dpaaihNsuntRtx7vy5FFcebISQXfqdUlBDi8lRUGz/vPTjmJk\n/z6Ul3TclRWP2QGth95qZ30TDy5YR/8+JZSXxBnYt5SaiuAb9lMrtnD7H5e2+bw1W/e1WD9yUF8m\nDavi9osnURSP0fcwKbcguaF/fcmZe55axeotezl+VDWfPuvIrJ4TjxkNzYmIIzv8rdmyl7PufLLT\n/b79T8dx2pFBR/3arfsYWxv9HbGS35QUupGqpHbNC+uDgrutrxN3JB4zct2lMOeVzdQ3JTh+VDWD\nKrv2AZtMOt+fvYLKsiLG1falpqKEyrIiFqzZzuqte7lp2tHtPu8Hf11BaVGc6j7F3PK7F4HgBqkv\nvXsC1967kCFVZXzx3RMA2LCjntEDKph2zFsdwcOryw/yjKU3UVLoBnl6j0pOuTu79zdxxIA+fOSU\nI7J+XnD5KOhTaGxOcudjy9kV3oU6ZWQ1l0wZTmlRjFhEwwrdnat+Ni+9/uMPncAFxw7N+vm/WbCO\n789e0e7js156o82Or4079lPf1LKFdOXJo/j6JccQixkr//0C4jHL2xum5PChpCA58YPZK3li+WaO\nGV7VpefFY0Zz2FR46fWdzJiziv59itm+r4n756/j5vAb9OfOexv//I5xbd4g1ZHXtu6ltrK0RYf3\ngjXb2LCjnlE1fdjfFCSk40dVs2jtDq69dyHXnDaaqvJiLjpuKK9u3sO0Y4ayP/wAz7ypau3Wfelv\n+Hd/5ESqyorZtHs/DU1JHlq4nnmrtzFxWNt/j8GVZcRicP273kZVeRHlxXGOGFCRfryr5ynSHiUF\nyYlVW4Lx6V9/77Fdel7cjA076rnhgefTJQXuufokRvQv53cLN/DKm7v5/aIN3PnYK5wxvpYpI1sW\nIHtj534uu/sf9CkuYnC/MqrLizluRD+GVZczZmAFF3z/KU4ZW8O5E4fw2JI3mDisip8/veaAOK46\n9QiOGlzJ40vf5L55a2loTvKDsAXw/cuncNNvX2B/U5KKkjglRTHGDKxg4dpg6ORn3jmO81uN7//A\nSSO79HcQiUqkScHMpgHfB+LAPe7+H60evwH4BNAMbAY+5u6vRRmT5J67s/T1XRw9pPKAD+3OnHbk\nAF56fWe6KuXEoVWMq62guk8J154VFB67/KSRfHDGXPaFZYgzzX75TdZtqydm0JxMMueVvcxc/HqL\nfeau2sbcVcHxnw1f59Ljh3PBsUPZsqeBvQ3NnDtxCO87fgT/8f5gFNDUf/8LDeHd2Nff/zwQjPV/\n97HDeGNXPeu21XP0kEomDK3ixvPb7jcQORxElhTMLA7cBZwLrAfmm9lMd88cJ7cIqHP3fWZ2LfAt\n4INRxSTdr7E5yZ+XvMGUEdVZ39x0//x1rNi0h6mja7r8eledOpqrTh3d4T6pYa2L1++kbnRNetgr\nkL7RacGXzqWmogR3Z9WWvWza1cCS13em74ztV17MZ945jpPHDGDd9n2cM2Fwm/V1UvvO+8I57Nrf\nRENzgkVrd5B056LJw1pchhLJB1G+Y6cCK919FYCZ3Q9cAqSTgrs/kbH/XODDEcYjEXh65Rauu28R\nANedfeQBve7lxXE+debYFh2/b+wMZnP99mXHRRLTwPCO6Dv+/DL3Pvsaf7vxncRjxrpt+3hgwXqG\nVJVRE1a8NDPG1fZlXG1fTh03gE+8/cD5MCZn0Zrp16eYfn2CewQOtt69yOEgyqQwHFiXsb4eOLmD\n/T8OPNrWA2Y2HZgOMGrU4Xv3Ym8ckbprf1N6+Qd/XdnmPt95bDmnjhsQlDSYNIQXN+w8oKO0Ow2r\nLufh687ght8sZvmbu9lZ30Qi6Xzw7mcA+MTbx0TyuiKF4LBo25rZh4E64B1tPe7uM4AZEMy81oOh\nZcV6cfGM+sZglM3TN599wDj4RNK59lfPsWVPA/NWb6OhOZkupTy2NpqEkDJpWD+mnzmWf3twMR+4\n+xniZry+cz+javrwQXXqirQryqSwAcj83zci3NaCmZ0DfBF4h7s3tH5cDm+/ejYYF1DRRnmKeMyY\ncVUw+18y6WzYUU8yvMMvdYknSmeMH8g5Ewbz95Wb2d+UpKQoxp+uO4PKbp6OVaSQRJkU5gPjzWwM\nQTK4HLgycwczOx64G5jm7psijEV3G0ekrChOWXGM6j4dz0oVi1mPzrgFMLiqjHuursPd2bKnEce7\nfX5ukUIT2R0v7t4MfBaYBSwDHnD3JWb2VTO7ONzt20Bf4EEze97MZkYVT4ru+Dw0v3xmDd99/BU8\nzLL1TQlOHzew4yflmJlRW1na5ZIUIr1RpH0K7v4I8EirbbdmLJ8T5etL97v1D0sA+OmcVVT3KWbz\n7gZGD4y2f0BEes5h0dEs+Sc1jy3A+7tQ0E5EDm9KCt2o0PstkuE8Bte/azz/79y35TgaEYmCqmh1\ng97STdGYCMo4lBbrbSNSqPS/W7KWqu1TooqcIgVLl4+kUys37Wbxup3sCQvMlbZTA0hE8p+SgnTq\nxodeYFFY9hlgiKZzFClYSgrSqZ31TZx99CBuu2gSxUXG0H6a1lGkUCkpSAt3PbGSH/51BdXlJRwz\nvB8Aqzbv5YRR/bMujS0i+UtJoRt5AdRJXfjadvY3JakZWMLrO+rZujcoRzV+UN8cRyYiPUFJIWKJ\npPOtWS9z5dRRkZWKPhTuzszFr9OccE48oj/1TQnqjujPQ9eelt6noTlBaZE6l0V6g16TFHL1LX7l\npj3c/bdVzH11K9/5wGQgqMUzZkBFi4lncmX99vr09JEpZ76ttsW6EoJI79FrkkJKT38Mx8MP/sXr\nd3LOd+ekt994/lF8+qxxOS/Ql5ok57ITRzB6YAX1jQnOnjAopzGJSO70uqTQ01LVRM+fNJj3HDcM\ngNtmLuHbs5bz0oad3H7xJAblcIjnum31AFw0edgBLQQR6X2UFCIWlgvikinDufDYoIhcbWUpl8+Y\ny6MvvcGjL73BdWcfyfXnvC3dquhJq7bsAWBoP917ICIqcxG5RJgVMj/vTxk7gL/deBZffs9EIJjb\neMGabZHF4O5s2rW/xXzKKY1h6YpxtRpdJCJqKXSrtqqkpqafjLXqOzhiQAUfP2MMQ/uV8el7F7Kv\nKRFZXL+a+xpfDudBqCorYlh1OaVFwfeB13fup6w4dlh0eotI7ikpdIOO+orbSwopqcnuU2Wpo7Bu\nez3xmDH9zLGs3bqPvY3N6cf6V5RwXHiTmoiIkkLEUp/17fUXpLYnuiEp/Pzp1fzoyVfT65VlRZw2\nbgAvbdhF/z7F3DTt6EN+DREpbEoKEUt92LfXmki1ILqjofCPV7fSlEhywTFDaWhO8Nxr2/nV3LUA\nnDZuwKG/gIgUPCWFiKWGpLbXUoiFXf3JLKdtW7NlLw+/uBGA0qIYtZWl6XsdHl/6JmMHVvDNS49N\n79/QHPRVFMc0pkBEOqekELG3Rh+1c/ko3VLILin89KlV3Pvs2nYfHz+45Sgi3Y0sIl2hpBChjTvr\nueKnc4H2k0LqW362fQp7G5oZVdOHv9zwDrbsaWBf41ujlpqTSc11ICKHREkhQi9t2JXuK6gsa/tP\nnbqslGVDgf1NScqL45QUxRhWrXkNRKR76UJzN7B2KiolksGNYXdeNplJw6ra3CfV1ZBtS6G+KUFZ\nsf7ZRCQavaalkO038e7UHH7QTx7Rr93Cd7F2+hRu+d2LrN6yh7LiOAMqSikviVEUi7Hizd2MqNFk\nNyISjV6TFFJ6sihp6tt/RzWNUncSZyaFpkSS++atZXh1Of0rilm2cRdNCacpkaQpkeQ9k4dFG7iI\n9Fq9Lin0pOZE8EFf1MFw0NToozv+vJzn1+2gKBZLl6D4+Blj+NgZY6IPVEQkpKQQoXRLId5+S6Gm\nooRTxtawbls9s5dtoimRZEd9E2Ywpvbwm6lNRAqbkkIEduxrZF9jgi3h/MZFHVw+KimKcf/0U1ts\nSyadxkSSsmLdYyAiPUtJoRu5O2u27OXs7zzZomxF6nJQtmIxoyymhCAiPU9JoRtkdl6/uWs/SYdP\nvn0MRw7qy6CqMqr7lOQuOBGRLlBS6GaNieDehPMmDeGk0TU5jkZEpGt0F1Q3S81kVhLXn1ZE8o9a\nCt3ozsdeSS+XdLEfQUTkcKCk0M3OmTCIkqIYYwZqOKmI5J9Ik4KZTQO+D8SBe9z9P1o9Xgr8EjgR\n2Ap80N3XRBlT1O65+qRchyAictAiu8ZhZnHgLuACYCJwhZlNbLXbx4Ht7n4k8D3gjqjiiVLcjHG1\nFXztvcfkOhQRkUMSZUthKrDS3VcBmNn9wCXA0ox9LgFuC5cfAv7LzMw9F+XrDl4sZsz+t7NyHYaI\nyCGLMikMB9ZlrK8HTm5vH3dvNrOdwABgS+ZOZjYdmB6u7jGz5QcZ08CKO1oeuxcYCDrnXkDn3Dsc\nyjkfkc27w4ZXAAAHUUlEQVROedHR7O4zgBmHehwzW+Dudd0QUt7QOfcOOufeoSfOOcpxkxuAkRnr\nI8Jtbe5jZkVAP4IOZxERyYEok8J8YLyZjTGzEuByYGarfWYCV4fL/wT8Nd/6E0RECklkl4/CPoLP\nArMIhqT+zN2XmNlXgQXuPhP4b+B/zWwlsI0gcUTpkC9B5SGdc++gc+4dIj9n0xdzERFJUS0GERFJ\nU1IQEZG0XpMUzGyamS03s5VmdnOu4zkUZvYzM9tkZi9lbKsxs8fNbEX4u3+43czsB+F5v2BmJ2Q8\n5+pw/xVmdnVbr3U4MLORZvaEmS01syVmdn24vZDPuczM5pnZ4vCcbw+3jzGzZ8Nz+004iAMzKw3X\nV4aPj8441i3h9uVmdn5uzih7ZhY3s0Vm9qdwvaDP2czWmNmLZva8mS0It+Xuve3uBf9D0NH9KjAW\nKAEWAxNzHdchnM+ZwAnASxnbvgXcHC7fDNwRLl8IPAoYcArwbLi9BlgV/u4fLvfP9bm1c75DgRPC\n5UrgFYLSKYV8zgb0DZeLgWfDc3kAuDzc/hPg2nD508BPwuXLgd+EyxPD93spMCb8fxDP9fl1cu43\nAL8G/hSuF/Q5A2uAga225ey93VtaCumSG+7eCKRKbuQld59DMFor0yXAL8LlXwDvzdj+Sw/MBarN\nbChwPvC4u29z9+3A48C06KPvOnff6O4Lw+XdwDKCu+EL+Zzd3feEq8XhjwNnE5SEgQPPOfW3eAh4\nl5lZuP1+d29w99XASoL/D4clMxsBvBu4J1w3Cvyc25Gz93ZvSQptldwYnqNYojLY3TeGy28Ag8Pl\n9s49L/8m4SWC4wm+ORf0OYeXUZ4HNhH8J38V2OHuzeEumfG3KBkDpErG5NU5A/8JfB5IhusDKPxz\nduAxM3vOgpI+kMP3dl6UuZCucXc3s4Iba2xmfYHfAv/q7rssY3LsQjxnd08AU8ysGvg9cHSOQ4qU\nmb0H2OTuz5nZWbmOpwed4e4bzGwQ8LiZvZz5YE+/t3tLSyGbkhv57s2wGUn4e1O4vb1zz6u/iZkV\nEySEe939d+Hmgj7nFHffATwBnEpwuSD1ZS4z/vZKxuTTOZ8OXGxmawgu8Z5NMB9LIZ8z7r4h/L2J\nIPlPJYfv7d6SFLIpuZHvMkuGXA38IWP7VeGohVOAnWGzdBZwnpn1D0c2nBduO+yE14n/G1jm7t/N\neKiQz7k2bCFgZuXAuQR9KU8QlISBA8+5rZIxM4HLw5E6Y4DxwLyeOYuucfdb3H2Eu48m+D/6V3f/\nEAV8zmZWYWaVqWWC9+RL5PK9neue9576Iei1f4XguuwXcx3PIZ7LfcBGoIng2uHHCa6lzgZWAH8B\nasJ9jWCyo1eBF4G6jON8jKATbiXw0VyfVwfnewbBddcXgOfDnwsL/JyPAxaF5/wScGu4fSzBB9xK\n4EGgNNxeFq6vDB8fm3GsL4Z/i+XABbk+tyzP/yzeGn1UsOccntvi8GdJ6rMpl+9tlbkQEZG03nL5\nSEREsqCkICIiaUoKIiKSpqQgIiJpSgoiIpKmpCAFxcz2dL5Xet/bzOxzh3p8MxttGRVrD0ZYKXPg\nQT73vWY28VBeXyRFSUEk/72XoDKoyCFTUpCCZ2YXhfX2F5nZX8xscMbDk83smbAG/ScznnOjmc0P\na9bfnsXLFJnZvWa2zMweMrM+4XHSLQAzqzOzJ8PlAWb2mAVzJdxDcFNS6rW/bME8AH83s/tSrRkz\nG2dmfw4Lpz1lZkeb2WnAxcC3LajHP+5Q/17SuykpSG/wd+AUdz+eoKbO5zMeO46gxs6pwK1mNszM\nziMojTAVmAKcaGZndvIaRwE/cvcJwC6CWv8d+Qrwd3efRFDvZhSAmZ0EvB+YDFwA1GU8ZwbwL+5+\nIvC58PX+QVD64EZ3n+Lur3byuiIdUpVU6Q1GAL8JC4uVAKszHvuDu9cD9Wb2BEEiOIOgdsyicJ++\nBEliTgevsc7dnw6XfwVcB9zZwf5nApcCuPvDZrY93H56GNN+YL+Z/RHSFWJPAx7MqA5b2uFZixwE\nJQXpDX4IfNfdZ4YlmW/LeKx1nRcnuJTzTXe/uwuv0dZxAJp5q0Ve1oXjtRYjmFdgyiEcQ6RTunwk\nvUE/3ioj3Hru2kssmA95AEERtvkE1SU/Fn47x8yGh7XuOzLKzE4Nl68kuGQFwVSLJ4bL78/Yf064\nH2Z2AcEUigBPAxeFMfUF3gPg7ruA1WZ2WfgcM7PJ4XN2E0xTKnLIlBSk0PQxs/UZPzcQtAweNLPn\ngC2t9n+BoDTzXOBr7v66uz9GMEfwM2b2IsFUj5196C4HPmNmywg+4H8cbr8d+L4FE7InMva/HTjT\nzJYQXEZaC+Du8wn6CF4gmIv3RYIZxQA+BHzczFIVNVNTyt4P3Bh2pKujWQ6JqqSKHGbMrK+77wlH\nMM0Bpns4R7VI1NSnIHL4mRHejFYG/EIJQXqSWgoiIpKmPgUREUlTUhARkTQlBRERSVNSEBGRNCUF\nERFJ+//nuSE7TpH17AAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def plt_estimates(smplr, true_value):\n", " plt.plot(smplr.estimate_[smplr.queried_oracle_])\n", " plt.axhline(y=true_value, color='r')\n", " plt.xlabel(\"Label budget\")\n", " plt.ylabel(\"Estimate of F1-score\")\n", " plt.ylim(0,1)\n", " plt.show()\n", "\n", "plt_estimates(smplr, data.F1_measure)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Other samplers" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "For comparison, we repeat the evaluation using two alternative sampling methods available in the OASIS package." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "First, we test the basic passive sampling method. It performs poorly due to the extreme class imbalance. Of the 5000 labels queried, none of them correspond to a true positive, yielding an incorrect estimate for the F1-score of 0.0." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFoVJREFUeJzt3XuUZWV95vHvY3NTQEDpOA6NEQwR2yxBLBkVF2G8AioY\nzQXURI0jM0ajM4wXGEcUyNKFZsjkgpeOMpJERSQXW0URFYfRiNLIRWlsbS6GRhNaRPCCF/A3f+xd\n22NRdeoU3bt2d9X3s9ZZtfd73jrn99aq7qf2fvd5d6oKSZIA7jN0AZKkbYehIEnqGAqSpI6hIEnq\nGAqSpI6hIEnq9BYKSc5OckuSr87xfJL8RZKNSa5OckhftUiSJtPnkcJ7gSPHPH8UcED7OAF4R4+1\nSJIm0FsoVNUlwHfHdDkW+JtqXArsmeTBfdUjSZrfDgO+9z7ATSP7m9q2b8/smOQEmqMJdt1118cc\neOCBi1KgJC0Vl19++XeqauV8/YYMhYlV1RpgDcDU1FStW7du4IokafuS5JuT9Bvy6qObgX1H9le1\nbZKkgQwZCmuBP2ivQnoccHtV3ePUkSRp8fR2+ijJB4AjgL2TbALeCOwIUFXvBC4AjgY2Aj8CXtxX\nLZKkyfQWClV1/DzPF/Dyvt5fkrRwfqJZktQxFCRJHUNBktQxFCRJHUNBktQxFCRJHUNBktQxFCRJ\nHUNBktQxFCRJHUNBktQxFCRJHUNBktQxFCRJHUNBktQxFCRJHUNBktQxFCRJHUNBktQxFCRJHUNB\nktQxFCRJHUNBktQxFCRJHUNBktQxFCRJHUNBktQxFCRJHUNBktQxFCRJHUNBktQxFCRJHUNBktQx\nFCRJHUNBktTpNRSSHJlkQ5KNSU6a5fmHJLk4yRVJrk5ydJ/1SJLG6y0UkqwAzgKOAlYDxydZPaPb\n/wTOq6pHA8cBb++rHknS/Po8UjgU2FhV11fVT4FzgWNn9Cng/u32HsC3eqxHkjSPPkNhH+Cmkf1N\nbduoNwEvSLIJuAD449leKMkJSdYlWbd58+Y+apUkMfxE8/HAe6tqFXA08LdJ7lFTVa2pqqmqmlq5\ncuWiFylJy0WfoXAzsO/I/qq2bdRLgPMAquoLwC7A3j3WJEkao89QuAw4IMl+SXaimUheO6PPvwBP\nBkjyCJpQ8PyQJA2kt1CoqruAVwAXAtfSXGV0TZLTkhzTdvvvwEuTXAV8AHhRVVVfNUmSxtuhzxev\nqgtoJpBH204Z2V4PHNZnDZKkyQ090SxJ2oYYCpKkjqEgSeoYCpKkjqEgSeoYCpKkjqEgSeoYCpKk\njqEgSepMHApJ7tdnIZKk4c0bCkmekGQ98LV2/6Ak3iFNkpagSY4U/gx4OnArQFVdBRzeZ1GSpGFM\ndPqoqm6a0XR3D7VIkgY2ySqpNyV5AlBJdgReRbMUtiRpiZnkSOG/AC+nub/yzcDB7b4kaYkZe6SQ\nZAXw+1X1/EWqR5I0oLFHClV1N/C8RapFkjSwSeYUPpfkr4APAj+cbqyqL/dWlSRpEJOEwsHt19NG\n2gp40tYvR5I0pHlDoar+42IUIkka3iSfaN4jyZlJ1rWP/5Vkj8UoTpK0uCa5JPVs4PvA77aPO4D/\n02dRkqRhTDKn8LCqeu7I/qlJruyrIEnScCY5UrgzyROnd5IcBtzZX0mSpKFMcqTwMuCckXmE24AX\n9VaRJGkwk1x9dCVwUJL7t/t39F6VJGkQk1x99OYke1bVHVV1R5K9kvzJYhQnSVpck8wpHFVV35ve\nqarbgKP7K0mSNJRJQmFFkp2nd5LcF9h5TH9J0nZqkonm9wGfTjL92YQXA+f0V5IkaSiTTDSfkeQq\n4Clt0+lVdWG/ZUmShjBvKCTZFfhkVX0iycOBhyfZsap+1n95kqTFNMmcwiXALkn2AT4B/D7w3j6L\nkiQNY5JQSFX9CHgO8I6q+h3gkf2WJUkawkShkOTxwPOBj7VtKyZ58SRHJtmQZGOSk+bo87tJ1ie5\nJsn7JytbktSHSa4+ehVwMvCPVXVNkv2Bi+f7pvb+zmcBTwU2AZclWVtV60f6HNC+9mFVdVuSX7k3\ng5AkbR2TXH10Cc28Akn+XVVdD7xygtc+FNjY9ifJucCxwPqRPi8Fzmo/EEdV3bKw8iVJW9Mkp49G\nXbCAvvsAN43sb2rbRv068OtJPp/k0iRHzvZCSU6YvsnP5s2bF1axJGliCw2FbOX33wE4ADgCOB74\n6yR7zuxUVWuqaqqqplauXLmVS5AkTVtoKPz1AvreDOw7sr+qbRu1CVhbVT+rqhuAr9OEhCRpAAsK\nhap6O0CS3SbofhlwQJL9kuwEHAesndHnn2iOEkiyN83ppOsXUpMkaetZ6JHCtPXzdaiqu4BXABcC\n1wLntVcvnZbkmLbbhcCtSdbTXNH0mqq69V7WJEnaQnNefZTkxLmeAiY5UqCqLmDG5HRVnTKyXcCJ\n7UOSNLBxRwpvBvYCdp/x2G2e75MkbafGfU7hy8A/VdXlM59I8p/6K0mSNJRxofBiYK7z+1M91CJJ\nGti4ULiunSy+h6r6t57qkSQNaNzcwJemN5L85SLUIkka2LhQGP308mF9FyJJGt64UKhFq0KStE0Y\nN6dwYJKraY4YHtZu0+5XVT2q9+okSYtqXCg8YtGqkCRtE+YMhar65mIWIkkanp9MliR1DAVJUmfO\nUEjy6fbrGYtXjiRpSOMmmh+c5AnAMe39lX/prmtV9eVeK5MkLbpxoXAK8AaaO6adOeO5Ap7UV1GS\npGGMu/rofOD8JG+oqtMXsSZJ0kDGHSkAUFWnt3dKO7xt+mxVfbTfsiRJQ5j36qMkbwFeRXMLzvXA\nq5K8ue/CJEmLb94jBeAZwMFV9XOAJOcAVwD/o8/CJEmLb9LPKew5sr1HH4VIkoY3yZHCW4ArklxM\nc1nq4cBJvVYlSRrEJBPNH0jyWeCxbdPrqupfe61KkjSISY4UqKpvA2t7rkWSNDDXPpIkdQwFSVJn\nolBI8sQkL263VybZr9+yJElDmOTDa28EXgec3DbtCPxdn0VJkoYxyZHCbwHHAD8EqKpvAbv3WZQk\naRiThMJPq6poVkYlya79liRJGsokoXBekncBeyZ5KfAp4N39liVJGsIkH1770yRPBe4AHg6cUlUX\n9V6ZJGnRzRsKSc6oqtcBF83SJklaQiY5ffTUWdqO2tqFSJKGN+eRQpKXAX8E7J/k6pGndgc+33dh\nkqTFN+700fuBj9Oskjq6Kur3q+q7vVYlSRrEnKePqur2qrqxqo6vqm8Cd9JclrpbkodM8uJJjkyy\nIcnGJHMut53kuUkqydSCRyBJ2mom+UTzs5J8A7gB+L/AjTRHEPN93wrgLJr5h9XA8UlWz9Jvd5rb\nfX5xQZVLkra6SSaa/wR4HPD1qtoPeDJw6QTfdyiwsaqur6qfAucCx87S73TgDODHk5UsSerLJKHw\ns6q6FbhPkvtU1cXAJKd59gFuGtnf1LZ1khwC7FtVHxv3QklOSLIuybrNmzdP8NaSpHtjkpvsfC/J\nbsAlwPuS3EK7DtKWSHIf4EzgRfP1rao1wBqAqamp2tL3liTNbpIjhWNpJpn/G/AJ4DrgWRN8383A\nviP7q9q2absDvwF8NsmNNKeo1jrZLEnDmWSZix8CJLk/8JEFvPZlwAHtvRduBo4DnjfyurcDe0/v\nt/eBfnVVrVvAe0iStqJJlrn4z8CpNBPBPwdCc2nq/uO+r6ruSvIK4EJgBXB2VV2T5DRgXVV5z2dJ\n2sZMMqfwauA3quo7C33xqroAuGBG2ylz9D1ioa8vSdq6JplTuA74Ud+FSJKGN8mRwsnAPyf5IvCT\n6caqemVvVUmSBjFJKLwL+AzwFZo5BUnSEjVJKOxYVSf2XokkaXCTzCl8vP1E8YOTPGD60XtlkqRF\nN8mRwvHt15NH2ua9JFWStP2Z5MNr+y1GIZKk4Y2789qTquozSZ4z2/NV9Q/9lSVJGsK4I4XfpLnq\naLZ1jgowFCRpiZkzFKrqje3maVV1w+hz7XpGkqQlZpKrj/5+lrbzt3YhkqThjZtTOBB4JLDHjHmF\n+wO79F2YJGnxjZtTeDjwTGBPfnle4fvAS/ssSpI0jHFzCh8GPpzk8VX1hUWsSZI0kEnmFH4ryf2T\n7Jjk00k2J3lB75VJkhbdJKHwtKq6g+ZU0o3ArwGv6bMoSdIwJgmFHduvzwA+1N5GU5K0BE2y9tFH\nknwNuBN4WZKVNLfmlCQtMZOsfXRSkrcCt1fV3Ul+BBzbf2lz2LABjjhisLeXpKVsztNHSV47svvk\nqroboKp+CHjXNUlaglJVsz+RfLmqDpm5Pdv+Ypqamqp169YN8daStN1KcnlVTc3Xb9xEc+bYnm1f\nkrQEjAuFmmN7tn1J0hIwbqL5oCR30BwV3Lfdpt137SNJWoLGLXOxYjELkSQNb5IPr0mSlglDQZLU\nMRQkSR1DQZLUMRQkSR1DQZLUMRQkSR1DQZLUMRQkSZ1eQyHJkUk2JNmY5KRZnj8xyfokV7f3f/7V\nPuuRJI3XWygkWQGcBRwFrAaOT7J6RrcrgKmqehRwPvDWvuqRJM2vzyOFQ4GNVXV9Vf0UOJcZd2yr\nqour6kft7qXAqh7rkSTNo89Q2Ae4aWR/U9s2l5cAH5/tiSQnJFmXZN3mzZu3YomSpFHbxERzkhcA\nU8DbZnu+qtZU1VRVTa1cuXJxi5OkZWTc/RS21M3AviP7q9q2X5LkKcDrgd+sqp/0WI8kaR59Hilc\nBhyQZL8kOwHHAWtHOyR5NPAu4JiquqXHWiRJE+gtFKrqLuAVwIXAtcB5VXVNktOSHNN2exuwG/Ch\nJFcmWTvHy0mSFkGfp4+oqguAC2a0nTKy/ZQ+31+StDDbxESzJGnbYChIkjqGgiSpYyhIkjqGgiSp\nYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhI\nkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqG\ngiSpYyhIkjqGgiSpYyhIkjqGgiSp02soJDkyyYYkG5OcNMvzOyf5YPv8F5M8tM96JEnj9RYKSVYA\nZwFHAauB45OsntHtJcBtVfVrwJ8BZ/RVjyRpfjv0+NqHAhur6nqAJOcCxwLrR/ocC7yp3T4f+Ksk\nqara2sW853M3cOYnN2ztl5WkRXPKs1bze499SK/v0Wco7APcNLK/CfgPc/WpqruS3A48EPjOaKck\nJwAntLs/SDL9v/veM/suI459+VrO41/OY+e409n7uHs//l+dpFOfobDVVNUaYM3M9iTrqmpqgJIG\n59iX59hheY9/OY8dFmf8fU403wzsO7K/qm2btU+SHYA9gFt7rEmSNEafoXAZcECS/ZLsBBwHrJ3R\nZy3wwnb7t4HP9DGfIEmaTG+nj9o5glcAFwIrgLOr6pokpwHrqmot8B7gb5NsBL5LExwLcY9TSsuI\nY1++lvP4l/PYYRHGH/8wlyRN8xPNkqSOoSBJ6myXoTDf8hnbqyRnJ7klyVdH2h6Q5KIk32i/7tW2\nJ8lftD+Dq5McMvI9L2z7fyPJC2d7r21Nkn2TXJxkfZJrkryqbV/y40+yS5IvJbmqHfupbft+7fIv\nG9vlYHZq2+dcHibJyW37hiRPH2ZEC5dkRZIrkny03V9OY78xyVeSXJlkXds23O99VW1XD5pJ6+uA\n/YGdgKuA1UPXtZXGdjhwCPDVkba3Aie12ycBZ7TbRwMfBwI8Dvhi2/4A4Pr2617t9l5Dj22CsT8Y\nOKTd3h34Os3yKEt+/O0Ydmu3dwS+2I7pPOC4tv2dwMva7T8C3tluHwd8sN1e3f572BnYr/13smLo\n8U34MzgReD/w0XZ/OY39RmDvGW2D/d5vj0cK3fIZVfVTYHr5jO1eVV1CcxXWqGOBc9rtc4Bnj7T/\nTTUuBfZM8mDg6cBFVfXdqroNuAg4sv/qt0xVfbuqvtxufx+4luYT70t+/O0YftDu7tg+CngSzfIv\ncM+xT/9MzgeenCRt+7lV9ZOqugHYSPPvZZuWZBXwDODd7X5YJmMfY7Df++0xFGZbPmOfgWpZDA+q\nqm+32/8KPKjdnuvnsN3/fNpTAo+m+Yt5WYy/PX1yJXALzT/o64DvVdVdbZfRcfzS8jDA9PIw2+XY\ngf8NvBb4ebv/QJbP2KH5A+CTSS5Ps6QPDPh7v10sc6FGVVWSJX0NcZLdgL8H/mtV3dH8EdhYyuOv\nqruBg5PsCfwjcODAJS2KJM8Ebqmqy5McMXQ9A3liVd2c5FeAi5J8bfTJxf693x6PFCZZPmMp+bf2\n8JD26y1t+1w/h+3255NkR5pAeF9V/UPbvGzGD1BV3wMuBh5Pc2pg+g+30XHMtTzM9jj2w4BjktxI\ncyr4ScCfszzGDkBV3dx+vYXmD4JDGfD3fnsMhUmWz1hKRpcCeSHw4ZH2P2ivRngccHt7uHkh8LQk\ne7VXLDytbdumteeF3wNcW1Vnjjy15MefZGV7hECS+wJPpZlTuZhm+Re459hnWx5mLXBce4XOfsAB\nwJcWZxT3TlWdXFWrquqhNP+WP1NVz2cZjB0gya5Jdp/epvl9/SpD/t4PPfN+bx40M/Bfpznv+vqh\n69mK4/oA8G3gZzTnBF9Cc77008A3gE8BD2j7huYmRtcBXwGmRl7nD2km2jYCLx56XBOO/Yk051av\nBq5sH0cvh/EDjwKuaMf+VeCUtn1/mv/YNgIfAnZu23dp9ze2z+8/8lqvb38mG4Cjhh7bAn8OR/CL\nq4+WxdjbcV7VPq6Z/v9syN97l7mQJHW2x9NHkqSeGAqSpI6hIEnqGAqSpI6hIEnqGApaUpL8YP5e\nXd83JXn1lr5+kodmZGXbe6NdKXPve/m9z06yekveX5pmKEjbv2fTrBIqbTFDQUtekme1a+9fkeRT\nSR408vRBSb7QrkH/0pHveU2Sy9o160+d4G12SPK+JNcmOT/J/drX6Y4Akkwl+Wy7/cAkn0xz/4R3\n03woafq935DmngCfS/KB6aOZJA9L8ol24bT/l+TAJE8AjgHelmY9/odt6c9Ly5uhoOXgc8DjqurR\nNOvrvHbkuUfRrLfzeOCUJP8+ydNolkk4FDgYeEySw+d5j4cDb6+qRwB30Kz7P84bgc9V1SNp1rt5\nCECSxwLPBQ4CjgKmRr5nDfDHVfUY4NXt+/0zzdIHr6mqg6vqunneVxrLVVK1HKwCPtguLLYTcMPI\ncx+uqjuBO5NcTBMET6RZO+aKts9uNCFxyZj3uKmqPt9u/x3wSuBPx/Q/HHgOQFV9LMltbfthbU0/\nBn6c5CPQrR77BOBDIyvH7jx21NK9YChoOfhL4MyqWtsuz/ymkedmrvNSNKdy3lJV71rAe8z2OgB3\n8Ysj8l0W8Hoz3YfmHgMHb8FrSPPy9JGWgz34xTLCM+9de2yaeyQ/kGZBtstoVpf8w/avc5Ls0651\nP85Dkjy+3X4ezSkraG61+Jh2+7kj/S9p+5HkKJpbKAJ8HnhWW9NuwDMBquoO4IYkv9N+T5Ic1H7P\n92luYSptMUNBS839kmwaeZxIc2TwoSSXA9+Z0f9qmmWaLwVOr6pvVdUnae4X/IUkX6G57eN8/+lu\nAF6e5Fqa/+Df0bafCvx5mhuy3z3S/1Tg8CTX0JxG+heAqrqMZo7gapp78X6F5u5iAM8HXpJkekXN\n6dvQngu8pp1Id6JZW8RVUqVtTJLdquoH7RVMlwAnVHv/aqlvzilI25417YfRdgHOMRC0mDxSkCR1\nnFOQJHUMBUlSx1CQJHUMBUlSx1CQJHX+Py6MaJ8rNfudAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "pass_smplr = oasis.PassiveSampler(alpha, data.preds, oracle, max_iter=max_iter)\n", "pass_smplr.sample_distinct(n_labels)\n", "plt_estimates(pass_smplr, data.F1_measure)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The non-adaptive importance sampling method fares better, yielding a decent estimate after consuming 5000 labels. However, it takes longer to converge than OASIS." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8XXWd//HXJ3vSrG3TvaWLhbLYFigIgqyC7HUZlUVH\nHUYcl9EZFAd+jqjo6Oi4juKCysi4gMq4VGXXAoJsZWmhQEsohS60TduszX7z+f1xzr29SXOT2zYn\nJ8l9Px+PPHK2e+7ntDf3c77L+X7N3REREQHIizsAEREZPZQUREQkRUlBRERSlBRERCRFSUFERFKU\nFEREJCWypGBmN5rZDjN7JsN+M7P/NrM6M1tjZsdEFYuIiGQnypLCT4BzBtl/LrAw/LkC+F6EsYiI\nSBYiSwrufj+we5BDlgP/64GHgWozmx5VPCIiMrSCGN97JrApbX1zuO3V/gea2RUEpQkmTJhw7KJF\ni0YkQBGR8eLxxx/f6e61Qx0XZ1LImrvfANwAsGzZMl+1alXMEYmIjC1m9nI2x8XZ+2gLMDttfVa4\nTUREYhJnUlgB/H3YC+kEoMnd96k6EhGRkRNZ9ZGZ3QycBkw2s83AZ4BCAHf/PnAbcB5QB7QB74sq\nFhERyU5kScHdLxlivwMfjur9RURk/+mJZhERSVFSEBGRFCUFERFJUVIQEZGUnEsKz2xpYk9nT9xh\niIiMSjmVFFo6urng2w/wpdufizsUEZFRKaeSQn1LJwArn6+PORIRkdEpp5JCe3cCgNKi/JgjEREZ\nnXIqKfQkHIA8izkQEZFRKreSQm8yKSgriIgMJKeSQkJJQURkUDmVFHoSvQAoJ4iIDCy3koJKCiIi\ng8qppJCqPlJLs4jIgHIqKXSH1UfKCSIiA8uppKCGZhGRweVUUki2KeQrKYiIDCinkkKypKCcICIy\nsJxKCnvbFJQVREQGklNJYW/vo5gDEREZpXLq67GpvRtQSUFEJJOcSgqrNzcCUFygUVJFRAaSU0lh\nQlEBAGUaOltEZEA5lRQSHrQp9Ia/RUSkr5xKCslcoJwgIjKwnEoKyd5Hyd8iItJXbiWFsIjwxCsN\nMUciIjI65VRS6A1LCBMnFMUciYjI6JRbSSEsKTy/rYW2rp6YoxERGX1yKimEo1wA8ItHXokvEBGR\nUSqnkkJ6V9R8TaogIrKPnEoK6b2OigvyeXpzU2reZhERybGksH57S2q5oa2LC7/zAF+7e32MEYmI\njC45lRRqK4pTy2u3NgHw9OamuMIRERl1cioppD/JfNvT24C+pQcRkVwXaVIws3PMbJ2Z1ZnZ1QPs\nn2NmK83sSTNbY2bnRRmPs++TzEtmV0f5liIiY0pkScHM8oHrgXOBI4BLzOyIfof9O/Ardz8auBj4\nblTxwMBjHqmhWURkryhLCscDde6+wd27gFuA5f2OcaAyXK4CtkYYD+4wo6qkz7buhMZBEhFJijIp\nzAQ2pa1vDrel+yzwLjPbDNwG/PNAJzKzK8xslZmtqq+vP+CAet3Jz+/7fEK3SgoiIilxNzRfAvzE\n3WcB5wE/NbN9YnL3G9x9mbsvq62tPag33NOZ6LPeoxFTRURSokwKW4DZaeuzwm3pLgd+BeDuDwEl\nwOSoAnKH7p69JQMztSmIiKSLMik8Biw0s3lmVkTQkLyi3zGvAGcCmNnhBEnhwOuHhuD9+h9VlhSq\nTUFEJE1kScHde4CPAHcCzxH0MlprZteZ2UXhYR8H3m9mq4Gbgfe6RzcvmntQOkhaNK2Cnl6VFERE\nkgqiPLm730bQgJy+7dq05WeBk6KMoc97AyctmMwda4MH12oriqlv6RyptxcRGfXibmgeUf17HxXm\n59GtkoKISEqkJYVRxyHPjOVLZ9DT6xTmG909alOQA9fS0c2azU28sL2F9Tta2bhzDycvnMx7TpxL\nYX4e67a1cNszr7K7tYuEO0+80sCG+j0sXzqDw6ZV0NjWzabdbZQU5rO5oQ3DuPwN83jTkdPivjTJ\nUTmVFBww4JvvXArAp373TMY2hY079/D4yw287dhZIxegxObVpnae2dJMnsEhk8q4b/1O6na0sLmh\nHXeYO7mMVRsb6OrpZWZNKY1t3QA8vWXfARX/9uIuvnLHuj7bSgvzqSotZFtzB0tnV3Pvunp+/9RW\n8gwmTihmZ2snUyuL2d7cSVeiV0lBYpNbScEdM7CwtbkwzzL2PnrLdx+koa2bs46cSmVJ4UiGGYuu\nnl7+9PRW/vWXq1Pb3nr0TC47YQ7HzKlJ/ZsdiNbOHp7Z0sTh0yupKo3m37I70UthflAb2tvr5OUZ\n3Yle7ly7jTkTy9ja2MH25g427trDy7va2NLQzoIpE1g4pYKf/G0jTe3dA563MD/4jDxQF6xPnFBE\nwp2Xd7Uxq6aUS46fw6yaUk49tJb5tRMoKcjnD2u2sqWxnU272zlsajnnL57RZ4ReCLpCd/b0UlKY\nT55Bc0cPVaWFvPMHD0Xy7yOSrdxKCgQlhaSC/LwBn1PoTvTSEN4JLv7sXfzHW47istcdMjJBjqD2\nrgQP1u3kP+94nrodrantVaWFNLV385snt/CbJ/c+WnLrP53IwikVFBfmcfOjr+AOpxxaS31LJ09t\naqSqtJBXm9r56cMv09jWzdxJZWzc1dbnPU89tJavvn0JRfl5dPYkmFK5d9iRju4EL+9q44UdLbzp\nyGmpL/mkV3a18chLu7h3XT0PbdjF3ElltHUlaGzrZntLR5+xrZJ33YNZt70FCDodnDB/Ih8+/TWp\neE+YN5GFUyuA4POQb0ZelrP1LV/a/8H9fRXk51GQdn1RJUuR/ZVTSaHXvc8db0G+sacrQVdPL0UF\ne/9AX9je2ud1n/rtM7Elhe5EL3lmwzZ9aHtXgu/dW8fmhvY+X/hJN/3D8Zx6aC0bd+5he3MHP334\nZf645lUA/u77+3cXm/yC/cCp89nR3Mlvn9zCfevrOe4/7kkdc+aiKTy0YRdtXYl9Xr9kVhWrNzex\ndHY1Te3dvLRzT5/9s2pKKSnMp6yot09COGxqBXl5xlEzqigqyOOomVW8YeFkJhQXcMjEMh55aTdl\nRfkUFeThDodOrUj9/79h4b7X0T85RU2tXBKnnEoK/Z9TKAi/aDfsbGXRtMrU9vrWfe8wm9q7R+xu\nbtPuNu5cu40TF0zi/P9+ILX9yBmV/O7DJ+3zJfWjv27grrXb+dF7l5FvxkMv7kp90Xb2JFg4pYL2\n7gT1LR3c/OimPq8977XTePPSmZy+aEqf886dPIG5kyfwuvmT+M6lcP/6en7yt41sb+7gmDk1HDKp\njD2dCRraujjz8Cm0dyW46aGNLF8yk7ccM5PC/Dw27W6jozuRuuP+2tuX8KenX+Vrd62juaOH3Xu6\n+PPzO1LvefnJ82hu7+bpLU08v62F1eEESE9taqSiuIArTpnP4dMrWDilgqNmVh3wv+9Jr4nsofmD\nZjbwaL4iIyX3kkJaBdLS2TUAbGvqoCfhqS+a3XuCpLDyE6dx+lfvBeDZrc2cuGBSZLHtau1kUnkx\ntzz6Clf/5ukBj1m7tZmFn7odgHedMIeq0kKuX/liav/iz96V9ft9+PQFHD9vEqcemt1YUqccWssp\nQxx7dr/G0dkTy/qs5+UZFy6ZwYVLZgBB4+796+t54+FTmTihaJ92i+5ELwV5hpmF7UHDU1oSkcxy\nKilA35JCssrgvf/zGABPfPosigryaOnoAaCypICK4gJaOnv6VC8Nlxe2tzClsoRbHn2FL93+/IDH\nTJpQxINXn8GO5k5O/9q9JMIB/H728CuDnvsjp7+Gty+bxTfuXs/siWU0tXezeFY17d0Jzj1qGpPL\niwd9/UiYXlXKO4+bk3F/eslFCUFkZGSdFMyszN3bhj5y9HL3Pg3NRf2qYT7+q6dYua6e979hHgDl\nJQV857JjeM+Nj/LjBzZw7CHHHvD77t7TRUtHD1+583mOmVPDynU7eLBu14DH//GfT+aomVXsaO5I\nNcTOmVTGi188j86eBC9sb+WCbz9AUX4ev/nQ61k0rYJeh/vW13PGoil92h++efHRBxSzxEjVRxKj\nIZOCmb0e+BFQDswxsyXAB9z9Q1EHN9x6w4fXkooK+t59rlwXjMX3w7++BEBxQT5Lw+k6k3M6Z/KP\nN63i7COn8o5ls/fZ96GfP8Htz+x9faZzfeCU+Vxz3uGp9fSeOUnFBfkcNbOKjf95/j77zjpi6qAx\nyugXVG8qK0h8sikpfAN4E+EIp+6+2sxOiTSqiDjet/ooPz/jsTOrS4Hsugp+654XuOe57dzz3HYq\nSwo556i9detN7d19EkK6z1x4BJ/7w7MU5hsv/Eek01OLiGQlq+ojd9/Ur0533/6DY0D/3keFBZnr\nqV+b1rvlyrMO5et3r091Xd3W1EFZcT6VJYW8938e5d51e0f7/qefPQ7ATy8/njcsrOWH928A4ILF\n0/mXNx5KZUkB67a3MLO6lPm15Vz2ukMozFd9uezlKilIjLJJCpvCKiQ3s0LgYwRDYY85wZ9aWvXR\nIP3Pp6XN5VxTFpQWGtu7mFJRwglf+jMAD19zZp+EkO7dP36Ux//9jdzyWNAF9LMXHZlq3E2vFoqi\nAVvGLrWnS9yySQr/BHyLYH7lLcBdwIejDCoKiV6nvqVzwN5HA3nXCXsfVqsuKwLgvnX1XHXrmtT2\nZHL49iVHc8whNTz+cgP3PLudFau3AnDsF4KHtJYvnTEqevuIiAxl0KRgZvnAu939shGKJzK7wmcP\n0oe1mFCU+fJnVO+9m68OSwrpCSHd2UdOpbggn5nVpVy0ZAZfe8eS1PMEAHMnTTio2EVERsqgScHd\nE2Z2KUFj87iweFZ1ajn5ZQ9w/aXHsKG+la/dvR6AkoK9jdAlhfs2SH/6giP4/B+f5Z4rT6G4oO/+\nwvw8fvDuY2nrCp53eHMWY+GIJOmJZolTNtVHD5jZd4BfAqnBZ9z9iciiGiHpjeeTyos4f/FCTlgw\niYdf3NVn8LNkt1SAD562gAsWT+fIGVVcfvK8jOfW0MdyINSmIHHLJiksDX9fl7bNgTOGP5z4lIal\ngePmTuS4uRP77CvMz+PPHz+Vlc/v4PKT5+npWhEZt4ZMCu5++kgEErkMRfJvvnMpL+3cwxEzKgc+\nILSgtpwFteURBCbSl2qPJE7ZPNFcBXwGSD6wdh9wnbvvO+XUGND/Jv/NR6u+X0YPQ6VQiVc2neRv\nBFqAd4Q/zcD/RBmUiIjEI5s2hQXu/ra09c+Z2VNRBSSS61zdjyRG2ZQU2s3s5OSKmZ0EtEcXkkju\nUh8GiVs2JYUPAjeFbQsADcB7I4soIrr3EhEZWja9j54ClphZZbjeHHlUEVJDnohIZkNWH5nZF82s\n2t2b3b3ZzGrM7AsjEZxILlKpVuKUTZvCue7emFxx9wZAg/+LiIxD2SSFfDNLDfFpZqWAhvwUERmH\nsmlo/jnwZzNLPpvwPuCm6EKKhnr5yVihz6rEKZuG5i+b2WrgjeGmz7v7ndGGFR11+ZPRTONqSdyy\nGeZiAnCXu99hZocBh5lZobt3Rx+eiIiMpGzaFO4HSsxsJnAH8G7gJ1EGJSIi8cgmKZi7twFvBb7n\n7m8Hjow2LJHcpSYFiVNWScHMTgQuA/4Ubtt3KrKBX3iOma0zszozuzrDMe8ws2fNbK2Z/SK7sPef\n609NxgC1KEjcsul99DHgGuC37r7WzOYDK4d6UTi/8/XAWcBm4DEzW+Huz6YdszA890nu3mBmUw7k\nIvaH/uhERDLLpvfR/QTtCpjZNHffAHw0i3MfD9SFx2NmtwDLgWfTjnk/cH34QBzuvmP/whcZh9Qn\nVWKUTfVRutv249iZwKa09c3htnSHAoea2YNm9rCZnTPQiczsCjNbZWar6uvr9y9ikTFEPVIlbvub\nFIb7I1sALAROAy4Bfmhm1f0Pcvcb3H2Zuy+rra09oDfSzZeIyND2Nyn8cD+O3QLMTlufFW5LtxlY\n4e7d7v4SsJ4gSURGd2IiIpntV1Jw9+8CmFk2M9g/Biw0s3lmVgRcDKzod8zvCEoJmNlkguqkDfsT\nk8h4o0KtxGl/SwpJzw51gLv3AB8B7gSeA34V9l66zswuCg+7E9hlZs8S9Gi6yt13HWBMImOeCrIS\nt4y9j8zsyky7gGxKCrj7bfRrnHb3a9OWHbgy/BERkZgNVlL4IlADVPT7KR/idaOSiuQyVqhThMRp\nsOcUngB+5+6P999hZv8YXUjR0nScMppplFSJ22BJ4X1Apvr9ZRHEIiIiMRssKbwYNhbvw923RxSP\niIjEaLC2gUeTC2b27RGIRUTQ4I0Sr8GSQnrl5klRBxI1V+udjAFqUZC4DZYUxue3qP7qREQyGqxN\nYZGZrSH4Gl0QLhOuu7svjjw6kRykQq3EabCkcPiIRSEigMbmkvhlTAru/vJIBiIiIvEbc08mHygV\nyUVEhpYzSSFJpXMZ7XQDI3HKmBTM7M/h7y+PXDgiuU63LRKvwRqap5vZ64GLwvmV+3xa3f2JSCMT\nEZERN1hSuBb4NMGMaV/vt8+BM6IKSiSXqfZI4jRY76NbgVvN7NPu/vkRjClSGoVSRjN9PCVug5UU\nAHD3z4czpZ0SbrrX3f8YbVgiIhKHIXsfmdmXgI8RTMH5LPAxM/ti1IGJiMjIG7KkAJwPLHX3XgAz\nuwl4Evh/UQYmkqs0eKPEKdvnFKrTlquiCCRq+juTsUBNChK3bEoKXwKeNLOVBJ/ZU4CrI40qQvqj\nExHJLJuG5pvN7F7guHDTv7n7tkijEhGRWGRTUsDdXwVWRByLSM5Tl1SJW86NfSQiIpnlTFLQvLci\nIkPLKimY2clm9r5wudbM5kUbVnRUPJfRTj3lJE7ZPLz2GeDfgGvCTYXAz6IMSiRXmfrHScyyKSm8\nBbgI2APg7luBiiiDEhGReGSTFLo8eMTSAcxsQrQhieQ2tX9JnLJJCr8ysx8A1Wb2fuAe4EfRhjX8\nVE8rY4HavCRu2Ty89lUzOwtoBg4DrnX3uyOPLCL6oxMRyWzIpGBmX3b3fwPuHmCbiIiMI9lUH501\nwLZzhzsQEQmoqlPilLGkYGYfBD4EzDezNWm7KoAHow5MJBepelPiNlj10S+A2wlGSU0fFbXF3XdH\nGlUEdPMlIjK0jNVH7t7k7hvd/RJ3fxloJ/huLTezOdmc3MzOMbN1ZlZnZhmH2zazt5mZm9my/b6C\n/aSHg2S00w2MxCmbJ5ovNLMXgJeA+4CNBCWIoV6XD1xP0P5wBHCJmR0xwHEVBNN9PrJfkYuMQ7pp\nkbhl09D8BeAEYL27zwPOBB7O4nXHA3XuvsHdu4BbgOUDHPd54MtAR3Yhi4hIVLJJCt3uvgvIM7M8\nd18JZFPNMxPYlLa+OdyWYmbHALPd/U+DncjMrjCzVWa2qr6+Pou33pfmvRURGVo2k+w0mlk5cD/w\nczPbQTgO0sEwszzg68B7hzrW3W8AbgBYtmzZQX27q3eHjHa6gZE4ZVNSWE7QyPyvwB3Ai8CFWbxu\nCzA7bX1WuC2pAjgKuNfMNhJUUa0YicZmkVFLNy0Ss2yGudgDYGaVwB/249yPAQvDuRe2ABcDl6ad\ntwmYnFwP54H+hLuv2o/3EBGRYZTNMBcfAD5H0BDcS3Av48D8wV7n7j1m9hHgTiAfuNHd15rZdcAq\nd9eczyIDUOWRxCmbNoVPAEe5+879Pbm73wbc1m/btRmOPW1/z79fsUR5cpFhotojiVs2bQovAm1R\nByIiIvHLpqRwDfA3M3sE6ExudPePRhaViIjEIpuk8APgL8DTBG0KIhIl1XVKjLJJCoXufmXkkYgI\npgdpJGbZtCncHj5RPN3MJiZ/Io9smOl5IBGRoWVTUrgk/H1N2rYhu6SOVroTk9FO9y8Sp2weXps3\nEoGIiLqkSvwGm3ntDHf/i5m9daD97v6b6MISEZE4DFZSOJWg19FA4xw5oKQgIjLOZEwK7v6ZcPE6\nd38pfV84ntEYo5paGRs0SqrEKZveR/83wLZbhzuQkaI6WxnN1A9C4jZYm8Ii4Eigql+7QiVQEnVg\nIiIy8gZrUzgMuACopm+7Qgvw/iiDEsllqjySOA3WpvB74PdmdqK7PzSCMUVC1bQyFqj2SOKWTZvC\nW8ys0swKzezPZlZvZu+KPLKIqM5WRCSzbJLC2e7eTFCVtBF4DXBVlEGJ5DKVaiVO2SSFwvD3+cCv\nw2k0RSQCGoZF4pbN2Ed/MLPngXbgg2ZWSzA1p4iIjDOWzYMy4aioTe6eMLMJQIW7b4s8ugEsq6jw\nVcceu9+va+tKsGZzIwunVjBpQlEEkYkcvLodrbR09nD07Oq4Q5Fxxu6773F3XzbUcRmrj8zsk2mr\nZ7p7AsDd9wCadU0kJr3utHUl2NbcQV19K89va6E7oYYIGR4ZSwpm9oS7H9N/eaD1kbRs2TJftWrV\nfr9u/fYWzv7G/Vx/6TGcv3h6BJGJHLwrf/kUj728m79+8ozUtkSv82DdTh6o28kdz2zjld37Tpl+\n8/tP4MQFk0YyVBljzCyrksJgbQqWYXmgdREZJt09zks797B6UyMv7GhhxeqtbNrdjhmcOH8Sy5fO\nYN7kCRx7SA1bGtu59IeP4HrkTYbJYEnBMywPtC4iw6DXnW3NHZz+1XsByDM4ek4NV71pEaceWktV\naWGf47c1hX0+9Bcpw2SwpLDEzJoJSgWl4TLh+pgb+0h9v2UseOsxsyjIz+PoOdW8dmYVh02roLgg\nP+PxyS6svfp8yzAZbJiLzJ/EMUzdwGU0O+XQWk45tDbr45OfZ1UfyXDJ5uE1ERml8pJJQTlBhomS\ngsiYlqw+UlaQ4aGkIDKGpUoK8YYh40jOJAXVucp4lGxo1hSeMlxyJikkqZ1ZxpPk51k5QYZLziUF\nkfEkL1VSiDkQGTeUFETGsGSXVDU0y3BRUhAZw0wNzTLMciYp6EZKxiNDDc0yvCJNCmZ2jpmtM7M6\nM7t6gP1XmtmzZrYmnP/5kCjjCd4z6ncQGTmmh9dkmEWWFMwsH7geOBc4ArjEzI7od9iTwDJ3Xwzc\nCnwlqnhExqNUQ3PMccj4EWVJ4Xigzt03uHsXcAuwPP0Ad1/p7snB4R8GZkUYj8i4o4ZmGW5RJoWZ\nwKa09c3htkwuB24faIeZXWFmq8xsVX19/QEFo78ZGY/0nIIMt8GGzh4xZvYuYBlw6kD73f0G4AYI\nZl47yHc7uJeLjCJ7h84+sD+Lju4Emxva2NHcyZbGdlZtbGBbcwc3vvc48vP0t5KLokwKW4DZaeuz\nwm19mNkbgU8Bp7p7Z4TxiIw72Xac6OhOsKF+D2s2N7K9uZPtLR3ct66eLY3tAx7f2tmzz4Q+Er2u\nnl72dPYE82/vaGVD/R42NbSxob6Vlo4ePnrmQi5cMiPSGKJMCo8BC81sHkEyuBi4NP0AMzsa+AFw\njrvviDAWkXGp/xPNnT0J6ls62d7cyZrNjaza2MDqzY282tRBIm0mnqrSQo49pIZLXzeHGdUlTKss\nZVpVCXet3caXbn+eXs3ac9Dcnca2bl5t6qC5o5vWjh5aOrup29HK1sYOGtu6aO9O0N6VYGdrF41t\nXezpSuxznsnlxSycUs7UKSUjkqgjSwru3mNmHwHuBPKBG919rZldB6xy9xXAfwHlwK/DYvAr7n5R\nVDGJjDfJgsJVt67mh3/dQN2OVjp7elP7Z1aXcti0Ct68dCaHTavgyBmVzKopo6hg4ObE0qJgbq2E\nGin24e40tHVT39JJfUsnje1dvNrYwc49nXR299LZk6CpvZvtzZ2pO/ueAZJrQZ4xtbKEmgmFlBUV\nUF1WxILacmomFFFdWkh5SQG1FcVMqyzhqJlVlBSO7HxnkbYpuPttwG39tl2btvzGKN+/z/uq056M\nQzVlRZQV5dPRnaCsKJ+3HTuLxTOrmFJZzPSqUhZNq0i1O2QjWfIY7yUFd6c74XQneulO9NKV6KWz\nu5em9m42N7Tx0s42Gtq6aG7vZltzBw17uti4q42m9u59zlWUn0dJYR7FhflUlhQwubyYc187nZqy\nQmrKiphRXZr6si8vLmBaVQllRaOiOXdAozeyiOjhNRlPqsoKWf2ZsynIs/368s8k2bg8HnKCu7O9\nuZNNDW1sb+5ge3Mnr+zaw/rtrTy/rZmGtn2/4NOVFOZRXlzItKpiasqKOH/xdOZPnsDUyhJqK4qp\nLitkemUpVWXjq+0l55KCyHhTmD98PcuTHY5Ge/WRu7Nh5x62NrbT2tFDW1eCjbv2sK2pg/rWTjbu\n3MOu1i5aOnv6vG5CUT6HTqvgzMOncsjEoBqtMD+PwoI8ivPzqC4rZFpVCfMmT6CiZHx92WdLSUFE\nUuKsPmrvSrCjpYOdrUG1TWN7F83tPbwcVtvsbO2ksb2bprYudrZ20drvCz8/z5hcXkRtRTGHTatg\nUnkxi6ZVMGdiGVMrS5hWWUJ1WeGwlKjGMyUFEUnZW300fEnB3elK9FLf0skru9t4dmszO1o62dEc\n3NU37Olm0+62fe7qk0oK85hYVsTkimKqSguZM7GMSROKOHx6BXMnTaCytJCSwnxmVJdQXDCyjbLj\nUc4khVFeGhYZFZIlhcR+lBTcnab2bjbtbufF+lbqdrTyyu42nt/WzM7WLprau/c5X1FBHrXlxUyt\nLGZaVQnHza1hSmUJUyqKqa0oprK0MNWIPrm8WA/SjaCcSQpJ+miJZJY3QEmhO9FLw54utjS2s6mh\nnbodrbza2M625g7qWzrZtLutT//6/DxjRnUJ8yeXs2zuRKpLC1Nf7lMrS1gyu5oaVeOMWjmXFEQk\ns/zwi/rjv1pNdVkRTe3dPLu1ma7E3mcfzKC2vJjp1aXMqinl+HkTmTOxjBnVpSycUs6smrLU8w4y\n9igpiEhKbUUxAE9vaeK1M6soKyrg7088hLmTJzCtsoSZNaUsqC3P+PCbjH1KCiKScvy8iTz+72+k\nuqxI9fg5SklBRPqYVF4cdwgSo5wrA6pxS0Qks5xLCiIikpmSgoiIpORMUtDDayIiQ8uZpJCkFgUR\nkcxyLimIiEhmSgoiIpKipCAiIik5kxQ0HaeIyNByJikk6dk1EZHMci4piIhIZkoKIiKSoqQgIiIp\nOZMU9ERcs0SuAAAIQ0lEQVSziMjQciYpJKmhWUQks5xLCiIikpmSgoiIpCgpiIhISs4kBbUzi4gM\nLWeSQpJp8GwRkYxyLimIiEhmSgoiIpKipCAiIik5kxRcjzSLiAwpZ5JCitqZRUQyijQpmNk5ZrbO\nzOrM7OoB9heb2S/D/Y+Y2dwo4xERkcFFlhTMLB+4HjgXOAK4xMyO6HfY5UCDu78G+Abw5ajiERGR\noUVZUjgeqHP3De7eBdwCLO93zHLgpnD5VuBMs2iGrPu/JzYDkKcR8UREMiqI8NwzgU1p65uB12U6\nxt17zKwJmATsTD/IzK4ArghXW81s3QHGNPm0L/c9dw6YDLrmHKBrzg0Hc82HZHNQlElh2Lj7DcAN\nB3seM1vl7suGIaQxQ9ecG3TNuWEkrjnK6qMtwOy09VnhtgGPMbMCoArYFWFMIiIyiCiTwmPAQjOb\nZ2ZFwMXAin7HrADeEy7/HfAX1wMFIiKxiaz6KGwj+AhwJ5AP3Ojua83sOmCVu68Afgz81MzqgN0E\niSNKB10FNQbpmnODrjk3RH7NphtzERFJyr0nmkVEJCMlBRERScmJpDDUcBtjiZndaGY7zOyZtG0T\nzexuM3sh/F0Tbjcz++/wuteY2TFpr3lPePwLZvaegd5rtDCz2Wa20syeNbO1ZvaxcPu4vW4zKzGz\nR81sdXjNnwu3zwuHhKkLh4gpCrdnHDLGzK4Jt68zszfFc0XZM7N8M3vSzP4Yro/razazjWb2tJk9\nZWarwm3xfbbdfVz/EDRyvwjMB4qA1cARccd1ENdzCnAM8Ezatq8AV4fLVwNfDpfPA24nGAbwBOCR\ncPtEYEP4uyZcron72ga55unAMeFyBbCeYOiUcXvdYezl4XIh8Eh4Lb8CLg63fx/4YLj8IeD74fLF\nwC/D5SPCz3wxMC/8W8iP+/qGuPYrgV8AfwzXx/U1AxuByf22xfbZzoWSQjbDbYwZ7n4/QU+tdOnD\nhdwEvDlt+/964GGg2symA28C7nb33e7eANwNnBN99AfG3V919yfC5RbgOYKn4cftdYext4arheGP\nA2cQDAkD+17zQEPGLAducfdOd38JqCP4mxiVzGwWcD7wo3DdGOfXnEFsn+1cSAoDDbcxM6ZYojLV\n3V8Nl7cBU8PlTNc+Zv9NwiqCownunMf1dYfVKE8BOwj+yF8EGt29JzwkPf4+Q8YAySFjxtQ1A98E\nPgn0huuTGP/X7MBdZva4BUP6QIyf7TExzIVkz93dzMZlP2MzKwf+D/gXd2+2tMENx+N1u3sCWGpm\n1cBvgUUxhxQpM7sA2OHuj5vZaXHHM4JOdvctZjYFuNvMnk/fOdKf7VwoKWQz3MZYtz0sQhL+3hFu\nz3TtY+7fxMwKCRLCz939N+HmcX/dAO7eCKwETiSoLkjezKXHn2nImLF0zScBF5nZRoJq3jOAbzG+\nrxl33xL+3kGQ/I8nxs92LiSFbIbbGOvShwt5D/D7tO1/H/ZYOAFoCoukdwJnm1lN2Kvh7HDbqBTW\nE/8YeM7dv562a9xet5nVhiUEzKwUOIugLWUlwZAwsO81DzRkzArg4rCnzjxgIfDoyFzF/nH3a9x9\nlrvPJfg7/Yu7X8Y4vmYzm2BmFcllgs/kM8T52Y675X0kfgha7NcT1Ml+Ku54DvJabgZeBboJ6g0v\nJ6hH/TPwAnAPMDE81ggmOnoReBpYlnaefyBogKsD3hf3dQ1xzScT1LuuAZ4Kf84bz9cNLAaeDK/5\nGeDacPt8gi+4OuDXQHG4vSRcrwv3z08716fCf4t1wLlxX1uW138ae3sfjdtrDq9tdfizNvn9FOdn\nW8NciIhISi5UH4mISJaUFEREJEVJQUREUpQUREQkRUlBRERSlBRkXDGz1qGPSh37WTP7xMGe38zm\nWtqotQciHClz8gG+9s1mdsTBvL9IkpKCyNj3ZoKRQUUOmpKCjHtmdmE43v6TZnaPmU1N273EzB4K\nx6B/f9prrjKzx8Ix6z+XxdsUmNnPzew5M7vVzMrC86RKAGa2zMzuDZcnmdldFsyV8COCh5KS7/1p\nC+YBeMDMbk6WZsxsgZndEQ6c9lczW2RmrwcuAv7LgvH4Fxzsv5fkNiUFyQUPACe4+9EEY+p8Mm3f\nYoIxdk4ErjWzGWZ2NsHQCMcDS4FjzeyUId7jMOC77n440Eww1v9gPgM84O5HEox3MwfAzI4D3gYs\nAc4FlqW95gbgn939WOAT4fv9jWDog6vcfam7vzjE+4oMSqOkSi6YBfwyHFisCHgpbd/v3b0daDez\nlQSJ4GSCsWOeDI8pJ0gS9w/yHpvc/cFw+WfAR4GvDnL8KcBbAdz9T2bWEG4/KYypA+gwsz9AaoTY\n1wO/ThsdtnjQqxY5AEoKkgu+DXzd3VeEQzJ/Nm1f/3FenKAq50vu/oP9eI+BzgPQw94Secl+nK+/\nPIJ5BZYexDlEhqTqI8kFVewdRrj/3LXLLZgPeRLBIGyPEYwu+Q/h3TlmNjMc634wc8zsxHD5UoIq\nKwimWjw2XH5b2vH3h8dhZucSTKEI8CBwYRhTOXABgLs3Ay+Z2dvD15iZLQlf00IwTanIQVNSkPGm\nzMw2p/1cSVAy+LWZPQ7s7Hf8GoKhmR8GPu/uW939LoI5gh8ys6cJpnoc6kt3HfBhM3uO4Av+e+H2\nzwHfsmBC9kTa8Z8DTjGztQTVSK8AuPtjBG0Eawjm4n2aYEYxgMuAy80sOaJmclrZW4CrwoZ0NTTL\nQdEoqSKjjJmVu3tr2IPpfuAKD+eoFoma2hRERp8bwofRSoCblBBkJKmkICIiKWpTEBGRFCUFERFJ\nUVIQEZEUJQUREUlRUhARkZT/D7B7GBi+oxdQAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "is_smplr = oasis.ImportanceSampler(alpha, data.preds, data.scores, oracle, max_iter=max_iter)\n", "is_smplr.sample_distinct(n_labels)\n", "plt_estimates(is_smplr, data.F1_measure)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.3" } }, "nbformat": 4, "nbformat_minor": 0 }