Exploiting Monotonicity Constraints in Active Learning

Soons, P.A.

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Feelders, AJ
dc.contributor.author	Soons, P.A.
dc.date.accessioned	2018-07-19T17:04:35Z
dc.date.available	2018-07-19T17:04:35Z
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/29564
dc.description.abstract	If one wants to build a classi?er, but the available data is not classi?ed, active learning can be a good solution. If the data in question has the monotonicity property, we can classify much more data using fewer queries. The data is said to be monotone if the class label is known to be increasing or decreasing in the attributes. The goal of active learning with monotonicity constraints is to use queried and monotonically inferred data to build a classi?er. A central problem in active learning is ?nding the best query points. It has been suggested to use the concept of lower sets to search in the partially ordered sets (posets) that represent the dataset [3], [2], [8]. We show that this concept can also be used to select good query points in active learning with monotonicity constraints. Because counting lower sets becomes very costly quickly, it's hard to test this hypothesis on general posets. We provide a dynamic programming algorithm that counts all lower sets in linear time for the special case of a matrix order. We then show that this query strategy is at most a logarithmic factor worse than the optimal query strategy at classifying the whole dataset. We contrast this strategy with a simple but good known greedy heuristic [1]. Testing shows that the lower-set-count strategy outperforms the greedy heuristic in classifying the whole dataset, but is worse when an a priori set maximum number of queries is made. Tests are performed on arti?cial and real datasets and the heuristics are judged based on both the quantity of inferred class labels and their quality, by means of building a classi?er. These tests lead us to conclude that the greedy heuristic is very good and that the lower-set-count strategy is not a promising one for general posets. The latter conclusion is reached by reasoning that approximating lower set counts, which would be needed, will nullify the marginal edge over the greedy strategy it might have at best. The most challenging part of creating arti?cial data is generating random monotone classi?cation functions. We show how to do this without generating all of them using the Propp-Wilson algorithm. Furthermore, concerning datasets with more than two class labels, we show that when building a classi?er after a restricted number of queries, the classi?er can be improved substantially by using partially classi?ed data entries in addition to the fully classi?ed data.
dc.description.sponsorship	Utrecht University
dc.format.extent	845651
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.title	Exploiting Monotonicity Constraints in Active Learning
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	Datamining,Active Learning,Posets,Monotonicity
dc.subject.courseuu	Applied Computing Science

Files in this item

Name:: Pieter Soons - Exploiting ...
Size:: 825.8Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record