Given a data set D, such that (Xi,yi) ∈ D, yi ∈ ℝ, we are interested in first dividing the range of yi, i.e. (ymax − ymin), (where ymax is the maximum of all yi corresponding to (Xi,yi) ∈ D and ymin is the minimum of all yi corresponding to (Xi,yi) ∈ D), into contiguous ranges which can be thought of as classes and then for a new point, Xj, predicting which range (class) it falls into. The problem is difficult, because neither the size of each range nor the number of ranges, is known a-priori.
This was a practical problem that arose when we wanted to predict the execution time of a query in a database. For our purposes, an accurate prediction was not required, while a time range was sufficient and the time ranges were unknown a-priori.
To solve this problem we introduce a binary tree structure called Class Discovery Tree. We have used this technique successfully for predicting the execution times of a query and this is slated for incorporation into a commercial, enterprise level Database Management System.
In this paper, we discuss our solution and validate it on two more real life data sets. In the first one, we compare our result with a naive approach and in the second, with the published results. In both cases, our approach is superior.