Faculty Sponsor: Pavel Oleinikov
Live Poster Session: Zoom Link
Abstract: A binarized document-term matrix (DTM) can be treated as a black-and-white image. Because the ordering of the terms (columns) and documents (rows) in DTMs is arbitrary, clustering can be viewed as swapping (permutations) of rows and columns, aimed to place the black pixels together in larger blocks. Traditional clustering algorithms rely on sequential convergence and do not parallelize well. Many algorithms require a fixed number of clusters as input. However, a Monte Carlo Tree Search is agnostic to the number of clusters, and as the search tree grows automatically, the Monte Carlo Tree Search can guarantee convergence to the optimal solution (minimax). Based on its design, a Monte Carlo Tree Search will focus on exploring “more promising moves”. If it finds a good move, the Monte Carlo Tree Search will explore it deeply. This can be seen as a combination of breadth-first search and depth-first search, similar to heuristic search. This project will experiment with the Monte Carlo Tree Search for the generation of permutation matrices (random draws vs incremental improvements), heuristics for handling the high dimensionality of DTMs, and methods of parallelizing the search for the best solution. We are interested in parallel clustering algorithms like an image that can utilize GPUs.
YifeiPavel_final