Faculty Sponsor: Professor Pavel Oleinikov
Abstract: Pairwise comparisons produce a diagonal line of matches and are originally used in homology. This method is extended to text similarity detections in programming languages as well as large collections of texts. The tokenization of texts allows dynamic programming such as Global Alignment to detect similar texts while running on either the CPUs or the GPU. An alternate approach is combining the visual analysis method with the tokenization of texts. This image-processing pipeline also utilizes Convolutional Neural Networks for the detection of texts in large batches with the consumption of GPU resources. After running the Global Alignment model and image-processing pipeline model on both GPU and CPU, the image-processing pipeline model ran faster while returning all of the string matches.
RomeDuongQAC