Follow the Diagonals: Finding String Matches through Matrix Operations

Faculty Sponsor: Professor Pavel Oleinikov

Rome Duong
Rome Duong

Rome Duong is a rising senior (’23) majoring in Computer Science. He grew up in Siem Reap, Cambodia and came to the US to finish high school at Culver Military Academy in Culver, IN. His interests are cycling, doodling, and binging motorsports. After Wesleyan, he is interested in pursuing a career as either a Software Engineer or a Front-End Engineer.
 

Abstract: Pairwise comparisons produce a diagonal line of matches and are originally used in homology. This method is extended to text similarity detections in programming languages as well as large collections of texts. The tokenization of texts allows dynamic programming such as Global Alignment to detect similar texts while running on either the CPUs or the GPU. An alternate approach is combining the visual analysis method with the tokenization of texts. This image-processing pipeline also utilizes Convolutional Neural Networks for the detection of texts in large batches with the consumption of GPU resources. After running the Global Alignment model and image-processing pipeline model on both GPU and CPU, the image-processing pipeline model ran faster while returning all of the string matches.

RomeDuongQAC