Ph.D. Defense | Similarity Algorithms for Embeddable Objects

Sep 24 2019 03:00 PM - Sep 24 2019 05:00 PM


The need to measure similarity between two objects is everywhere. Similarity between two objects is generally defined as an inverse function to the distance between them.
First, we propose the first non brute force algorithm to calculate the Gromov hyperbolicity constant. We present several approximate and exact algorithms to solve this problem. For example, we  provide an exact algorithm to compute the hyperbolicity constant $\delta$ in time $\bigO{n^{3.688}}$ for a discrete metric space. We also show that hyperbolicity at a fixed base-point cannot be computed in $O(n^{2.05})$ time, unless there exists a faster algorithm for (max,min) matrix multiplication than
currently known.
Then, we present a new system to find proteins similar in functionality. We employ text mining techniques to map text similarity to similarity in functionality. We use manually curated data from Swiss-Prot to train and build our system. The result is a search engine that given a query protein, reports the top similar proteins in functionality with 99\% accuracy. The system is tested extensively using GO annotations. We used this system, that predicts similarity in function, to enhance protein annotations. In particular, we were able to predict that some GO annotations should be added to some proteins. After careful literature reviews we were able to confirm many of those predictions, for example, in one case, we have 96\% prediction accuracy.
We also present a new algorithm for measuring the similarity between GPS traces. Our algorithm is robust against subsampling and supersampling. We perform experiments to compare this new similarity measure with the two main approaches that have been used so far: Dynamic Time Warping (DTW) and the Euclidean distance and our algorithm outperforms both of them in most of the cases.