MPS: improving exact string matching through pattern character frequency

Sahota, V., Maozhen, L. and Bayford, R. (2013) MPS: improving exact string matching through pattern character frequency. Journal of Data Processing, 3 (3). pp. 127-137. ISSN 2278-6481.

Full text not available from this repository.

Abstract

One of the initial hurdles in taking advantage of big data is the ability to quickly analyze and establish its relevance. Intrinsic to this big data analysis problem is the need for tools to scale well in proportion to the growth of data and have the flexibility to operate across disparate datasets. Exact string matching is a fundamental tool used to search through data, though many existing algorithms do not scale well since their computational cost occurs and grows during reading of data. A proof of concept is presented in this paper which transfers all the required calculations to a pre-processing stage, removing all calculations during the reading stage; creating a search trie which incorporates the statistical distribution of characters in the search pattern to reduce overall calculation and lookups in a search. The resulting algorithm produces a total calculation costs which is independent of data (source) size. Preliminary results shows the new algorithm out performing existing general algorithms, as the search pattern becomes large for natural English text and when searching a small alphabet source (DNA).

Item Type: Article
Subjects: Q Science > QA Mathematics > QA0075 Electronic computers. Computer science
Divisions: Faculty of Social and Applied Sciences > School of Law, Criminal Justice and Computing
Related URLs:
Depositing User: Vijay Sahota
Date Deposited: 07 Jul 2015 10:22
Last Modified: 07 Jul 2015 10:23
URI: https://create.canterbury.ac.uk/id/eprint/13533

Actions (login required)

Update Item (CReaTE staff only) Update Item (CReaTE staff only)

Downloads

Downloads per month over past year

View more statistics

Share

Connect with us

Last edited: 29/06/2016 12:23:00