Please use this identifier to cite or link to this item:
http://bura.brunel.ac.uk/handle/2438/23010
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Ayad, LAK | - |
dc.contributor.author | Badkobeh, G | - |
dc.contributor.author | Fici, G | - |
dc.contributor.author | Héliou, A | - |
dc.contributor.author | Pissis, SP | - |
dc.date.accessioned | 2021-07-28T14:40:38Z | - |
dc.date.available | 2021-07-01 | - |
dc.date.available | 2021-07-28T14:40:38Z | - |
dc.date.issued | 2021-12-14 | - |
dc.identifier.citation | Ayad, L.A., Badkobeh, G., Fici, G. et al. Constructing Antidictionaries of Long Texts in Output-Sensitive Space. Theory Comput Syst 65, 777–797 (2021). | en_US |
dc.identifier.issn | 1432-4350 | - |
dc.identifier.issn | 1433-0490 | - |
dc.identifier.uri | http://bura.brunel.ac.uk/handle/2438/23010 | - |
dc.description.abstract | A word x that is absent from a word y is called minimal if all its proper factors occur in y. Given a collection of k words y1, … , yk over an alphabet Σ, we are asked to compute the set M{y1,…,yk}ℓ of minimal absent words of length at most ℓ of the collection {y1, … , yk}. The set M{y1,…,yk}ℓ contains all the words x such that x is absent from all the words of the collection while there exist i,j, such that the maximal proper suffix of x is a factor of yi and the maximal proper prefix of x is a factor of yj. In data compression, this corresponds to computing the antidictionary of k documents. In bioinformatics, it corresponds to computing words that are absent from a genome of k chromosomes. Indeed, the set Myℓ of minimal absent words of a word y is equal to M{y1,…,yk}ℓ for any decomposition of y into a collection of words y1, … , yk such that there is an overlap of length at least ℓ − 1 between any two consecutive words in the collection. This computation generally requires Ω(n) space for n = |y| using any of the plenty available O(n) -time algorithms. This is because an Ω(n)-sized text index is constructed over y which can be impractical for large n. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when ∥M{y1,…,yN}ℓ∥=o(n), for all N ∈ [1,k], where ∥S∥ denotes the sum of the lengths of words in set S. For instance, in the human genome, n ≈ 3 × 109 but ∥M{y1,…,yk}12∥≈106. We consider a constant-sized alphabet for stating our results. We show that allMy1ℓ,…,M{y1,…,yk}ℓ can be computed in O(kn+∑N=1k∥M{y1,…,yN}ℓ∥) total time using O(MaxIn+MaxOut) space, where MaxIn is the length of the longest word in {y1, … , yk} and MaxOut=max{∥M{y1,…,yN}ℓ∥:N∈[1,k]}. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution. | en_US |
dc.description.sponsorship | Università degli Studi di Palermo | en_US |
dc.format.extent | 777 - 797 | - |
dc.language.iso | en | en_US |
dc.publisher | Springer | en_US |
dc.subject | Absent word | en_US |
dc.subject | Antidictionary | en_US |
dc.subject | String algorithm | en_US |
dc.subject | Output sensitive algorithm | en_US |
dc.subject | Data compression | en_US |
dc.title | Constructing Antidictionaries of Long Texts in Output-Sensitive Space | en_US |
dc.type | Article | en_US |
dc.identifier.doi | http://dx.doi.org/10.1007/s00224-020-10018-5 | - |
dc.relation.isPartOf | Theory of Computing Systems | - |
pubs.issue | 5 | - |
pubs.publication-status | Published | - |
pubs.volume | 65 | - |
dc.identifier.eissn | 1433-0490 | - |
Appears in Collections: | Dept of Computer Science Research Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
FullText.pdf | 948.51 kB | Adobe PDF | View/Open |
Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.