Regular coding partitions

icon

8

pages

icon

English

icon

Documents

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

icon

8

pages

icon

English

icon

Documents

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

Regular coding partitions
y yMarie-Pierre Beal Fabio Burderi Antonio Restivo
June 11, 2006
Abstract
The canonical coding partition of a set of words is the nest partition such that the words contained
in at least two factorizations of a same sequence belong to a same class. In the case the set is not
uniquely decipherable, it partitions the set into one unambiguous class and other parts that localize
the ambiguities in the factorizations of nite sequences.
We prove that the canonical coding partition of a regular set contains a nite number of regular
classes. We give an algorithm for computing this partition.
1 Introduction
In this paper, we call code a set of nite words. An important class of codes is the class of uniquely
decipherable codes. This property allows the decoding of a sequence of concatenated codewords. Never-
theless, some classes of codes are used in information theory although they are not uniquely decipherable
(see for instance [6], [7] and [8]). The condition of unique decipherability can also be weakened by consid-
ering that it applies only to codes with constraints (see [1]) or to codes with a constraint source (see [4],
[5]). In [5], the classi cation of ambiguities of codes is investigated in the study of natural languages.
From a combinatorial point of view, the study of ambiguities helps to understand the structure of a code.
To this purpose, the notions of coding partition and canonical coding partition of a code were intro-
duced in [3] to ...
Voir icon arrow

Publié par

Langue

English

1
Regular coding partitions
 Marie-PierreBeal
Fabio Burderi
June 11, 2006
Antonio Restivo
Abstract Thecanonicalcodingpartitionofasetofwordsisthe nestpartitionsuchthatthewordscontained in at least two factorizations of a same sequence belong to a same class. In the case the set is not uniquely decipherable, it partitions the set into one unambiguous class and other parts that localize theambiguitiesinthefactorizationsof nitesequences. Weprovethatthecanonicalcodingpartitionofaregularsetcontainsa nitenumberofregular classes. We give an algorithm for computing this partition.
Introduction
Inthispaper,wecallcodeasetof nitewords.Animportantclassofcodesistheclassofuniquely decipherable codes. This property allows the decoding of a sequence of concatenated codewords. Never-theless, some classes of codes are used in information theory although they are not uniquely decipherable (see for instance [6], [7] and [8]). The condition of unique decipherability can also be weakened by consid-ering that it applies only to codes with constraints (see [1]) or to codes with a constraint source (see [4], [5]).In[5],theclassi cationofambiguitiesofcodesisinvestigatedinthestudyofnaturallanguages. From a combinatorial point of view, the study of ambiguities helps to understand the structure of a code. To this purpose, the notions of coding partition and canonical coding partition of a code were intro-duced in [3] to study some decipherability conditions for codes weaker than the unique decipherability. In the case the code is not uniquely decipherable, the code is partitioned into one unambiguous class, andotherpartsthatlocalizetheambiguitiesinfactorizationsof nitesequences.Inthecasethecodeis uniquely decipherable, the canonical coding partition is trivial and contains a unique class equal to the code itself. In [3] is also given a Sardinas-Patterson like algorithm for computing the canonical coding partitionofa nitecode. Inthispaper,weprovethatthecanonicalcodingpartitionofaregularcodehasa nitenumberof classes, each one being regular. This result was conjectured in [3]. We give an exponential time algorithm for computing all classes of the partition which is based on automata constructions. In Section 2 we recall the notions of coding partition and canonical coding partition of a code intro-duced in[3]. The main result is proved in Section 3. We end the article with some concluding remarks.
2
Partitions of a code
+ LetAybeto ainebabphalteened.WetAalphabetosevtrehinetowdrt fotesehA, and byAthe  setofnonempty nitewords.AcodeXis here a subset ofAelements are called. Its codewords.  InstitutGaspard-Monge,LaboratoiredinformatiqueUMR8049,UniversitedeMarne-la-Vallee,77454Marne-la-Vallee Cedex 2, France. Dipartimento di Mathematica ed Applicazioni, Universita degli studi di Palermo, Via Archira 34, 90123 Palermo, Italy.
33
34
Regular coding partitions
A codeXis said to beuniquely decipherable(UD) if every word has at most one factorization into codewords,i.e.the equality x1x2. . . xn=y1y2. . . ym, xi, yjX, impliesn=mandxi=yifor 1inrefer to [2] for the theory of uniquely. We decipherable codes. In [3] is introduced the notion of coding partition of a codeXgroups the codewords contained in. It at least two factorizations of a same word in the case whereXis not UD. LetIdeneithotesin ehtreteset{1,2, . . . , r}, whereris a positive integer, orNpartition. A + + P= (Xi)iIof a codeXisconcatenatively independentif, fori6=j,XX=. IfPis concatenatively i j + + independent, aP-factorizationof a wordzXis a factorizationz=z1z2. . . znwhere eachziX k + + for somekI, andziXimplieszi+1/ Xifi < n. k k Acoding partitionof a codeXis a concatenatively independent partition such that each wordzin + Xhas at exactly oneP-factorization,i.e.the equality
z=x1x2. . . xn=y1y2. . . ym,
wherexi, yjareP-factorizations ofz, impliesn=mandxi=yifor 1in. + We say that a wordzXhas aminimal 2-factorizationin codewordsx1, x2, . . . xn, y1, y2, . . . , ym 0 ifz=x1x2. . . xn=y1y2. . . ymwith (xi)1in6= (yj)1jm, and there are not positive integersn < n, 0 0 andmm < , such thatx1x2. . . xn=y1y2. . . ym. 0 0 Thecanonical coding partitionofXtsehitpar nesontitiP= (Xi)iIsuch that whenever a word + zXhas a minimal 2-factorization in codewordsx1, x2, . . . xn, y1, y2, . . . , ym, these codewords belong to the same set of the partition, and such that all words not contained in any minimal 2-factorization belong to the same set of the partition. Thecanonicalcodingpartitioniswellde ned,uptoabijectionofI. We can chooseIsuch thatX0 + is the set of all codewords that never appear in any factorization of a word ofXwhich has at least two factorizations. It is straightforward to see that the canonical coding partition is a coding partition (see [3]). In [3] is given a Sardinas-Patterson like algorithm for computing the canonical coding partition of a nite codeX. Example1.We consider the codeX={00,0010,1000,11}. Clearly the words 00,0010 and 1000 belong to a same set in the canonical coding partition since
001000 = 00
1000 = 0010
00
is a minimal 2-factorization. The canonical partition ofXis{00,0010,1000},{11}. Note thatXis not a UD code. Example2.The canonical coding partition of a UD code has a single set, the code itself.
3
Coding partition of a regular code
In this section, we consider a regular codeX. We say that a coding partition of a code is ntienidoghtyacatatssees.Wmbnuoferinetsa aifhsi partition of a code isregularif all sets of the partitions are regular. Proposition1.nacecinohTnoiorefapaalitrtinetie scrdougalar.egulandr Inordertoprovethisproposition,wegiveanalgorithmforcomputingthe niteautomataaccepting thesetsofthepartitionfroma niteautomatonacceptingthecodeX. A niteautomatonA= (Q, I , E, Tatstestaesofteafoetin i)damsQ, a set of edgesElabelled on an alphabetA, a set of initial statesIf tosedaansatetanslT. We shall also consider automata
M-P.Beal,F.Burderi,A.Restivo
35
 labelled inA. Asuccessful pathis a path going from a state ofIto a state ofT. The set of labels of successful paths is thelanguage acceptedby the automaton. An automaton isunambiguousif for any wordz, any statesp, q, there is at most one path going from ptoqand labelled byz. 0 0 0 0 LetA= (Q, I , E, Teb)in auaetotamehuaotnton.toma netWedeA  A= (I , E , Q , T) called a a a 0 0 0 0 0 0 thesquareofA, whereQ=QQ,E={(p, q)(qp , )|ppandqqE}set of initial. The   0 0p stateIesehtdnastetalsnaf toTstate (later. A will be speci ed p, q.) will be also denoted by q Proof of Proposition 1.LetA= (Q, I , E, Tcode)eb ainetgubiamunomutsaoueccanotaehtgnitpX such thatI={i},T={t}, and which has no edge coming iniand no edge going out oft. Such an automaton, called a normalized automaton, can be obtained by standard constructions (see for instance [2]). By mergingiandtinto a single state denoted by 0, we get an automatonB= (Q,0, E,0) accepting  the setXthat. Note Bis no more unambiguous unlessXis UD.       0 0 0 We build the square automatonBB  and such thatand replace the state by two states 0 0s0t           0 0 0 0 0 the edges going out of go out of and the edges coming in come in . Note that has 0 0s0 0t0s   0 no incoming edges and has no outgoing edges. We only keep inBB  the states belonging to paths 0t       0 0p from to and going at least one time through a state withp= 0, q6= 0 orp6= 0, qBy= 0. 0s0t q   p using the state-elimination technique (see for instance [9]), we remove the states withpandqdistinct q       0 0p from 0 and get an automatonClabelled in regular subsets ofAand with, , whose states are 0s0t q p= 0, q6= 0 orp6= 0, qThere is at most one edge between two states and each label is a regular= 0.  non-empty subset ofA.     p p States withp= 0 are called upper-zero states while states withq= 0 are called lower-zero q q     0 0 states. Hence and are both upper and lower-zero states. 0s0t  0   0p p p p We denote byE0the regular set label of the edge0a slight abuse of language,. With q q q q   0p p we sometimes say that there is an edge labelled by a wordwto statefrom a state 0whenever q q  0p p wE0. q q Letpi, qi, pj, qjbe states inQwithqiandqjdistinct from 0. Lete, fbe the edges         0qiqj0 e=andf=pi0 0pj
(i.e.respectively an edge from an upper-zero state to a lower-zero state and an edge from lower-zero state to an upper-zero state). We denote by        qiqjqiqj Lto withthe regular set of labels of paths from all its states being lower-zero states. 0 0 0 0        qiqjqiqj Sto withthe union of the labels of all edges contained in a path from all its states 0 0 0 0 being lower-zero states.       qiqjqiqj Note that we may haveqi=qj. In this case,Lcontains the empty word andSmay be the 0 0 0 0 empty set. Wede netheregularsets             0qiqiqjqj0qiqj Y=ELE+S , pi0 0 0 0pj0 0 Yifpi= 0, pj= 0,    0qi Y+Eifpi= 0, pj6= 0, 0s0 Sef=  qj0 Y+Eifpi6= 0, pj= 0, 0 0t        0qiqj0 Y+E+Eifpi=pj= 0, 0s0 0 0t
36
Regular coding partitions
where the symbol + is the union symbol and the dot symbol is the concatenation symbol. Letpi, qi, pj, qj, pk, qkbe states inQwithqi, qj, pj, pkLetdistinct from 0. e, f, gbe the edges             0qiqj0 0qk e=, f=andg=. pi0 0pjpk0
Wede ne
the regular set    0qi Sef g=E pi0
   qiqj L 0 0
      qj0qj0 E+E 0pj0pj
   0 0 L pjpk
   0qk E . pk0
Wede nesimilarsetsSefandSef gwhene, gare edges from a lower-zero state to an upper-zero state andfis an edge from an upper-zero state to a lower-zero state, by exchanging the roles played by the upper and lower states. Wegeta nitenumberofregularsubsetsofX. Some of these states may have a nonempty intersection. Wereplacetwopartshavinganon-emptyintersectionbytheirunion.Aftera nitenumberofstepswe geta nitenumberofregularsubsetsofXwhose two by two intersections are empty. We denote these S r sets byX1, X2, . . . , Xr.tseheten edeWX0=XXiclaim that (. We Xi)0iris the canonical i=1 coding partition ofX, which proves the proposition. To prove our claim, we show that any two words which belong to a same minimal 2-factorization belong to a same setXi. Letz=x1x2. . . xn=y1y2. . . ymbe a minimal 2-factorization wherexi, yjare codewords. The existence of such a factorization is equivalent to the existence of a path inC:  (e1)   (e2)   (e3)   (e4)  0q01q0j0 0q11q1j0 0 1  →. . . →. . . →. . . →. . . 0s0 0p11p1i0 0p21 1 (ek2)   (ek1)   (ek)  0 0qr1qrj0 r  →. . . →. . . →. pr1pri0 0 0t r In this path, we denote byeithe edges going from an upper-zero state to a lower-zero one or the converse. Note that this path encodes two paths in the automatonAis read on the upper track, the other one. One on the lower track. The label of any path read on the upper (or lower track) going from 0 to 0 without going through 0 inbetween belongs toX. Hence
i1+  +ir+ 1 =n j0+j1+  +jr+ 1 =m.
Byrenumberingthelowercoecientspijof the upper-zero states of this pathp1topn, and the upper coecien tsqijof the lower-zero states of this pathq1toqm, the label of each part of this path going     0 0 from a state to a state is labelled byxi. The label of each part of this path going from a state pi1pi     qj1qj to a state is labelled byyj. 0 0 Bythede nitionofthesetsSeiei+1and the setsSeiei+1ei+2, we get that allxiand allyjbelong to a same part of the canonical coding partition. Conversely, we prove that if two wordsxandybelong to a same set of the partition, then there isa nitechainofwordsx=w0, w1, . . . , wn=ysuch thatwiandwi+1belong to a same minimal 2-factorization for 0i < n.    0q1q2 Letq1,q2be two non null states inQdrowowtfstsrse .Wtihawthoy, yS, then there is 0 0 0 a nitechainofwordsy=w0, w1, . . . , wn=ysuch thatwiandwi+1such thatwiandwi+1belong to a same minimal 2-factorization for 0i < n.    0q1q20 0 0 0 0 Sincey, yS, there are inCtwo paths labelledxyzandx y z, with, z,x, x A, containing 0 0 0 respectively an edge labelled byyand an edge labelled byy, with the following form:     q x y q2 z 1q11q12 → → →, 0 0 0 0       0 0 0 q x0y0z q2 1q q 11 12 → → →. 0 0 0 0
Voir icon more
Alternate Text