Lexicon
Since our approach is based mainly on lexical cues, we need a lexicon of word pairs, which is used during the alignment rocess. The lexicon can be prepared manually within CSV or XLS file and updated with new word pairs as necessary. owever, n order to generate as many word pairs automatically, we singled out a collection of reasonably aligned bitexts, and proceeded s follows:
1- Index all the words in the source and target segments within each document
2- Create inverted index for each word in the source and target
3- For each source word, identify a sample of the segments where it occurs, and the words in the corresponding target segments
4- Identify the most corresponding target word as follows: