The genetic algorithm calculated the weights wij for each term. Initially I assigned random real valued weights (between 0 and 1).
The GA computed the best weights using the training dataset. Then I classified each test page by computing the cosine similarity of the vector (weights) learned by the GA with the normalized tf vector of the test pages.
If the similarity is above the threshold, I classified the page as positive. Otherwise it is labeled as negative.
Fitness of a chromosome is computed by calculating the predictive accuracy of the chromosome.
In order to understand my code I advise you to read some introductory books or papers about vector space model of information retrieval domain, and classification from the data mining domain.
Because all the things necessary for the computations are explained in detail in the paper.
Özel, Selma Ayşe. “A web page classification system based on a genetic algorithm using tagged-terms as features.” Expert Systems with Applications38.4 (2011): 3407-3415.
I think your site will help a lot of students to do their projects better.
It was professional. I would like to appreciate your work.