Dealing with Sparse Data. In column 3 we exclude all words that occur in fewer than 20 reviews (this is 1 % of the total), thereby eliminating all words for which there is insufficient data to estimate the correlation with sentiment accurately. Excluding very rare words also makes sense if we are restricted to a limited number of feature words for the sentiment analysis. (At this stage we also excluded all tokens of length less than three to remove punctuation marks or short words. This change had no impact on these lists). Most of the found words in this list seem to be sensible in the context of sentiment analysis. Only 'seagal' (making a reappearance) and 'freddie' stand out from the negative list - and it might be argued that the allegedly poor quality of Steven Seagal and Freddie Kruger films is such that
Sweetback - Stage 2 2004.zip
2ff7e9595c
Comments