[Sem3] Generic feature extraction for text categorization

Student paper for Seminar 3

Student paper

Related work review of “Generic feature extraction for text categorization”.

Abstract

Extracting more informative text features improves the performance of text mining tasks, such as text categorization and sentiment analysis. Nevertheless researchers usually focus on the main part of their problem and just preprocess texts with well-known linguistic and custom methods, that are based on background knowledge about a given language, domain, and specific task. To improve and automate the initial preprocessing phase we propose a generic feature extraction method inspired by inductive programming. Beginning almost without any natural language processing knowledge it will heuristically try to define and combine elemental feature extractors until more complex and informative features are found.