Course Content
|
Intro. To Data & Text Mining, Why Text Mining, Issues and difficulties in Text Mining
Basic text processing commands in Unix like operating systems & regular expressions
Recap basic probabilities, n-gram language models, perplexity, and smoothing techniques
n-gram model interpolation and backoff, Naïve Bayes algorithm for text classification
Introduction to named entity recognition, information extraction
Conditional vs generative models, maximum entropy models for named entity recognition
Part-of-speech tagging using maxent models, rel. extraction (supervised, distant supervision)
Intro. to parsing, PCFGs, CNF, CKY algorithm & issues with PCFGs, Lexicalized PCFG
Dependency parsing, arc-eager parser, Malt parser, relation extraction through dependency structure
Lexical semantics, synonymy/homonymy/polysemy, word sense disambiguation
Word similarity, term-document matrices, tf-idf weighting, vector space model
Intro. to open-source text mining libraries (NLTK, spaCy -in python), building a model for prediction models
|