Course Details

TEXT MINING

ECE544

Course Information
SemesterCourse Unit CodeCourse Unit TitleT+P+LCreditNumber of ECTS Credits
1ECE544TEXT MINING3+0+037,5

Course Details
Language of Instruction English
Level of Course Unit Master's Degree
Department / Program ELECTRICAL AND COMPUTER ENGINEERING
Type of Program Formal Education
Type of Course Unit Elective
Course Delivery Method Face To Face
Objectives of the Course (1) Recall the basics of probability concepts
(2) Learn the fundamental notions of data & text mining
(3) Discuss essential NLP approaches for free-text data structures
(4) Apply machine learning algorithms for text mining problems
Course Content Intro. To Data & Text Mining, Why Text Mining, Issues and difficulties in Text Mining
Basic text processing commands in Unix like operating systems & regular expressions
Recap basic probabilities, n-gram language models, perplexity, and smoothing techniques
n-gram model interpolation and backoff, Naïve Bayes algorithm for text classification
Introduction to named entity recognition, information extraction
Conditional vs generative models, maximum entropy models for named entity recognition
Part-of-speech tagging using maxent models, rel. extraction (supervised, distant supervision)
Intro. to parsing, PCFGs, CNF, CKY algorithm & issues with PCFGs, Lexicalized PCFG
Dependency parsing, arc-eager parser, Malt parser, relation extraction through dependency structure
Lexical semantics, synonymy/homonymy/polysemy, word sense disambiguation
Word similarity, term-document matrices, tf-idf weighting, vector space model
Intro. to open-source text mining libraries (NLTK, spaCy -in python), building a model for prediction models
Course Methods and Techniques
Prerequisites and co-requisities None
Course Coordinator None
Name of Lecturers Asist Prof.Dr. MEHMET GÖKHAN BAKAL gokhan.bakal@agu.edu.tr
Assistants None
Work Placement(s) No

Recommended or Required Reading
Resources


Planned Learning Activities and Teaching Methods
Activities are given in detail in the section of "Assessment Methods and Criteria" and "Workload Calculation"

Assessment Methods and Criteria
Veri yok

 
ECTS Allocated Based on Student Workload
Activities Quantity Duration Total Work Load
Ev Ödevi 1 2 2
Sunum için Hazırlık 1 10 10
Teslim İçin Hazırlık 1 5 5
Sunum 1 3 3
Proje 1 5 5
Kısa Sınav 1 1 1
Okuma 1 3 3
Araştırma 1 5 5
Kişisel Çalışma 1 10 10
Yüz Yüze Ders 1 3 3
Asenkron Ders 2 2 4
Total Work Load   Number of ECTS Credits 1,5 51

Course Learning Outcomes: Upon the successful completion of this course, students will be able to:
NoLearning Outcomes
1 Express foundations of data mining and text mining concepts
2 Discuss computational text mining tasks including document classification & clustering, sentiment analysis, document summarization, and information extraction
3 Apply text processing experiments such as tokenization, named entity recognition, part-of-speech tagging for text classification
4 Perform sentence level mining including parsing, different parser approaches, lexical semantics and other methodologies
5 Use open-source NLP libraries (NLTK, spaCy, etc.) for building an NLP system to solve a real-world problem and to present the designed model


Weekly Detailed Course Contents
WeekTopicsStudy MaterialsMaterials
1 Intro. To Data & Text Mining, Why Text Mining, Issues and difficulties in Text Mining
2 Basic text processing commands in Unix-like operating systems & regular expressions
3 Introduction to probabilities, n-gram language models, perplexity, smoothing techniques
4 n-gram model interpolation and backoff, Naïve Bayes algorithm for text classification
5 Introduction to named entity recognition, relation extraction, and other forms of information extraction
6 Conditional vs generative models, maximum entropy models for named entity recognition
7 Part-of-speech tagging using maxent models
8 Introduction to parsing, PCFGs, CNF, CKY algorithm
9 Dependency parsing, arc-eager parser, Malt parser, relation extraction through dependency structure
10 Lexical semantics, synonymy/homonymy/polysemy, word sense disambiguation
11 Word similarity, term-document matrices, tf-idf weighting, vector space model
12 Intro. to open-source text mining libraries (NLTK, spaCy -in python) for text mining tasks
13 Overall Semester Recap
14 Final Project Presentations
15
16


Contribution of Learning Outcomes to Programme Outcomes
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11
C1
C2
C3
C4
C5

Contribution: 1: Very Slight 2:Slight 3:Moderate 4:Significant 5:Very Significant


https://sis.agu.edu.tr/oibs/bologna/progCourseDetails.aspx?curCourse=77735&lang=en