Syllabus: Introduction to Stringology
Instructor: Guan-Shieng Huang (黃光璿)
Office: Room 314, Science and Technology Building 3
Office Hours: 2:00~5:00 pm, Friday
Web Site: http://staffweb.ncnu.edu.tw/shieng/
Email: shieng@ncnu.edu.tw
Time and Location:
Room 119, Science and Technology Building 3
2:00~5:00 pm, Thursday
Teaching Assistant: C. S. Wu (吳展碩)
Office: Room 106, Science and Technology Building 3
Email: s2321504@ncnu.edu.tw
Text Book:
Jewels of Stringology, M. Crochemore and W. Rytter, World Scientific, 2003, ISBN: 9810248970. (Local Regional Office)
References:
Algorithms on Strings, Trees, and Sequences, Dan Gusfield, Cambridge University Press, 1997.
Prerequisites:
Data Structures and Algorithms
Discrete Mathematics
Description:
This course emphasizes essential string processing algorithms and techniques. We introduce it from both practical and theoretical viewpoints. Classical string processing algorithms such as KMP and Boyer-Moore algorithms will first be introduced. The theoretical analyses of these algorithms will come together, too. Then we will turn to the most powerful data structure, the suffix tree, which has been demonstrated to have many important applications in string processing and in bioinformatics. We will introduce Ukkonen’s and McCreight’s versions of linear-time algorithms to construct suffix trees. Basic combinatorics on strings will also be discussed in this course. Parts of the automata approach will also be included. The following is the outline of the course:
1. Stringology
2. Basic string searching algorithms
3. Preprocessing for basic searchings
4. On-line construction of suffix trees
5. More on suffix trees
6. Subword graphs
7. Test algorithms related to sorting
8. Symmetries and repetitions in texts
9. Constant-space searchings
10. Text compression techniques
11. Automata-theoretic approach
12. Approximate pattern matching
13. Matching by duel and sampling
14. Two-dimensional pattern matching
15. Two-dimensional periodicities
16. Parallel text algorithms
17. Miscellaneous
Grading Policy:
Every student taken this course should follow three rules: hand in your homework on time, do it by yourself, and code programs by your own effort. Cheating is absolutely disallowed and no late homework will be accepted. The distribution of grading is as follows:
50% Homework (about 8 assignments)
50% Paper Reports
There are about eight homework assignments, including paper work and programming assignments. As for the programming languages, C, C++ and Java are preferred. Before the end of the course, each student should have read two regular papers assigned by the instructor and then summarize them as two reports.