Syllabus: Introduction to Stringology

 

Instructor: Guan-Shieng Huang (黃光璿)

    Office: Room 314, Science and Technology Building 3

    Office Hours: 2:00~5:00 pm, Friday

    Web Site: http://staffweb.ncnu.edu.tw/shieng/

    Email: shieng@ncnu.edu.tw

Time and Location:

    Room 119, Science and Technology Building 3

    2:00~5:00 pm, Thursday

Teaching Assistant: C. S. Wu (吳展碩)

    Office: Room 106, Science and Technology Building 3

    Email: s2321504@ncnu.edu.tw

Text Book

    Jewels of Stringology, M. Crochemore and W. Rytter, World Scientific, 2003, ISBN: 9810248970. (Local Regional Office)

References:

    Algorithms on Strings, Trees, and Sequences, Dan Gusfield, Cambridge University Press, 1997.

Prerequisites:

    Data Structures and Algorithms

    Discrete Mathematics

Description:

        This course emphasizes essential string processing algorithms and techniques. We introduce it from both practical and theoretical viewpoints. Classical string processing algorithms such as KMP and Boyer-Moore algorithms will first be introduced. The theoretical analyses of these algorithms will come together, too. Then we will turn to the most powerful data structure, the suffix tree, which has been demonstrated to have many important applications in string processing and in bioinformatics. We will introduce Ukkonen’s and McCreight’s versions of linear-time algorithms to construct suffix trees. Basic combinatorics on strings will also be discussed in this course. Parts of the automata approach will also be included. The following is the outline of the course:

               1.      Stringology

               2.      Basic string searching algorithms

               3.      Preprocessing for basic searchings

               4.      On-line construction of suffix trees

               5.      More on suffix trees

               6.      Subword graphs

               7.      Test algorithms related to sorting

               8.      Symmetries and repetitions in texts

               9.      Constant-space searchings

           10.      Text compression techniques

           11.      Automata-theoretic approach

           12.      Approximate pattern matching

           13.      Matching by duel and sampling

           14.      Two-dimensional pattern matching

           15.      Two-dimensional periodicities

           16.      Parallel text algorithms

           17.      Miscellaneous

Grading Policy:

        Every student taken this course should follow three rules: hand in your homework on time, do it by yourself, and code programs by your own effort. Cheating is absolutely disallowed and no late homework will be accepted. The distribution of grading is as follows:

                50% Homework (about 8 assignments)

                50% Paper Reports

There are about eight homework assignments, including paper work and programming assignments. As for the programming languages, C, C++ and Java are preferred. Before the end of the course, each student should have read two regular papers assigned by the instructor and then summarize them as two reports.