About
the course Catalog Description A More Honest Description Course Objectives Administrivia About the Instructor Texts: Class Conduct Course Evaluation/Grading Tentative Schedule |
## About the course:## Catalog Description:
## A More Honest Description:
## Course Objectives:This is one of the four proseminars in the Information Science Doctoral program and a crucial foundational seminar specially for those proposing to major in the Organisation of Knowledge Records specialisation. The objectives of this seminar are *To provide a broad perspective on information organisation and knowledge representation.**To develop in-depth knowledge of the statistical analysis of unstructured data (text) to facilitate retrieval.**To gain an understanding and develop competence in modeling in the context of corpus based research.**To develop competence in conducting corpus based empirical research in information organisation and retrieval.*
By the end of the semester, you should be able to *Interpret seminal journal articles in the field.**Build simple statistical models for text.**Statistically analyse corpus data.**Be familiar with well-known corpora, their tagging schemes, etc.*
## Administrivia**Course homepage:**https://www.albany.edu/acc/courses/inf703.spring98.html**Course Newsgroup:**sunya.class.inf703**Course E-Mail:**inf703@cnsunix.albany.edu**Meeting Time:**W 4:15 - 7:05**Meeting Room:**BA 363
## About the Instructor: ## Texts:The main texts for the course are: **Foundations of Statistical Natural Language Processing,***Christopher D. Manning*and*Hinrich Schutze*(The MIT Press, 1999).**Knowledge Representation: Logical, Philosophical, and Computational Foundations,***John F. Sowa*(Brooks/Cole Thomson Learning, 2000).**Modern Applied Statistics with S-Plus, Third Edition,***W.N. Venables*and*B.D. Ripley*(Springer-Verlag, 1999).
I shall be using primarily MS and JS, but you may like to refer to the Venables and Ripley book for the projects/homework. I shall be introducing S-Plus as we go along. However, if you are more comfortable with other statistical software such as SPSS, SAS, etc., please feel free to do so, but you will be on your own (ie., no tech support on my part). ## Class Conduct:Due to attrition (retirements and sabbaticals), the Inf703 team has been reduced to just one (me) this time. The topical coverage and their orientation in the course, therefore, reflects my own interests. However, I will have the support of guest faculty from the School of Information Science & Policy as well as from Computer Science and Geography departments, so you will get a broader perspective on the subject matter of the course. I will update (or amend) the schedule as the guest lecture dates become certain. ## Course Evaluation & Grading:The final course grade will depend on the following components: ## TENTATIVE SCHEDULE
**Approaches to Language:***Language & cognition as probabilistic phenomena -- Ambiguity of language -- Zipf's laws -- Collocations & Concordances.***Probability:***Elementary Probability theory -- random variables -- joint & conditional distributions -- some standard distributions (Binomial, Poisson, and Normal) -- Bayesian decision theory.***Information Theory:***Entropy -- joint & conditional entropy -- mutual information -- the Noisy Channel Model -- Cross entropy -- Perplexity.***Reading Assignments:****MS***Ch.1 and 2.***Do: MS***Ch.2.1, 2.3, 2.4, 2.6, 2.9, 2.10, 2.12, 2.13, 2.14, 2.15.*
**Linguistics Preliminaries:***Parts of speech & Morphology -- Phrase Structure Grammar -- Dependency (Arguments and Adjuncts) -- Phrase structure ambiguity -- Semantics & Pragmatics.***Corpus Based Work:***Corpora -- Tokenisation -- Markup Schemes -- Grammatical Tagging.***Reading Assignments:****MS***Ch.3, 4.***DO:****M S***Ch.3.1--12, Ch.4.3.*
**Topics:***Likelihood ratios -- Relative frequency ratios -- Mutual information -- Reliability & Discrimination -- Statistical estimators (maximum likelihood, Laplace's law, Lidstone's law, Jeffreys-Perks law) -- heldout estimation, cross-validation.***Reading Assignments:****MS**Ch.5, 6.**DO:****MS***Ch.5.1, 5.4, 5.5, 5.6, 5.7, 5.9, 5.12, 5.13, 5.17.*
**Supervised & Unsupervised Disambiguation:***Bayesian Classification -- Information theoretic disambiguation -- Dictionary based disambiguation -- Disambiguation based on sense definitions -- thesaurus based disambiguation -- Unsupervised disambiguation.***Lexical Acquisition:***Evaluation measures (Precision & Recall) -- Verb sub-categorisation -- Attachment ambiguities -- PP attachment -- Selectional preferences -- Semantic similarity -- Vector space measures --Probabilistic measures -- The role of lexical acquisition in Statistical NLP.***Reading Assignments:****MS***Ch.7, 8.*
**Clustering:***Hierarchical clustering (Agglomerative & Divisive clustering, single-link and complete-link) -- non-hierarchical clustering (K-means, EM algorithms).***Information Retrieval:***Evaluation measures -- the Probability Ranking Principle -- the Vector Space Model -- Term Weighting -- Term Distribution Models (Poisson and Two-Poisson models) -- Inverse Document Frequency -- Latent Semantic Indexing -- Singular Value Decomposition -- Discourse segmentation (Text Tiling).***Text Categorisation:***Decision Trees -- Maximum Entropy Modeling -- Generalised Iterative Scaling -- Perceptrons -- k-nearest neighbour classification.***Reading Assignments:****MS***Ch.14, 15, 16.*
**Logic Preliminaries:***Propositional & Predicate logic -- Boolean operators -- Formation rules -- Rules of inference -- Quantification and rules for quantifiers -- Varieties of logic -- Typed Predicate Logic -- Conceptual graphs -- Knowledge Interface Format (KIF).***Logic & Knowledge Representation:***Conceptual graphs -- Names, Types and Measures.***Reading Assignments:****JS***Ch.1.***Do:****JS***Ch.1.1, 1.4, 1.7, 1.8.*
**Ontological Categories:***Quine's criterion -- CYC categories -- Approaches to categorisation (Aristotle, Kant, Peirce, Husserl, Whitehead, Heidegger).***Categories Analysis & Synthesis, etc.:***Contrasts, Distinctions & categories -- Lattice of categories -- Describing physical entities -- Defining abstractions -- Sets, Collections, Types, and Categories.***Reading Assignments:****JS***Ch.2.***Do:****JS***Ch.2.1, 2.2, 2.3, 2.6, 2.8.*
**Knowledge Engineering:***Informal specifications -- Formalisation -- Knowledge representation principles -- Ontological committments -- Representing structure in frames -- Mapping frames to logic -- Frames and Syllogisms -- Multiple inheritance -- Rules and data -- Object-oriented systems -- Natural language semantics -- Levels of representation.***Reading Assignments:****JS***Ch.3.***Do:****JS***Ch.3.1, 3.3, 3.6, 3.7.*
Updated March 6, 2000. |