Skip to the content of the web site.

CS886 | Natural Language Computing Spring 2007

Organizational Meeting:
Tuesday 1 May 2007 3:00-4:00
DC2306C (AI Lab Conference Room)


All sessions will be held in the AI Lab conference room (DC2306C).
Copies of each text will be available for short-term loan at the DC Library Circulation Desk.


Course Overview

Natural Language systems have evolved tremendously in the past few years from dealing only with small handcrafted examples to extremely large, real-world applications.
This course is intended for any Computer Science graduate student interested in gaining an in-depth understanding of the main theoretical paradigms (symbolic and statistical) and applications being worked on by Natural Language researchers and system developers.

Participants will be expected to read widely and in-depth. There are no formal requirements other than interest in the topic and ability to read and analyze technical material. Some knowledge of linguistics or a second language would be helpful but is not necessary.

Grading will be based on class participation (40% for presentions of the readings and leading discussions; 20% for participating in discussions) and a term paper on a topic of your choice (40%). Auditors are welcome but will be expected to lead at least one discussion and to participate regularly in discussions.

If you are interested in this course and would like to do some background preparation on your own, the following textbook is recommended:

Daniel Jurafsky and James H. Martin
Speech and language processing: An introduction to natural language processing,
computational linguistics and speech recognition

Prentice Hall, 2000

DC Library Short-term loan call number: UWD 1488

 


Course Outline


SESSIONS 1 and 2: Psycholinguistics (Group presentations)

Session 1

Thursday 3 May 10:00-12:00 DC2306C

Readings:

Steven Pinker,
The language instinct: How the mind creates language,
Perennial Classics, 2000.

DC Library Short-term loan call number: UWD XXXX


Chapter 3 Mentalese
Chapter 4 How Language Works


Session 2

Thursday 10 May 2:30-4:00 DC2306C

Readings:

Pinker (continued)
Chapter 10 Language Organs and Grammar Genes
Chapter 11 The Big Bang
Chapter 13 Mind Design


SESSIONS 3 and 4: Ontologies for Natural Language Computing

Session 3: Standard Linguistic Ontologies

Thursday 17 May 2:30-4:00 DC2306C

Readings:

Christiane Fellbaum (editor),
WordNet: An electronic lexical database,
The MIT Press, 1998.

DC Library Short-term loan call number: UWD 1408


Chapter 1 Nouns in WordNet

Selected Papers from FrameNet Project:
http://framenet.icsi.berkeley.edu/

Collin F. Baker, Charles J. Fillmore, and John B. Lowe,
"The Berkeley FrameNet project",
Proceedings of COLING-ACL, Montreal, Canada, 1998.

PDF

Charles J. Fillmore, Charles Wooters, and Collin F. Baker,
"Building a large lexical databank which provides deep semantics",
Proceedings of the Pacific Asian Conference on Language, Information and Computation, Hong Kong, 2001.

PDF


Session 4: Automated Ontology Learning

Thursday 24 May 2:30-4:00 DC2306C

Readings:

M. Shamsfard and A.A. Barforoush,
"The state of the art in ontology learning: A framework for comparison",
The Knowledge Engineering Review, 18(4):293-316, 2003.

M. Shamsfard and A.A. Barforoush,
"Ontology learning from natural language texts",
International Journal of Human-Computer Studies, 60(1):17-63, 2004.


SESSIONS 5 to 7: Statistical Natural Language Processing Basics

Session 5

Thursday 31 May 2:30-4:00

Readings:

Christopher Manning and Hinrich Schütze,
Foundations of Statistical Natural Language Processing,
The MIT Press, 2000.

DC Library Short-term loan call number: UWD 1413


Chapter 6 (6.1 6.2) N-gram models
Chapter 10 (10.1 to p.353, 10.3) Part-of-speech tagging

Eric Brill,
"A simple rule-based part of speech tagger",
DARPA Workshop, 1996.

PDF


Sessions 6 and 7 ***DATE CHANGES

Thursday 14 June 2:30-4:00
Monday 18 June 2:30-4:00

Readings:

Manning and Schütze (continued)
Chapter 12.1 Probabilistic parsing

Michael Collins,
"A new statistical parser based on bigram lexical dependencies",
Proceedings of the 34th Annual Meeting of the ACL, Santa Cruz, 1996.

PDF

***DATE CHANGE: Tuesday June 26 2:30-4:30
Daniel Gildea and Daniel Jurafsky,
"Automatic labelling of semantic roles",
Computational Linguistics, Volume 28(3): 245-288.

PDF

NO Additional Paper (class choice): Hidden Markov Models, Statistical Chart Parsing, Treebank Grammars

PDF


SESSIONS 8 and 9 Applications of Statistical NLP

Text Classification and Text Summarization


Session 8

***DATE CHANGE: Tuesday June 26 2:30-4:30

Readings:

Manning and Schütze (continued)
Chapter 16 (first few sections)


Session 9

Thursday 28 June 2:30-4:00

Readings:

Inderjeet Mani,
Automatic summarization,
John Benjamins, 2001. (Reference only)

DC Library Short-term loan call number: UWD XXXX

Eduard Hovy,
"Text summarization",
The Oxford handbook of computational linguistics,
Oxford University Press, 2005.

PDF

Selected Papers from 2006 Document Understanding Conference:
http://duc.nist.gov/

K.C. Litkowski,
"CL Research Summarization in DUC 2006: An Easier Task, An Easier Method?".

PDF


SESSIONS 10 to 13 Current Topics and Applications

Sessions 10 and 11: Lexical Semantics

Thursday 5 July 2:30-4:00
Thursday 12 July 2:30-4:00

Reference:

Patrick Saint-Dizier and Evelyn Viegas (editors),
Computational lexical semantics,
Cambridge University Press, 2005.

DC Library Short-term loan call number: UWD XXXX


Readings:

Thomas Landauer, P.W. Foltz, and D. Laham,
"Introduction to Latent Semantic Analysis",
Discourse Processes, 25, 259-284, 1998.

Patrick Pantel and Dekai Lin,
"Discovering word senses from text",
Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2002.

PDF

Peter D. Turney,
"Word sense disambiguation by Web mining for word co-occurrence probabilities",
Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (SENSEVAL-3), Barcelona, Spain, 2004.

PDF

Peter D. Turney,
"Similarity of semantic relations",
Computational Linguistics, 32(3), 379-416.

P.D. Turney and M.L. Littman,
"Corpus-based learning of analogies and semantic relations",
Machine Learning, 60(1-3),251-278, 2005.


Sessions 12 and 13: The Semantic Web

Thursday 19 July 2:30-4:00
Thursday 26 July 2:30-4:00

References:

Tim Berners-Lee, James Hendler and Ora Lassila,
"The Semantic Web",
Scientific American, May 2001.

Isabel Cruz et al. (editors),
The Semantic Web - ISWC 2006: 5th International Semantic Web Conference, ISWC 2006, Athens, GA, USA, November 5-9, 2006,
Springer, 2006

DC Library Short-term loan call number: UWD XXXX


Tim Berners-Lee, Dieter Fensel, James A. Hendler, and Henry Lieberman,
Spinning the Semantic Web : Bringing the World Wide Web to its full potential,
The MIT Press, 2005.

DC Library Short-term loan call number: UWD XXXX