My Projects

Named-Entity Recognition with Structured Perceptrons

NER as sequence tagging: NER can be formulated as a sequence tagging problem using BIO encoding, where the Beginning word of a named entity is marked with ‘B’, the following words of a named entity is marked with ‘I’ (i.e., inside) and words that are not part of an entity is marked with ‘O’ (i.e., outside). These labels are further marked with the types of named entities (e.g., ORG, PER, LOC).

Parts of Speech Tagging (Hidden Markov Models)

In this assignment you will design and implement Hidden Markov Models (HMMs) for part-of-speech (POS) tagging. Given a natural language sentence s = x1x2:::xn, the task is to nd the most likely sequence of POS tags ^y = y1y2:::yn

Sentiment Classi cation with Feed-Forward Neural Network Classifier

As training and development data, you have a set of random English tweets that have been categorized into two classes: positive and negative sentiment. As test data, you have a set of tweets that come from either an iPhone or an Android phone, which you will classify automatically as positive or negative.

Analogy using Distributed Semantic Vector

In this assignment, you will rst think about how we can evaluate the quality of the distributed semantic vectors, or embeddings, through a concrete semantic task.

Language Models, Smoothing, Authorship Attribution

Here I implement n-gram language models, specifically unigram, bigram and trigram models. In this assignment, you will build n-gram language models (LMs), think about how di erent design choices including smoothing techniques can influence the quality of the trained LMs, and play with authorship attribution as one of the cool applications of LMs.