Our Research

Our research focuses on the following problems:

  • Language identification (LID) for multilingual documents:
    • Creating accurate LID at the word level using a combination of rules, n-grams, and machine learning
  • Quantifying and visualizing code-switching:
    • Using multiple measures  to model patterns of code-switching within and across corpora

  • Part-of-Speech tagging for multilingual documents:
    • Improving POS tagging using information from  patterns of code-switching
  • Exploring linguistic constraints on code-switching