BA/app
Contents
Primary sources
Computer linguistics: CL intro
Genetic Algorithms: An introduction to genetic algorithms
Linguistics: Dissertation partly about interferences. Has a nice error classification, error taxonomy, borrowing, tranfer etc etc. Seems like a nice intro to "What exists"
CL/ML resources
Text classification
Natural language classification with Python:Book, especially learning to classify text
With machine learning:
- with tensorflow and generally nns: [1]
- Machine Learning, NLP: Text Classification using scikit-learn, python and NLTK
- Working with scikit and text data
Error Detection
error detection using local word bigram and trigram + some others Automatic error analysis of machine translation output -- more about possible errors and ways to classify them
Somewhat similar problems being solved
Cross-cultural Deception Detection. It uses unigrams + LIWC (which is more psychological and less relevant)
- Deception detection -- has examples of extracted features which I might use
- [2] -- lie detector
- Linguistic Cues to Deception Assessed by Computer Programs: A Meta-Analysis -- also ideas of possible features that might be interesting to look into.
Linguistics
Typical errors
Russian
- Similar-sounding and semantically non-identical words + idioms
- Grammar. Articles, connecting verbs, future tenss, negative sentences, commas etc etc -- really nice.
German
- list of sentences
- also examples, hard to generalize
- examples, a bit better ones?
- German reflexive verbs list which could be used to see differences between English and German reflexive verbs.
Indian
- Wikipedia - Indian English I thing this could be done just statistically?
Italian
???
Random
Natural Language Annotation for Machine Learning ebook, seems to cover quite a lot
downloads and demos -- datasets for CL lying detection -- generally interesting
Classification-as-a-service with free examples. Gender, MBTI, etc etc etc, pretty nice