Difference between revisions of "BA/app"
Line 30: | Line 30: | ||
* [http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.acl09.pdf] -- lie detector | * [http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.acl09.pdf] -- lie detector | ||
* [http://delivery.acm.org/10.1145/2390000/2388617/p1-hauch.pdf?ip=149.205.109.95&id=2388617&acc=OPEN&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&CFID=1008304166&CFTOKEN=69973089&__acm__=1511275273_f72fd72f6e2433e82566681fc1a564cb Linguistic Cues to Deception Assessed by Computer Programs: A Meta-Analysis] -- also ideas of possible features that might be interesting to look into. | * [http://delivery.acm.org/10.1145/2390000/2388617/p1-hauch.pdf?ip=149.205.109.95&id=2388617&acc=OPEN&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&CFID=1008304166&CFTOKEN=69973089&__acm__=1511275273_f72fd72f6e2433e82566681fc1a564cb Linguistic Cues to Deception Assessed by Computer Programs: A Meta-Analysis] -- also ideas of possible features that might be interesting to look into. | ||
+ | |||
+ | |||
+ | Think about sentiment detection etc | ||
+ | |||
+ | Most of those things are solved via Bag of Words which won't be enough for me, I think | ||
== Linguistics == | == Linguistics == |
Revision as of 14:17, 26 November 2017
Contents
Primary sources
Computer linguistics: CL intro
Genetic Algorithms: An introduction to genetic algorithms
Linguistics: Dissertation partly about interferences. Has a nice error classification, error taxonomy, borrowing, tranfer etc etc. Seems like a nice intro to "What exists"
CL/ML resources
Text classification
Natural language classification with Python:Book, especially learning to classify text
With machine learning:
- with tensorflow and generally nns: [1]
- Machine Learning, NLP: Text Classification using scikit-learn, python and NLTK
- Working with scikit and text data
Error Detection
error detection using local word bigram and trigram + some others Automatic error analysis of machine translation output -- more about possible errors and ways to classify them
Somewhat similar problems being solved
Cross-cultural Deception Detection. It uses unigrams + LIWC (which is more psychological and less relevant)
- Deception detection -- has examples of extracted features which I might use
- [2] -- lie detector
- Linguistic Cues to Deception Assessed by Computer Programs: A Meta-Analysis -- also ideas of possible features that might be interesting to look into.
Think about sentiment detection etc
Most of those things are solved via Bag of Words which won't be enough for me, I think
Linguistics
Typical errors
Russian
- Similar-sounding and semantically non-identical words + idioms
- Grammar. Articles, connecting verbs, future tenss, negative sentences, commas etc etc -- really nice.
German
- list of sentences
- also examples, hard to generalize
- examples, a bit better ones?
- German reflexive verbs list which could be used to see differences between English and German reflexive verbs.
Indian
- Wikipedia - Indian English I thing this could be done just statistically?
Italian
???
Random
Natural Language Annotation for Machine Learning ebook, seems to cover quite a lot
downloads and demos -- datasets for CL lying detection -- generally interesting
Classification-as-a-service with free examples. Gender, MBTI, etc etc etc, pretty nice