Difference between revisions of "BA/app"
Line 7: | Line 7: | ||
Linguistics: | Linguistics: | ||
− | [https://is.muni.cz/th/180075/ff_b/Thesis_2nd_draft.txt | + | [https://is.muni.cz/th/180075/ff_b/Thesis_2nd_draft.txt Dissertation partly about interferences]. Has a nice error classification, error taxonomy, borrowing, tranfer etc etc. Seems like a nice intro to "What exists" |
== CL/ML resources == | == CL/ML resources == | ||
Line 22: | Line 22: | ||
[http://www.aclweb.org/anthology/O13-1022 error detection using local word bigram and trigram] + some others | [http://www.aclweb.org/anthology/O13-1022 error detection using local word bigram and trigram] + some others | ||
+ | [http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00072 Automatic error analysis of machine translation output] -- more about possible errors and ways to classify them | ||
=== Somewhat similar problems being solved === | === Somewhat similar problems being solved === | ||
Line 29: | Line 30: | ||
* [http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.acl09.pdf] -- lie detector | * [http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.acl09.pdf] -- lie detector | ||
* [http://delivery.acm.org/10.1145/2390000/2388617/p1-hauch.pdf?ip=149.205.109.95&id=2388617&acc=OPEN&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&CFID=1008304166&CFTOKEN=69973089&__acm__=1511275273_f72fd72f6e2433e82566681fc1a564cb Linguistic Cues to Deception Assessed by Computer Programs: A Meta-Analysis] -- also ideas of possible features that might be interesting to look into. | * [http://delivery.acm.org/10.1145/2390000/2388617/p1-hauch.pdf?ip=149.205.109.95&id=2388617&acc=OPEN&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&CFID=1008304166&CFTOKEN=69973089&__acm__=1511275273_f72fd72f6e2433e82566681fc1a564cb Linguistic Cues to Deception Assessed by Computer Programs: A Meta-Analysis] -- also ideas of possible features that might be interesting to look into. | ||
+ | |||
+ | == Linguistics == | ||
+ | === Typical errors === | ||
+ | ==== Russian ==== | ||
+ | *[http://www.simonf.com/lang/mistakes_russian_win.html Similar-sounding and semantically non-identical words + idioms] | ||
+ | * ''[http://www.study.ru/support/lib/note281.html Grammar]. Articles, connecting verbs, future tenss, negative sentences, commas etc etc -- really nice.'' | ||
+ | ==== German ==== | ||
+ | * [https://www.englishwithnick.de/resources-for-germans/typical-grammar-mistakes-made-by-germans/ list of sentences] | ||
+ | * [http://londonschool.de/top-english-mistakes-made-german-learners-volume-1/ also examples, hard to generalize] | ||
+ | * [https://englishwithkirsty.com/2014/07/15/10-typical-mistakes-made-by-german-speakers-who-are-learning-english/ examples, a bit better ones?] | ||
+ | * [http://www.jabbalab.com/blog/966/how-and-when-to-use-german-reflexive-verbs German reflexive verbs list] which could be used to see differences between English and German reflexive verbs. | ||
+ | |||
+ | ==== Indian ==== | ||
+ | * [https://en.wikipedia.org/wiki/Indian_English#Morphology_and_syntax Wikipedia - Indian English] I thing this could be done just statistically? | ||
+ | |||
+ | ==== Italian ==== | ||
+ | ??? | ||
== Random == | == Random == | ||
[https://www.safaribooksonline.com/library/view/natural-language-annotation/9781449332693/ Natural Language Annotation for Machine Learning] ebook, seems to cover quite a lot | [https://www.safaribooksonline.com/library/view/natural-language-annotation/9781449332693/ Natural Language Annotation for Machine Learning] ebook, seems to cover quite a lot | ||
− | [http://lit.eecs.umich.edu/downloads.html#Cross-Cultural%20Deception downloads and demos -- | + | [http://lit.eecs.umich.edu/downloads.html#Cross-Cultural%20Deception downloads and demos -- datasets for CL lying detection] -- generally interesting |
+ | |||
+ | [https://www.uclassify.com/browse/uclassify/ Classification-as-a-service with free examples]. Gender, MBTI, etc etc etc, pretty nice |
Revision as of 14:16, 26 November 2017
Contents
Primary sources
Computer linguistics: CL intro
Genetic Algorithms: An introduction to genetic algorithms
Linguistics: Dissertation partly about interferences. Has a nice error classification, error taxonomy, borrowing, tranfer etc etc. Seems like a nice intro to "What exists"
CL/ML resources
Text classification
Natural language classification with Python:Book, especially learning to classify text
With machine learning:
- with tensorflow and generally nns: [1]
- Machine Learning, NLP: Text Classification using scikit-learn, python and NLTK
- Working with scikit and text data
Error Detection
error detection using local word bigram and trigram + some others Automatic error analysis of machine translation output -- more about possible errors and ways to classify them
Somewhat similar problems being solved
Cross-cultural Deception Detection. It uses unigrams + LIWC (which is more psychological and less relevant)
- Deception detection -- has examples of extracted features which I might use
- [2] -- lie detector
- Linguistic Cues to Deception Assessed by Computer Programs: A Meta-Analysis -- also ideas of possible features that might be interesting to look into.
Linguistics
Typical errors
Russian
- Similar-sounding and semantically non-identical words + idioms
- Grammar. Articles, connecting verbs, future tenss, negative sentences, commas etc etc -- really nice.
German
- list of sentences
- also examples, hard to generalize
- examples, a bit better ones?
- German reflexive verbs list which could be used to see differences between English and German reflexive verbs.
Indian
- Wikipedia - Indian English I thing this could be done just statistically?
Italian
???
Random
Natural Language Annotation for Machine Learning ebook, seems to cover quite a lot
downloads and demos -- datasets for CL lying detection -- generally interesting
Classification-as-a-service with free examples. Gender, MBTI, etc etc etc, pretty nice