Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add files via upload #8

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
335 changes: 335 additions & 0 deletions 2019-spring/Assignment-02.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,335 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Assignment-02, Probability Model A First Look: An Introduction of Language Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Assignment\n",
"\n",
"1. Review the course online programming code; \n",
"2. Review the main questions; \n",
"3. Using wikipedia corpus to build a language model. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Review the course online programming code. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*In this part, you should re-code the programming task in our online course.*\n",
"\n",
"> \n",
"> \n",
"\n",
"> \n",
"> \n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Review the main points of this lesson. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 1. How to Github and Why do we use Jupyter and Pycharm; "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ans: {*Put your answer here*}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 2. What's the Probability Model?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ans:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 3. Can you came up with some sceneraies at which we could use Probability Model?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ans:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 4. Why do we use probability and what's the difficult points for programming based on parsing and pattern match? \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ans:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 5. What's the Language Model;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ans: "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 6. Can you came up with some sceneraies at which we could use Language Model?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ans:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 7. What's the 1-gram language model;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ans:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 8. What's the disadvantages and advantages of 1-gram language model;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ans:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 9. What't the 2-gram models; "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ans:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 10. what's the web crawler, and can you implement a simple crawler? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ans:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 11. There may be some issues to make our crwaler programming difficult, what are these, and how do we solve them?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ans:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 12. What't the Regular Expression and how to use?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ans:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Using Wikipedia dataset to finish the language model. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Step 1: You need to download the corpus from wikipedis:\n",
"> https://dumps.wikimedia.org/zhwiki/20190401/\n",
"\n",
"Step 2: You may need the help of wiki-extractor:\n",
"\n",
"> https://github.com/attardi/wikiextractor\n",
"\n",
"Step 3: Using the technologies and methods to finish the language model; \n",
"> \n",
"\n",
"Step 4: Try some interested sentence pairs, and check if your model could fit them\n",
"\n",
"> \n",
"\n",
"Step 5: If we need to solve following problems, how can language model help us? \n",
"\n",
"+ Voice Recognization.\n",
"+ Sogou *pinyin* input.\n",
"+ Auto correction in search engine. \n",
"+ Abnormal Detection."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Compared to the previous learned parsing and pattern match problems. What's the advantage and disavantage of Probability Based Methods? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ans: "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## (Optional) How to solve *OOV* problem?\n",
"\n",
"If some words are not in our dictionary or corpus. When we using language model, we need to overcome this `out-of-vocabulary`(OOV) problems. There are so many intelligent man to solve this probelm. \n",
"\n",
"-- \n",
"\n",
"The first question is: \n",
"\n",
"**Q1: How did you solve this problem in your programming task?**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ans: "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, the sencond question is: "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Q2: Read about the 'Turing-Good Estimator', can explain the main points about this method, and may implement this method in your programming task**\n",
"\n",
"Reference: \n",
"+ https://www.wikiwand.com/en/Good%E2%80%93Turing_frequency_estimation\n",
"+ https://github.com/Computing-Intelligence/References/blob/master/NLP/Natural-Language-Processing.pdf, Page-46"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> coding in here"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}