Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update 10_nlp.ipynbto say 'spaces' #613

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion 04_mnist_basics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2239,7 +2239,7 @@
"\n",
"Next in `mnist_distance` we see `abs`. You might be able to guess now what this does when applied to a tensor. It applies the method to each individual element in the tensor, and returns a tensor of the results (that is, it applies the method \"elementwise\"). So in this case, we'll get back 1,010 matrices of absolute values.\n",
"\n",
"Finally, our function calls `mean((-1,-2))`. The tuple `(-1,-2)` represents a range of axes. In Python, `-1` refers to the last element, and `-2` refers to the second-to-last. So in this case, this tells PyTorch that we want to take the mean ranging over the values indexed by the last two axes of the tensor. The last two axes are the horizontal and vertical dimensions of an image. After taking the mean over the last two axes, we are left with just the first tensor axis, which indexes over our images, which is why our final size was `(1010)`. In other words, for every image, we averaged the intensity of all the pixels in that image.\n",
"Finally, our function calls `mean((-1,-2))`. The tuple `(-1,-2)` represents a list of axes. In Python, `-1` refers to the last element, and `-2` refers to the second-to-last. So in this case, this tells PyTorch that we want to take the mean ranging over the values indexed by the last two axes of the tensor. The last two axes are the horizontal and vertical dimensions of an image. After taking the mean over the last two axes, we are left with just the first tensor axis, which indexes over our images, which is why our final size was `(1010)`. In other words, for every image, we averaged the intensity of all the pixels in that image.\n",
"\n",
"We'll be learning lots more about broadcasting throughout this book, especially in <<chapter_foundations>>, and will be practicing it regularly too.\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion 10_nlp.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"When we said \"convert the text into a list of words,\" we left out a lot of details. For instance, what do we do with punctuation? How do we deal with a word like \"don't\"? Is it one word, or two? What about long medical or chemical words? Should they be split into their separate pieces of meaning? How about hyphenated words? What about languages like German and Polish where we can create really long words from many, many pieces? What about languages like Japanese and Chinese that don't use bases at all, and don't really have a well-defined idea of *word*?\n",
"When we said \"convert the text into a list of words,\" we left out a lot of details. For instance, what do we do with punctuation? How do we deal with a word like \"don't\"? Is it one word, or two? What about long medical or chemical words? Should they be split into their separate pieces of meaning? How about hyphenated words? What about languages like German and Polish where we can create really long words from many, many pieces? What about languages like Japanese and Chinese that don't use spaces at all, and don't really have a well-defined idea of *word*?\n",
"\n",
"Because there is no one correct answer to these questions, there is no one approach to tokenization. There are three main approaches:\n",
"\n",
Expand Down