diff --git a/Gemfile b/Gemfile new file mode 100644 index 0000000..b1abdf6 --- /dev/null +++ b/Gemfile @@ -0,0 +1 @@ +gem "github-pages", group: :jekyll_plugins diff --git a/README.md b/README.md new file mode 100644 index 0000000..284b79f --- /dev/null +++ b/README.md @@ -0,0 +1,55 @@ +# Lunch Time Python + +[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) +[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/ssciwr/lunch-time-python/ci.yml?branch=main)](https://github.com/ssciwr/lunch-time-python/actions/workflows/ci.yml) +[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ssciwr/lunch-time-python/main) +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ssciwr/lunch-time-python/blob/main) + +Welcome to Lunch Time Python - an event series organized by the [Scientific Software Center](https://ssc.iwr.uni-heidelberg.de) at Heidelberg University. + +## What it is + +Python is a very popular - maybe even *the* most popular - programming language among scientific software developers. One of the reasons for this success story is the rich standard library and the rich ecosystem of available (scientific) libraries. To fully leverage this ecosystem, developers need to stay up to date and explore new libraries. *Lunch Time Python* aims at providing a communication platform between Pythonistas to learn about new libraries in an informal setting. Sessions take roughly 30 minutes, one library is presented per session and the code will be made available afterwards. Come by, enjoy your lunch with us and step up your Python game! + +## Sessions + +Next installment of Lunch Time Python: tba + +Here is a list of past sessions which you can reproduce in a Cloud environment by clicking the [Binder](https://mybinder.org) or [Colab](https://colab.research.google.com/) (requires a Google account) links below: + +* Dash: [slides](https://ssciwr.github.io/lunch-time-python/lunchtime12/lunchtime12.slides.html), [binder](https://mybinder.org/v2/gh/ssciwr/lunch-time-python.git/HEAD?labpath=lunchtime12%2Flunchtime12.ipynb), [colab](https://colab.research.google.com/github/ssciwr/lunch-time-python/blob/main/lunchtime12/lunchtime12.ipynb), [notebook](https://ssciwr.github.io/lunch-time-python/lunchtime12/lunchtime12.ipynb) (Session 12, May 25th 2022, 12:00pm) + +* spaCy: [slides](https://ssciwr.github.io/lunch-time-python/lunchtime11/lunchtime11.slides.html), [binder](https://mybinder.org/v2/gh/ssciwr/lunch-time-python.git/HEAD?labpath=lunchtime11%2Flunchtime11.ipynb), [colab](https://colab.research.google.com/github/ssciwr/lunch-time-python/blob/main/lunchtime11/lunchtime11.ipynb), [notebook](https://ssciwr.github.io/lunch-time-python/lunchtime11/lunchtime11.ipynb) (Session 11, November 25th 2022, 11:30am) + +* pytorch: [slides](https://ssciwr.github.io/lunch-time-python/lunchtime10/lunchtime10.slides.html), [binder](https://mybinder.org/v2/gh/ssciwr/lunch-time-python.git/HEAD?labpath=lunchtime10%2Flunchtime10.ipynb), [colab](https://colab.research.google.com/github/ssciwr/lunch-time-python/blob/main/lunchtime10/lunchtime10.ipynb), [notebook](https://ssciwr.github.io/lunch-time-python/lunchtime10/lunchtime10.ipynb) (Session 10, October 30th 2022, 12pm) + +* mypy: [slides](https://ssciwr.github.io/lunch-time-python/lunchtime9/lunchtime9.slides.html), [binder](https://mybinder.org/v2/gh/ssciwr/lunch-time-python.git/HEAD?labpath=lunchtime9%2Flunchtime9.ipynb), [colab](https://colab.research.google.com/github/ssciwr/lunch-time-python/blob/main/lunchtime9/lunchtime9.ipynb), [notebook](https://ssciwr.github.io/lunch-time-python/lunchtime9/lunchtime9.ipynb) (Session 9, September 30th 2022, 12pm) + +* ipywidgets: [slides](https://ssciwr.github.io/lunch-time-python/lunchtime8/lunchtime8.slides.html), [binder](https://mybinder.org/v2/gh/ssciwr/lunch-time-python.git/HEAD?labpath=lunchtime8%2Flunchtime8.ipynb), [colab](https://colab.research.google.com/github/ssciwr/lunch-time-python/blob/main/lunchtime8/lunchtime8.ipynb), [notebook](https://ssciwr.github.io/lunch-time-python/lunchtime8/lunchtime8.ipynb) (Session 8, July 29th 2022, 12pm) + +* matplotlib: [slides](https://ssciwr.github.io/lunch-time-python/lunchtime7/lunchtime7.slides.html), [binder](https://mybinder.org/v2/gh/ssciwr/lunch-time-python.git/HEAD?labpath=lunchtime7%2Flunchtime7.ipynb), [colab](https://colab.research.google.com/github/ssciwr/lunch-time-python/blob/main/lunchtime7/lunchtime7.ipynb), [notebook](https://ssciwr.github.io/lunch-time-python/lunchtime7/lunchtime7.ipynb) (Session 7, June 24th 2022, 12pm) + +* numba: [slides](https://ssciwr.github.io/lunch-time-python/lunchtime6/lunchtime6.slides.html), [binder](https://mybinder.org/v2/gh/ssciwr/lunch-time-python.git/HEAD?labpath=lunchtime6%2Flunchtime6.ipynb), [colab](https://colab.research.google.com/github/ssciwr/lunch-time-python/blob/main/lunchtime6/lunchtime6.ipynb), [noteboook](https://ssciwr.github.io/lunch-time-python/lunchtime6/lunchtime6.ipynb) (Session 6, April 29th 2022, 12pm) + +* pillow: [slides](https://ssciwr.github.io/lunch-time-python/lunchtime5/lunchtime5.slides.html), [binder](https://mybinder.org/v2/gh/ssciwr/lunch-time-python.git/HEAD?labpath=lunchtime5%2Flunchtime5.ipynb), [colab](https://colab.research.google.com/github/ssciwr/lunch-time-python/blob/main/lunchtime5/lunchtime5.ipynb), [noteboook](https://ssciwr.github.io/lunch-time-python/lunchtime5/lunchtime5.ipynb) (Session 5, March 25th 2022, 12pm) + +* pytest: [slides](https://ssciwr.github.io/lunch-time-python/lunchtime4/lunchtime4.slides.html), [binder](https://mybinder.org/v2/gh/ssciwr/lunch-time-python.git/HEAD?labpath=lunchtime4%2Flunchtime4.ipynb), [colab](https://colab.research.google.com/github/ssciwr/lunch-time-python/blob/main/lunchtime4/lunchtime4.ipynb), [noteboook](https://ssciwr.github.io/lunch-time-python/lunchtime4/lunchtime4.ipynb) (Session 4, February 25th 2022, 12pm) + +* Click: [slides](https://ssciwr.github.io/lunch-time-python/lunchtime3/lunchtime3.slides.html), [binder](https://mybinder.org/v2/gh/ssciwr/lunch-time-python.git/HEAD?labpath=lunchtime3%2Flunchtime3.ipynb), [colab](https://colab.research.google.com/github/ssciwr/lunch-time-python/blob/main/lunchtime3/lunchtime3.ipynb), [noteboook](https://ssciwr.github.io/lunch-time-python/lunchtime3/lunchtime3.ipynb) (Session 3, January 28th 2022, 12pm) + +* SymPy: [slides](https://ssciwr.github.io/lunch-time-python/lunchtime2/lunchtime2.slides.html), [binder](https://mybinder.org/v2/gh/ssciwr/lunch-time-python.git/HEAD?labpath=lunchtime2%2Flunchtime2.ipynb), [colab](https://colab.research.google.com/github/ssciwr/lunch-time-python/blob/main/lunchtime2/lunchtime2.ipynb), [noteboook](https://ssciwr.github.io/lunch-time-python/lunchtime2/lunchtime2.ipynb) (Session 2, November 26th 2021, 12pm) + +* requests: [colab](https://colab.research.google.com/github/ssciwr/lunch-time-python/blob/main/lunchtime1/lunchtime1.ipynb), [binder](https://mybinder.org/v2/gh/ssciwr/lunch-time-python.git/HEAD?labpath=lunchtime1%2Flunchtime1.ipynb), [noteboook](https://ssciwr.github.io/lunch-time-python/lunchtime1/lunchtime1.ipynb) (Session 1, October 29th 2021, 12 pm) + +## Registration + +Registration for Lunch Time Python sessions is not required. You can still register using [this form](https://ssc.iwr.uni-heidelberg.de/form/lunch-time-python-registration) to + +* give us a better idea about the potential audience and what libraries we should present. +* make us send you a reminder the day before the session + +Suggestions for libraries to present are very welcome as a [github issue](https://github.com/ssciwr/lunch-time-python/issues/new/choose) or as an [email to the SSC developers](mailto:ssc@iwr.uni-heidelberg.de) + +## Licensing + +The example codes are [available on GitHub](https://github.com/ssciwr/lunch-time-python) and are provided under the permissive MIT license giving you a lot of freedom to reuse and redistribute the code in your projects! diff --git a/_config.yml b/_config.yml new file mode 100644 index 0000000..f980e76 --- /dev/null +++ b/_config.yml @@ -0,0 +1 @@ +theme: jekyll-theme-slate diff --git a/_layouts/default.html b/_layouts/default.html new file mode 100644 index 0000000..5dfb179 --- /dev/null +++ b/_layouts/default.html @@ -0,0 +1,75 @@ + + + + + + + + + +{% seo %} + + + + + +
+
+ {% if site.github.is_project_page %} + View on GitHub + {% endif %} + +

+ + + + + + + + + + + + SSC Lunch Time Python +

+
+
+ + +
+
+ {{ content }} +
+
+ + + + + + diff --git a/assets/css/style.scss b/assets/css/style.scss new file mode 100644 index 0000000..ff0c121 --- /dev/null +++ b/assets/css/style.scss @@ -0,0 +1,29 @@ +--- +--- + +@import "{{ site.theme }}"; + +#navigation ul { + margin: 0; + padding: 0; + border: 0; + font-size: 20px; +} + +#navigation ul li { + list-style-type: none; + display: inline; +} + +#navigation li a { + display: block; + float: left; + padding: 0px 10px; + color: #fff; + text-decoration: none; + border-right: 1px solid #fff; +} + +#navigation li a:hover { + background-color: #3a9ebf; +} \ No newline at end of file diff --git a/assets/images/favicon.ico b/assets/images/favicon.ico new file mode 100644 index 0000000..af97f26 Binary files /dev/null and b/assets/images/favicon.ico differ diff --git a/assets/images/ssc-logo.svg b/assets/images/ssc-logo.svg new file mode 100644 index 0000000..af8fa23 --- /dev/null +++ b/assets/images/ssc-logo.svg @@ -0,0 +1,3 @@ + + + diff --git a/lunchtime1/README.md b/lunchtime1/README.md new file mode 100644 index 0000000..dd06831 --- /dev/null +++ b/lunchtime1/README.md @@ -0,0 +1,3 @@ +# Lunchtime #1: requests library (Oct. 21) + +The [requests library](https://docs.python-requests.org/en/latest/) provides an elegant and simple way to send HTTP requests. Connect to the server of your choice, and download websites, stream data or upload content. Requests is [one of the most downloaded python packages](https://pypi.org/project/requests/) with about 14 Million downloads per week, and half a million of repositories that depend on requests. \ No newline at end of file diff --git a/lunchtime1/lunchtime1.ipynb b/lunchtime1/lunchtime1.ipynb new file mode 100644 index 0000000..55f62b7 --- /dev/null +++ b/lunchtime1/lunchtime1.ipynb @@ -0,0 +1,627 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0", + "metadata": {}, + "source": [ + "# Lunch Time Python\n", + "## Lunch 1: Requests\n", + "*Scientific Software Center, Heidelberg University* \n", + "*October 2021* \n", + "*Visit on [GitHub](https://github.com/ssciwr/lunch-time-python)* \n", + "\n", + "Welcome to Lunch Time Python! This is the notebook for [session 1](https://ssciwr.github.io/lunch-time-python/lunchtime1/) - the [requests](https://docs.python-requests.org/en/latest/) library.\n", + "\n", + "The requests library provides an elegant and simple way to send HTTP requests. Connect to the server of your choice, and download websites, stream data or upload content. Requests is [one of the most downloaded python packages](https://pypi.org/project/requests/) with about 14 Million downloads per week, and half a million of repositories that depend on requests as of October 2021." + ] + }, + { + "cell_type": "markdown", + "id": "1", + "metadata": {}, + "source": [ + "# Requests: HTTP for humans\n", + "\n", + "Carry out HTTP/1.1 requests using python! An HTTP request is made by a client to a server. For example, when you open a web page in your browser, your device sends a GET request to the web server hosting the page.\n", + "\n", + "The HTTP request contains three elements in the start line: An HTTP method; the request target; and the HTTP version.\n", + "\n", + "For example, when you open the page [ssc.iwr.uni-heidelberg.de](https://ssc.iwr.uni-heidelberg.de/), this is the message that is sent from the client to the server:\n", + "\n", + "GET https://ssc.iwr.uni-heidelberg.de/ HTTP/1.1\n", + "\n", + "The above request contains the request method, GET, the URI of the target, https://ssc.iwr.uni-heidelberg.de/, and the protocol version, HTTP/1.1.\n", + "\n", + "**These are the [main methods](https://www.tutorialspoint.com/http/http_methods.htm) for HTTP/1.1:**\n", + "1. GET \n", + "The GET method is used to retrieve information from the given server using a given URI. Requests using GET should only retrieve data and should have no other effect on the data.\n", + "\n", + "1. HEAD \n", + "Same as GET, but transfers the status line and header section only.\n", + "\n", + "1. POST \n", + "A POST request is used to send data to the server, for example, customer information, file upload, etc. using HTML forms.\n", + "\n", + "1. PUT \n", + "Replaces all current representations of the target resource with the uploaded content.\n", + "\n", + "1. DELETE \n", + "Removes all current representations of the target resource given by a URI.\n", + "\n", + "1. CONNECT \n", + "Establishes a tunnel to the server identified by a given URI.\n", + "\n", + "1. OPTIONS \n", + "Describes the communication options for the target resource.\n", + "\n", + "1. TRACE \n", + "Performs a message loop-back test along the path to the target resource.\n", + "\n", + "*Let's start requesting! \n", + "To install requests on your local machine, simply use* `python -m pip install requests`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2", + "metadata": {}, + "outputs": [], + "source": [ + "import requests as rq\n", + "import json # to pretty-print JSON responses" + ] + }, + { + "cell_type": "markdown", + "id": "3", + "metadata": {}, + "source": [ + "We will start with the above example - \n", + "GET https://ssc.iwr.uni-heidelberg.de/ HTTP/1.1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4", + "metadata": {}, + "outputs": [], + "source": [ + "targetURI = \"https://ssc.iwr.uni-heidelberg.de/\"\n", + "r = rq.get(url=targetURI)" + ] + }, + { + "cell_type": "markdown", + "id": "5", + "metadata": {}, + "source": [ + "This did something! Let's check the object that we obtained." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6", + "metadata": {}, + "outputs": [], + "source": [ + "r.status_code" + ] + }, + { + "cell_type": "markdown", + "id": "7", + "metadata": {}, + "source": [ + "There are a couple of status codes that are important. You are probably familiar with 404 Not Found; status codes starting with 2 stand for successful requests; status codes starting with 3 stand for redirections; codes starting with 4 stand for client-side errors." + ] + }, + { + "cell_type": "raw", + "id": "8", + "metadata": {}, + "source": [ + "# Tell me some sites that you would like to GET!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10", + "metadata": {}, + "outputs": [], + "source": [ + "targetURI = \"https://en.wikipedia.org/wiki/Monty_Python\"\n", + "r = rq.get(url=targetURI)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11", + "metadata": {}, + "outputs": [], + "source": [ + "r.text" + ] + }, + { + "cell_type": "markdown", + "id": "12", + "metadata": {}, + "source": [ + "## The HTTP response\n", + "The response that you receive from the server contains the status line (as per `r.status_code`), the HTTP headers and a body. \n", + "\n", + "### The response header" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13", + "metadata": {}, + "outputs": [], + "source": [ + "r.headers" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "14", + "metadata": {}, + "outputs": [], + "source": [ + "r.headers[\"content-type\"] # the dictionary is case-insensitive!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "15", + "metadata": {}, + "outputs": [], + "source": [ + "r.encoding # the type of compression that is used" + ] + }, + { + "cell_type": "markdown", + "id": "16", + "metadata": {}, + "source": [ + "The headers contain information in the response headers (like host), the general headers (i.e. information about the connection), and representation headers (ie. content length).\n", + "You can also see what cookies were sent back, and how much time elapsed for the processing of the request." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "17", + "metadata": {}, + "outputs": [], + "source": [ + "r.cookies # the cookies that the server sent back" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "18", + "metadata": {}, + "outputs": [], + "source": [ + "r.elapsed # time between request send and receiving the response" + ] + }, + { + "cell_type": "markdown", + "id": "19", + "metadata": {}, + "source": [ + "### The response body\n", + "Not all requests come with a body (the payload) - if for example you PUT data on a server, the response does not necessarily entail a body. You can look at the request's body using `r.text` (this one looks at textual data) or `r.content` (automatically detects the encoding also for non-text response content)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20", + "metadata": {}, + "outputs": [], + "source": [ + "r.text" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "21", + "metadata": {}, + "outputs": [], + "source": [ + "r.content" + ] + }, + { + "cell_type": "markdown", + "id": "22", + "metadata": {}, + "source": [ + "### Side note\n", + "This doesn't look too pretty - you can use BeautifulSoup (`pip install beautifulsoup4`) to improve it's appearance, but that library can fill up a whole other lunch time." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23", + "metadata": {}, + "outputs": [], + "source": [ + "from bs4 import BeautifulSoup" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "24", + "metadata": {}, + "outputs": [], + "source": [ + "soup = BeautifulSoup(r.content, \"html.parser\")\n", + "print(soup.prettify())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25", + "metadata": {}, + "outputs": [], + "source": [ + "print(soup.text)" + ] + }, + { + "cell_type": "markdown", + "id": "26", + "metadata": {}, + "source": [ + "### Back to requests\n", + "Requests also has a built-in JSON decoder." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "27", + "metadata": {}, + "outputs": [], + "source": [ + "r = rq.get(\"https://api.github.com/events\")\n", + "r.json()" + ] + }, + { + "cell_type": "markdown", + "id": "28", + "metadata": {}, + "source": [ + "# GET request with parameters\n", + "Now let's try to get something useful using requests (apart from that you can use it to crawl the web and download pages!). Let's find out the geographic position of Heidelberg University using [google's geocoding API](https://developers.google.com/maps/documentation/geocoding/overview?_gl=1*oagjnc*_ga*MTk0NjcwNTg2Ni4xNjM1MTUzNjc5*_ga_NRWSTWS78N*MTYzNTE1MzY3OC4xLjAuMTYzNTE1MzY3OC4w). For this, you can generate a trial account on google's website to obtain an API key." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29", + "metadata": {}, + "outputs": [], + "source": [ + "# api-endpoint\n", + "URI = \"https://maps.googleapis.com/maps/api/geocode/json\"\n", + "# API key\n", + "key = \"XXXXXXXXXXXXXXXXXXX\"" + ] + }, + { + "cell_type": "markdown", + "id": "30", + "metadata": {}, + "source": [ + "The better practice is to store the key securely outside of the notebook (and adding the configuration file to .gitignore)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31", + "metadata": {}, + "outputs": [], + "source": [ + "import yaml\n", + "\n", + "with open(\"config.yml\", \"r\") as ymlfile:\n", + " cfg = yaml.safe_load(ymlfile)\n", + "key = cfg[\"google_api\"][\"secret_code\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "32", + "metadata": {}, + "outputs": [], + "source": [ + "# location to geocode\n", + "location = \"university of heidelberg\"\n", + "country = \"germany\"\n", + "# defining a params dict for the parameters to be sent to the API\n", + "parameters = {\"key\": key, \"address\": location, \"country\": country}\n", + "# sending get request and saving the response as response object\n", + "r = rq.get(url=URI, params=parameters)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33", + "metadata": {}, + "outputs": [], + "source": [ + "r.status_code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "34", + "metadata": {}, + "outputs": [], + "source": [ + "# extracting data in json format\n", + "data = r.json()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "35", + "metadata": {}, + "outputs": [], + "source": [ + "print(data)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "36", + "metadata": {}, + "outputs": [], + "source": [ + "# print this a little prettier\n", + "print(json.dumps(data, indent=4, sort_keys=True))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37", + "metadata": {}, + "outputs": [], + "source": [ + "address_out = data[\"results\"][0][\"formatted_address\"]\n", + "# printing the output\n", + "print(\"Address is {}.\".format(address_out))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "38", + "metadata": {}, + "outputs": [], + "source": [ + "latitude = data[\"results\"][0][\"geometry\"][\"location\"][\"lat\"]\n", + "longitude = data[\"results\"][0][\"geometry\"][\"location\"][\"lng\"]\n", + "# printing the output\n", + "print(\"Latitude is {} and longitude {}.\".format(latitude, longitude))" + ] + }, + { + "cell_type": "markdown", + "id": "39", + "metadata": {}, + "source": [ + "# Making a POST request\n", + "Again we need an account for this example. This time, we are using the service [pastebin](https://pastebin.com/). You can send text to this address and it will be publicly visible. It serves as a storage for textual data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "40", + "metadata": {}, + "outputs": [], + "source": [ + "# defining the api-endpoint\n", + "api_endpoint = \"https://pastebin.com/api/api_post.php\"\n", + "# API key\n", + "key = \"XXXXXXXXXXXXXXXXXXXXXXXXXXXX\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "41", + "metadata": {}, + "outputs": [], + "source": [ + "key = cfg[\"pastebin_api\"][\"secret_code\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "42", + "metadata": {}, + "outputs": [], + "source": [ + "# the API option\n", + "option = \"paste\"\n", + "# name/title of your paste\n", + "api_paste_name = \"lunch time python\"\n", + "# syntax highlighting\n", + "api_format = \"python\"\n", + "# this makes a paste public, unlisted or private, public = 0, unlisted = 1, private = 2\n", + "private = 0\n", + "# the text you want to paste, for example, a code snippet in python\n", + "text = \"\"\"\n", + "print(\"Hello, lunch time!\")\n", + "x = 'my lunch'\n", + "y = 'your lunch'\n", + "print('{} {}'.format(x, y))\n", + "\"\"\"\n", + "# data dictionary, to be sent to api\n", + "data = {\n", + " \"api_dev_key\": key,\n", + " \"api_option\": option,\n", + " \"api_paste_code\": text,\n", + " \"api_paste_format\": api_format,\n", + " \"api_paste_private\": private,\n", + "}\n", + "\n", + "# sending post request and saving response as response object\n", + "r = rq.post(url=api_endpoint, data=data)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "43", + "metadata": {}, + "outputs": [], + "source": [ + "r.status_code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "44", + "metadata": {}, + "outputs": [], + "source": [ + "# extracting response text\n", + "pastebin_url = r.text\n", + "print(\"The pastebin URL is {}\".format(pastebin_url))" + ] + }, + { + "cell_type": "markdown", + "id": "45", + "metadata": {}, + "source": [ + "# Making a PUT request\n", + "A PUT request is similar to a POST request, but it is *idempotent*. This means, that in a PUT request the target is replaced. In a POST request, the target appears multiple times. In the above example from pastebin, a POST request generates a new paste, while a PUT request would replace/alter a paste. For the differences between HTTP methods, see [here](https://www.w3schools.com/tags/ref_httpmethods.asp).\n", + "\n", + "For the PUT example, we will use [httpbin](https://httpbin.org/). This is an open service that allows you to test API calls and authetication methods." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "46", + "metadata": {}, + "outputs": [], + "source": [ + "# the api-endpoint\n", + "api_endpoint = \"https://httpbin.org/put\"\n", + "# the data to send - we want to receive a JSON response\n", + "data_type = \"application/json\"\n", + "# storing in a dictionary\n", + "data = {\"accept\": data_type}\n", + "# Making a PUT request\n", + "r = rq.put(url=api_endpoint, data=data)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "47", + "metadata": {}, + "outputs": [], + "source": [ + "# check status code for response received\n", + "print(r)\n", + "print(\"*************************\")\n", + "print(r.status_code)\n", + "print(\"*************************\")\n", + "# print content of request\n", + "print(r.content)\n", + "print(\"*************************\")\n", + "# print recognizing the json response of the request\n", + "print(r.json())\n", + "print(\"*************************\")\n", + "# print this a little prettier\n", + "print(json.dumps(r.json(), indent=4, sort_keys=True))" + ] + }, + { + "cell_type": "markdown", + "id": "48", + "metadata": {}, + "source": [ + "# Advanced topics\n", + "There is so much more you can do with requests - for example:\n", + "- [sessioning](https://docs.python-requests.org/en/latest/user/advanced/#session-objects) which allows you to re-use the connection to the server (through session pooling, leading to faster requests); \n", + "- [SSL certificate verification](https://docs.python-requests.org/en/latest/user/advanced/#ssl-cert-verification) which allows you to validate the requests;\n", + "- [streaming](https://docs.python-requests.org/en/latest/user/advanced/#streaming-requests); \n", + "- and [much more](https://docs.python-requests.org/en/latest/user/advanced/)!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "49", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/lunchtime10/README.md b/lunchtime10/README.md new file mode 100644 index 0000000..56d8cd4 --- /dev/null +++ b/lunchtime10/README.md @@ -0,0 +1,5 @@ +# Lunchtime 10: PyTorch (October 28th) + +[PyTorch](https://pytorch.org/) is a free and open-source machine learning framework that was originally developed by engineers at Facebook, but is now part of the Linux foundation. +The two main features of PyTorch are its tensor computations framework (similar to numpy) with great support for GPU acceleration and their support for neural networks via autograd. + diff --git a/lunchtime10/data/MNIST/raw/t10k-images-idx3-ubyte b/lunchtime10/data/MNIST/raw/t10k-images-idx3-ubyte new file mode 100644 index 0000000..1170b2c Binary files /dev/null and b/lunchtime10/data/MNIST/raw/t10k-images-idx3-ubyte differ diff --git a/lunchtime10/data/MNIST/raw/t10k-images-idx3-ubyte.gz b/lunchtime10/data/MNIST/raw/t10k-images-idx3-ubyte.gz new file mode 100644 index 0000000..5ace8ea Binary files /dev/null and b/lunchtime10/data/MNIST/raw/t10k-images-idx3-ubyte.gz differ diff --git a/lunchtime10/data/MNIST/raw/t10k-labels-idx1-ubyte b/lunchtime10/data/MNIST/raw/t10k-labels-idx1-ubyte new file mode 100644 index 0000000..d1c3a97 Binary files /dev/null and b/lunchtime10/data/MNIST/raw/t10k-labels-idx1-ubyte differ diff --git a/lunchtime10/data/MNIST/raw/t10k-labels-idx1-ubyte.gz b/lunchtime10/data/MNIST/raw/t10k-labels-idx1-ubyte.gz new file mode 100644 index 0000000..a7e1415 Binary files /dev/null and b/lunchtime10/data/MNIST/raw/t10k-labels-idx1-ubyte.gz differ diff --git a/lunchtime10/data/MNIST/raw/train-images-idx3-ubyte b/lunchtime10/data/MNIST/raw/train-images-idx3-ubyte new file mode 100644 index 0000000..bbce276 Binary files /dev/null and b/lunchtime10/data/MNIST/raw/train-images-idx3-ubyte differ diff --git a/lunchtime10/data/MNIST/raw/train-images-idx3-ubyte.gz b/lunchtime10/data/MNIST/raw/train-images-idx3-ubyte.gz new file mode 100644 index 0000000..b50e4b6 Binary files /dev/null and b/lunchtime10/data/MNIST/raw/train-images-idx3-ubyte.gz differ diff --git a/lunchtime10/data/MNIST/raw/train-labels-idx1-ubyte b/lunchtime10/data/MNIST/raw/train-labels-idx1-ubyte new file mode 100644 index 0000000..d6b4c5d Binary files /dev/null and b/lunchtime10/data/MNIST/raw/train-labels-idx1-ubyte differ diff --git a/lunchtime10/data/MNIST/raw/train-labels-idx1-ubyte.gz b/lunchtime10/data/MNIST/raw/train-labels-idx1-ubyte.gz new file mode 100644 index 0000000..707a576 Binary files /dev/null and b/lunchtime10/data/MNIST/raw/train-labels-idx1-ubyte.gz differ diff --git a/lunchtime10/lunchtime10.ipynb b/lunchtime10/lunchtime10.ipynb new file mode 100644 index 0000000..fee436b --- /dev/null +++ b/lunchtime10/lunchtime10.ipynb @@ -0,0 +1,922 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "SEOtINgnsNVw", + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Lunch Time Python\n", + "\n", + "## 28.10.2022: PyTorch\n", + "\n", + "\n", + "[PyTorch](https://pytorch.org/) is a free and open-source machine learning framework that was originally developed by engineers at Facebook, but is now part of the Linux foundation. \n", + "The two main features of PyTorch are its tensor computations framework (similar to numpy) with great support for GPU acceleration and their support for neural networks via autograd.\n", + "\n", + "*Press `Spacebar` to go to the next slide (or `?` to see all navigation shortcuts)*\n", + "\n", + "[Lunch Time Python](https://ssciwr.github.io/lunch-time-python/), [Scientific Software Center](https://ssc.iwr.uni-heidelberg.de), [Heidelberg University](https://www.uni-heidelberg.de/)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 0 Why use PyTorch? " + ] + }, + { + "attachments": { + "image.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "![image.png](attachment:image.png)\n", + "\n", + "Source: Twitter" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "![Framework comparison](https://www.assemblyai.com/blog/content/images/2021/12/Fraction-of-Papers-Using-PyTorch-vs.-TensorFlow.png)\n", + "\n", + "Source: Assembly AI" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "q-K33cgaoY_s", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# first imports\n", + "import torch\n", + "from torch import nn # model\n", + "from torch import optim # optimizer\n", + "from torchvision import datasets, transforms # data and data transforms\n", + "from torch.utils.data import random_split, DataLoader # utilities\n", + "\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "n4094EXrlsmY", + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 1 Tensors" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "earp3FM4lg53", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# directly from data\n", + "data = [[1, 2], [3, 4]]\n", + "x_data = torch.tensor(data)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "T8VPI6Rglwbi", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# from numpy array\n", + "np_array = np.array(data)\n", + "x_np = torch.from_numpy(np_array)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "hd-_NAsDlzxi", + "outputId": "a3929eb9-d37d-417e-c45a-8119a46565c6", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# from another tensor\n", + "x_ones = torch.ones_like(x_data) # retains the properties of x_data\n", + "print(f\"Ones Tensor: \\n {x_ones} \\n\")\n", + "\n", + "x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data\n", + "print(f\"Random Tensor: \\n {x_rand} \\n\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "KOmRiBhYmCef", + "outputId": "11a4a70b-f59e-4160-8dc3-7f80287e546b", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# use tuples to determine tensor dimensions\n", + "shape = (\n", + " 2,\n", + " 3,\n", + ")\n", + "rand_tensor = torch.rand(shape)\n", + "ones_tensor = torch.ones(shape)\n", + "zeros_tensor = torch.zeros(shape)\n", + "\n", + "print(f\"Random Tensor: \\n {rand_tensor} \\n\")\n", + "print(f\"Ones Tensor: \\n {ones_tensor} \\n\")\n", + "print(f\"Zeros Tensor: \\n {zeros_tensor}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ZIe8tIYbmHqW", + "outputId": "82df9f81-d0dc-49f1-b928-6c0f086de49d", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# tensor attributes\n", + "tensor = torch.rand(3, 4)\n", + "\n", + "print(f\"Shape of tensor: {tensor.shape}\")\n", + "print(f\"Datatype of tensor: {tensor.dtype}\")\n", + "print(f\"Device tensor is stored on: {tensor.device}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vkzYCe5qmMAO", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# by default, tensors are created on CPU\n", + "# We move our tensor to the GPU if available\n", + "if torch.cuda.is_available():\n", + " tensor = tensor.to(\"cuda\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "KYUOfiXAmUj3", + "outputId": "14572a5c-62a0-4d8e-d78c-5be716af2c1c", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# indexing like numpy\n", + "tensor = torch.ones(4, 4)\n", + "print(f\"First row: {tensor[0]}\")\n", + "print(f\"First column: {tensor[:, 0]}\")\n", + "print(f\"Last column: {tensor[..., -1]}\")\n", + "tensor[:, 1] = 0\n", + "print(tensor)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "N_PqtgNXmR4T", + "outputId": "694ff4c8-381e-4bc4-a5ff-a43ee0892209", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# joining tensors\n", + "t1 = torch.cat([tensor, tensor, tensor], dim=1)\n", + "print(t1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ppYSqJGnmewf", + "outputId": "7720cf9e-eafa-4d73-aabe-2c5f8a07b359", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value\n", + "y1 = tensor @ tensor.T\n", + "y2 = tensor.matmul(tensor.T)\n", + "\n", + "y3 = torch.rand_like(y1)\n", + "torch.matmul(tensor, tensor.T, out=y3)\n", + "\n", + "\n", + "# This computes the element-wise product. z1, z2, z3 will have the same value\n", + "z1 = tensor * tensor\n", + "z2 = tensor.mul(tensor)\n", + "\n", + "z3 = torch.rand_like(tensor)\n", + "torch.mul(tensor, tensor, out=z3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "PWuFy1EwGs0H", + "outputId": "85d8845a-c691-47e9-8e93-26315c469bf3", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# GPU via CUDA\n", + "# torch.randn(5).cuda()\n", + "# better (more flexible):\n", + "device = torch.device(\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n", + "torch.randn(5).to(device)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CX_XMFuTmlzi", + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 2 Datasets and DataLoaders" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zryt62ZKmyVp", + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "- datasets: stores the samples and their corresponding labels\n", + "- DataLoader: wraps an iterable around the Dataset to enable easy access to the samples" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lZT2YnVMu96w", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# import and split data\n", + "train_data = datasets.MNIST(\n", + " \"data\", train=True, download=True, transform=transforms.ToTensor()\n", + ")\n", + "train, val = random_split(train_data, [55000, 5000])\n", + "train_loader = DataLoader(train, batch_size=32)\n", + "val_loader = DataLoader(val, batch_size=32)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 466 + }, + "id": "6BhOxZy-nHP0", + "outputId": "1e07d5b6-0c13-4366-8da0-54f9f1a07881", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "figure = plt.figure(figsize=(8, 8))\n", + "cols, rows = 3, 3\n", + "for i in range(1, cols * rows + 1):\n", + " sample_idx = torch.randint(len(train_data), size=(1,)).item()\n", + " img, label = train_data[sample_idx]\n", + " figure.add_subplot(rows, cols, i)\n", + " plt.axis(\"off\")\n", + " plt.imshow(img.squeeze(), cmap=\"gray\")\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 319 + }, + "id": "EsmxozRsnXA9", + "outputId": "43701d04-f650-49ec-d29a-18a3cbbf006e", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "train_features, train_labels = next(iter(train_loader))\n", + "print(f\"Feature batch shape: {train_features.size()}\")\n", + "print(f\"Labels batch shape: {train_labels.size()}\")\n", + "img = train_features[0].squeeze()\n", + "label = train_labels[0]\n", + "plt.imshow(img, cmap=\"gray\")\n", + "plt.show()\n", + "print(f\"Label: {label}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bZqXqXe5n33K", + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 3 Coding a neural network" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZQqu_MGNM9mE", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# in theory easy via stateless approach\n", + "# import torch.nn.functional as F\n", + "\n", + "# loss_func = F.cross_entropy\n", + "\n", + "# def model(xb):\n", + "# return xb @ weights + bias\n", + "\n", + "# print(loss_func(model(xb), yb), accuracy(model(xb), yb))\n", + "# gets messy quickly!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "HigdC6uGHfdQ", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# define model via explicit nn.Module class\n", + "class MyModel(nn.Module):\n", + " def __init__(self):\n", + " super().__init__()\n", + " self.l1 = nn.Linear(28 * 28, 64)\n", + " self.l2 = nn.Linear(64, 64)\n", + " self.l3 = nn.Linear(64, 10)\n", + " self.do = nn.Dropout(0.1)\n", + "\n", + " def forward(self, x):\n", + " h1 = nn.functional.relu(self.l1(x))\n", + " h2 = nn.functional.relu(self.l2(h1))\n", + " do = self.do(h2 + h1) # residual connection\n", + " logits = self.l3(do)\n", + " return logits" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OEdxKC9goNxP", + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "[nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) is an ordered container of modules; good for easy and quick networks. No need to specify forward method!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "RM9PGgqQoqeo", + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "# defining model via sequential\n", + "# shorthand, no need for forward method\n", + "model_seq = nn.Sequential(\n", + " nn.Linear(28 * 28, 64),\n", + " nn.ReLU(),\n", + " nn.Linear(64, 64),\n", + " nn.ReLU(),\n", + " nn.Dropout(0.1), # often helps with overfitting\n", + " nn.Linear(64, 10),\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IyB6DSiANTkf", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# move model to GPU/device memory\n", + "model = model_seq.to(device)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A9pgmNmaofHI", + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "Many layers inside a neural network are parameterized, i.e. have associated weights and biases that are optimized during training. Subclassing nn.Module automatically tracks all fields defined inside your model object, and makes all parameters accessible using your model’s parameters() or named_parameters() methods." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "GyC1MegLofzP", + "outputId": "ae35c0ea-3a89-4b1c-dc9f-ce26a628fade", + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(f\"Model structure: {model}\\n\\n\")\n", + "\n", + "for name, param in model.named_parameters():\n", + " print(f\"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \\n\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YONSCevnqZUi", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# define loss function\n", + "loss = nn.CrossEntropyLoss() # softmax + neg. log" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wvofYk8GqSeD", + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 4 Backpropagation via Autograd" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0M40WT7jqEt4", + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "![Computational graph](https://pytorch.org/tutorials/_images/comp-graph.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZyWtRqDYpSaz", + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "In a forward pass, autograd does two things simultaneously:\n", + "\n", + "- run the requested operation to compute a resulting tensor\n", + "\n", + "- maintain the operation’s gradient function in the DAG.\n", + "\n", + "The backward pass kicks off when .backward() is called on the DAG root. autograd then:\n", + "\n", + "- computes the gradients from each .grad_fn,\n", + "\n", + "- accumulates them in the respective tensor’s .grad attribute\n", + "\n", + "- using the chain rule, propagates all the way to the leaf tensors." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ra6xARRFqbbD", + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 5 Optimization of model parameters (training)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DXpp_QiCqi3Z", + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "We define the following hyperparameters for training:\n", + "\n", + "- Number of Epochs - the number times to iterate over the dataset\n", + "- Batch Size - the number of data samples propagated through the network before the parameters are updated (defined in train_Loader)\n", + "- Learning Rate - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JgTXf_DbqlZP", + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "lr = 1e-2\n", + "epochs = 5" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "62yFqEQ-p9qc", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# defining optimizer\n", + "params = model.parameters()\n", + "optimiser = optim.SGD(params, lr=1e-2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "I0YUXfJnq7n9", + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "Inside the training loop, optimization happens in three steps:\n", + "\n", + "- Call **optimizer.zero_grad()** to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.\n", + "- Backpropagate the prediction loss with a call to **loss.backward()**. PyTorch deposits the gradients of the loss w.r.t. each parameter.\n", + "- Once we have our gradients, we call **optimizer.step()** to adjust the parameters by the gradients collected in the backward pass." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "5XwqYKq8ql5G", + "outputId": "2e780ec9-8f22-4c97-a575-eb9f00f5a787", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# define training and validation loop\n", + "# training loop\n", + "for epoch in range(epochs):\n", + " losses = list()\n", + " accuracies = list()\n", + " model.train() # enables dropout/batchnorm\n", + " for batch in train_loader:\n", + " x, y = batch\n", + " batch_size = x.size(0)\n", + " # x: b x 1 x 28 x 28\n", + " x = x.view(batch_size, -1).to(device)\n", + "\n", + " # 5 steps to train network\n", + " # 1 forward\n", + " l = model(x) # l: logits\n", + "\n", + " # 2 compute objective function\n", + " J = loss(l, y.to(device))\n", + "\n", + " # 3 cleaning the gradients (could also call this on optimiser)\n", + " model.zero_grad()\n", + " # optimizer.zero_grad() is equivalent\n", + " # manually: params.grad._zero()\n", + "\n", + " # 4 accumulate the partial derivatives of J wrt params\n", + " J.backward()\n", + " # manually: params.grad.add_(dJ/dparams)\n", + "\n", + " # 5 step in the opposite direction of the gradient\n", + " optimiser.step()\n", + " # could have done manual gradient update:\n", + " # with torch.no_grad():\n", + " # params = params - lr * params.grad\n", + " losses.append(J.item())\n", + " accuracies.append(y.eq(l.detach().argmax(dim=1).cpu()).float().mean())\n", + "\n", + " print(f\"epoch {epoch + 1}\", end=\", \")\n", + " print(f\"training loss: {torch.tensor(losses).mean():.2f}\", end=\", \")\n", + " print(\n", + " f\"training accuracy: {torch.tensor(accuracies).mean():.2f}\"\n", + " ) # print two decimals\n", + "\n", + " # validation loop\n", + " losses = list()\n", + " accuracies = list()\n", + " model.eval() # disables dropout/batchnorm\n", + " for batch in val_loader:\n", + " x, y = batch\n", + " batch_size = x.size(0)\n", + " # x: b x 1 x 28 x 28\n", + " x = x.view(batch_size, -1).to(device)\n", + "\n", + " # 5 steps to train network\n", + " # 1 forward\n", + " with torch.no_grad(): # more efficient, just tensor no graph connected\n", + " l = model(x) # l: logits\n", + "\n", + " # 2 compute objective function\n", + " J = loss(l, y.to(device))\n", + " losses.append(J.item())\n", + " accuracies.append(y.eq(l.detach().argmax(dim=1).cpu()).float().mean())\n", + "\n", + " print(f\"epoch {epoch + 1}\", end=\", \")\n", + " print(f\"validation loss: {torch.tensor(losses).mean():.2f}\", end=\", \")\n", + " print(\n", + " f\"validation accuracy: {torch.tensor(accuracies).mean():.2f}\"\n", + " ) # print two decimals" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BQye3VF6rlMp", + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 6 Store models\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "MQbKGqSVrm6E", + "outputId": "1f3eb8f5-3ce4-46fa-999b-df745dc24319", + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# just save model weights without structure\n", + "torch.save(model.state_dict(), \"model_weights.pth\")\n", + "model.load_state_dict(torch.load(\"model_weights.pth\"))\n", + "model.eval()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "njC3UgJer4Zw", + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "# save whole model\n", + "torch.save(model, \"model.pth\")\n", + "new_model = torch.load(\"model.pth\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cKqbikpVsccx", + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 7 Material sources/more resources:\n", + "- [PyTorch Tutorial Page](https://pytorch.org/tutorials/beginner/basics/)\n", + "- [YouTube Series for PyTorch Lightning](https://www.youtube.com/watch?v=OMDn66kM9Qc)\n", + "- [PyTorch common mistakes video from Alladin Persson](https://www.youtube.com/watch?v=O2wJ3tkc-TU&list=PLhhyoLH6IjfxeoooqP9rhU3HJIAVAJ3Vz&index=14)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "I6HFBPPrrF6A", + "slideshow": { + "slide_type": "skip" + } + }, + "source": [ + "Extra: interactive debugging during training loop" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3DcFOjNJrJNc", + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "# import pdb; pdb.set_trace() (put these commands after logit computation in training loop)\n", + "# interactive debugger, leave via q\n", + "# you can call the following commands interactively during the training loop\n", + "# p l.size()\n", + "# p l[0]\n", + "# p l[0].detach().argmax()\n", + "# p l[0].detach().softmax(dim=0)\n", + "# p [f\"{prob:.2f}\" for prob in l[0].detach().softmax(dim=0)\n", + "# p y[:4]\n", + "# p l.detach().argmax(dim=1)[:4]\n", + "# p y[:4].eq(l.detach().argmax(dim=1)[:4])\n", + "# p y.eq(l.detach().argmax(dim=1)).float().mean()" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "celltoolbar": "Slideshow", + "colab": { + "collapsed_sections": [], + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3.10.6 64-bit ('3.10.6')", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + }, + "vscode": { + "interpreter": { + "hash": "ebbca54691d61843c0a04253fcd790a2bc545e11985b7cc4dd8a14aab0b5083b" + } + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} diff --git a/lunchtime10/lunchtime10.slides.html b/lunchtime10/lunchtime10.slides.html new file mode 100644 index 0000000..4712f24 --- /dev/null +++ b/lunchtime10/lunchtime10.slides.html @@ -0,0 +1,16729 @@ + + + + + + + + + +lunchtime10 slides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + + + + + + + + + + diff --git a/lunchtime10/model.pth b/lunchtime10/model.pth new file mode 100644 index 0000000..7a7a6d2 Binary files /dev/null and b/lunchtime10/model.pth differ diff --git a/lunchtime10/model_weights.pth b/lunchtime10/model_weights.pth new file mode 100644 index 0000000..713da7a Binary files /dev/null and b/lunchtime10/model_weights.pth differ diff --git a/lunchtime11/lunchtime11.ipynb b/lunchtime11/lunchtime11.ipynb new file mode 100644 index 0000000..f2865ac --- /dev/null +++ b/lunchtime11/lunchtime11.ipynb @@ -0,0 +1,949 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "SEOtINgnsNVw", + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Lunch Time Python\n", + "\n", + "## 25.11.2022: spaCy\n", + "\n", + "\n", + "[spaCy](https://spacy.io/) is an open-source natural language processing library written in Python and Cython.\n", + "\n", + "spaCy focuses on production usage and is very fast and efficient. It also supports deep learning workflows through interfacing with [TensorFlow](https://www.tensorflow.org/) or [PyTorch](https://pytorch.org/), as well as the transformer model library [Hugging Face](https://github.com/huggingface).\n", + "\n", + "*Press `Spacebar` to go to the next slide (or `?` to see all navigation shortcuts)*\n", + "\n", + "[Lunch Time Python](https://ssciwr.github.io/lunch-time-python/), [Scientific Software Center](https://ssc.iwr.uni-heidelberg.de), [Heidelberg University](https://www.uni-heidelberg.de/)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 0 What to do with spaCy" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "spaCy is very powerful for text annotation:\n", + "- sentencize and tokenize\n", + "- POS (part-of-speech) and lemma\n", + "- NER (named entity recognition)\n", + "- dependency parsing\n", + "- text classification\n", + "- morphological analysis\n", + "- pattern matching\n", + "- ...\n", + "\n", + "spaCy can also learn new tasks through integration with your machine learning stack. It also provides multi-task learning with pretrained transformers like [BERT](https://arxiv.org/abs/1810.04805). \n", + "(BERT is used in the google search engine.)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "import spacy\n", + "from spacy import displacy\n", + "\n", + "if \"google.colab\" in str(get_ipython()):\n", + " spacy.cli.download(\"en_core_web_md\")\n", + "nlp = spacy.load(\"en_core_web_md\")\n", + "doc = nlp(\n", + " \"The Scientific Software Center offers lunch-time Python - an informal way to learn about new Python libraries.\"\n", + ")\n", + "displacy.render(doc, style=\"dep\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "displacy.render(doc, style=\"ent\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "n4094EXrlsmY", + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 1 Install spaCy\n", + "You can install spaCy using `pip`:\n", + "\n", + "`pip install spacy`\n", + "\n", + "It is also available via `conda-forge`:\n", + "\n", + "`conda install -c conda-forge spacy`\n", + "\n", + "After installing spaCy, you also need to download the language model. For a medium-sized English model, you would do this using\n", + "\n", + "`python -m spacy download en_core_web_md`\n", + "\n", + "The available models are listed on the spaCy website: https://spacy.io/usage/models" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Install spaCy with CUDA support\n", + "\n", + "`pip install -U spacy[cuda]`\n", + "\n", + "You can also explore the [online tool](https://spacy.io/usage) for installation instructions." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "earp3FM4lg53", + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# 2 Let's try it out!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "nlp = spacy.load(\"en_core_web_md\")\n", + "nlp(\"This is lunch-time Python.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "doc = nlp(\"This is lunch-time Python.\")\n", + "print(type(doc))\n", + "[i for i in doc]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "t = doc[0]\n", + "type(t)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "t.ent_id_" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "displacy.render(doc)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "spacy.explain(\"AUX\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "for t in doc:\n", + " print(t.text, t.pos_, t.dep_, t.lemma_)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 3 Pipelines\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "![pipeline](https://spacy.io/pipeline-fde48da9b43661abcdf62ab70a546d71.svg)\n", + "\n", + "[source: spaCy 101]\n", + "\n", + "The capabilities of the processing pipeline dependes on the components, their models and how they were trained." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "nlp.pipe_names" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "nlp.tokenizer" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "text = \"Python is a very popular - maybe even the most popular - programming language among scientific software developers. One of the reasons for this success story is the rich standard library and the rich ecosystem of available (scientific) libraries. To fully leverage this ecosystem, developers need to stay up to date and explore new libraries. Lunch Time Python aims at providing a communication platform between Pythonistas to learn about new libraries in an informal setting. Sessions take roughly 30 minutes, one library is presented per session and the code will be made available afterwards. Come by, enjoy your lunch with us and step up your Python game!\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(text)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "doc = nlp(text)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "for i, sent in enumerate(doc.sents):\n", + " print(i, sent)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "for i, sent in enumerate(doc.sents):\n", + " for j, token in enumerate(sent):\n", + " print(i, j, token.text, token.pos_)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Adding custom components\n", + "You can add custom pipeline components, for example rule-based or phrase matchers, and add the custom attributes to the `doc`, `token` and `span` objects." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Processing batches of texts\n", + "You can process batches of texts using the `nlp.pipe()` command.\n", + "\n", + "`docs = list(nlp.pipe(LOTS_OF_TEXTS))`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Disabling pipeline components\n", + "To achieve higher efficiency, it is possible to disable pipeline components.\n", + "\n", + "`nlp.select_pipes(disable=[\"ner\"])`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 4 Rule-based matching" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# Import the Matcher\n", + "from spacy.matcher import Matcher\n", + "\n", + "# Initialize the matcher with the shared vocab\n", + "matcher = Matcher(nlp.vocab)\n", + "\n", + "# Add the pattern to the matcher\n", + "python_pattern = [{\"TEXT\": \"Python\", \"POS\": \"PROPN\"}]\n", + "matcher.add(\"PYTHON_PATTERN\", [python_pattern])\n", + "\n", + "doc = nlp(text)\n", + "\n", + "# Call the matcher on the doc\n", + "matches = matcher(doc)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# Iterate over the matches\n", + "for match_id, start, end in matches:\n", + " # Get the matched span\n", + " matched_span = doc[start:end]\n", + " print(matched_span.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 5 Phrase matching\n", + "More efficient than the rule-based matching, can be used for finding sequences of words, and also gives you access to the tokens in context.\n", + "\n", + "- Rule-based matching: find patterns in the tokens (token-based matching)\n", + "- Phrase matching: find exact string; useful for names and if there are several options of tokenizing the string" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "doc = nlp(\n", + " \"The Scientific Software Center supports researchers in developing scientific software.\"\n", + ")\n", + "\n", + "# Import the PhraseMatcher and initialize it\n", + "from spacy.matcher import PhraseMatcher\n", + "\n", + "matcher = PhraseMatcher(nlp.vocab)\n", + "# you can also pass in attributes, for example attr=\"LOWER\" or attr=\"POS\"\n", + "\n", + "# Create pattern Doc objects and add them to the matcher\n", + "term = \"Scientific Software Center\"\n", + "pattern = nlp(term)\n", + "# or use pattern = nlp.make_doc(term) to only invoke tokenizer - more efficient!\n", + "matcher.add(\"SSC\", [pattern])\n", + "\n", + "# Call the matcher on the test document and print the result\n", + "matches = matcher(doc)\n", + "\n", + "for match_id, start, end in matches:\n", + " span = doc[start:end]\n", + " print(span.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 6 Word vectors and semantic similarity\n", + "spaCy can compare two objects and predict similarity:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "text1 = \"I like Python.\"\n", + "text2 = \"I like snakes.\"\n", + "\n", + "\n", + "doc1 = nlp(text1)\n", + "doc2 = nlp(text2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(doc1.similarity(doc2))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "token1 = doc1[2]\n", + "token2 = doc2[2]\n", + "print(token1.text, token2.text)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(token1.similarity(token2))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "The similarity score is generated from word vectors." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(token1.vector)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "Similarity can be used to predict similar texts to users, or to flag duplicate content. \n", + "\n", + "But: Similarity always depends on the context." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "text3 = \"I hate snakes.\"\n", + "doc3 = nlp(text3)\n", + "print(doc2.similarity(doc3))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "These come out similar as both statements express a sentiment." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 7 Internal workings\n", + "spaCy stores all strings as hash values and creates a lookup table. This way, a word that occurs several times only needs to be stored once." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "nlp.vocab.strings.add(\"python\")\n", + "python_hash = nlp.vocab.strings[\"python\"]\n", + "python_string = nlp.vocab.strings[python_hash]\n", + "print(python_hash, python_string)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "- lexemes are entries in the vocabulary and contain context-independent information (the text, hash, lexical attributes).\n", + "![data structure](https://course.spacy.io/vocab_stringstore.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 8 Train your own model\n", + "![training_scheme](https://course.spacy.io/training.png)\n", + "[source: spaCy online course]\n", + "\n", + "Training data: Annotated text \n", + "Text: The input text that the model should label \n", + "Label: The label that the model should predict \n", + "Gradient: How to change the weights" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## The training data\n", + "- Examples in context\n", + "- Update existing model: a few hundred to a few thousand examples\n", + "- Train a new category: a few thousand to a million examples\n", + "- Created manually by human annotators\n", + "- Use matcher to semi-automatize\n", + "\n", + "Also need evaluation data." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Create a training corpus" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "from spacy.tokens import Span\n", + "\n", + "nlp = spacy.blank(\"en\")\n", + "\n", + "# Create a Doc with entity spans\n", + "doc1 = nlp(\"iPhone X is coming\")\n", + "doc1.ents = [Span(doc1, 0, 2, label=\"GADGET\")]\n", + "# Create another doc without entity spans\n", + "doc2 = nlp(\"I need a new phone! Any tips?\")\n", + "\n", + "docs = [doc1, doc2] # and so on..." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Configuring the training\n", + "The training `config.cfg` contains the settings for the training, such as configuration of the pipeline and setting of hyperparameters." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "```\n", + "[nlp]\n", + "lang = \"en\"\n", + "pipeline = [\"tok2vec\", \"ner\"]\n", + "batch_size = 1000\n", + "\n", + "[nlp.tokenizer]\n", + "@tokenizers = \"spacy.Tokenizer.v1\"\n", + "\n", + "[components]\n", + "\n", + "[components.ner]\n", + "factory = \"ner\"\n", + "\n", + "[components.ner.model]\n", + "@architectures = \"spacy.TransitionBasedParser.v2\"\n", + "hidden_width = 64\n", + "...\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "Use the [quickstart-widget](https://spacy.io/usage/training#quickstart) to initialize a config." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## That's it! All you need is the training and evaluation data and the config.\n", + "`python -m spacy train ./config.cfg --output ./output --paths.train train.spacy --paths.dev dev.spacy`\n", + "\n", + "After you have completed the training, the model can be loaded and used with `spacy.load()`.\n", + "\n", + "You can also package and deploy your pipeline so others can use it." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## A few notes on training\n", + "- If you update existing models, previously predicted categories can be unlearned (\"catastrophic forgetting\")!\n", + "- Labels need to be consistent and not too specific" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 9 spaCy transformers\n", + "You can load in transformer models using `spacy-transformers`:\n", + "\n", + "`pip install spacy-transformers`\n", + "\n", + "Remember that transformer models work with context, so if you have a list of terms with no context around them (say, titles of blog posts), a transformer model may not be the best choice.\n", + "\n", + "![transformer_pipeline](https://spacy.io/pipeline_transformer-3464b402cf7b19c3dd1efe1c0b4336dd.svg)\n", + "[source: spaCy documentation]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "transformer-based pipelines end in `_trf`:\n", + "\n", + "`python -m spacy download en_core_web_trf`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 10 Further information" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# spaCy demos\n", + "- You can explore spaCy using [online tools](https://explosion.ai/software)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "For example, the [rule-based matcher explorer](https://demos.explosion.ai/matcher) -" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "- or the [spaCy online course](https://course.spacy.io/en/).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# Example use cases\n", + "- [Detection of programming language in stackoverflow posts](https://github.com/koaning/spacy-youtube-material)\n", + "- take a look at [spaCy projects](https://spacy.io/usage/projects)!\n" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "celltoolbar": "Slideshow", + "colab": { + "collapsed_sections": [], + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.1" + }, + "vscode": { + "interpreter": { + "hash": "ebbca54691d61843c0a04253fcd790a2bc545e11985b7cc4dd8a14aab0b5083b" + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/lunchtime11/lunchtime11.slides.html b/lunchtime11/lunchtime11.slides.html new file mode 100644 index 0000000..011de27 --- /dev/null +++ b/lunchtime11/lunchtime11.slides.html @@ -0,0 +1,17243 @@ + + + + + + + + + +lunchtime11 slides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + + + + + + + + + + diff --git a/lunchtime12/lunchtime12.ipynb b/lunchtime12/lunchtime12.ipynb new file mode 100644 index 0000000..c5c67e5 --- /dev/null +++ b/lunchtime12/lunchtime12.ipynb @@ -0,0 +1,855 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "notes" + } + }, + "source": [ + "# Relevant Links:\n", + "\n", + "\n", + "- Docs: https://dash.plotly.com/dash-core-components \n", + "- Style-sheets: https://dash.plotly.com/external-resources\n", + "- callbacks: https://dash.plotly.com/pattern-matching-callbacks\n", + "- dash enrich: https://www.dash-extensions.com/getting_started/enrich\n", + "- examples: https://dash.gallery/Portal/" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Welcome to Dash\n", + "\n", + "### UI elements for both websites and jupyter notebooks\n", + "\n", + "Dash is a comparatively easy and stable way to build standalone or jupyter based widgets/apps.
\n", + "It is based on `flask` and `react.js` this combined with the html elements of dash enables one to write interactive and scalable webpages without knowing JavaScript or HTML.
\n", + "Since it is flask based dash servers can be deployed in the same way as a standalone web service.
\n", + "Originally Dash is designed for web applications but with the help of the `jupyter-dash` library it works very well in notebooks.\n", + "\n", + "\n", + "All components listed here can also be found under https://dash.plotly.com/dash-core-components for a more in depth documentation.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "notes" + } + }, + "source": [ + "Simple text is automatically shown side by side:
\n", + "`html.Div([\"some nested text \", \"some parallel text\"]),`
\n", + "while nested Div's are shown side by side:
\n", + "`html.Div([html.Div(\"some nested text \"), html.Div(\"some parallel text\")])`
\n", + "Unless you use the style property to change the display property to inline-block:
\n", + "`html.Div([html.Div(\"some nested text \", style={\"display\":\"inline-block\"}), html.Div(\"some parallel text\", style={\"display\":\"inline-block\"})],),`\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, + "outputs": [], + "source": [ + "if \"google.colab\" in str(get_ipython()):\n", + " !pip install dash jupyter_dash dash_extensions -qqq\n", + "import plotly.express as px\n", + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "from dash import Dash, dcc, html\n", + "import jupyter_dash" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "The UI is defined as a hierarchy of HTML components inside the app.layout.\n", + "Many dash objects have a `children` attribute that we can put more dash objects into. \n", + "The most commonly used structuring tool is `html.Div`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "basic_layout = html.Div(\n", + " children=[\n", + " html.Div([\"some nested text \", \"some parallel text\"]),\n", + " html.Br(), # this is just a line break\n", + " html.Div([html.Div(\"some nested text \"), html.Div(\"some parallel text\")]),\n", + " html.Br(),\n", + " html.Div(\n", + " [\n", + " html.Div(\n", + " \"some nested text \",\n", + " style={\"display\": \"inline-block\"},\n", + " ),\n", + " html.Div(\n", + " \"some parallel text\",\n", + " style={\"display\": \"inline-block\"},\n", + " ),\n", + " ],\n", + " ),\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# standalone dash server\n", + "app1 = Dash(\"app1\")\n", + "\n", + "app1.layout = basic_layout\n", + "# app1.run_server(debug=True, port=5050, use_reloader=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "This would run a dash server and provide an ip address to open the app in a new browser tab.\n", + "However this does not really work in jupyter notebooks as the cell never actually finishes.\n", + "\n", + "A better solution inside Notebooks is the `jupyter-dash` library. This enables us to run the entire notebook while either calling dash in a standalone mode or `inline`. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + }, + "tags": [ + "nbval-ignore-output" + ] + }, + "outputs": [], + "source": [ + "app2 = jupyter_dash.JupyterDash(\"app2\")\n", + "app2.layout = basic_layout\n", + "app2.run_server(debug=True, port=8071, mode=\"inline\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Styling and components\n", + "With the `style` argument most dash components can be changed according to the css standard.\n", + "\n", + "Most dash components are found under `dcc`, though some are in `html`.
\n", + "With just these we can generate a UI that can't really do anything yet.
\n", + "For data visualization Dash works very well with the `plotly` library.\n", + "\n", + "\n", + "Note: Dash also supports css style sheets. See: https://dash.plotly.com/external-resources\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "external_stylesheets = [\"https://codepen.io/chriddyp/pen/bWLwgP.css\"]\n", + "my_style = {\"width\": \"30%\", \"margin-top\": \"20px\", \"margin-bottom\": \"20px\"}\n", + "\n", + "app3 = jupyter_dash.JupyterDash(\"app3\", external_stylesheets=external_stylesheets)\n", + "\n", + "app3.layout = html.Div(\n", + " [\n", + " \"Choosing and displaying a function:\",\n", + " dcc.Dropdown(\n", + " options=[\"x^2\", \"2x\", \"e^x\"],\n", + " value=\"x^2\",\n", + " style=my_style,\n", + " ),\n", + " html.Div(\n", + " dcc.RangeSlider(min=0, max=20, step=1, value=[5, 15]),\n", + " style={\"width\": \"50%\"},\n", + " ),\n", + " html.Button(\n", + " \"Click_me\",\n", + " style=my_style,\n", + " ),\n", + " dcc.Graph(),\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + }, + "tags": [ + "nbval-ignore-output" + ] + }, + "outputs": [], + "source": [ + "app3.run_server(debug=True, port=8072, mode=\"inline\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Callbacks\n", + "With the use of callbacks we can now add functionality to all our elements.\n", + "\n", + "In this example I want to be able to choose a function type, set the x limits for the calculation and show the graph upon clicking the button.\n", + "\n", + "The Dash callbacks allow us to access and monitor each object variable.
\n", + "For this to work we first need to assign IDs to every object we want to interact with.
\n", + "Many of the Dividers for example don't need a specific ID.
\n", + "\n", + "\n", + "Note that even though the `n_click` value of the button is not used it must still be the first function argument since its the value we want to observe." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "app4 = jupyter_dash.JupyterDash(\"app4\")\n", + "\n", + "app4.layout = html.Div(\n", + " [\n", + " \"Choosing and displaying a function:\",\n", + " dcc.Dropdown(\n", + " options=[\"x^2\", \"2x\", \"e^x\"],\n", + " value=\"x^2\",\n", + " style=my_style,\n", + " id=\"dropdown\",\n", + " ),\n", + " html.Div(\n", + " dcc.RangeSlider(\n", + " min=0,\n", + " max=20,\n", + " step=1,\n", + " value=[5, 15],\n", + " id=\"slider\",\n", + " ),\n", + " style={\"width\": \"50%\"},\n", + " ),\n", + " html.Button(\n", + " \"Click_me\",\n", + " style=my_style,\n", + " id=\"button\",\n", + " ),\n", + " dcc.Graph(id=\"graph\"),\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "Callbacks can have as many inputs and outputs as needed.
\n", + "Any component provided as `Input` will trigger the callback, while `State` can be used to obtain certain variables without triggering the function.
\n", + "Basically every property of the selected object can be interacted with.
\n", + "Eg.: One can give an ID to a `html.Div` and attach to or rewrite its children attribute, thus potentially rewriting the entire app within one callback.\n", + "\n", + "\n", + "Lastly `Output` is used to define which object the return value will be assigned to.\n", + "The order of function and return arguments is dependent on the order in the decorator.\n", + "`Ouput, Ìnput`and `State` must always be used in exactly this order." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "from dash import Input, Output, State\n", + "\n", + "\n", + "@app4.callback(\n", + " Output(\"graph\", \"figure\"),\n", + " Input(\"button\", \"n_clicks\"),\n", + " State(\"dropdown\", \"value\"),\n", + " State(\"slider\", \"value\"),\n", + ")\n", + "def update_graph(n_clicks, dropdown_value, slider_value):\n", + " def _plot_function(x, function_name):\n", + " if function_name == \"x^2\":\n", + " return x**2\n", + " elif function_name == \"2x\":\n", + " return 2 * x\n", + " elif function_name == \"e^x\":\n", + " return np.exp(x)\n", + " else:\n", + " raise ValueError(f\"Unknown function_name: {function_name}\")\n", + "\n", + " x_range = np.linspace(slider_value[0], slider_value[1], 100)\n", + " y = _plot_function(x_range, dropdown_value)\n", + " figure = px.line(x=x_range, y=y, title=dropdown_value)\n", + "\n", + " return figure" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "notes" + } + }, + "outputs": [], + "source": [ + "# what happens if I deselect the functions?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + }, + "tags": [ + "nbval-ignore-output" + ] + }, + "outputs": [], + "source": [ + "app4.run_server(debug=True, port=8073, mode=\"inline\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Dynamically add more widgets\n", + "\n", + "So far we have only considered static IDs and that is fine for many work cases. However sometimes it might be necessary to add widgets inside of callbacks.\n", + "An example for this could be the creation of a new tab with its own button and text on the inside.\n", + "\n", + "\n", + "For these callbacks dash provides three patterns `MATCH` `ALL` and `AllSMALLER`.\n", + "Here I will only go over `MATCH`, for more information see https://dash.plotly.com/pattern-matching-callbacks" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "from dash.dependencies import MATCH\n", + "\n", + "app5 = jupyter_dash.JupyterDash(\"app5\", external_stylesheets=external_stylesheets)\n", + "\n", + "app5.layout = html.Div(\n", + " [\n", + " html.Button(\"Add Tab\", id=\"button_add_tab\"),\n", + " dcc.Tabs(id=\"tabs\", children=[]),\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "@app5.callback(\n", + " Output(\"tabs\", \"children\"),\n", + " Input(\"button_add_tab\", \"n_clicks\"),\n", + " State(\"tabs\", \"children\"),\n", + " prevent_initial_call=True,\n", + ")\n", + "def add_tab(n_clicks, tabs_children):\n", + " new_tab = dcc.Tab(\n", + " label=f\"Tab {n_clicks}\",\n", + " children=[\n", + " html.Div(\n", + " [\n", + " html.Button(\n", + " f\"Button {n_clicks}\",\n", + " id={\"type\": \"button_tab\", \"index\": n_clicks},\n", + " ),\n", + " html.Div(\n", + " f\"Button {n_clicks} clicked 0 times. \",\n", + " id={\"type\": \"div_tab\", \"index\": n_clicks},\n", + " ),\n", + " ]\n", + " )\n", + " ],\n", + " )\n", + " tabs_children.append(new_tab)\n", + " return tabs_children" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "@app5.callback(\n", + " Output({\"type\": \"div_tab\", \"index\": MATCH}, \"children\"),\n", + " Input({\"type\": \"button_tab\", \"index\": MATCH}, \"n_clicks\"),\n", + " State({\"type\": \"button_tab\", \"index\": MATCH}, \"id\"),\n", + " prevent_initial_call=True,\n", + ")\n", + "def tabs_button_click(n_clicks, button_id):\n", + " return f\"Button {button_id['index']} clicked {n_clicks} times. \"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + }, + "tags": [ + "nbval-ignore-output" + ] + }, + "outputs": [], + "source": [ + "app5.run_server(debug=True, port=8074, mode=\"inline\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Using dash inside a class\n", + "\n", + "Unfortunately the decorator style of dashs callbacks we have used so far is very much incompatible with encapsulating the dash app inside a class. \n", + "Normally the app itself is to be used in the whole module.
\n", + "`@self.app.callback()` or smililar things don't work.\n", + "However we can simply refer any function as a callback as seen here:\n", + "\n", + "##### A piece of warning though: \n", + "The [dash website](https://dash.plotly.com/sharing-data-between-callbacks) advices against using a callback to access out of scope data or variables. As far as I can tell this is only relevant when deploying the dash server in a way that multiple user access the same instance and it should not be a problem for local or cloud hosted python environments.\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "class App6:\n", + " def __init__(self):\n", + " external_stylesheets = [\"https://codepen.io/chriddyp/pen/bWLwgP.css\"]\n", + " self.app6 = jupyter_dash.JupyterDash(\n", + " \"app6\", external_stylesheets=external_stylesheets\n", + " )\n", + " my_style = {\n", + " \"width\": \"50%\",\n", + " \"margin-top\": \"20px\",\n", + " \"margin-bottom\": \"20px\",\n", + " }\n", + " self.app6.layout = html.Div(\n", + " [\n", + " \"Choosing and displaying a function:\",\n", + " dcc.Dropdown(\n", + " options={\"x^2\": \"quadratic\", \"2x\": \"linear\", \"e^x\": \"exponential\"},\n", + " value=\"x^2\",\n", + " style=my_style,\n", + " id=\"dropdown\",\n", + " ),\n", + " html.Div(\n", + " dcc.RangeSlider(\n", + " min=0,\n", + " max=20,\n", + " step=1,\n", + " value=[5, 15],\n", + " id=\"slider\",\n", + " ),\n", + " style={\"width\": \"50%\"},\n", + " ),\n", + " html.Button(\n", + " \"Click_me\",\n", + " style=my_style,\n", + " id=\"button\",\n", + " ),\n", + " dcc.Graph(id=\"graph\"),\n", + " ]\n", + " )\n", + " self.app6.callback(\n", + " Output(\"graph\", \"figure\"),\n", + " Input(\"button\", \"n_clicks\"),\n", + " State(\"dropdown\", \"value\"),\n", + " State(\"slider\", \"value\"),\n", + " )(self.update_graph)\n", + "\n", + " def update_graph(self, n_clicks, dropdown_value, slider_value):\n", + " def _plot_function(x, function_name):\n", + " if function_name == \"x^2\":\n", + " return x**2\n", + " elif function_name == \"2x\":\n", + " return 2 * x\n", + " elif function_name == \"e^x\":\n", + " return np.exp(x)\n", + " elif function_name == None:\n", + " return None\n", + " else:\n", + " raise ValueError(\n", + " f\"Unknown function_name: {function_name}, type: {type(function_name)}\"\n", + " )\n", + "\n", + " x_range = np.linspace(slider_value[0], slider_value[1], 100)\n", + " y = _plot_function(x_range, dropdown_value)\n", + " if y is not None:\n", + " figure = px.line(x=x_range, y=y, title=dropdown_value)\n", + " else:\n", + " figure = px.line()\n", + "\n", + " return figure\n", + "\n", + " def run(self, port=8081):\n", + " self.app6.run_server(debug=True, port=port, mode=\"inline\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + }, + "tags": [ + "nbval-ignore-output" + ] + }, + "outputs": [], + "source": [ + "app_6 = App6()\n", + "app_6.run(port=8811)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Additional Dash Components\n", + "\n", + "- [Download button](https://dash.plotly.com/dash-core-components/download)\n", + "- [Upload button](https://dash.plotly.com/dash-core-components/upload)\n", + "- [Data Tables from pandas](https://dash.plotly.com/datatable)\n", + "- [Bio and molecule viewer](https://dash.plotly.com/dash-bio)\n", + "- [Many more]( https://dash.plotly.com/)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "# Extended dash functionality \n", + "\n", + "\n", + "- [Dash Extensions - Enrich](https://www.dash-extensions.com/getting_started/enrich)\n", + "- [Dash json viewer](https://github.com/ghandic/dash_renderjson)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Dash extensions DashBlueprint\n", + "\n", + "Blueprints can be used to create and plan dash layouts and callbacks. Because these blueprints do not call the DashApp directly they can be created in differend scopes, files or libraries and later imported when needed. \n", + "This can help keep the actual code much cleaner." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "from dash_extensions.enrich import DashBlueprint, DashProxy # , html, Output, Input\n", + "\n", + "bp = DashBlueprint()\n", + "\n", + "bp.layout = html.Div(\n", + " [\n", + " \"Choosing and displaying a function:\",\n", + " dcc.Dropdown(\n", + " options=[\"x^2\", \"2x\", \"e^x\"],\n", + " value=\"x^2\",\n", + " style=my_style,\n", + " id=\"dropdown\",\n", + " ),\n", + " html.Div(\n", + " dcc.RangeSlider(\n", + " min=0,\n", + " max=20,\n", + " step=1,\n", + " value=[5, 15],\n", + " id=\"slider\",\n", + " ),\n", + " style={\"width\": \"50%\"},\n", + " ),\n", + " html.Button(\n", + " \"Click_me\",\n", + " style=my_style,\n", + " id=\"button\",\n", + " ),\n", + " dcc.Graph(id=\"graph\"),\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "@bp.callback(\n", + " Output(\"graph\", \"figure\"),\n", + " Input(\"button\", \"n_clicks\"),\n", + " State(\"dropdown\", \"value\"),\n", + " State(\"slider\", \"value\"),\n", + ")\n", + "def update_graph2(n_clicks, dropdown_value, slider_value):\n", + " def _plot_function(x, function_name):\n", + " if function_name == \"x^2\":\n", + " return x**2\n", + " elif function_name == \"2x\":\n", + " return 2 * x\n", + " elif function_name == \"e^x\":\n", + " return np.exp(x)\n", + " else:\n", + " raise ValueError(f\"Unknown function_name: {function_name}\")\n", + "\n", + " x_range = np.linspace(slider_value[0], slider_value[1], 100)\n", + " y = _plot_function(x_range, dropdown_value)\n", + " figure = px.line(x=x_range, y=y, title=dropdown_value)\n", + "\n", + " return figure" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "app7 = DashProxy(blueprint=bp)\n", + "# app7.run_server()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "The problem here is that DashProxy and JupyterDash are not compatible.\n", + "If you run `DashProxy.run_server()` in a notebook the cell will never finish. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# Personal grievances\n", + "\n", + "I have two problems with dash that I have not found a good solution for.\n", + "\n", + "- First is that dash code can get very convoluted and messy.
\n", + " Extracting part of the layout into individual functions can help a lot, but it still mostly looks messy.\n", + "\n", + "- Second is Dash's tendency to swallow error messages, especially inside a notebook.
\n", + "This can be somewhat circumvented by running dash in a browser as that at least provides you with some of the messages. But mostly its just annoying.
\n", + "Also printing and logging doesn't always work either\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "And to finish off small more comprehensive examples:\n", + "- https://dash.gallery/named-entity-recognition/\n", + "- https://dash.gallery/dash-opioid-epidemic/\n", + "- https://dash.gallery/Portal/" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "celltoolbar": "Slideshow", + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.1" + }, + "rise": { + "scroll": true + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/lunchtime12/lunchtime12.slides.html b/lunchtime12/lunchtime12.slides.html new file mode 100644 index 0000000..6659545 --- /dev/null +++ b/lunchtime12/lunchtime12.slides.html @@ -0,0 +1,16428 @@ + + + + + + + + + +lunchtime12 slides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + + + + + + + + + + diff --git a/lunchtime2/README.md b/lunchtime2/README.md new file mode 100644 index 0000000..34a9f70 --- /dev/null +++ b/lunchtime2/README.md @@ -0,0 +1,7 @@ +# Lunchtime #2: SymPy (Nov. 26) + +[SymPy](https://www.sympy.org/) is a Python library for symbolic mathematics. +It can parse mathematical expressions, substitute, differentiate, integrate and evaluate them, as well as solve algebraic and differential equations. +It is also itself written entirely in Python, with a focus keeping the code comprehensible and easily extensible. +There is also a related project [SymEngine](https://github.com/symengine/symengine.py), +which is written in C++ with a focus on speed, which offers a much faster implementation of a subset of SymPy's functionality. \ No newline at end of file diff --git a/lunchtime2/lunchtime2.ipynb b/lunchtime2/lunchtime2.ipynb new file mode 100644 index 0000000..24fd164 --- /dev/null +++ b/lunchtime2/lunchtime2.ipynb @@ -0,0 +1,1158 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Lunch Time Python\n", + "\n", + "## Lunch 2: SymPy\n", + "\n", + "\n", + "\n", + "[SymPy](https://www.sympy.org/) is a Python library for symbolic mathematics.\n", + "It can parse mathematical expressions, substitute, differentiate, integrate and evaluate them, as well as solve algebraic and differential equations.\n", + "It is also itself written entirely in Python, with a focus keeping the code comprehensible and easily extensible.\n", + "There is also a related project [SymEngine](https://github.com/symengine/symengine.py),\n", + "which is written in C++ with a focus on speed, which offers a much faster implementation of a subset of SymPy's functionality.\n", + "\n", + "*Press `Spacebar` to go to the next slide (or `?` to see all navigation shortcuts)*\n", + "\n", + "[Lunch Time Python](https://ssciwr.github.io/lunch-time-python/), [Scientific Software Center](https://ssc.iwr.uni-heidelberg.de), [Heidelberg University](https://www.uni-heidelberg.de/)" + ] + }, + { + "cell_type": "markdown", + "id": "1", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# SymPy Installation\n", + "\n", + "- Anaconda: pre-installed\n", + "- Conda: `conda install sympy`\n", + "- Pip: `python -m pip install sympy`\n", + "\n", + "Or try it out online:\n", + "\n", + "- [live.sympy.org](https://live.sympy.org/)\n", + " - online python shell with SymPy installed\n", + "- [sympygamma.com](https://www.sympygamma.com/input/?i=cos%28x%29)\n", + " - SymPy powered web app similar to [Wolfram|Alpha](https://www.wolframalpha.com/)\n", + "- [this notebook on binder](https://mybinder.org/v2/gh/ssciwr/lunch-time-python.git/HEAD?labpath=lunchtime2%2Flunchtime2.ipynb)\n", + " - interactive version of this notebook on binder\n" + ] + }, + { + "cell_type": "markdown", + "id": "2", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Symbols\n", + "\n", + "- basic building block of expressions\n", + "- `sympy.symbols`\n", + " - takes a string of variable names separated by spaces\n", + " - returns a tuple of Symbol objects, one for each name\n", + " - usually a good idea to use the same name for the Python object, but you don't have to" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "import sympy as sp\n", + "\n", + "# Define a single symbol x named 'x':\n", + "\n", + "x = sp.symbols(\"x\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "type(x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# Define multiple symbols at once:\n", + "\n", + "y, z, t, nu = sp.symbols(\"y z t nu\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "# Can use different name for symbol & object (not recommended!)\n", + "\n", + "confusing = sp.symbols(\"alpha\")\n", + "print(confusing)" + ] + }, + { + "cell_type": "markdown", + "id": "8", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Expressions\n", + "\n", + "- Symbols can be combined with the usual arithmetic operations (`+`, `-`, `*`, `/`)\n", + "- Result is an expression\n", + "- The type of an expression depends on the operation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr = x + y - 2 * z" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "type(expr)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr.args" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "type(expr.args[2])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "14", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr.args[2].args" + ] + }, + { + "cell_type": "markdown", + "id": "15", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Numbers\n", + "\n", + "- will automatically convert to SymPy Numbers when required\n", + "- but take care with math only involving Python numbers!\n", + "- sometimes an explicit conversion is helpful" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "x + 1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "17", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "type((1 + x).args[0])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "18", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "type((123 + x).args[0])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "19", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# be careful with unexpected Python maths!\n", + "x + 1 / 2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "# explicit construction of Rational\n", + "x + sp.Rational(1, 2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "21", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "# explicit construction of Integer\n", + "x + sp.Integer(1) / 2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "# equivalent but now all operations involve a sympy expr:\n", + "(2 * x + 1) / 2" + ] + }, + { + "cell_type": "markdown", + "id": "23", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Functions\n", + "\n", + "- `sympy.functions` contains many built-in analytic functions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "24", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr = sp.sin(x) * sp.exp(y) + sp.gamma(t)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr" + ] + }, + { + "cell_type": "markdown", + "id": "26", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Substitution\n", + "\n", + "- evaluate an expression by substituting numbers for symbols\n", + "- substitute sub-expressions for sub-expressions\n", + "- `.subs` takes a list of `(old, new)` pairs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "27", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "(x + 1).subs(x, 1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "28", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "(x * sp.cos(y)).subs([(x, 1), (y, 0.2)])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "(x * y).subs(x, y**3)" + ] + }, + { + "cell_type": "markdown", + "id": "30", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Parsing\n", + "\n", + "- `sympify` converts a string into a SymPy expression\n", + "- existing symbols are used if the name matches\n", + "- otherwise new symbols will be created as necessary" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr = sp.sympify(\"x\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "32", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "type(expr)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr == x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "34", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "expr = sp.sympify(\"cos(x) + a\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "35", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "36", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr.args[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr.args[1].args[0] == x" + ] + }, + { + "cell_type": "markdown", + "id": "38", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Evaluating\n", + "\n", + "- `evalf` for one off numerical evaluation of an expression\n", + " - can take a dict of `Symbol : number` pairs\n", + "- `lambdify` for efficient repeated evaluation of an expression\n", + " - replaces sympy functions like `sin`, `cos` with numpy equivalents" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "39", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "sp.sqrt(2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "40", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "sp.sqrt(2).evalf()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "41", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr = sp.sqrt(1 + x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "42", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "43", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr.evalf(subs={x: 1})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "44", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "f = sp.lambdify(x, expr)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "45", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "f(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "46", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "a = np.array([1, 2, 3, 5.21])\n", + "f(a)" + ] + }, + { + "cell_type": "markdown", + "id": "47", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Simplification\n", + "\n", + "- `simplify` applies various simplifications to an expression\n", + "- `expand` expands a polynomial to a sum of monomials\n", + "- `factor` looks for common terms to factorize a polynomial\n", + "- `collect` collects common powers of a term\n", + "- as well as [many more](https://docs.sympy.org/latest/tutorial/simplification.html)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "48", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "sp.cos(x) ** 2 + sp.sin(x) ** 2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "49", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "sp.simplify(sp.cos(x) ** 2 + sp.sin(x) ** 2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "50", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "sp.gamma(x) / sp.gamma(x - 3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "51", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "sp.simplify(sp.gamma(x) / sp.gamma(x - 3))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "52", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "(z + t) ** 2 * (x + 2 * y) ** 6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "53", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(sp.expand((z + t) ** 2 * (x + 2 * y) ** 6))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "54", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "sp.factor(\n", + " t**2 * x**6\n", + " + 12 * t**2 * x**5 * y\n", + " + 60 * t**2 * x**4 * y**2\n", + " + 160 * t**2 * x**3 * y**3\n", + " + 240 * t**2 * x**2 * y**4\n", + " + 192 * t**2 * x * y**5\n", + " + 64 * t**2 * y**6\n", + " + 2 * t * x**6 * z\n", + " + 24 * t * x**5 * y * z\n", + " + 120 * t * x**4 * y**2 * z\n", + " + 320 * t * x**3 * y**3 * z\n", + " + 480 * t * x**2 * y**4 * z\n", + " + 384 * t * x * y**5 * z\n", + " + 128 * t * y**6 * z\n", + " + x**6 * z**2\n", + " + 12 * x**5 * y * z**2\n", + " + 60 * x**4 * y**2 * z**2\n", + " + 160 * x**3 * y**3 * z**2\n", + " + 240 * x**2 * y**4 * z**2\n", + " + 192 * x * y**5 * z**2\n", + " + 64 * y**6 * z**2\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "55", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Differentiation\n", + "\n", + "- `sympy.diff` differentiates expressions\n", + " - `diff(f, x)` : $df/dx$\n", + " - `diff(f, x, 3)` : $d^3f/dx^3$\n", + " - `diff(f, x, y, z)` : $d^3f/dxdydz$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "56", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "57", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "sp.diff(expr, x)" + ] + }, + { + "cell_type": "markdown", + "id": "58", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Integration\n", + "\n", + "- `sympy.integrate` integrates expressions\n", + " - indefinite by default, without the integration constant\n", + " - definite if tuple of `(Symbol, lower_limit, upper_limit)` is provided" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "59", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "60", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "sp.integrate(expr, x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "61", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "sp.integrate(expr, (x, 0, 1))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "62", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "sp.diff(sp.integrate(expr, x), x)" + ] + }, + { + "cell_type": "markdown", + "id": "63", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Limits\n", + "\n", + "- `sympy.limit` takes limit of expression at singular point" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "64", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "sinc = sp.sin(x) / x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "65", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "sinc.subs(x, 0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "66", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "sinc.limit(x, 0)" + ] + }, + { + "cell_type": "markdown", + "id": "67", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Series expansion\n", + "\n", + "- `sympy.series` expands an expression around a point" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "68", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "sinc.series(x, 0, 10)" + ] + }, + { + "cell_type": "markdown", + "id": "69", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# SymEngine\n", + "\n", + "- Fast C++ implementation of (some of) SymPy" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "expr = sp.cos(x + 7 * x**2) + sp.sin(x - 2 * x**3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "71", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%time _ = (sp.series(expr, n=100))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "72", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "if \"google.colab\" in str(get_ipython()):\n", + " !pip install symengine -qqq\n", + "import symengine as se\n", + "\n", + "%time _ = se.series(expr, n=100)" + ] + }, + { + "cell_type": "markdown", + "id": "73", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# More at [docs.sympy.org](https://docs.sympy.org/)\n", + "\n", + "- Algebraic equation solving\n", + "- Differential equation solving\n", + "- Matrices\n", + "- Assumptions\n", + "- Printing\n", + "- Code generation\n", + "\n", + "## [ssciwr.github.io/lunch-time-python](https://ssciwr.github.io/lunch-time-python/)" + ] + } + ], + "metadata": { + "celltoolbar": "Slideshow", + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/lunchtime2/lunchtime2.slides.html b/lunchtime2/lunchtime2.slides.html new file mode 100644 index 0000000..4dedd04 --- /dev/null +++ b/lunchtime2/lunchtime2.slides.html @@ -0,0 +1,17663 @@ + + + + + + + + + +lunchtime2 slides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + + + + + + + + + + diff --git a/lunchtime3/README.md b/lunchtime3/README.md new file mode 100644 index 0000000..bb22d0b --- /dev/null +++ b/lunchtime3/README.md @@ -0,0 +1,4 @@ +# Lunchtime #3: Click (Jan. 22) + +[Click](https://click.palletsprojects.com/en/8.0.x/) is a Python package for creating beautiful command line interfaces +in a composable way with as little code as necessary. diff --git a/lunchtime3/click1.py b/lunchtime3/click1.py new file mode 100644 index 0000000..60e7ebf --- /dev/null +++ b/lunchtime3/click1.py @@ -0,0 +1,13 @@ +import click + + +@click.command() +@click.argument("inputfile", type=click.Path(exists=True)) +def stats(inputfile): + """Read data from the given INPUTFILE and calculate useful statistics""" + print(f"Calculating statistics from {inputfile}") + + +# This allows use of this Python file both for imports and for the CLI +if __name__ == "__main__": + stats() diff --git a/lunchtime3/click2.py b/lunchtime3/click2.py new file mode 100644 index 0000000..cf5d5ed --- /dev/null +++ b/lunchtime3/click2.py @@ -0,0 +1,22 @@ +import click + + +@click.command() +@click.option( + "--input", + type=click.Path(exists=True), + default="input.txt", + help="The data file to read from", +) +@click.option( + "--verbose/--no-verbose", type=bool, help="Whether to output intermediate results" +) +def stats(verbose, input): + """Read data and calculate useful statistics""" + if verbose: + print("Started the CLI script") + print(f"Calculating statistics from {input}") + + +if __name__ == "__main__": + stats() diff --git a/lunchtime3/click3.py b/lunchtime3/click3.py new file mode 100644 index 0000000..a646f80 --- /dev/null +++ b/lunchtime3/click3.py @@ -0,0 +1,35 @@ +import click + + +@click.group() +def main(): + pass + + +@click.command() +@click.option( + "--input", + type=click.Path(exists=True), + default="input.txt", + help="The data file to read from", +) +@click.option( + "--verbose/--no-verbose", type=bool, help="Whether to output intermediate results" +) +def stats(verbose, input): + """Read data and calculate useful statistics""" + if verbose: + print("Started the CLI script") + print(f"Calculating statistics from {input}") + + +@click.command() +def preprocess(): + print("Apply preprocessing") + + +main.add_command(stats) +main.add_command(preprocess) + +if __name__ == "__main__": + main() diff --git a/lunchtime3/lunchtime3.ipynb b/lunchtime3/lunchtime3.ipynb new file mode 100644 index 0000000..2bbe991 --- /dev/null +++ b/lunchtime3/lunchtime3.ipynb @@ -0,0 +1,422 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Lunchtime Python #3: Click" + ] + }, + { + "cell_type": "markdown", + "id": "1", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "Presenter: Dominic Kempf, Scientific Software Center" + ] + }, + { + "cell_type": "markdown", + "id": "2", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "[Click](https://click.palletsprojects.com/en/8.0.x/) is a Python package for creating beautiful command line interfaces in a composable way with as little code as necessary." + ] + }, + { + "cell_type": "markdown", + "id": "3", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Command Line Interfaces: Why?" + ] + }, + { + "cell_type": "markdown", + "id": "4", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "Natural evolution of a piece of research software in Python:\n", + "* Starts on a Jupyter notebook playground\n", + "* At some point freezes into a script for long term use\n", + "* (Automated) Application to a wider range of usage scenarios" + ] + }, + { + "cell_type": "markdown", + "id": "5", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "For the last step, good Command Line Interface (CLI) is very helpful. Important aspects:\n", + "\n", + "* Easy addition into existing code\n", + "* Few lines of code to ease maintenance and focus on the scientific part\n", + "* Good help text generation" + ] + }, + { + "cell_type": "markdown", + "id": "6", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## How not to do it\n", + "\n", + "We get access to command line arguments similar to C through `sys.argv`:" + ] + }, + { + "cell_type": "markdown", + "id": "7", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "```python\n", + "# DON'T DO THIS!\n", + "import sys\n", + "\n", + "inputfile = sys.argv[1]\n", + "print(f\"Calculating statistics from {inputfile}\")\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "8", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "Why is this bad?\n", + "* No help text generation\n", + "* No validation of arguments\n", + "* Prone to errors in argument indexing\n", + "* Logic for non-string and optional arguments becomes quickly unwieldy" + ] + }, + { + "cell_type": "markdown", + "id": "9", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "The standard libraries `argparse` and `optparse` are better options, but there is even better." + ] + }, + { + "cell_type": "markdown", + "id": "10", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Click: A beautiful, opinionated approach" + ] + }, + { + "cell_type": "markdown", + "id": "11", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "```python\n", + "import click\n", + "\n", + "@click.command()\n", + "@click.argument(\"inputfile\", type=click.Path(exists=True))\n", + "def stats(inputfile):\n", + " \"\"\"Read data from the given INPUTFILE and calculate useful statistics\"\"\"\n", + " click.echo(f\"Calculating statistics from {inputfile}\")\n", + "\n", + "# This allows use of this Python file both for imports and for the CLI\n", + "if __name__ == \"__main__\":\n", + " stats()\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "12", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Arguments vs. Options" + ] + }, + { + "cell_type": "markdown", + "id": "13", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "Arguments are positional and only too a small extent optional or defaultable. Use them only for absolute essential, self-explanatory input. To customize your script's behavious *options* are the better choice:" + ] + }, + { + "cell_type": "markdown", + "id": "14", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "```python\n", + "import click\n", + "\n", + "@click.command()\n", + "@click.option(\n", + " \"--input\",\n", + " type=click.Path(exists=True),\n", + " default=\"input.txt\",\n", + " help=\"The data file to read from\",\n", + ")\n", + "@click.option(\n", + " \"--verbose/--no-verbose\", type=bool, help=\"Whether to output intermediate results\"\n", + ")\n", + "def stats(verbose, input):\n", + " \"\"\"Read data and calculate useful statistics\"\"\"\n", + " if verbose:\n", + " click.echo(\"Started the CLI script\")\n", + " click.echo(f\"Calculating statistics from {input}\")\n", + "\n", + "if __name__ == \"__main__\":\n", + " stats()\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "15", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Composability of commands\n", + "\n", + "Add subcommand structure by reusing previously defined commands:" + ] + }, + { + "cell_type": "markdown", + "id": "16", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "```python\n", + "@click.group()\n", + "def main():\n", + " pass\n", + "\n", + "@click.command()\n", + "def preprocess():\n", + " click.echo(\"Apply preprocessing\")\n", + "\n", + "main.add_command(stats)\n", + "main.add_command(preprocess)\n", + "\n", + "if __name__ == \"__main__\":\n", + " main()\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "17", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "This mechanism is very powerful: arbitrary nesting, runtime extension e.g. through plugins etc." + ] + }, + { + "cell_type": "markdown", + "id": "18", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Setuptools integration\n", + "\n", + "As software becomes more mature, it is also advisable to package and distribute it as a Python package. Click easily integrates with `setuptools` as well using the entrypoints mechanism:" + ] + }, + { + "cell_type": "markdown", + "id": "19", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "```python\n", + "# In setup.py\n", + "setup(entry_points={\"console_scripts\": [\"myscript = mypackage.mymodule:myclifunction\"]})\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "20", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "```python\n", + "# In setup.cfg\n", + "[options.entry_points]\n", + "console_scripts =\n", + " myscript = mypackage.mymodule:myclifunction\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "21", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Further information about click" + ] + }, + { + "cell_type": "markdown", + "id": "22", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "Today's presentation can be found on the Lunch Time Python website: [https://ssciwr.github.io/lunch-time-python/](https://ssciwr.github.io/lunch-time-python/)" + ] + }, + { + "cell_type": "markdown", + "id": "23", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "For further information, see also the (very good) [Click documentation](https://click.palletsprojects.com) " + ] + }, + { + "cell_type": "markdown", + "id": "24", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "For questions to the Scientific Software Center, please write us to [ssc@iwr.uni-heidelberg.de](mailto:ssc@iwr.uni-heidelberg.de)" + ] + }, + { + "cell_type": "markdown", + "id": "25", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Lunch Time Python Session #4\n", + "\n", + "Library options, please vote now:\n", + "\n", + "* [itertools](https://docs.python.org/3/library/itertools.html) is a standard library that implements a number of iterator building blocks inspired by functional programming languages.\n", + "* [pytest](https://docs.pytest.org/en/6.2.x/) The pytest framework makes it easy to write small tests, yet scales to support complex functional testing for applications and libraries.\n", + "* [matplotlib](https://matplotlib.org/) Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python" + ] + } + ], + "metadata": { + "celltoolbar": "Slideshow", + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/lunchtime3/lunchtime3.slides.html b/lunchtime3/lunchtime3.slides.html new file mode 100644 index 0000000..a9c0ad6 --- /dev/null +++ b/lunchtime3/lunchtime3.slides.html @@ -0,0 +1,15693 @@ + + + + + + + + + +lunchtime3 slides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + + + + + + + + + + diff --git a/lunchtime3/sysargv.py b/lunchtime3/sysargv.py new file mode 100644 index 0000000..c54d697 --- /dev/null +++ b/lunchtime3/sysargv.py @@ -0,0 +1,7 @@ +# DON'T DO THIS! +import sys + +inputfile = sys.argv[1] +outputfile = sys.argv[2] + +print(f"Doing something with {inputfile} and {outputfile}") diff --git a/lunchtime4/README.md b/lunchtime4/README.md new file mode 100644 index 0000000..a8f8cba --- /dev/null +++ b/lunchtime4/README.md @@ -0,0 +1,6 @@ +# Lunchtime #4: pytest (Feb. 25) + +[pytest](https://docs.pytest.org/) is a widely used Python test framework, +which makes it easy to write small and readable tests, +and also offers more advanced features such as fixtures and mocks. +There is also a large ecosystem of plugins providing additional functionality. \ No newline at end of file diff --git a/lunchtime4/lunchtime4.ipynb b/lunchtime4/lunchtime4.ipynb new file mode 100644 index 0000000..a7ed648 --- /dev/null +++ b/lunchtime4/lunchtime4.ipynb @@ -0,0 +1,707 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Lunch Time Python\n", + "\n", + "## Lunch 4: pytest\n", + "\n", + "\n", + "\n", + "[pytest](https://docs.pytest.org/) is a widely used Python test framework, which makes it easy to write small and readable tests, and also offers more advanced features such as fixtures and mocks. There is also a large ecosystem of plugins providing additional functionality.\n", + "\n", + "*Press `Spacebar` to go to the next slide (or `?` to see all navigation shortcuts)*\n", + "\n", + "[Lunch Time Python](https://ssciwr.github.io/lunch-time-python/), [Scientific Software Center](https://ssc.iwr.uni-heidelberg.de), [Heidelberg University](https://www.uni-heidelberg.de/)" + ] + }, + { + "cell_type": "markdown", + "id": "1", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Why write tests?\n", + "\n", + "- ensure correctness\n", + "- maintain correctness\n", + "- find bugs earlier and more easily\n", + "- allow refactoring without fear\n", + "- allow others to contribute without unknowingly breaking stuff\n", + "- can complement documentation as examples of use\n", + "- give others confidence in your code" + ] + }, + { + "cell_type": "markdown", + "id": "2", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# pytest installation\n", + "\n", + "- Conda: `conda install pytest`\n", + "- Pip: `python -m pip install pytest`" + ] + }, + { + "cell_type": "markdown", + "id": "3", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# First steps\n", + "\n", + "## Create a test\n", + "\n", + "1. create a file that starts with `test_`, e.g. `test_math.py`\n", + "2. add a function to it that starts with `test_` and asserts things, e.g.\n", + "```python\n", + "# in file: test_math.py\n", + "def test_add():\n", + " assert 1 + 1 == 2\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "4", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Run the tests\n", + "\n", + "3. run `pytest -v` or `python -m pytest -v`\n", + "\n", + "```bash\n", + "================ test session starts =======================\n", + "platform linux -- Python 3.10.2, pytest-7.0.0, pluggy-1.0.0\n", + "rootdir: /home/liam/test\n", + "plugins: anyio-3.5.0\n", + "collected 1 item \n", + "\n", + "test_math.py::test_add PASSED \n", + " [100%]\n", + "\n", + "====================== 1 passed in 0.00s ====================\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "5", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## What just happened?\n", + "\n", + "- pytest looks for all files that start with `test_`\n", + "- it collects all functions in these files that start with `test_`\n", + "- it runs them all, and reports PASS/FAIL for each assertion in each function\n", + "\n", + "## Some things we didn't do\n", + "\n", + "- import a test library\n", + "- inherit from some base test class\n", + "- use some special `assertEqual` function\n", + "- register our test file or test cases" + ] + }, + { + "cell_type": "markdown", + "id": "6", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Simple pytest test strategy\n", + "\n", + "- for each file `abc.py`, add a `test_abc.py`\n", + "- for each function `foo()` in `abc.py`, add a `test_foo()` to `test_abc.py`\n", + "- inside `test_foo()` assert things involving `foo()` that should be true \n", + "- that's it" + ] + }, + { + "cell_type": "markdown", + "id": "7", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# ipytest" + ] + }, + { + "cell_type": "markdown", + "id": "8", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "## disclaimer\n", + "\n", + "- For demonstration purposes, these tests will be written inside this jupyter notebook\n", + "- We'll use a helper library `ipytest` to call pytest on them from inside the notebook\n", + "- But generally I would recommend putting functions and tests into files and running pytest directly" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "if \"google.colab\" in str(get_ipython()):\n", + " !pip install ipytest -qqq\n", + "import ipytest\n", + "import pytest\n", + "\n", + "ipytest.autoconfig()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%%ipytest -vv\n", + "\n", + "\n", + "def test_math():\n", + " assert 1 + 1 == 2" + ] + }, + { + "cell_type": "markdown", + "id": "11", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Failing tests" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%%ipytest -q\n", + "\n", + "\n", + "def f(x):\n", + " return 2 * x\n", + "\n", + "\n", + "def g(x):\n", + " return f(x) + 3\n", + "\n", + "\n", + "def test_math():\n", + " a = 2\n", + " b = 3\n", + " assert f(a) * g(b) == 36" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "%%ipytest -q\n", + "\n", + "\n", + "def test_list():\n", + " a = [1, 2, 5, 8]\n", + " b = [1, 2, 5, 8]\n", + " assert a == b" + ] + }, + { + "cell_type": "markdown", + "id": "14", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Exceptions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "15", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%%ipytest -q\n", + "\n", + "\n", + "def test_exception():\n", + " my_list = [1, 2, 3]\n", + " with pytest.raises(IndexError):\n", + " my_list[5]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "%%ipytest -q\n", + "\n", + "\n", + "def test_exception():\n", + " my_list = [1, 2, 3]\n", + " with pytest.raises(Exception) as e:\n", + " my_list[5]\n", + " assert e.type == IndexError\n", + " assert \"out of range\" in str(e.value)" + ] + }, + { + "cell_type": "markdown", + "id": "17", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Temporary files\n", + "\n", + "- often need to write to a temporary file in a test\n", + "- simply add `tmp_path` as an argument to your test function\n", + "- pytest will provide a unique temporary path object for each test\n", + "- this is an example of a *fixture*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "18", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%%ipytest -qs\n", + "\n", + "\n", + "def test_write(tmp_path):\n", + " print(tmp_path)\n", + " assert str(tmp_path) != \"\"" + ] + }, + { + "cell_type": "markdown", + "id": "19", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Monkey-patching\n", + "\n", + "- a fixture to temporarily modify an object, dict or environment variable\n", + "- all modifications are undone after the test is finished\n", + "- add `monkeypatch` as an argument to your test function\n", + "- provides various methods, e.g.\n", + " - `monkeypatch.setattr(obj, name, value)`\n", + " - `monkeypatch.setenv(name, value)`\n", + " - `monkeypatch.syspath_prepend(path)`\n", + " - `monkeypatch.chdir(path)`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "%%ipytest -qs\n", + "\n", + "import os\n", + "\n", + "\n", + "def test_env(monkeypatch):\n", + " assert os.getenv(\"TEST_API_KEY\") == None\n", + " monkeypatch.setenv(\"TEST_API_KEY\", \"abc123\")\n", + " assert os.getenv(\"TEST_API_KEY\") == \"abc123\"" + ] + }, + { + "cell_type": "markdown", + "id": "21", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Fixtures\n", + "\n", + "- a way to provide context (e.g. data or environment) to a test\n", + "- test \"requests\" a fixture by declaring it as an argument\n", + "- various built-in fixtures (`tmp_path`, `monkeypatch`, ...)\n", + "- you can create your own with the `@pytest.fixture` decorator\n", + "- for each test function argument, pytest looks for a fixture with the same name\n", + "- fixtures can themselves request other fixtures" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "%%ipytest -vv\n", + "\n", + "# a fixture to provide some data to a test\n", + "@pytest.fixture\n", + "def colours():\n", + " return [\"red\", \"green\", \"blue\"]\n", + "\n", + "\n", + "def test_colours(colours):\n", + " assert colours[0] == \"red\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "%%ipytest -vv\n", + "\n", + "\n", + "@pytest.fixture\n", + "def colours():\n", + " return [\"red\", \"green\", \"blue\"]\n", + "\n", + "\n", + "# a fixture that itself requests another fixture\n", + "@pytest.fixture\n", + "def sorted_colours(colours):\n", + " return sorted(colours)\n", + "\n", + "\n", + "def test_colours(sorted_colours):\n", + " assert sorted_colours[0] == \"blue\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "24", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "%%ipytest -vv\n", + "\n", + "# a fixture that uses monkeypatch to set an environment variable\n", + "@pytest.fixture\n", + "def api_key(monkeypatch):\n", + " monkeypatch.setenv(\"TEST_API_KEY\", \"abc123\")\n", + "\n", + "\n", + "def test_missing_api_key():\n", + " assert os.getenv(\"TEST_API_KEY\") == None\n", + "\n", + "\n", + "def test_api_key(api_key):\n", + " assert os.getenv(\"TEST_API_KEY\") == \"abc123\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "%%ipytest -vv\n", + "\n", + "# a parameterized fixture: test will be repeated for each parameter\n", + "@pytest.fixture(params=[\"red\", \"green\", \"blue\", \"yellow\"])\n", + "def colour(request):\n", + " return request.param\n", + "\n", + "\n", + "def test_colour(colour):\n", + " assert len(colour) >= 3" + ] + }, + { + "cell_type": "markdown", + "id": "26", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Test grouping\n", + "\n", + "- put functions into a class whose name begins with `Test`\n", + "- a class can request a fixture, all member functions then have this fixture" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "27", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%%ipytest -vv\n", + "\n", + "\n", + "class TestMath:\n", + " def test_add(self):\n", + " assert 1 + 1 == 2\n", + "\n", + " def test_mul(self):\n", + " assert 2 * 2 == 4" + ] + }, + { + "cell_type": "markdown", + "id": "28", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Marking tests\n", + "\n", + "- mark tests with attributes using `@pytest.mark` decorator\n", + "- common use cases\n", + " - `skipif` to conditionally skip a test\n", + " - e.g. depending on python version or platform\n", + " - `xfail` to mark a test that is expected to fail\n", + " - e.g. a test that documents a known bug that is not yet fixed\n", + "- can also mark a test class to mark all tests within that class\n", + "- can also create your own custom markers" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "%%ipytest -vv\n", + "\n", + "import sys\n", + "\n", + "\n", + "@pytest.mark.xfail(reason=\"bug from issue #123\")\n", + "def test_add():\n", + " assert 1 + 1 == 3\n", + "\n", + "\n", + "@pytest.mark.skipif(not sys.platform.startswith(\"win\"), reason=\"windows only test\")\n", + "def test_mul():\n", + " assert 2 * 2 == 4" + ] + }, + { + "cell_type": "markdown", + "id": "30", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Parameterizing tests\n", + "\n", + "- can parameterize tests using the `@pytest.mark.parameterize` decorator\n", + "- takes comma-delimeted list of arguments as a string\n", + "- followed by a list of tuples of argument values" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "%%ipytest -vv\n", + "\n", + "\n", + "@pytest.mark.parametrize(\"n\", [1, 2, 3])\n", + "def test_n(n):\n", + " assert n > 0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "32", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "%%ipytest -vv\n", + "\n", + "\n", + "@pytest.mark.parametrize(\"n,n_squared\", [(1, 1), (2, 4), (3, 9)])\n", + "def test_n(n, n_squared):\n", + " assert n * n == n_squared" + ] + }, + { + "cell_type": "markdown", + "id": "33", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Summary\n", + "\n", + "- pytest is very easy to get started with and use\n", + "- just writing test functions with assertions already provides a lot of value\n", + "- fixtures allow you to provide context to your test functions\n", + "- parameterizing fixtures and/or tests can turn a single test into many test cases\n", + "- (many) more features at [docs.pytest.org](https://docs.pytest.org/)" + ] + } + ], + "metadata": { + "celltoolbar": "Slideshow", + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/lunchtime4/lunchtime4.slides.html b/lunchtime4/lunchtime4.slides.html new file mode 100644 index 0000000..56fea87 --- /dev/null +++ b/lunchtime4/lunchtime4.slides.html @@ -0,0 +1,16431 @@ + + + + + + + + + +lunchtime4 slides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + + + + + + + + + + diff --git a/lunchtime5/README.md b/lunchtime5/README.md new file mode 100644 index 0000000..a71eac0 --- /dev/null +++ b/lunchtime5/README.md @@ -0,0 +1,5 @@ +# Lunchtime 5: pillow (March 25th) + +[Pillow](https://pillow.readthedocs.io/en/stable/) is the friendly fork of the +Python Imaging Library (PIL). It provides provides extensive file format support, +an efficient internal representation, and fairly powerful image processing capabilities. diff --git a/lunchtime5/final.jpg b/lunchtime5/final.jpg new file mode 100644 index 0000000..683f129 Binary files /dev/null and b/lunchtime5/final.jpg differ diff --git a/lunchtime5/lunchtime5.ipynb b/lunchtime5/lunchtime5.ipynb new file mode 100644 index 0000000..8e3ffaa --- /dev/null +++ b/lunchtime5/lunchtime5.ipynb @@ -0,0 +1,584 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Lunchtime #5: Pillow" + ] + }, + { + "cell_type": "markdown", + "id": "1", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "\n", + "## Why using Python for image processing?\n", + "\n", + "* Easy automation of image processing (compared to GUIs)\n", + "* High-Level interfaces usable without very deep knowledge of image formats\n", + "* Easy integration with other Python-based tools for e.g.\n", + " * Image Analysis\n", + " * Web Scraping\n", + " * ML/AI" + ] + }, + { + "cell_type": "markdown", + "id": "2", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Pillow is *the friendly fork* of PIL, the Python Imaging Library:\n", + "\n", + "* PIL is/was the state-of-the-art library for image processing in Python\n", + "* Had several issues caused by the project maintenance:\n", + " * Compatibility issues with standard installation procedures\n", + " * Missing open community work for issues, contributions etc.\n", + " * Sustainability issues due to missing Continuous Integration\n", + "* Pillow has stepped in, PIL had its last release in 2009" + ] + }, + { + "cell_type": "markdown", + "id": "3", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## The Basics: Loading an image\n", + "We import `Image` from the `PIL` package:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "from PIL import Image" + ] + }, + { + "cell_type": "markdown", + "id": "5", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "We can open an image from disk by using `open`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "if \"google.colab\" in str(get_ipython()):\n", + " !wget https://ssciwr.github.io/lunch-time-python/lunchtime5/thingstaette.png -q\n", + "img = Image.open(\"thingstaette.png\")" + ] + }, + { + "cell_type": "markdown", + "id": "7", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "The image is represented using the `Image` class from PIL (or one of its specialized subclasses). Images can be created by loading from file, from other images or programmatically." + ] + }, + { + "cell_type": "markdown", + "id": "8", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "The `img` object can be queried for a number of metadata fields of the image:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "img.format" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "img.size" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "img.width, img.height" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "img.mode" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "img.getbands()" + ] + }, + { + "cell_type": "markdown", + "id": "14", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Visualizing images in Jupyter" + ] + }, + { + "cell_type": "markdown", + "id": "15", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "To display the image directly in Jupyter notebooks, we can use the `IPython`'s rich display system. Alternatively, `img.show()` can open the image in an external viewer." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "img" + ] + }, + { + "cell_type": "markdown", + "id": "17", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Modifying images\n", + "\n", + "Image modifications operate on one image and return a new image which is a copy of the original with the applied modifications. This is common (good) practice in object-oriented programming." + ] + }, + { + "cell_type": "markdown", + "id": "18", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Cropping" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "19", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "cropped = img.crop([330, 100, 650, 550])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "cropped" + ] + }, + { + "cell_type": "markdown", + "id": "21", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Resizing" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "resized = cropped.reduce(2)\n", + "# resized = cropped.resize((150, 100))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "resized" + ] + }, + { + "cell_type": "markdown", + "id": "24", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Transforming" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "rotated = resized.rotate(180)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "26", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "rotated" + ] + }, + { + "cell_type": "markdown", + "id": "27", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Applying filters" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "28", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "from PIL import ImageFilter" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "blurred = rotated.filter(ImageFilter.BLUR)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "30", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "blurred" + ] + }, + { + "cell_type": "markdown", + "id": "31", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Merging\n", + "\n", + "Merging is done as an in-place operation on the `Image` object:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "32", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "img.paste(rotated, (100, 100))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "img" + ] + }, + { + "cell_type": "markdown", + "id": "34", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Saving Images\n", + "\n", + "After successful transformation, we can save the result.\n", + "\n", + "* Output format deduced from given file extension\n", + "* Alternatively passed explicitly\n", + "* Format conversion implemented by `PIL`\n", + "* Some formats require certain conditions on the data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "35", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "converted = img.convert(\"RGB\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "36", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "converted.getbands()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "converted.save(\"final.jpg\")" + ] + }, + { + "cell_type": "markdown", + "id": "38", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Further information\n", + "\n", + "Pillow has much more functionality than shown here today, check the examples and references in its documentation:\n", + "\n", + "[https://pillow.readthedocs.io](https://pillow.readthedocs.io)\n" + ] + }, + { + "cell_type": "markdown", + "id": "39", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "Thanks for joining!" + ] + } + ], + "metadata": { + "celltoolbar": "Slideshow", + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/lunchtime5/lunchtime5.slides.html b/lunchtime5/lunchtime5.slides.html new file mode 100644 index 0000000..8bcdcab --- /dev/null +++ b/lunchtime5/lunchtime5.slides.html @@ -0,0 +1,16194 @@ + + + + + + + + + +lunchtime5 slides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + + + + + + + + + + diff --git a/lunchtime5/thingstaette.png b/lunchtime5/thingstaette.png new file mode 100644 index 0000000..06eea95 Binary files /dev/null and b/lunchtime5/thingstaette.png differ diff --git a/lunchtime6/README.md b/lunchtime6/README.md new file mode 100644 index 0000000..963b167 --- /dev/null +++ b/lunchtime6/README.md @@ -0,0 +1,6 @@ +# Lunchtime #6: numba + +[numba](https://numba.pydata.org/) is a just-in-time (JIT) compiler for Python. +With a few simple annotations, array-oriented and math-heavy Python code can be +just-in-time optimized to performance similar as C, C++ and Fortran, +without having to switch languages or Python interpreters. diff --git a/lunchtime6/build/lib.linux-x86_64-cpython-310/pybind11_7993cdc.cpython-310-x86_64-linux-gnu.so b/lunchtime6/build/lib.linux-x86_64-cpython-310/pybind11_7993cdc.cpython-310-x86_64-linux-gnu.so new file mode 100755 index 0000000..0e9046c Binary files /dev/null and b/lunchtime6/build/lib.linux-x86_64-cpython-310/pybind11_7993cdc.cpython-310-x86_64-linux-gnu.so differ diff --git a/lunchtime6/lunchtime6.ipynb b/lunchtime6/lunchtime6.ipynb new file mode 100644 index 0000000..0264471 --- /dev/null +++ b/lunchtime6/lunchtime6.ipynb @@ -0,0 +1,865 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Lunch Time Python\n", + "\n", + "## Lunch 6: numba\n", + "\n", + "\n", + "\n", + "[numba](https://numba.pydata.org/) is a just-in-time (JIT) compiler for Python. With a few simple annotations, array-oriented and math-heavy Python code can be just-in-time optimized to performance similar as C, C++ and Fortran, without having to switch languages or Python interpreters.\n", + "\n", + "*Press `Spacebar` to go to the next slide (or `?` to see all navigation shortcuts)*\n", + "\n", + "[Lunch Time Python](https://ssciwr.github.io/lunch-time-python/), [Scientific Software Center](https://ssc.iwr.uni-heidelberg.de), [Heidelberg University](https://www.uni-heidelberg.de/)" + ] + }, + { + "cell_type": "markdown", + "id": "1", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Motivation\n", + "\n", + "- Many reasons to use Python, but performance not one of them\n", + "- What to do when a Python function is too slow?\n", + "- Ideally, find a library (e.g. numpy) with an equivalent function\n", + "- Otherwise:\n", + " - use PyPy instead of CPython (if all your libraries are available)\n", + " - write a fortan function and compile with f2py or fortranmagic\n", + " - write a C function and compile with Cython\n", + " - write a C++ function and compile using pybind11 or ipybind\n", + " - magically make your slow Python function faster (numba)" + ] + }, + { + "cell_type": "markdown", + "id": "2", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# numba installation\n", + "\n", + "- Conda: `conda install numba`\n", + "- Pip: `python -m pip install numba`" + ] + }, + { + "cell_type": "markdown", + "id": "3", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# Vector reduction example\n", + "\n", + "Toy example: implement a vector reduction operation:\n", + "\n", + "r(x,y) = $ \\sum_i \\cos(x_i) \\sin(y_i) $\n", + "\n", + "Some random vectors to benchmark our functions:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "x = np.random.uniform(low=-1, high=1, size=5000000)\n", + "y = np.random.uniform(low=-1, high=1, size=5000000)" + ] + }, + { + "cell_type": "markdown", + "id": "5", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# Python" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "import math\n", + "\n", + "\n", + "def r_python(x_vec, y_vec):\n", + " s = 0\n", + " for x, y in zip(x_vec, y_vec):\n", + " s += math.cos(x) * math.sin(y)\n", + " return s" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "r_python(x, y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%timeit r_python(x,y)" + ] + }, + { + "cell_type": "markdown", + "id": "9", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# numpy" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "def r_numpy(x_vec, y_vec):\n", + " return np.dot(np.cos(x_vec), np.sin(y_vec))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "r_numpy(x, y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%timeit r_numpy(x,y)" + ] + }, + { + "cell_type": "markdown", + "id": "13", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# Cython" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "14", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "# pip install cython\n", + "%load_ext cython" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "15", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%%cython\n", + "\n", + "import math\n", + "\n", + "def r_cython(x_vec, y_vec):\n", + " s = 0\n", + " for x,y in zip(x_vec, y_vec):\n", + " s += math.cos(x) * math.sin(y)\n", + " return s" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "r_cython(x, y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "17", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%timeit r_cython(x,y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "18", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "%%cython\n", + "\n", + "import math\n", + "# use C math functions\n", + "from libc.math cimport sin, cos\n", + "\n", + "# use C types instead of Python types\n", + "def r_cython(double[:] x_vec, double[:] y_vec):\n", + " cdef double s = 0\n", + " cdef int i\n", + " for i in range(len(x_vec)):\n", + " s += cos(x_vec[i])*sin(y_vec[i])\n", + " return s" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "19", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "r_cython(x, y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%timeit r_cython(x,y)" + ] + }, + { + "cell_type": "markdown", + "id": "21", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# Fortran" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "if \"google.colab\" in str(get_ipython()):\n", + " !pip install fortran-magic -qqq\n", + "%load_ext fortranmagic" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%%fortran\n", + "\n", + "subroutine r_fortran(x_vec, y_vec, res)\n", + " real, intent(in) :: x_vec(:), y_vec(:)\n", + " real, intent(out) :: res\n", + " integer :: i, n\n", + " n = size(x_vec)\n", + " res = 0\n", + " do i=1,n\n", + " res = res + cos(x_vec(i))*sin(y_vec(i))\n", + " enddo\n", + "endsubroutine r_fortran" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "24", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "r_fortran(x, y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%timeit r_fortran(x,y)" + ] + }, + { + "cell_type": "markdown", + "id": "26", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# C++ / pybind11" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "27", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "if \"google.colab\" in str(get_ipython()):\n", + " !pip install git+https://github.com/aldanor/ipybind.git -qqq\n", + "%load_ext ipybind" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "28", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%%pybind11\n", + "\n", + "#include \n", + "#include \n", + "PYBIND11_PLUGIN(example) {\n", + " py::module m(\"example\");\n", + " m.def(\"r_pybind\", [](const py::array_t& x, const py::array_t& y) {\n", + " double sum{0};\n", + " auto rx{x.unchecked<1>()};\n", + " auto ry{y.unchecked<1>()};\n", + " for (py::ssize_t i = 0; i < rx.shape(0); i++){\n", + " sum += std::cos(rx[i])*std::sin(ry[i]);\n", + " }\n", + " return sum;\n", + " });\n", + " return m.ptr();\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "r_pybind(x, y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "30", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%timeit r_pybind(x, y)" + ] + }, + { + "cell_type": "markdown", + "id": "31", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# numba" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "32", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "from numba import jit\n", + "\n", + "\n", + "@jit\n", + "def r_numba(x_vec, y_vec):\n", + " s = 0\n", + " for x, y in zip(x_vec, y_vec):\n", + " s += math.cos(x) * math.sin(y)\n", + " return s" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "r_numba(x, y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "34", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "# pure python with numba JIT\n", + "%timeit r_numba(x,y)" + ] + }, + { + "cell_type": "markdown", + "id": "35", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Numba compilation\n", + "\n", + "Two compilation modes\n", + "\n", + "- `nopython` mode (default)\n", + " - Fast because it doesn't access the Python C API\n", + " - Needs to be able to infer the native (C) types of all values\n", + "- `object` mode (fallback)\n", + " - Slow because it uses Python objects and the Python C API\n", + " - Only used if `nopython` mode is not possible\n", + " - To raise an error instead of falling back to this, set `nopython=True` or use `@njit`" + ] + }, + { + "cell_type": "markdown", + "id": "36", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Numba function signatures\n", + "\n", + "You can optionally explicitly specify the function signature. Use cases:\n", + "\n", + "- you want the function to be compiled when it is defined rather than when it is first called\n", + "- you need fine-grained control over types (e.g. if you want 32-bit floats)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "from numba import float32\n", + "\n", + "\n", + "@jit(float32(float32, float32))\n", + "def sum(a, b):\n", + " return a + b" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "38", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "sum(1, 0.99999999)" + ] + }, + { + "cell_type": "markdown", + "id": "39", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Numba options\n", + "\n", + "- `nopython=True` disable Object mode fallback\n", + "- `nogil=True` release the Python Global Interpreter Lock (GIL)\n", + "- `cache=True` cache the compiled funtions on disk\n", + "- `parallel=True` enable automatic parallelization" + ] + }, + { + "cell_type": "markdown", + "id": "40", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# Parallelization\n", + "\n", + "- set `parallel=True` option to enable\n", + "- use `prange` to explicitly parallelize a loop over a `range`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "41", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "from numba import jit, prange\n", + "\n", + "\n", + "@jit(parallel=True)\n", + "def r_numba(x_vec, y_vec):\n", + " s = 0\n", + " for i in prange(len(x_vec)):\n", + " s += math.cos(x[i]) * math.sin(y[i])\n", + " return s" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "42", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "r_numba(x, y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "43", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%timeit r_numba(x,y)" + ] + }, + { + "cell_type": "markdown", + "id": "44", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# NumPy universal functions\n", + "\n", + "- a numpy `ufunc` is a function that operates on scalars\n", + "- can create one using `@numba.vectorize` and use it like built-in numpy ufuncs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "45", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "from numba import vectorize, float64\n", + "\n", + "\n", + "@vectorize([float64(float64, float64)], target=\"parallel\")\n", + "def r(x, y):\n", + " return np.cos(x) * np.sin(y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "46", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "r(2, 3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "47", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "r(x, y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "48", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "np.sum(r(x, y))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "49", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%timeit np.sum(r(x,y))" + ] + }, + { + "cell_type": "markdown", + "id": "50", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Advanced features\n", + "\n", + "- Ahead of Time (AoT) compilation\n", + " - the compiled module only depends on NumPy\n", + "- Flexible specializations\n", + " - `@generated_jit` decorator for compile-time logic, e.g. type specializations\n", + "- Stencil\n", + " - `@stencil` decorator for creating a stencil to apply to an array\n", + "- C callbacks\n", + " - `@cfunc` decorator to generate a C-callback (e.g. to pass to scipy.integrate)\n", + "- CUDA support\n", + " - compile CUDA kernels to run on a GPU\n", + "- see [numba.readthedocs.io](https://numba.readthedocs.io/) for more" + ] + } + ], + "metadata": { + "celltoolbar": "Slideshow", + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/lunchtime6/lunchtime6.slides.html b/lunchtime6/lunchtime6.slides.html new file mode 100644 index 0000000..2ac998d --- /dev/null +++ b/lunchtime6/lunchtime6.slides.html @@ -0,0 +1,16767 @@ + + + + + + + + + +lunchtime6 slides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + + + + + + + + + + diff --git a/lunchtime7/lunchtime7.ipynb b/lunchtime7/lunchtime7.ipynb new file mode 100644 index 0000000..4cf5f66 --- /dev/null +++ b/lunchtime7/lunchtime7.ipynb @@ -0,0 +1,836 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Lunch Time Python\n", + "\n", + "## Lunch 7: matplotlib\n", + "\n", + "\n", + "\n", + "[matplotlib](https://matplotlib.org/) is a plotting library for Python and the NumPy library. It is easy to use and can be used to generate scatter or bar plots, density maps and even 3D plots in publication quality.\n", + "\n", + "*Press `Spacebar` to go to the next slide (or `?` to see all navigation shortcuts)*\n", + "\n", + "[Lunch Time Python](https://ssciwr.github.io/lunch-time-python/), [Scientific Software Center](https://ssc.iwr.uni-heidelberg.de), [Heidelberg University](https://www.uni-heidelberg.de/)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Advantages of matplotlib:\n", + "\n", + "- matplotlib is mostly used in conjunction with [pyplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.html) - a matplotlib module - that provides an easy-to-use interface\n", + "- Very versatile, works with many types of data\n", + "- Directly plot NumPy functions and arrays\n", + "- Many export options\n", + "- Customizable \n", + "- Can be used with additional toolkits that extend the functionality, like [seaborn](https://seaborn.pydata.org/)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## matplotlib installation\n", + "\n", + "Available via pip: \n", + "`python -m pip install -U matplotlib` \n", + "Or install via conda: \n", + "`conda install matplotlib`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Basic plots\n", + "Plot the sin and cos over a range of angles. For this, we also need numpy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "xvals = np.arange(0, 2 * np.pi, 0.1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "plt.plot(xvals, np.sin(xvals))\n", + "plt.plot(xvals, np.cos(xvals))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "The default settions already look quite nice!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Running inside a script\n", + "Or more generally, if you want to suppress the output (return) of the `plt()` function." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "plt.plot(xvals, np.sin(xvals))\n", + "plt.plot(xvals, np.cos(xvals))\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "`plt.show()` closes the plot; if you plot using a script and not a notebook, place one `plt.show()` command at the end of your script." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Running inside a notebook: static images\n", + "You can plot static images inside your notebook using the `%matplotlib inline` magic: You only need to run this once. It is not always necessary to put this, but it makes it clear which Matplotlib backend should be used." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%matplotlib inline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "plt.plot(xvals, np.sin(xvals))\n", + "plt.plot(xvals, np.cos(xvals))\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Scatter plot" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "xvals = np.random.randint(10, size=10)\n", + "yvals = np.random.randint(10, size=10)\n", + "plt.scatter(xvals, yvals)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Bar plot" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "xvals = np.linspace(1, 10, 10)\n", + "yvals = np.random.randint(10, size=10)\n", + "plt.bar(xvals, yvals)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Customizing plots" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "plt.bar(xvals, yvals, label=\"Random series\")\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Customizing plots" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "plt.bar(xvals, yvals, label=\"Random series\")\n", + "plt.legend(fontsize=16, loc=\"upper right\")\n", + "plt.xlabel(\"Integer\", fontsize=18)\n", + "plt.ylabel(\"Magnitude\", fontsize=22, color=\"red\")\n", + "plt.title(\"My custom plot\", fontsize=22)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Customizing plots" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "xvals = np.arange(0, 2 * np.pi, 0.1)\n", + "plt.plot(xvals, np.sin(xvals), marker=\"x\", markevery=10, color=\"blue\")\n", + "plt.plot(xvals, np.cos(xvals), marker=\"<\", color=\"black\", alpha=0.5)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "You can set x- and y-limits, annotate points on the plot, change the axis position and intersection, add a grid in the background, ... \n", + "A list of available markers is found [here](https://matplotlib.org/2.0.2/api/markers_api.html) and named colors [here](https://matplotlib.org/stable/gallery/color/named_colors.html)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Figure class: Advanced plots with the artist layer\n", + "A `Figure` in matplotlib is the whole plot (or window in the user interface) and can contain multiple plots. By accessing the **Artist layer** (\"object-based plotting\"), you can access more customizing options than with the basic `plt.xxx` **Scripting layer** (\"procedural plotting\") (see https://matplotlib.org/1.5.1/faq/usage_faq.html#parts-of-a-figure).\n", + "\n", + "This also allows you to include multiple plots in one Figure.\n", + "\n", + "" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "fig = plt.figure(figsize=(12, 10)) # width = 12 inches and height = 10 inches\n", + "# create one axes object\n", + "ax1 = fig.add_subplot(211) # (2, 1, 1) no of rows, no of columns, no of plots\n", + "ax1.plot(xvals, np.sin(xvals), marker=\"x\", color=\"blue\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "It is also possible to use `add_axes()` instead of `add_subplot()`, but not recommended as with the latter, matplotlib takes care of the exact position of the axes in the figure." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Several subplots\n", + "Using `add_subplot()`, we can add one `axes` object at a time:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "fig = plt.figure(figsize=(12, 10))\n", + "ax1 = fig.add_subplot(221)\n", + "ax1.plot(xvals, np.sin(xvals), marker=\"x\", color=\"blue\")\n", + "ax2 = fig.add_subplot(222)\n", + "ax2.plot(xvals, np.cos(xvals), marker=\"x\", color=\"blue\")\n", + "ax3 = fig.add_subplot(223)\n", + "ax3.plot(xvals, np.tan(xvals), marker=\"x\", color=\"blue\")\n", + "ax4 = fig.add_subplot(224)\n", + "ax4.plot(xvals, np.tanh(xvals), marker=\"x\", color=\"blue\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## A more practical way to add multiple subplots" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "fig, ax = plt.subplots(figsize=(12, 10), nrows=2, ncols=2)\n", + "ax[0, 0].plot(xvals, np.sin(xvals), marker=\"x\", color=\"blue\")\n", + "ax[0, 1].plot(xvals, np.cos(xvals), marker=\"x\", color=\"blue\")\n", + "ax[1, 0].plot(xvals, np.tan(xvals), marker=\"x\", color=\"blue\")\n", + "ax[1, 1].plot(xvals, np.tanh(xvals), marker=\"x\", color=\"blue\")\n", + "plt.savefig(\"my_figure.pdf\", bbox_inches=\"tight\")\n", + "plt.savefig(\"my_figure.jpg\", dpi=300, bbox_inches=\"tight\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 2D plots" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "def myfunc(x, y):\n", + " return np.sin(np.sqrt(5) + x) * y\n", + "\n", + "\n", + "mf = 16\n", + "# the x and y values need to be spanned on a 2D mesh\n", + "num1 = np.arange(-5, 5, 0.1)\n", + "num2 = np.arange(-5, 5, 0.1)\n", + "X, Y = np.meshgrid(num1, num2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(X[0])\n", + "print(Y[0])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "fig, ax = plt.subplots(figsize=(15, 5), nrows=1, ncols=3)\n", + "ax[0].pcolor(X, Y, myfunc(X, Y), cmap=\"viridis\")\n", + "ax[0].set_title(\"pcolor\", fontsize=mf)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "`pcolor` is the slowest plotting method of the three, but is more flexible in terms of the data mesh." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "fig, ax = plt.subplots(figsize=(15, 5), nrows=1, ncols=3)\n", + "\n", + "ax[0].pcolor(X, Y, myfunc(X, Y), cmap=\"viridis\")\n", + "ax[0].set_title(\"pcolor\", fontsize=mf)\n", + "\n", + "ax[1].pcolormesh(X, Y, myfunc(X, Y), cmap=\"viridis\")\n", + "ax[1].set_title(\"pcolormesh\", fontsize=mf)\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "`pcolormesh` is basically identical to `pcolor` but faster." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "fig, ax = plt.subplots(figsize=(15, 5), nrows=1, ncols=3)\n", + "\n", + "ax[0].pcolor(X, Y, myfunc(X, Y), cmap=\"viridis\")\n", + "ax[0].set_title(\"pcolor\", fontsize=mf)\n", + "\n", + "ax[1].pcolormesh(X, Y, myfunc(X, Y), cmap=\"viridis\")\n", + "ax[1].set_title(\"pcolormesh\", fontsize=mf)\n", + "\n", + "ax[2].imshow(myfunc(X, Y))\n", + "ax[2].set_title(\"imshow\", fontsize=mf)\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "`imshow` is the fastest of the three methods. Note that the image is flipped compared to `pcolor` and `pcolormesh` and that the axis range is specified differently." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "fig, ax = plt.subplots(figsize=(15, 5), nrows=1, ncols=3)\n", + "\n", + "ax[0].pcolor(X, Y, myfunc(X, Y), cmap=\"viridis\")\n", + "ax[0].set_title(\"pcolor\", fontsize=mf)\n", + "\n", + "ax[1].pcolormesh(X, Y, myfunc(X, Y), cmap=\"viridis\")\n", + "ax[1].set_title(\"pcolormesh\", fontsize=mf)\n", + "\n", + "ax[2].imshow(myfunc(X, Y), extent=[-5, 5, -5, 5], origin=\"lower\", aspect=\"auto\")\n", + "ax[2].set_title(\"imshow\", fontsize=mf)\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "Here, we have repositioned the origin of `imshow` to match `pcolor` and `pcolormesh`, and further adjusted the aspect ratio and tick labels." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "## Imshow with contour bar" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "fig, ax = plt.subplots(figsize=(5, 5))\n", + "\n", + "c = ax.imshow(myfunc(X, Y), extent=[-5, 5, -5, 5], origin=\"lower\", aspect=\"auto\")\n", + "cbar = plt.colorbar(c)\n", + "cbar.set_ticks([-5, -2.5, 0, 2.5, 5])\n", + "cbar.set_label(\"my colors\", rotation=90, fontsize=mf)\n", + "\n", + "ax.set_title(\"imshow\", fontsize=mf)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 3D plots\n", + "For 3D plots, you will need to import the `mplot3d` [module](https://matplotlib.org/stable/tutorials/toolkits/mplot3d.html). Then, when the axes are created, passing the `projection=\"3d\"` keyword enables a 3D axes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "from mpl_toolkits.mplot3d import Axes3D\n", + "from matplotlib import cm" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "fig = plt.figure(figsize=(12, 12))\n", + "# fig, ax = plt.subplots(figsize=(12,12),subplot_kw=dict(projection='3d'))\n", + "\n", + "ax = fig.add_subplot(projection=\"3d\")\n", + "ax.plot_surface(X, Y, myfunc(X, Y), cmap=cm.viridis)\n", + "\n", + "\n", + "ax.set_title(\"3D plot\", fontsize=mf)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "The 3D plots of matplotlib are quite powerful but has its limitations: If it comes to plotting more than one set of data (two functions in one figure), then it will likely be rendered incorrectly." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "fig = plt.figure(figsize=(12, 12))\n", + "\n", + "ax = fig.add_subplot(projection=\"3d\")\n", + "ax.plot_surface(X, Y, X + Y)\n", + "ax.plot_surface(X, Y, X + 0.1 * Y)\n", + "\n", + "\n", + "ax.set_title(\"3D plot\", fontsize=mf)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "source": [ + "For more advanced 3D plots, resort to [`Mayavi`](https://docs.enthought.com/mayavi/mayavi/)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Interactive plots" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "%matplotlib notebook" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "fig, ax = plt.subplots(nrows=2, ncols=2)\n", + "ax[0, 0].plot(xvals, np.sin(xvals), marker=\"x\", color=\"blue\")\n", + "ax[0, 1].plot(xvals, np.cos(xvals), marker=\"x\", color=\"blue\")\n", + "ax[1, 0].plot(xvals, np.tan(xvals), marker=\"x\", color=\"blue\")\n", + "ax[1, 1].plot(xvals, np.tanh(xvals), marker=\"x\", color=\"blue\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Further references\n", + "\n", + "- Anatomy of matplotlib: https://nbviewer.org/github/matplotlib/AnatomyOfMatplotlib/tree/master/\n", + "- Matplotlib tutorial by Nicolas Rougier: https://github.com/rougier/matplotlib-tutorial\n", + "- Creating animations with Matplotlib: https://matplotlib.org/stable/api/animation_api.html" + ] + } + ], + "metadata": { + "celltoolbar": "Slideshow", + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/lunchtime7/lunchtime7.slides.html b/lunchtime7/lunchtime7.slides.html new file mode 100644 index 0000000..71450e3 --- /dev/null +++ b/lunchtime7/lunchtime7.slides.html @@ -0,0 +1,17752 @@ + + + + + + + + + +lunchtime7 slides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + + + + + + + + + + diff --git a/lunchtime7/my_figure.jpg b/lunchtime7/my_figure.jpg new file mode 100644 index 0000000..ed7364e Binary files /dev/null and b/lunchtime7/my_figure.jpg differ diff --git a/lunchtime7/my_figure.pdf b/lunchtime7/my_figure.pdf new file mode 100644 index 0000000..7c8d8c0 Binary files /dev/null and b/lunchtime7/my_figure.pdf differ diff --git a/lunchtime8/README.md b/lunchtime8/README.md new file mode 100644 index 0000000..f2d6e85 --- /dev/null +++ b/lunchtime8/README.md @@ -0,0 +1,5 @@ +# Lunchtime 8: ipywidgets (July 29th) + +[ipywidgets](https://ipywidgets.readthedocs.io/) is a widget library that provides +interactive UI controls (*widgets*) to Jupyter notebooks. They allow the generation +of frontend controls in pure Python for both demonstrator notebooks and Python libraries. diff --git a/lunchtime8/lunchtime8.ipynb b/lunchtime8/lunchtime8.ipynb new file mode 100644 index 0000000..05bf9f8 --- /dev/null +++ b/lunchtime8/lunchtime8.ipynb @@ -0,0 +1,742 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Lunch Time Python #8: ipywidgets" + ] + }, + { + "cell_type": "markdown", + "id": "1", + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "source": [ + "*Jupyter Notebooks* are a perfect fit for scientific work with Python. They combine the following elements:\n", + "\n", + "* Code\n", + "* Documentation\n", + "* Visualization\n", + "* **UI Controls**" + ] + }, + { + "cell_type": "markdown", + "id": "2", + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "source": [ + "This allows us to write scientifically meaningful, executable documents that contain results, their interpretation and their provenance. They are a key element for reproducible research." + ] + }, + { + "cell_type": "markdown", + "id": "3", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## What are widgets?" + ] + }, + { + "cell_type": "markdown", + "id": "4", + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "source": [ + "Jupyter has a so-called *rich display system*. If Python code returns an object, Jupyter accesses special methods on the object to decide how to display it. This can involve pretty printing, HTML, images, video, sounds etc:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5", + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "from PIL import Image\n", + "from io import BytesIO\n", + "import requests" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6", + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "response = requests.get(\n", + " \"https://ssciwr.github.io/lunch-time-python/lunchtime5/thingstaette.png\"\n", + ")\n", + "img = Image.open(BytesIO(response.content))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7", + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "?img._repr_png_" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8", + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "img" + ] + }, + { + "cell_type": "markdown", + "id": "9", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "`ipywidgets` provides a number of widgets that are Python objects that display as HTML. The interactive behaviour of this HTML snippet is implemented in JavaScript and uses callback functions in Python. This way, you write interactive notebooks with pure Python." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10", + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "import ipywidgets" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11", + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "button = ipywidgets.Button(description=\"Click Me!\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12", + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "button" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13", + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "def handler(change):\n", + " button.description = \"Thanks!\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "14", + "metadata": { + "slideshow": { + "slide_type": "-" + } + }, + "outputs": [], + "source": [ + "button.on_click(handler)" + ] + }, + { + "cell_type": "markdown", + "id": "15", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Input widgets (I)" + ] + }, + { + "cell_type": "markdown", + "id": "16", + "metadata": {}, + "source": [ + "We can create simple input fields that allow users to put in data. We can then access that data from Python reading and writing:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "17", + "metadata": {}, + "outputs": [], + "source": [ + "widget = ipywidgets.Text()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "18", + "metadata": {}, + "outputs": [], + "source": [ + "widget" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "19", + "metadata": {}, + "outputs": [], + "source": [ + "widget.value" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20", + "metadata": {}, + "outputs": [], + "source": [ + "widget.value = \"Test\"" + ] + }, + { + "cell_type": "markdown", + "id": "21", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Input widgets (II)" + ] + }, + { + "cell_type": "markdown", + "id": "22", + "metadata": {}, + "source": [ + "Many similar working subflavors exist (for a full list see [the docs](https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20List.html)):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23", + "metadata": {}, + "outputs": [], + "source": [ + "ipywidgets.FloatText(value=42.0, step=0.01)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "24", + "metadata": {}, + "outputs": [], + "source": [ + "ipywidgets.IntSlider(min=-10, max=10)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25", + "metadata": {}, + "outputs": [], + "source": [ + "ipywidgets.Checkbox(value=True, description=\"Some Option\")" + ] + }, + { + "cell_type": "markdown", + "id": "26", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Selection widgets" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "27", + "metadata": {}, + "outputs": [], + "source": [ + "widget = ipywidgets.Dropdown(options=[\"Model A\", \"Model B\", \"Model C\"])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "28", + "metadata": {}, + "outputs": [], + "source": [ + "widget" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29", + "metadata": {}, + "outputs": [], + "source": [ + "widget.value" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "30", + "metadata": {}, + "outputs": [], + "source": [ + "ipywidgets.RadioButtons(options=[\"Model A\", \"Model B\", \"Model C\"])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31", + "metadata": {}, + "outputs": [], + "source": [ + "ipywidgets.Select(options=[\"Linux\", \"Windows\", \"macOS\"], description=\"OS:\")" + ] + }, + { + "cell_type": "markdown", + "id": "32", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Container widgets" + ] + }, + { + "cell_type": "markdown", + "id": "33", + "metadata": {}, + "source": [ + "If multiple widgets should be placed together, possibly applying some styling, they can be grouped into container widgets. In contrast to other widgets, these do not have an accessible `value`, but some have `selected_index`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "34", + "metadata": {}, + "outputs": [], + "source": [ + "widgets = [ipywidgets.Text(value=f\"#{i}\") for i in range(4)]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "35", + "metadata": {}, + "outputs": [], + "source": [ + "ipywidgets.HBox(children=widgets)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "36", + "metadata": {}, + "outputs": [], + "source": [ + "ipywidgets.VBox(children=widgets)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "ipywidgets.Accordion(children=widgets, titles=tuple(f\"Tab #{i}\" for i in range(4)))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "38", + "metadata": {}, + "outputs": [], + "source": [ + "tab = ipywidgets.Tab(children=widgets, titles=tuple(f\"Tab #{i}\" for i in range(4)))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "39", + "metadata": {}, + "outputs": [], + "source": [ + "tab" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "40", + "metadata": {}, + "outputs": [], + "source": [ + "tab.selected_index" + ] + }, + { + "cell_type": "markdown", + "id": "41", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Putting things together" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "42", + "metadata": {}, + "outputs": [], + "source": [ + "import io\n", + "\n", + "\n", + "def img_to_widget(i):\n", + " membuf = io.BytesIO()\n", + " i.save(membuf, format=\"png\")\n", + " return ipywidgets.Image(value=membuf.getvalue(), format=\"png\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "43", + "metadata": {}, + "outputs": [], + "source": [ + "img_widget = img_to_widget(img)\n", + "cropped_widget = img_to_widget(img)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "44", + "metadata": {}, + "outputs": [], + "source": [ + "x0 = ipywidgets.IntText(value=0, layout=ipywidgets.Layout(width=\"100px\"))\n", + "y0 = ipywidgets.IntText(value=0, layout=ipywidgets.Layout(width=\"100px\"))\n", + "x1 = ipywidgets.IntText(value=img.size[0], layout=ipywidgets.Layout(width=\"100px\"))\n", + "y1 = ipywidgets.IntText(value=img.size[1], layout=ipywidgets.Layout(width=\"100px\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "45", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "controls = ipywidgets.VBox(\n", + " children=[\n", + " ipywidgets.VBox(children=[ipywidgets.Label(\"Upper left:\"), x0, y0]),\n", + " ipywidgets.VBox(children=[ipywidgets.Label(\"Lower right:\"), x1, y1]),\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "46", + "metadata": {}, + "outputs": [], + "source": [ + "def crop_handler(_):\n", + " cropped_widget.value = img_to_widget(\n", + " img.crop([x0.value, y0.value, x1.value, y1.value])\n", + " ).value\n", + "\n", + "\n", + "x0.observe(crop_handler, names=\"value\")\n", + "y0.observe(crop_handler, names=\"value\")\n", + "x1.observe(crop_handler, names=\"value\")\n", + "y1.observe(crop_handler, names=\"value\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "47", + "metadata": {}, + "outputs": [], + "source": [ + "app = ipywidgets.AppLayout(\n", + " left_sidebar=controls,\n", + " center=img_widget,\n", + " right_sidebar=cropped_widget,\n", + " pane_widths=(1, 2, 2),\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "48", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "app" + ] + }, + { + "cell_type": "markdown", + "id": "49", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## A simple alternative - interact" + ] + }, + { + "cell_type": "markdown", + "id": "50", + "metadata": {}, + "source": [ + "`ipywidgets` contains a much simpler interface that automatically creates widgets for you. You simply need to annotate (\"decorate\") a function that does something and you will get a continuously updated interactive version:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "51", + "metadata": {}, + "outputs": [], + "source": [ + "@ipywidgets.interact(x=(0, 100), y=(0, 100))\n", + "def add(x, y):\n", + " return x + y" + ] + }, + { + "cell_type": "markdown", + "id": "52", + "metadata": {}, + "source": [ + "Notably, this does not change the function nature of `add`. It is merely displaying a UI as a side effect of the function definition:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "53", + "metadata": {}, + "outputs": [], + "source": [ + "?add" + ] + }, + { + "cell_type": "markdown", + "id": "54", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "`ipywidgets.interact` has many more options and flavors. Here are some:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "55", + "metadata": {}, + "outputs": [], + "source": [ + "@ipywidgets.interact(\n", + " operation=[(\"add\", 1.0), (\"subtract\", -1.0)],\n", + " rounding=False,\n", + " x=(0, 100, 0.1),\n", + " y=(0, 100, 0.1),\n", + ")\n", + "def op(operation, rounding, x, y):\n", + " val = x * operation + y\n", + " if rounding:\n", + " return round(val)\n", + " else:\n", + " return val" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "56", + "metadata": {}, + "outputs": [], + "source": [ + "import time" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "57", + "metadata": {}, + "outputs": [], + "source": [ + "@ipywidgets.interact_manual(x=(0, 100), y=(0, 100))\n", + "def slow_add(x, y):\n", + " time.sleep(1)\n", + " return x + y" + ] + }, + { + "cell_type": "markdown", + "id": "58", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## More information\n", + "\n", + "For more information, see the `ipywidgets` documentation:\n", + "\n", + "[https://ipywidgets.readthedocs.io](https://ipywidgets.readthedocs.io)" + ] + } + ], + "metadata": { + "celltoolbar": "Slideshow", + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.0" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/lunchtime8/lunchtime8.slides.html b/lunchtime8/lunchtime8.slides.html new file mode 100644 index 0000000..d039d00 --- /dev/null +++ b/lunchtime8/lunchtime8.slides.html @@ -0,0 +1,16850 @@ + + + + + + + + + +lunchtime8 slides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+ + +
+ +
+
+ +
+ +
+
+ +
+
+ + +
+
+
+
+
+ + + + + + + + + + + + + diff --git a/lunchtime9/README.md b/lunchtime9/README.md new file mode 100644 index 0000000..329a722 --- /dev/null +++ b/lunchtime9/README.md @@ -0,0 +1,8 @@ +# Lunchtime #9: mypy + +[mypy](https://mypy.readthedocs.io/en/stable/) is a static type checker for Python. +By adding type annotations to your code mypy can find a variety of bugs. +These type annotations also act as machine-checked documentation of your code, +and your IDE can make use of them to improve its code completion. +They don't affect how your program runs, as the Python interpreter ignores +these type annotations at run-time \ No newline at end of file diff --git a/lunchtime9/duck0.png b/lunchtime9/duck0.png new file mode 100644 index 0000000..c5046cb Binary files /dev/null and b/lunchtime9/duck0.png differ diff --git a/lunchtime9/duck1.png b/lunchtime9/duck1.png new file mode 100644 index 0000000..df48903 Binary files /dev/null and b/lunchtime9/duck1.png differ diff --git a/lunchtime9/duck2.png b/lunchtime9/duck2.png new file mode 100644 index 0000000..f968dcb Binary files /dev/null and b/lunchtime9/duck2.png differ diff --git a/lunchtime9/duck3.png b/lunchtime9/duck3.png new file mode 100644 index 0000000..732214b Binary files /dev/null and b/lunchtime9/duck3.png differ diff --git a/lunchtime9/lunchtime9.ipynb b/lunchtime9/lunchtime9.ipynb new file mode 100644 index 0000000..b85ffae --- /dev/null +++ b/lunchtime9/lunchtime9.ipynb @@ -0,0 +1,1051 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Lunch Time Python\n", + "\n", + "## Lunch 9: mypy\n", + "\n", + "\n", + "[mypy](https://mypy.readthedocs.io/en/stable/) is a static type checker for Python.\n", + "By adding type annotations to your code mypy can find a variety of bugs.\n", + "These type annotations also act as machine-checked documentation of your code,\n", + "and your IDE can make use of them to improve its code completion.\n", + "They doesn't affect how your program runs, as the Python interpreter ignores\n", + "these type annotations at run-time\n", + "\n", + "*Press `Spacebar` to go to the next slide (or `?` to see all navigation shortcuts)*\n", + "\n", + "[Lunch Time Python](https://ssciwr.github.io/lunch-time-python/), [Scientific Software Center](https://ssc.iwr.uni-heidelberg.de), [Heidelberg University](https://www.uni-heidelberg.de/)" + ] + }, + { + "cell_type": "markdown", + "id": "1", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Motivation\n", + "\n", + "- Python is a **dynamically typed** language\n", + " - It infers the type of objects automatically\n", + " - \"Duck typing\" - if something looks like a duck and quacks like a duck, it's a duck\n", + "- This makes Python very flexible and convenient compared to a **statically typed** language like C\n", + " - But this flexibility also leaves room for bugs, makes static analysis difficult\n", + " - Harder to maintain and understand large complicated projects without type safety\n", + "- Solution: add optional **type annotations** (or hints) to your code\n", + " - mypy can use these to check the code for bugs\n", + " - they are ignored by the Python interpreter at run-time" + ] + }, + { + "cell_type": "markdown", + "id": "2", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# How does this help?\n", + "\n", + "Imagine you call a function with the wrong type of object\n", + "\n", + "- This may be a runtime error, e.g. `str` + `int`\n", + " - This is a good bug: the program stops at the place where the bug is!\n", + " - Assuming your test suite executes this line, you find it by running the tests\n", + " - But even better would be to have your IDE point this out as you write the code\n", + "- This may not be an error, just do something undesirable\n", + " - Eventually this may cause a runtime error or incorrect output\n", + " - Hopefully some test fails, but could be very far away from the original bug\n", + " - This is a bad bug: can be hard to trace back to the root cause\n", + " - If your IDE points out the root cause as you type it this is a big win" + ] + }, + { + "cell_type": "markdown", + "id": "3", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "\"DALL·E" + ] + }, + { + "cell_type": "markdown", + "id": "4", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "\"DALL·E" + ] + }, + { + "cell_type": "markdown", + "id": "5", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "\"DALL·E" + ] + }, + { + "cell_type": "markdown", + "id": "6", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "\"DALL·E" + ] + }, + { + "cell_type": "markdown", + "id": "7", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# mypy installation\n", + "\n", + "- Conda: `conda install mypy`\n", + "- Pip: `python -m pip install mypy`\n", + "\n", + "# mypy use\n", + "\n", + "- `mypy my_file_to_check.py`" + ] + }, + { + "cell_type": "markdown", + "id": "8", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "### jupyter notebook use\n", + "\n", + "These slides also use the [nb-mypy](https://pypi.org/project/nb-mypy/) extension:\n", + "\n", + "- `python -m pip install nb-mypy`\n", + "\n", + "This automatically runs mypy on every cell before it is executed.\n", + "\n", + "This is just for convenience to show the mypy output for this talk.\n", + "\n", + "In general I would recommend running mypy separately or as a pre-commit hook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "if \"google.colab\" in str(get_ipython()):\n", + " !pip install nb-mypy -qqq\n", + "%load_ext nb_mypy" + ] + }, + { + "cell_type": "markdown", + "id": "10", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# Hello world" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "def greet(thing):\n", + " return f\"Hello {thing}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(greet(\"world\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(greet(True))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "14", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(greet(greet))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "15", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "def greet(thing: str) -> str:\n", + " return f\"Hello {thing}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(greet(\"world\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "17", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(greet(True))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "18", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(greet(greet))" + ] + }, + { + "cell_type": "markdown", + "id": "19", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Basic types\n", + "\n", + "- to annotate an object, add its type after a `:`\n", + " - `my_string: str`\n", + "- to annotate the return type of a function, add the type after `->`\n", + " - `def hello() -> str:`\n", + "- intrinsic types like `None`, `int`, `float`, `str`, `bool` can be used directly\n", + "- other types like `list`, `dict`, `tuple`, `set`\n", + " - import a capitalized version from typing: `from typing import List`\n", + " - use `list` directly (only with Python >= 3.9)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "def mul(a, b):\n", + " return a * b\n", + "\n", + "\n", + "print(mul(2, 2))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "21", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(mul(2, [0]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(mul(2, \"really?\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "def mul(a: float, b: float) -> float:\n", + " return a * b\n", + "\n", + "\n", + "print(mul(2, 2))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "24", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(mul(2, [0]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(mul(2, \"really?\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "26", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# does operation & returns result\n", + "def append1(l):\n", + " return l + [1]\n", + "\n", + "\n", + "# does operation in place\n", + "def append2(l):\n", + " l.append(1)\n", + "\n", + "\n", + "l0 = [0]\n", + "print(append1(l0))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "27", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(append2(l0))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "28", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "from typing import List\n", + "\n", + "\n", + "def append1(l: List[int]) -> List[int]:\n", + " return l + [1]\n", + "\n", + "\n", + "def append2(l: List[int]) -> None:\n", + " l.append(1)\n", + "\n", + "\n", + "l0 = [0]\n", + "print(append1(l0))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "29", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(append2(l0))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "30", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "def count(l: List[str]):\n", + " return len(l)\n", + "\n", + "\n", + "a = []\n", + "# a: List[str] = []\n", + "# a.append(\"ok\")\n", + "# a = [\"ok\"]\n", + "print(count(a))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "from typing import List, Tuple\n", + "\n", + "\n", + "def count(coords: List[Tuple[float, float]]) -> int:\n", + " return len(coords)\n", + "\n", + "\n", + "print(count([(0, 0), (1, 1), (2, 2)]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "32", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "from typing import Dict\n", + "\n", + "\n", + "def invert(ages: Dict[str, int]) -> Dict[int, str]:\n", + " return {key: value for value, key in ages.items()}\n", + "\n", + "\n", + "ages = {\"bob\": 2, \"joe\": 7}\n", + "names = invert(ages)\n", + "print(names)" + ] + }, + { + "cell_type": "markdown", + "id": "33", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Generic collections\n", + "\n", + "- if you can do \"for\" to iterate over the object\n", + " - `my_obj: Iterable`\n", + "- if it's like a list\n", + " - `my_obj: Sequence`\n", + "- if it's like a read-only dict\n", + " - `my_obj: Mapping`\n", + "- if it's a dict we can modify\n", + " - `my_obj: MutableMapping`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "34", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "from typing import Iterable\n", + "\n", + "\n", + "def count(items: Iterable):\n", + " i = 0\n", + " for item in items:\n", + " i += 1\n", + " return i\n", + "\n", + "\n", + "print(count([\"a\", \"b\", \"c\"]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "35", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "from typing import Sequence\n", + "\n", + "\n", + "def last(items: Sequence):\n", + " return items[-1]\n", + "\n", + "\n", + "print(last([1, 2, 3]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "36", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "print(count([\"a\", \"b\", \"c\"]))\n", + "from typing import Dict, Mapping\n", + "\n", + "\n", + "def invert(ages: Mapping[str, int]) -> Dict[int, str]:\n", + " # ages[\"simon\"] = 12\n", + " return {key: value for value, key in ages.items()}\n", + "\n", + "\n", + "ages = {\"bob\": 2, \"joe\": 7}\n", + "names = invert(ages)\n", + "print(names)" + ] + }, + { + "cell_type": "markdown", + "id": "37", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "\n", + "# Flexible types\n", + "\n", + "- if an object could have several types, use typing.Union\n", + " - `my_obj: Union[str, float]`\n", + "- if an object can be either a dict or None\n", + " - `my_obj: Union[Dict, None]`\n", + " - `my_obj: Optional[Dict]`\n", + "- if an object can be anything\n", + " - `my_obj: Any`\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "38", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "def hi(name=None):\n", + " if name is None:\n", + " name = \"you\"\n", + " print(f\"hello {name}\")\n", + "\n", + "\n", + "hi()\n", + "hi(\"joe\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "39", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "from typing import Union\n", + "\n", + "\n", + "def hi(name: Union[str, None] = None) -> None:\n", + " if name is None:\n", + " name = \"you\"\n", + " print(f\"hello {name}\")\n", + "\n", + "\n", + "hi()\n", + "hi(\"joe\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "40", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "from typing import Optional\n", + "\n", + "\n", + "def hi(name: Optional[str] = None) -> None:\n", + " if name is None:\n", + " name = \"you\"\n", + " print(f\"hello {name}\")\n", + "\n", + "\n", + "hi()\n", + "hi(\"joe\")" + ] + }, + { + "cell_type": "markdown", + "id": "41", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "\n", + "# Generics\n", + "\n", + "- TypeVar specifies a generic type\n", + " - Possible types can optionally be constrained\n", + " - Within a scope it represents a single type\n", + " - For any c++ programmers it's a bit like a `template`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "42", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "# concatenation of lists of a single type\n", + "def add(x, y):\n", + " return x + y\n", + "\n", + "\n", + "# desired use\n", + "print(add([1, 2], [3]))\n", + "print(add([\"A\"], [\"B\"]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "43", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "# don't want to allow e.g.\n", + "print(add([1.0], [\"B\"])) # this shouldn't be allowed\n", + "print(add([[0]], [{1, 2}])) # this shouldn't be allowed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "44", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "from typing import TypeVar, List\n", + "\n", + "T = TypeVar(\"T\")\n", + "\n", + "\n", + "def add(x: List[T], y: List[T]) -> List[T]:\n", + " return x + y\n", + "\n", + "\n", + "print(add([1, 2], [3]))\n", + "print(add([\"A\"], [\"B\"]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "45", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "add([1.0], [\"B\"]) # this shouldn't be allowed\n", + "add([[0]], [{1, 2}]) # this shouldn't be allowed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "46", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "from typing import TypeVar\n", + "\n", + "IntOrStr = TypeVar(\"IntOrStr\", str, int)\n", + "\n", + "\n", + "def add(x: List[IntOrStr], y: List[IntOrStr]) -> List[IntOrStr]:\n", + " return x + y\n", + "\n", + "\n", + "print(add([1, 2], [3]))\n", + "print(add([\"A\"], [\"B\"]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "47", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "print(add([1.0, 2.0], [3.0]))" + ] + }, + { + "cell_type": "markdown", + "id": "48", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Other tricks\n", + "\n", + "- to have mypy ignore a line, append\n", + " - `# type: ignore`\n", + "- to see what type mypy infers for an object\n", + " - `reveal_type(my_obj)`\n", + "- to override the inferred type\n", + " - `cast(str, my_obj)`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "49", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "outputs": [], + "source": [ + "def add1(x: int) -> int:\n", + " return x + 1\n", + "\n", + "\n", + "f = 0.1\n", + "\n", + "reveal_type(f)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "50", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "# (mis)use of type ignore:\n", + "print(add1(f)) # type: ignore" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "51", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [], + "source": [ + "from typing import cast\n", + "\n", + "# (mis)use of cast:\n", + "print(add1(cast(int, f)))" + ] + }, + { + "cell_type": "markdown", + "id": "52", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# 3rd party libraries\n", + "\n", + "- some libraries include type information and will just work with mypy\n", + "- type information for many other libraries is available from [typeshed](https://github.com/python/typeshed)\n", + " - e.g. for the requests library `pip install types-requests`\n", + " - or run mypy with `--install-types` to automatically download them as needed\n", + "- a few libraries offer a separate stubs package to install instead\n", + "\n", + "If no type information is available, you can add `# type: ignore` to the end of the line where you import the package to suppress mypy error messages related to this package." + ] + }, + { + "cell_type": "markdown", + "id": "53", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# Advanced features\n", + "\n", + "- define your own [protocols](https://mypy.readthedocs.io/en/stable/protocols.html)\n", + " - an example of a built-in protocal is `Iterable[T]`\n", + "- define your own [generic classes](https://mypy.readthedocs.io/en/stable/generics.html)\n", + " - and example of a similar built-in class is `list[X]`\n", + "- use [mypyc](https://mypyc.readthedocs.io/) to compile type-annoted Python to C extensions\n", + " - similar to Cython but code remains valid python & get run-time type checks\n", + " - note: still alpha and not competitive for numeric code" + ] + }, + { + "cell_type": "markdown", + "id": "54", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# TLDR\n", + "\n", + "Add type annotations and run mypy to catch bugs earlier and more easily\n", + "\n", + "# Strategy\n", + "\n", + "- you don't need to annotate everything\n", + "- start with just a few functions in a single file\n", + "- mypy infers types where possible\n", + " - e.g. `a = [1, 2, 3]` is automatically inferred to be a `List[int]`\n", + "- a few annotations can go a long way" + ] + }, + { + "cell_type": "markdown", + "id": "55", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "# More information\n", + "\n", + "- mypy is well documented\n", + "- a good starting point: [getting started](https://mypy.readthedocs.io/en/stable/getting_started.html)\n", + "- basic summary: [cheat sheet](https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html)\n", + "- full documentation: [mypy.readthedocs.io](https://mypy.readthedocs.io/)\n", + "- beyond that try github issues: [github.com/python/mypy/issues](https://github.com/python/mypy/issues)\n" + ] + } + ], + "metadata": { + "celltoolbar": "Slideshow", + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/lunchtime9/lunchtime9.slides.html b/lunchtime9/lunchtime9.slides.html new file mode 100644 index 0000000..a44a8c5 --- /dev/null +++ b/lunchtime9/lunchtime9.slides.html @@ -0,0 +1,17264 @@ + + + + + + + + + +lunchtime9 slides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + + + + + + + + + +