Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Group By" - lesson improvements #498

Open
mcdperera opened this issue Sep 30, 2020 · 4 comments
Open

"Group By" - lesson improvements #498

mcdperera opened this issue Sep 30, 2020 · 4 comments
Labels
help wanted Looking for Contributors type:enhancement Propose enhancement to the lesson type:feedback Issue to provide feedback on lesson

Comments

@mcdperera
Copy link

Hello dear maintainers,

I had the chance to teach Python session for beginners. While I was doing, I figure it out the existing “Group By” lesson is bit hard to understand.

So, I modified the section using same data set with different point of view. Here is my example of doing “Group By”.

*************************************** Start ***********************************

Group By: split-apply-combine

  • Any groupby operation involves one of the following operations on the original object. They are −

    • Splitting the Object
    • Applying a function
    • Combining the results
  • Split-apply-combine technique

Split-Apply-Combine

Source: https://cmdlinetips.com/2018/02/introduction-to-split-apply-combine-with-pandas/

  • Used "gapminder_all.csv" csv file.
    • Index using continent column.
    • Use Pandas "groupby()" method using continent column.
data = pd.read_csv('data/gapminder_all.csv', index_col=["continent"])
subset = data.groupby('continent')
print(subset)
  • Above output have duplicate values.
  • By using 'count()' method in Pandas we can get the exact record count (number of countries).
print(subset["country"].count())
  • Use Pandas describe() method to do statistical analysis based on continents.
print(subset.groupby('continent').describe())

*************************************** End ***********************************

Conclusions

If you think the above mentioned suggestions could help, please do necessary changes to the lesson.

@mcdperera mcdperera changed the title "Group By" - lesson improvements "Group By" - lesson improvements Sep 30, 2020
@eldobbins
Copy link
Contributor

I also have had trouble teaching groupby - mainly because I didn't understand it well myself.

The problems I had with the example that calculates the wealth index:

  • I can't see quickly the meaning of the second line. I had an easier time understanding groupby if I could predict what the answer would be
  • The numbers output are a jumble (might be better sorted)
  • Output is long (screen real-estate)

I like this new example because it solves those problems. I love the figure. However, I don't understand a couple of the outputs.

  • print(subset) returns <pandas.core.groupby.generic.DataFrameGroupBy object at 0x120a86a30>. I don't mind that. It was helpful to me to see that I couldn't look at a groupby object directly. But is that what you meant?
  • print(subset.groupby('continent').describe()) returned an AttributeError for me

'DataFrameGroupBy' object has no attribute 'groupby'

  • the output of 'describe' overwhelms me. I would like to change the last line to a command that returns an easily understood DataFrame

subset.mean()

I will write a PR if I can get clarification about those things.

@alee alee added help wanted Looking for Contributors type:enhancement Propose enhancement to the lesson type:feedback Issue to provide feedback on lesson labels Dec 7, 2020
@vinisalazar
Copy link
Contributor

vinisalazar commented Apr 26, 2021

I agree with the arguments presented by @eldobbins that the example that @mcdperera provided is better than the current one.

  • print(subset) returns <pandas.core.groupby.generic.DataFrameGroupBy object at 0x120a86a30>. I don't mind that. It was helpful to me to see that I couldn't look at a groupby object directly. But is that what you meant?

I guess it would be nice to include a sentence or two about DataFrameGroupBy objects, and perhaps a link to the Pandas docs entry on these.

  • print(subset.groupby('continent').describe()) returned an AttributeError for me

I'm guessing that OP meant subset.describe() instead? Seeing as the object is already grouped by continent.

  • the output of 'describe' overwhelms me. I would like to change the last line to a command that returns an easily understood DataFrame

subset.mean()

I will write a PR if I can get clarification about those things.

I agree that subset.mean() would be more illustrative than .describe() in this example.

@eldobbins if you are still inclined to write a PR, I will be happy to review it or assist you (except for actually merging, that's up to @alee 😊 )

Best,
V

@eldobbins
Copy link
Contributor

I'd be obliged if you could do it. That was a long time ago for me! And you seem to understand what I was getting at.

Liz

souravsingh pushed a commit that referenced this issue Sep 10, 2021
* bin/lesson_check.py: avoid inconsistent grammar

Pull Request: carpentries/styles#396

* util.py: hot fix for YAML loader

* add data-checker-ignore attributes to links that only work on GitHub

* add data-checker-ignore attributes to footer links

* make title show episode title first

* use ndash as separator

display it only if both `page.title` and `site.title` are defined.

* upgrade to bootstrap 3.4.1

* update coc incident reporting link

* Update PULL_REQUEST_TEMPLATE.md

* Update ISSUE_TEMPLATE.md

* links.md: fix lesson-setup link

* links.md: include base_path.html

* aio.md: multiple improvements

Pull Request: carpentries/styles#406

* change all-in-one file to have the same 'depth' as episode files

* move the core script of aio.md into _includes/aio-script.md

* include comment to inform maintainers to not edit the file

* move aio.md to the root of styles repository

* [fix #408] remove aio from files that need to be initialized

* lesson-check: exclude aio.md, fix read_references

* bump jekyll version to match github pages version

https://pages.github.com/versions/

* Callout (and other) blocks: proper font size, margins

* Use darker purple for code blocks

* Use 1px borders: fix Google Chrome & Edge

See swcarpentry/git-novice#662 (comment)

* specify language based on engine

- fix swcarpentry/r-novice-inflammation#436

* Ignore Jekyll 4's cache

See https://jekyllrb.com/news/2019/08/20/jekyll-4-0-0-released/#cache-all-the-things- & https://github.com/jekyll/jekyll/releases/tag/v4.0.0

It appears during `make site` & `make serve`.

* assets/css/lesson.scss: add proper padding to the top of paragraphs in blockquotes (#425)

* restore lost CSS settings

* Manual ordering of episodes and extras

Co-authored-by: stamper <[email protected]>

* [fix carpentries/workshop-template#513] remove site.title outside

lessons

* update survey links

* Switch to Liquid comments

HTML comments end up in the generated HTML pages: they're not displayed by the browsers but they're still present there. Liquid comments do not end up in the generated HTML pages

* bin/boilerplate/README.md: fix typo

carpentries/styles#441

* Use Jekyll to generate the 'all-in-one' page (carpentries/styles#438)

* manual_episode_order.html: fix typo in a comment

* Enable 'Sponsor' button on GitHub repos

* remove Jekyll command markers from comment block

fix carpentries/lesson-example#281

* Forced re-encoding of text to UTF-8, to avoid issues on Windows

* Refactored paths to make use of OS agnostic methods

* Modified shebang to use python, not python3.

* Reverted change of permissiveness on lesson_check

* Cleaned leftover debug code and old implementations.

* Added PYTHON variable to define executable to run python scripts

* Reverted variable names

* Removed shebang lines from Python scripts to avoid cross-OS problems

* Fixes encoding problem on Windows systems, with minimal changes to existing code.

* Silenced output of PYTHON calls

* Makefile: detect Python 3

* util.py: remove empty line

* Undo optimizations to read_all_markdown

These will be submitted in a separate PR

* Remove executable bits from Python scripts

We can't use a single shebang:
* on some platforms `python` may mean Python 2, on others - Python 3
* on some platforms `python3` does not exist at all

Therefore, we're removing the shebangs altogether.

* lesson_initialize: windows compatibility

* lesson_check.py: Windows-compatible regular expression pattern

* repo_check.py: enforce utf-8 encoding

... for compatibility with Windows

* clarify comment on python check block

* Makefile: Windows does not like single quotes

* test_lesson_check.py: skip unnecessary steps

* Makefile: suppress error message on Windows

* fix typo in lesson_initialize.py

* Makefile: suppress another error message on Windows

These '2>/dev/null' are important on Windows because without them
a mere `make` stalls.

* Makefile: handle MS Store's Python 3

* Makefile: fix syntax in conditional

* Makefile: fix two more syntax errors in conditionals

* fix urls in _config.yml

* use bundler to render lessons

* install gems locally

* refactor use of docker

Co-authored-by: Allen Lee <[email protected]>

* add @maxim-belkin suggestions

* add .bundle to .gitignore

* Makefile: Specify shell. Don't include commands.mk

* Makefile: use Python to execute repo_check.py

* Makefile: improve commands target and commands categories (#450)

* Makefile: improve commands target and commands categories

* replace 'files' with 'website'

* Update R install in .travis.yml (#430)

* Update R install in .travis.yml

See https://cran.r-project.org/bin/linux/ubuntu/README.html

* update .travis.yml

* specify YAML loader

Co-authored-by: Daniel McCloy <[email protected]>c

* add warning hook + CSS class for Rmd-based lessons (#455)

* Update PyPI link

* Use carpentries/lesson-docker for docker-serve make rule (#461)

* Use renv (#462)

* Improve issue template (#463)

Co-authored-by: Sarah Brown <[email protected]>

* lesson.scss: style tab panels on setup pages (#464)

* Improve pull request template (#465)

* OS stripe color (#468)

* bump ruby version

* Upgrade jQuery to 3.5.1 (#469)

```
cd assets/js
wget https://code.jquery.com/jquery-3.5.1.min.js
wget https://code.jquery.com/jquery-3.5.1.min.map
mv -f jquery-3.5.1.min.js jquery.min.js
mv -f jquery-3.5.1.min.map jquery.min.map
```

Fixes carpentries/styles#460

* assets/js/lesson.js: use .length instead of .size()

.size() was deprecated in jQuery 3.0 in favor of .length attribute.
https://jquery.com/upgrade-guide/3.0/#breaking-change-deprecated-size-removed

Co-Authored-By: Thomas Green <[email protected]>

* License is not a copyright. License info ID (#472)

* fix AMY's URL

* Deprecated use of --path when installing bundle (#473)

* add warning blockquote style, carpentries/styles#49 (#475)

* Make links in code tags distinguishable (#478)

* _config.yml: link to Lesson Life Cycle chapter of CDH

To help users of the lesson template understand how to choose/when to update the value of life_cycle

* repo_check.py: allow URLs not ending with .git (#482)

* Makefile: fix comment in front of `lesson-check` (#481)

Comments prepended with `##` appear in the output of `make commands`.
This commit changes `#` to `##` in front of `lesson-check` so that it
appears in the output of `make commands`.

* repo_check.py: match https repositories (#483)

* Makefile: don't use /bin/bash shell (#484)

* [fix carpentries/styles#477] rewrite travis script

* add vendor folder to gitignore and _config.yml

* OS stripe: adjust line height (#490)

* Makefile: fix 'lesson-fixme' target for Windows (#486)

Git for Windows doesn't provide fgrep, which is a shortcut
to call `grep -F` on Mac and Linux. Instead, we have to use
full arguments.

* Fix Python scripts for Windows: UTF-8 encoding (#485)

To avoid problems with various symbols, we have to specify the encoding
when we read files.
The actual codec name is `utf_8` but aliases like `utf8`, `utf-8`, etc
are accepted. Here, I'm using `utf-8` alias.
https://docs.python.org/3.8/library/codecs.html#standard-encodings

This fixes `make lesson-check` when running under 'Git for Windows' for
lessons that have non-cp1252 characters.

* only display Episodes drop-down if we have episodes to show (#491)

* fix variable name

* GitHub Actions: check lesson template (#489)

Co-authored-by: Maxim Belkin <[email protected]>
Co-authored-by: François Michonneau <[email protected]>

* fix: tighten definitions of highlighter (#496)

* add three more common languages (#497)

* remove unused code highlight classes (#498)

* Sync styles first (#494)

Co-authored-by: Maxim Belkin <[email protected]>

* GitHub Actions: website (#488)

* GitHub Actions: better workflow and job names (#500)

* GitHub Action: website.yml: don't run in forks (#501)

* GH Website action: rename + don't use lesson directory (#504)

When 'Website' action tests a lesson, it checks out repositories into the current working directory: 'lesson' directory doesn't exist.
As a result, steps that use "lesson" as the working directory fail.

* lesson.scss: styling for DIVs for embedding Youtube videos (#503)

* update link to discuss mailing list (#507)

* add default repository to install_required_packages() (#509)

* add warning blockquote

* Revert "Merge branch 'gh-pages' of github.com:carpentries/styles into gh-pages"

This reverts commit 2c6b97e, reversing
changes made to 53e9913.

* add warning code block

* update expected reference filename (#508)

* lesson.scss: no borders around unrecognized code (#510)

* bump ruby version (as specified in github-pages v209)

* drop patch version, fix at v1

* removing contractions from CONTRIBUTING (#512)

* add image-with-shadow class

* _config.yml: mention Carpentries Incubator

* Matlab -> MATLAB

* set CRAN url if default is "@cran@"

This will fix #526

* Fix Ruby style

* accept any base filename for Rmd episodes

* run R-based lessons in forks

This is a modification for #501

* add control structure

* 404 page for better learner experience

* lesson.scss: wildcard selectors for code blocks

* Ignore .jekyll-metatada

* Speed up builds of R-based lessons

R-based lessons might take a while to build because packages need to be compiled from source. RStudio Package Manager has compiled versions of packages for ubuntu distros starting with 16.04: https://packagemanager.rstudio.com/client/#/repos/1/overview

I've added the necessary magic in the actions yaml to make it work.

* No need for User Agent string

* permissive checks for pre-alpha lessons

This will fix #533

* bin/lesson_check.py: allow 'language-*' code blocks (#532)

* bin/lesson_check.py: allow 'caution' blockquote

* avoid ansi color characters from being printed

* deploy from "website" action

* deploy R-based lessons without using another action

* also delete _site

* include @zkamvar suggestions

Co-authored-by: Zhian N. Kamvar <[email protected]>

* pin ubuntu version to 20.04 (#540)

Co-authored-by: Zhian N. Kamvar <[email protected]>

* GitHub Actions: cache required R packages (#534)

* add missing parenthesis

* Add incubator option for carpentry field. (#542)

Closes #541

* .editorconfig: don't trim trailing spaces in markdown

* lesson.scss: HTML block

carpentries/styles#519

* add patch to clean gh-pages before committing (#545)

This will address #544

* Fix Kramdown parser crash

... by using GFM (GitHub-flavored Markdown) parser (`kramdown-parser-gfm`)
instead of the default one (`kramdown`).

The default one fails to produce an AST (Abstract Syntax Tree) when
there is no blank line before the line with the opening code fence.

Related:
 - gettalong/kramdown#530
 - Python-Markdown/markdown#807

Fixes: carpentries/styles#543

* bin/util.py: Change ruby executable to "bundle exec ruby"

Closes: carpentries/styles#547

* Change link colours (#549)

...to make them distinguishable from regular text. And for accessibility!

* bin/workshop_check.py: update default contact email address

* Gemfile: add 'webrick' dependency for Ruby 3.0.0 and above

Fixes carpentries/styles#552

* lesson_check.py allow for missing life_cycle

This will fix carpentries/styles#556

* update with Maxim's suggestion

* Add catch for None type code block in lesson_check

There are times when the AST is malformed and does not emit a class for
the code element. We do not want the parser to crash when this happens,
but we also want to notify ourselves that the AST is malformed.

This should not result in an error because as we saw in
carpentries/styles#543, the parser itself can
cause these malformations when the lesson itself renders well. Even
though we fixed the previous issue with an updated parser, problems
still persist:
swcarpentry/r-novice-gapminder#696 (comment)

I fully admit that this is a kludge.

* fix syntax

I've removed the print condition, because it will just result in an error no matter what (sigh)

* Makefile: fix 'bundle config' command flags

* Makefile: clean target: remove .vendor, .bundle, Gemfile.lock

Clean up:
1. `.vendor` directory where Bundler installs all the gems.
2. `.bundle` directory where Bundler stores its settings.
3. `Gemfile.lock` file generated by the Bundler.

* Makefile: silence Docker commands

* use Ruby's official GH Actions

* Makefile: use SHELL to call bin/knit_lesson.sh

* Makefile: fix up PHONY targets

* Fix GitHub actions for lessons in Rmarkdown

Specifically, set CRAN repository to https://cran.rstudio.com

* apply single shadow to image class

* use grey shadow instead of transparent black

* expand image-with-shadow selection

* Makefile: don't fail when Python isn't found

* bin/dependencies.R: handle 'no packages were specified' error

Fixes the following issue:

```
$ make site
lib paths: /Library/Frameworks/R.framework/Versions/3.5/Resources/library
Error in install.packages(missing_pkgs, lib = lib, repos = repos) :
  no packages were specified
Calls: install_required_packages -> install.packages
Execution halted
make: *** [install-rmd-deps] Error 1
```

* Don't check links.md in lessons that use remote theme

Fixes carpentries/styles#570

* add link references to code_of_conduct.md (#572)

* Update links.md

* add source_dir argument

This will fix carpentries/styles#576

* Improved relative_root_path

* update contributing guide

* add further languages for box titles (#580)

Will be useful for HPC-Carpentry lessons, GPU programming lesson as well as Julia lessons which are currently in the incubator.

* bin/lesson_check.py: allow comments and empty lines in links.md

* bin/lesson_check.py: one more fix for using_remote_theme()

* Template workflow: add two more lessons

* add make lesson-check-all step

* Set working directory for the 'make site' step

* Template workflow: smarter syncing with the styles repo

Current syncing procedure that used in the Template workflow fails for:

1. Lessons that are, in fact, nsync with the styles repo.
2. For lessons that use The Carpentries' remote theme and have deleted
   some of the files.

This PR makes this step a little bit more intelligent and takes into
account the above two scenarios.

* Apply Zhian's suggestions

Co-authored-by: Zhian N. Kamvar <[email protected]>

* add math support with katex (#573)

* bin/util.py: remove unused 'IMAGE_FILE_SUFFIX' var (#590)

It should've been removed in 7e835fd.

* bin/lesson_check.py: use proper function

* bin/lesson_check.py: allow exceptions to line length limit

Allow lines that contain a single image or a single link
to go over the suggested line length limit.

* lesson_check.py: harden single-line image/link pattern

This change hardens the pattern that matches single-line
image or link:

1. It extends the pattern to be matched in a heading
2. It allows the line to contain {: ...} customizations
3. It allows the line to end with \

* lesson_check.py: relax P_LINK_IMAGE_LINE pattern

This PR allows up to 3 non-word (`\W` in Python's `re`-speak) characters
in the beginning and end of the pattern that matches links and images.
This is to allow lesson developers place punctuation marks, parentheses,
or other symbols before or after the link or image on the same line in
Markdown.

* bin/util.py: Factor out reporter class. Define __all__

* Don't force hostname into relative_root_path

* lesson_check.py: add a comment about importing * from a package

Co-authored-by: Zhian N. Kamvar <[email protected]>

* lesson.scss: define 'inline' class for images

Define `inline` class for images that should not be displayed as block
elements.

By appending `{:class="inline"}` or `{: .inline}` to the image definition
in Markdown, one can create an inline image that doesn't break the
current line and is embedded in the paragraph. Useful for showing
special symbols and hieroglyphs that we can't display by other means.

* Fix Reporter class imports

* update R dependency search; Allow Bioconductor packages (#600)

* Automatically add deep anchor links using AnchorJS

* Makefile: require index.md (#607)

* Makefile: docker-serve target: ensure Docker is installed (#608)

* Fix broken "How to contribute" link

* lesson_check.py: report check status at the end

* util.py: load_yaml: Don't fail when it's not necessary

Also, make 'require()' function not fail by default.
The only case where we really need to fail is when 'kramdown' parser is
not specified. This is a highly unlikely scenario, tbh (because
arguments to `lesson_check.py` are set in the Makefile), but we can
think about reworking/optimizing this part later.

* lesson_check.py: fix error message for the 'defaults' check

* lesson_check.py: use proper regex for matching episode files

* lesson-check.py: read `config.yml` only once

1. Read `config.yml` file only once and store the contents in a global
   variable `CONFIG`. We're currently reading this file twice.
2. Detect lesson life cycle in `main` instead of making `check_config`
   return it.
3. Replace `using_remote_theme` function with a single test that checks
   that `remote_theme` keyword is present in `_config.yml`
4. CheckEpisode class: `check` method: move "remote theme test" inside the `check_reference_inclusion`.
5. main: mvoe "remote theme test" inside the `read_references`

* 404.md: expand contractions (#620)

* Syllabus: add conditional intructor training pre-/post- surveys (#618)

Port of carpentries/instructor-training@f488d24

Co-authored-by: François Michonneau <[email protected]>

Co-authored-by: François Michonneau <[email protected]>

* Navigation bar: allow excluding extras (#617)

Port of carpentries/instructor-training@dc1b52a

Co-authored-by: maneesha sane <[email protected]>

Co-authored-by: maneesha sane <[email protected]>

* add styling for lesson in The Carpentries Lab

Co-authored-by: Toby Hodges <[email protected]>

* remove contractions

* remove contraction

* remove contraction

* remove funding file to use organization's

Co-authored-by: Katrin Leinweber <[email protected]>
Co-authored-by: Maxim Belkin <[email protected]>
Co-authored-by: maneesha sane <[email protected]>
Co-authored-by: Rayna M Harris <[email protected]>
Co-authored-by: Allen Lee <[email protected]>
Co-authored-by: K.E. Koziar <[email protected]>
Co-authored-by: stamper <[email protected]>
Co-authored-by: Michael Joseph <[email protected]>
Co-authored-by: Joao Rodrigues <[email protected]>
Co-authored-by: Sarah Brown <[email protected]>
Co-authored-by: Anthony Gitter <[email protected]>
Co-authored-by: Zhian N. Kamvar <[email protected]>
Co-authored-by: Thomas Green <[email protected]>
Co-authored-by: ocaisa <[email protected]>
Co-authored-by: Joseph Stachelek <[email protected]>
Co-authored-by: Renato Alves <[email protected]>
Co-authored-by: Henry Schreiner <[email protected]>
Co-authored-by: Christina K <[email protected]>
Co-authored-by: Kilian <[email protected]>
Co-authored-by: Trevor Keller <[email protected]>
Co-authored-by: Andrew Reid <[email protected]>
Co-authored-by: Bailey Harrington <[email protected]>
Co-authored-by: Alan O'Callaghan <[email protected]>
Co-authored-by: Benson Muite <[email protected]>
@gabrielesh
Copy link

More than this, a colleague has recently suggested that we not teach groupby at all in this lesson. It's a much more advanced concept. I agree with her that we should start with more basic concepts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Looking for Contributors type:enhancement Propose enhancement to the lesson type:feedback Issue to provide feedback on lesson
Projects
None yet
Development

No branches or pull requests

5 participants