Skip to content

Commit

Permalink
fixup! Update spanish usepackages
Browse files Browse the repository at this point in the history
  • Loading branch information
sogladev committed Feb 5, 2024
1 parent e2f3d20 commit 0770360
Show file tree
Hide file tree
Showing 14 changed files with 16 additions and 99 deletions.
7 changes: 5 additions & 2 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -205,9 +205,12 @@ jobs:
uses: softprops/action-gh-release@v1
with:
body: |
in this Release
- extracted data/ found in `.pkl`, `.csv`, `.json` format
English:
- extracted data/ in `.pkl`, `.csv`, `.json` format
- formatted/styled output/ in `.pdf` and `.html` format.
Spanish:
- extracted data/ in `.pkl` format
- formatted/style output/ in `.pdf` and `.html` formta
files: |
english-data.zip
english-output.zip
Expand Down
67 changes: 11 additions & 56 deletions english/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# English vocabulary + pronunciation + definition
# Spanish vocabulary + definition + examples translation

```
latexmk -pdfxe -cd format/spanish_5000_two_column_alphabetical_by_rank_with_example.tex -outdir=../output
```


This project aims to provide easy-to-read and printable vocabulary list of the
most common words of the English language with their meaning.
Expand All @@ -7,10 +12,10 @@ The lists are mostly based on data gathered from the oxford 3000, 5000 and 5000

The word lists contain the following points of data
* Spelling (text)
* Pronunciation (audio)
* Lexical spelling (text)
* Meaning (text)
* Example (text)
* Example translation (text)

This project contains scripts to extract data and formatting.

Expand All @@ -19,15 +24,12 @@ see `scraping` below
see `formatting` below

## Data
Extracted data is hosted seperately on mediafire and can be
found in formats `.pkl`, `.csv`, `.json`
Audio consists of around 10,000 *.mp3 files totalling 200MB

Formatted lists in `/output` are formatted alphabetically, by CEFR rating, random and viewable in
`.pdf` and `.html` format.

## Sample outputs

To be updated

1. grouped by CEFR alphabetical order
![by_cefr_img_sample](./img/oxford_5000_exclusive_by_cefr_sample.jpg)
[by_cefr_pdf_sample](./img/oxford_5000_exclusive_by_cefr_sample.pdf)
Expand All @@ -38,55 +40,17 @@ Formatted lists in `/output` are formatted alphabetically, by CEFR rating, rando

## Folder structure
```
├── audio
│   ├── *_uk.mp3
│   ├── *_us.mp3
│   ├── ...
├── data
│   ├── df_concat.pkl
│   ├── df_definition.pkl
│   ├── df.pkl
│   ├── oxford_3000.csv
│   ├── oxford_3000.json
│   ├── oxford_3000.pkl
│   ├── oxford_5000.csv
│   ├── oxford_5000_exclusive.csv
│   ├── oxford_5000_exclusive.json
│   ├── oxford_5000_exclusive.pkl
│   ├── oxford_5000.json
│   └── oxford_5000.pkl
├── output
│   ├── oxford_3000_alphabetical.html
│   ├── oxford_3000_alphabetical.pdf
│   ├── oxford_3000_by_cefr.html
│   ├── oxford_3000_by_cefr.pdf
│   ├── oxford_3000_two_column_alphabetical.pdf
│   ├── oxford_3000_two_column_by_cefr.pdf
│   ├── oxford_5000_alphabetical.html
│   ├── oxford_5000_alphabetical.pdf
│   ├── oxford_5000_by_cefr.html
│   ├── oxford_5000_by_cefr.pdf
│   ├── oxford_5000_exclusive_alphabetical.html
│   ├── oxford_5000_exclusive_alphabetical.pdf
│   ├── oxford_5000_exclusive_by_cefr.html
│   ├── oxford_5000_exclusive_by_cefr.pdf
│   ├── oxford_5000_exclusive_two_column_alphabetical.pdf
│   ├── oxford_5000_exclusive_two_column_by_cefr.pdf
│   ├── oxford_5000_two_column_alphabetical.pdf
│   └── oxford_5000_two_column_by_cefr.pdf
│   └── *pdf / *html
├── format.ipynb
└── scrape.ipynb
```
## Scraping
Selenium, beautifulsoup4, requests, pandas
and geckodriver
beautifulsoup4, requests, pandas

https://github.com/mozilla/geckodriver/releases
```
$ tar -xf geckodriver-v0.30.0-linux64.tar.gz
$ chmod +x geckodriver
$ mv geckodriver /usr/local/bin

```
See `scrape.ipynb`
Expand Down Expand Up @@ -114,12 +78,3 @@ flowchart LR
See `format.ipynb`

## Resources and credit
Oxford 5000 list, online interface to lookup words, filter by CEFR level,
listen pronunciation (US,UK)
also shows meaning but only after clicking to a new page.
https://www.oxfordlearnersdictionaries.com/wordlists/oxford3000-5000

dictionary by tusharlock10
https://github.com/tusharlock10/Dictionary
with relevant stackoverflow thread
https://stackoverflow.com/questions/41768215/english-json-dictionary-with-word-word-type-and-definition
3 changes: 0 additions & 3 deletions spanish/format/spanish_3000_two_column_alphabetical.tex
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
\documentclass{article}
\usepackage[a4paper,left=1cm,right=1cm,top=1cm,bottom=1cm]{geometry}
\usepackage[utf8]{inputenx}
\usepackage[T1]{fontenc}
\usepackage[spanish]{babel}
\usepackage{supertabular}
\usepackage{array}
\usepackage{helvet}
\renewcommand{\familydefault}{\sfdefault}
\makeatletter
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
\documentclass{article}
\usepackage[a4paper,left=1cm,right=1cm,top=1cm,bottom=1cm]{geometry}
\usepackage[utf8]{inputenx}
\usepackage[T1]{fontenc}
\usepackage[spanish]{babel}
\usepackage{supertabular}
\usepackage{array}
\usepackage{helvet}
\renewcommand{\familydefault}{\sfdefault}
\makeatletter
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
\documentclass{article}
\usepackage[a4paper,left=1cm,right=1cm,top=1cm,bottom=1cm]{geometry}
\usepackage[utf8]{inputenx}
\usepackage[T1]{fontenc}
\usepackage[spanish]{babel}
\usepackage{supertabular}
\usepackage{array}
\usepackage{helvet}
\renewcommand{\familydefault}{\sfdefault}
\makeatletter
Expand Down
3 changes: 0 additions & 3 deletions spanish/format/spanish_3000_two_column_shuffle.tex
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
\documentclass{article}
\usepackage[a4paper,left=1cm,right=1cm,top=1cm,bottom=1cm]{geometry}
\usepackage[utf8]{inputenx}
\usepackage[T1]{fontenc}
\usepackage[spanish]{babel}
\usepackage{supertabular}
\usepackage{array}
\usepackage{helvet}
\renewcommand{\familydefault}{\sfdefault}
\makeatletter
Expand Down
3 changes: 0 additions & 3 deletions spanish/format/spanish_3000_two_column_shuffle_by_rank.tex
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
\documentclass{article}
\usepackage[a4paper,left=1cm,right=1cm,top=1cm,bottom=1cm]{geometry}
\usepackage[utf8]{inputenx}
\usepackage[T1]{fontenc}
\usepackage[spanish]{babel}
\usepackage{supertabular}
\usepackage{array}
\usepackage{helvet}
\renewcommand{\familydefault}{\sfdefault}
\makeatletter
Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,3 @@
\documentclass{article}
\usepackage[a4paper,left=1cm,right=1cm,top=1cm,bottom=1cm]{geometry}
\usepackage[utf8]{inputenx}
\usepackage[T1]{fontenc}
\usepackage[spanish]{babel}
\usepackage{supertabular}
\usepackage{array}
\usepackage{helvet}
\renewcommand{\familydefault}{\sfdefault}
\makeatletter
\author{}
Expand Down
3 changes: 0 additions & 3 deletions spanish/format/spanish_5000_two_column_alphabetical.tex
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
\documentclass{article}
\usepackage[a4paper,left=1cm,right=1cm,top=1cm,bottom=1cm]{geometry}
\usepackage[utf8]{inputenx}
\usepackage[T1]{fontenc}
\usepackage[spanish]{babel}
\usepackage{supertabular}
\usepackage{array}
\usepackage{helvet}
\renewcommand{\familydefault}{\sfdefault}
\makeatletter
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
\documentclass{article}
\usepackage[a4paper,left=1cm,right=1cm,top=1cm,bottom=1cm]{geometry}
\usepackage[utf8]{inputenx}
\usepackage[T1]{fontenc}
\usepackage[spanish]{babel}
\usepackage{supertabular}
\usepackage{array}
\usepackage{helvet}
\renewcommand{\familydefault}{\sfdefault}
\makeatletter
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
\documentclass{article}
\usepackage[a4paper,left=1cm,right=1cm,top=1cm,bottom=1cm]{geometry}
\usepackage[utf8]{inputenx}
\usepackage[T1]{fontenc}
\usepackage[spanish]{babel}
\usepackage{supertabular}
\usepackage{array}
\usepackage{helvet}
\renewcommand{\familydefault}{\sfdefault}
\makeatletter
Expand Down
3 changes: 0 additions & 3 deletions spanish/format/spanish_5000_two_column_shuffle.tex
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
\documentclass{article}
\usepackage[a4paper,left=1cm,right=1cm,top=1cm,bottom=1cm]{geometry}
\usepackage[utf8]{inputenx}
\usepackage[T1]{fontenc}
\usepackage[spanish]{babel}
\usepackage{supertabular}
\usepackage{array}
\usepackage{helvet}
\renewcommand{\familydefault}{\sfdefault}
\makeatletter
Expand Down
3 changes: 0 additions & 3 deletions spanish/format/spanish_5000_two_column_shuffle_by_rank.tex
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
\documentclass{article}
\usepackage[a4paper,left=1cm,right=1cm,top=1cm,bottom=1cm]{geometry}
\usepackage[utf8]{inputenx}
\usepackage[T1]{fontenc}
\usepackage[spanish]{babel}
\usepackage{supertabular}
\usepackage{array}
\usepackage{helvet}
\renewcommand{\familydefault}{\sfdefault}
\makeatletter
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
\documentclass{article}
\usepackage[a4paper,left=1cm,right=1cm,top=1cm,bottom=1cm]{geometry}
\usepackage[utf8]{inputenx}
\usepackage[T1]{fontenc}
\usepackage[spanish]{babel}
\usepackage{supertabular}
\usepackage{array}
\usepackage{helvet}
\renewcommand{\familydefault}{\sfdefault}
\makeatletter
Expand Down

0 comments on commit 0770360

Please sign in to comment.