This is a repo of projects completed during 8-month DS/ML/NLP/CV/DL training program at "Yandex Practicum".
Each project is assigned with its own designated folder containing all related files. Due to the exclusivity of the materials provided during the course, it is forbidden to publish datasets used in the projects but, however, all Jupyter Notebook files which contain the solution of the projects do include all explanations as well as data processing results in the cells executed.
The project folders are generally characterized by the following structure:
|-- [project_folder_name]
|-- README.md
|-- [project_name].ipynb
README.md
- Markdown file containing the description of the project;*.ipynb
- Jupyter Notebook file storing the solution of the project.
Project name | Description | Libraries used |
---|---|---|
Big Cities Music | Comparison of preferences of "Yandex.Music" users from Moscow and Saint-Peterburg depending on time of day (morning and evening) and weekday (Monday, Wednesday and Friday). | matplotlib numpy pandas seaborn IPython |
Borrowers Solvency Study | Analysis of factors affecting the creditworthiness of clients of some bank: number of children, family status, total income and loan purpose. | matplotlib numpy pandas seaborn IPython |
Real Estate Ads Study | Exploratory data analysis of real estate objects advertisements in Saint-Petersburg and its neighbouring localities. | warnings matplotlib numpy pandas seaborn IPython |
Preferred Tariff Choice | Optimal choice of the most preferable tariff plan from the menu offered by a mobile network operator based on its users behavior patterns. | matplotlib numpy pandas seaborn IPython scipy |
Computer Games Market Analysis | Identification of profit-enhancing patterns in data and making product-oriented forecasts. | warnings math matplotlib numpy pandas seaborn IPython scipy |
Tariff Recommendation System | Building a recommendation system that would suggest tariffs to clients of a mobile operator. | collections matplotlib numpy pandas seaborn IPython joblib scipy sklearn |
Customer Churn | Building a system capable of predicting whether a client will churn from the bank or not in the near future. | re collections copy matplotlib numpy pandas seaborn imblearn IPython joblib scipy sklearn |
Oil Well Location Choice | Building an ML model capable of determining the most optimal location for drilling a new oil well. | matplotlib numpy pandas seaborn IPython sklearn |
Gold Recovery Prediction[Real project] | Developing an ML model prototype for predicting recovery rate of gold from gold-bearing ore. | functools itertools matplotlib numpy pandas seaborn IPython sklearn tqdm |
Clients' Personal Data Protection | Developing a data obfuscation algorithm such that it would make it difficult to recover personal information from it. | matplotlib numpy pandas seaborn IPython sklearn |
Car Prices Prediction | Building an optimal ML model capable of determining the prices of automobile vehicles. | re time warnings pprint matplotlib numpy pandas seaborn catboost IPython joblib lightgbm sklearn xgboost |
Forecasting Taxi Orders | Developing a time-series model that is capable of forecasting hourly taxi orders to the airport. | itertools matplotlib numpy pandas seaborn catboost IPython lightgbm sklearn statsmodels xgboost |
Transformers-based Sentiment Analysis[GPU] | Classification of commentaries into positive and toxic ones using BERT language model along with GPU support. | pprint matplotlib numpy pandas seaborn torch transformers catboost imblearn lightgbm sklearn tqdm xgboost |
Startup Investments | Writing queries of different levels of complexity to the database containing information about the venture capital and startup companies. | SQL/Postgres |
CV-based People's Age Determination[GPU] | Building a neural net model capable of determining a person's age based on their photos. | os typing matplotlib numpy pandas seaborn IPython PIL tensorflow.keras |
Production Costs Optimization[Diploma project] | Developing a prototype of an ML model that will predict a temperature of steel. | os copy pprint joblib matplotlib numpy pandas seaborn catboost IPython lightgbm sklearn xgboost |
-
Module 1: Introduction to Data Analysis
- Topics: Basic Python, Data Preprocessing, Exploratory Data Analysis, Statistical Data Analysis
- Libraries:
pandas
numpy
scipy
matplotlib
seaborn
-
Module 2: Basics of Machine Learning
- Topics: Introduction to Machine Learning, Supervised Learning, Machine Learning in Business
- Libraries:
sklearn
imblearn
-
Module 3: Advanced Machine Learning
- Topics: Transformers, Natural Language Processing, Gradient Boosting/Descent, Time Series, Linear Algebra
- Libraries:
catboost
lightgbm
xgboost
statsmodels
re
pymystem3
nltk
transformers
torch
tqdm
-
Module 4: Machine Learning for Big Data
- Topics: SQL (Postgres), PySpark, Unsupervised Learning, Computer Vision, Deep Learning
- Libraries:
tensorflow.keras
pyspark
PIL
cv2
pyod