Skip to content

Latest commit

 

History

History
95 lines (76 loc) · 3.22 KB

README.md

File metadata and controls

95 lines (76 loc) · 3.22 KB

Suppose you were running an Online Electronic site and you need to get some insights from 2019 monthly sales reports. Main idea of the project is coming from this video.

Features:

See how it works below

Generate data


The sample dataset is randomly added by this script. We can adjust the weight of each element to get different versions.

Data format

# Data header
columns = ['Order ID', 'Product', 'Quantity', 'Price', 'Order Date', 'Purchase Address']

# Location
street_names = ['1st', ' 2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th', '11th', '12th', '13th', '14th', '15th', '16th', '17th', '18th', '19th', '20th', '21st', '22nd', '23rd', '24th', '25th']
districts = ['01', '02', '03', '04', '05', '06', '07', '08', '10', '11']
districts_weights = [9, 3, 4, 3, 5, 2, 6, 0.5, 3, 1]  

# Product (Price and Frequency)
products = {
  'iPhone': [700, 10],
  'Google Phone': [600, 3],
  'Samsung Phone': [500, 8],
  '21in Monitor': [110, 6],
  '27in Monitor': [150, 11],
  '34in 2k Monitor': [380, 9],
  '27in 4K Gaming Monitor': [400, 9],
  'Dell Laptop': [800, 7],
  'ThinkPad Laptop': [1000, 6],
  'Macbook Pro Laptop': [1700, 7],
  'Power Bank': [60, 30],
  'USB-C Charging Cable': [12, 30],
  'Lightning Charging Cable': [15, 30],
  'Normal Headphones': [12, 30],
  'Sony Headphones': [100, 19],
  'Airpods Headphones': [150, 15]
  }

Result

Analyze data


The sample analysis file is performed by this script.

Concatenate csv files

os.chdir("./Sales Data Examples")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension), recursive=True)]
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames])
combined_csv.to_csv( "Sales_2019.csv", index=False, encoding='utf-8-sig')

Clear outliers

sales = sales.dropna(how='all')
sales = sales[sales['Order Date'].str[0:4] != 'Null']

Sales by Month

Sales by Territory

Top 5 bundles

sales = sales.dropna(how='all')
sales = sales[sales['Order Date'].str[0:4] != 'Null']

from itertools import combinations
from collections import Counter

count = Counter()

for row in df_pair['Bundle']:
    row_list = row.split(', ')
    count.update(Counter(combinations(row_list, 3)))

for key, value in count.most_common(5):
    print(key, '-', value, 'times')

Feedback & Suggestions


Please feel free to fork, comment or give feedback to [email protected]