Skip to content

GitHub Repository containing code samples for Sales Analysis project

Notifications You must be signed in to change notification settings

ngtridung97/Sales-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Suppose you were running an Online Electronic site and you need to get some insights from 2019 monthly sales reports. Main idea of the project is coming from this video.

Features:

See how it works below

Generate data


The sample dataset is randomly added by this script. We can adjust the weight of each element to get different versions.

Data format

# Data header
columns = ['Order ID', 'Product', 'Quantity', 'Price', 'Order Date', 'Purchase Address']

# Location
street_names = ['1st', ' 2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th', '11th', '12th', '13th', '14th', '15th', '16th', '17th', '18th', '19th', '20th', '21st', '22nd', '23rd', '24th', '25th']
districts = ['01', '02', '03', '04', '05', '06', '07', '08', '10', '11']
districts_weights = [9, 3, 4, 3, 5, 2, 6, 0.5, 3, 1]  

# Product (Price and Frequency)
products = {
  'iPhone': [700, 10],
  'Google Phone': [600, 3],
  'Samsung Phone': [500, 8],
  '21in Monitor': [110, 6],
  '27in Monitor': [150, 11],
  '34in 2k Monitor': [380, 9],
  '27in 4K Gaming Monitor': [400, 9],
  'Dell Laptop': [800, 7],
  'ThinkPad Laptop': [1000, 6],
  'Macbook Pro Laptop': [1700, 7],
  'Power Bank': [60, 30],
  'USB-C Charging Cable': [12, 30],
  'Lightning Charging Cable': [15, 30],
  'Normal Headphones': [12, 30],
  'Sony Headphones': [100, 19],
  'Airpods Headphones': [150, 15]
  }

Result

Analyze data


The sample analysis file is performed by this script.

Concatenate csv files

os.chdir("./Sales Data Examples")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension), recursive=True)]
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames])
combined_csv.to_csv( "Sales_2019.csv", index=False, encoding='utf-8-sig')

Clear outliers

sales = sales.dropna(how='all')
sales = sales[sales['Order Date'].str[0:4] != 'Null']

Sales by Month

Sales by Territory

Top 5 bundles

sales = sales.dropna(how='all')
sales = sales[sales['Order Date'].str[0:4] != 'Null']

from itertools import combinations
from collections import Counter

count = Counter()

for row in df_pair['Bundle']:
    row_list = row.split(', ')
    count.update(Counter(combinations(row_list, 3)))

for key, value in count.most_common(5):
    print(key, '-', value, 'times')

Feedback & Suggestions


Please feel free to fork, comment or give feedback to [email protected]

About

GitHub Repository containing code samples for Sales Analysis project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published