Overview

This project (a work-in-progress) is about analyzing Crunchbase data in order to learn about gender equity in start-ups and VC funding.

There is one main dataset from which all others are derived that relates start-ups, gender, and funding data: where 1 row = 1 company. I only pulled those founded after 1989 to visualize a more relevant period. On any given row, there is information on the industry sector, the genders of the founders, and the funding history of any given company. Note: only U.S. companies were queried, but the founders of those orgs are not necessarily from the U.S.

The remaining datasets bin the data in the following ways: by investor, by education level of the founder, by industry (to find where the disparities are), by U.S. state, and by total funding for every year between 1990-2021.

Collecting Start-Up, Gender & Funding Info

Data Visualizations

"Diverse" means founders, who were identified as female, non-binary, or unspecified.

Data for the above graph from year_funded.csv & code from bin_year_funded.py

Data for the above graph: year_funded.csv & code from rates.py

Data for the above graph from industries_count.csv & code from bin_industries.py

Data for the above graph: org_funding.csv & code from post_process.py

Data for the above graph: industries_count.csv & code from bin_industries.py

Data for the above graph: industries_fraction.csv & code from bin_industries.py

Data for the above graph: state_count.csv & code from bin_states.py

Data for the above graph: state_fraction.csv & code from bin_states.py

Data for the above graph: degrees_fraction.csv & code from process_degrees.py

How Datasets Were Created

To produce the organizations.csv:

crunch_library.py
organizations.py
people.py

organizations.csv: where 1 row = 1 company with gender info

To produce the org_funding.csv:

crunch_library.py
organizations.py
people.py
funding_rounds.py
post_process.py

org_funding.csv: where 1 row = 1 company with gender info and the money raised at each funding round (see last column)

More info: I used 3 queries to get this data from 3 relevant collections: organizations, funding rounds, and people. I pulled from the people collection for the gender of each founder and associated them back to their respective organization. I got the history of each company's funding from the funding rounds collection and added the history as a list within a column called moneyRaised.

To produce the binned data of total funding per each year since 1990:

bin_year_funded.py

year_funded.csv: binned by year starting from 1990, where 1 row = aggregate data over 1 year

To produce the binned data by industry:

bin_industries.py

industries_fraction.csv: contains binned data for fraction of gender within each industry where 1 row = aggregate data for all companies for a given industry

industries_count.csv: contains binned data for total count of genders of founders within each industry where 1 row = aggregate data for all companies for a given industry

To produce the binned data by investor:

bin_investors.py

investors_fraction.csv: contains binned data by investor where 1 row = aggregate data for all companies for a given investor

To produce the binned data by U.S. state:

bin_states.py

state_fraction.csv: contains binned data of the fraction of founders by gender within each state where 1 row = aggregate data for all companies for a given state

state_count.csv: contains binned data of the total founders by gender within each state where 1 row = aggregate data for all companies for a given state

To produce the binned data for founder's education level by money invested:

process_degrees.py

degrees_fraction.csv: contains binned data of the fraction of money invested towards founders by gender where 1 row = aggregate data for different education levels for a given gender

degrees_count.csv: contains binned data of the total money invested towards founders by gender where 1 row = aggregate data for different education levels for a given gender

crunchbase-gender-analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Collecting Start-Up, Gender & Funding Info

Data Visualizations

How Datasets Were Created

crunchbase-gender-analysis

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
binned_output		binned_output
img		img
README.md		README.md
bin_industries.py		bin_industries.py
bin_investors.py		bin_investors.py
bin_states.py		bin_states.py
bin_year_funded.py		bin_year_funded.py
crunch_library.py		crunch_library.py
degrees.py		degrees.py
funding_rounds.py		funding_rounds.py
investments.py		investments.py
organizations.py		organizations.py
people.py		people.py
post_process.py		post_process.py
process_degrees.py		process_degrees.py
rates.py		rates.py

itserinlee/crunchbase-gender-analysis

Folders and files

Latest commit

History

Repository files navigation

Overview

Collecting Start-Up, Gender & Funding Info

Data Visualizations

How Datasets Were Created

crunchbase-gender-analysis

About

Resources

Stars

Watchers

Forks

Languages