PRODIGY INFOTECH DATASCIENCE INTERNSHIP TASK-02
Welcome to my submission for Task 2 of the Data Science Internship at Prodigy Infotech. In this task, I have performed Exploratory Data Analysis (EDA) on a dataset provided, focusing on creating a visualization to represent the distribution
DATASET:
The dataset used for this task is https://www.kaggle.com/c/titanic/data. It is based on the real historical data of passengers aboard the RMS Titanic, which sank on its maiden voyage 1912.
The Titanic dataset contains the following columns: PassengerId: Unique identifier for each passenger.
Survived: Whether the passenger survived (1) or not (0).
Pclass: Ticket class (1 = 1st class, 2 = 2nd class, 3 = 3rd class).
Name: Name of the passenger.
Sex: Gender of the passenger (male or female).
Age: Age of the passenger in years.
SibSp: Number of siblings/spouses aboard the Titanic.
Parch: Number of parents/children aboard the Titanic.
Ticket: Ticket number.
Fare: The fare paid by the passenger.
Cabin: Cabin number (if recorded).
Embarked: Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton).
TOOLS AND LIBRARIES USED:
Jupyter notebook
Pandas
Numpy
Matplotlib for visualization
EXPLOITARY DATA ANALYSIS (EDA):
During the EDA process, I performed the following steps:
Data Cleaning:
Checked for missing values, duplicates, and outliers in the dataset and handled them accordingly. Used descriptive statistics (mean, median, standard deviation) to understand the central tendency and dispersion of the data
Visualization:
Created a bar chart, scatter plots to visualize the data.
CONCLUSION:
EDA is an essential step in the data science workflow that ensures you fully understand your data before applying machine learning models.