Step 1: Data Loading

Description: In this step, we load the dataset from a CSV file and perform some initial data exploration. the dataset is retrive from kagle you can see the Citation Above

# Import necessary libraries/modules
import os  # Import the 'os' module for working with file paths and directories
import pandas as pd  # Import the 'pandas' library for data manipulation
import matplotlib.pyplot as plt  # Import 'matplotlib.pyplot' for data visualization
import seaborn as sns  # Import 'seaborn' for enhanced data visualization
import folium as fl  # Import 'folium' for creating interactive maps

# 1. Load Data
csv_file_path = "/Users/DataSet - CRIME IN LA PROJECT/crime_in_la.csv"
delimiter = ','  # Define the delimiter used in the CSV file
data = pd.read_csv(csv_file_path, sep=delimiter)  # Read the CSV data into a Pandas DataFrame

Step 2: Data Exploration

Description: In this step, we explore the loaded data by displaying the first 5 rows, calculating summary statistics, checking for missing values, and inspecting data types.

# 2. Display After Load
print("First 5 rows of the DataFrame:")
print(data.head(5))  # Display the first 5 rows of the loaded DataFrame

# Calculate summary statistics
summary_stats = data.describe(percentiles=[0.25, 0.5, 0.75])
print(summary_stats)  # Display summary statistics of the data

# Identify missing values
missing_values = data.isna().sum()
print("Missing Values:")
print(missing_values)  # Display the count of missing values in each column

# Check data types
data_types = data.dtypes
print("Data Types:")
print(data_types)  # Display data types of columns

In this step, we begin the data exploration process by examining the loaded dataset. We print the first 5 rows of the DataFrame to get a glimpse of the data's structure. Next, we calculate summary statistics such as mean, standard deviation, and quartiles to understand the data's distribution and central tendencies.

We also identify missing values by counting the number of null values in each column. Lastly, we check the data types of each column to ensure that they are correctly interpreted by Python.

Step 3: Data Cleaning

Description: In this step, we enhance data quality by eliminating rows with missing values in the "Premis Desc" column.

# 3. Remove Rows with Missing Values in "Premis Desc"
data_cleaned = data.dropna(subset=["Premis Desc"])  # Remove rows with missing values in the "Premis Desc" column
print("First 5 rows of the Cleaned DataFrame:")
print(data_cleaned.head(5))  # Display the first 5 rows of the cleaned DataFrame

# Check for Missing Values in the Cleaned Data
missing_values_cleaned = data_cleaned.isna().sum()
print("Missing Values in Cleaned Data:")
print(missing_values_cleaned)  # Display the count of missing values in the cleaned data

In this step, we focus on improving the data quality for our portfolio project. We achieve this by eliminating rows that contain missing values specifically in the "Premis Desc" column. This ensures that our analysis is based on complete and reliable data.

After performing the data cleaning operation, we verify the results by checking for missing values in the cleaned dataset. This is crucial for maintaining data integrity throughout our project.

Step 4: Data Visualization

Description: Data Viz

Description: In this crucial step of our project portfolio, we leverage various visualization techniques to gain valuable insights from the data. These visualizations enhance our understanding of the dataset and help communicate findings effectively.

Histogram of Victim Ages: We begin by creating a histogram that displays the distribution of victim ages. This visualization provides insights into the age demographics of crime victims, helping us identify trends or patterns.

Distribution of Victim Ages.png

# Create a histogram of victim ages
plt.hist(data['Vict Age'], bins=20, edgecolor='k')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Distribution of Victim Ages')
plt.show()

Crime Counts by Area: Next, we generate a bar chart (countplot) that illustrates the number of crimes in each area. This visualization allows us to identify areas with higher crime rates and potential hotspots.