GitHub

Task: Delivery Delays

A food delivery company is trying to understand delays in order delivery. You've been asked to analyze the dataset and help uncover what might be contributing to the delays.

Use this file to document your reasoning and code. Dot is available in the chat to support you—feel free to talk through your ideas or ask questions as you go.


Initial Exploration and Planning

Use this section to do initial exploration and to outline your approach to the problem.

You might include:


# Code goes here, feel free to add cells

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('FastDrop2.csv')

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45584 entries, 0 to 45583
Data columns (total 20 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   ID                           45584 non-null  object 
 1   Delivery_person_ID           45584 non-null  object 
 2   Delivery_person_age          43730 non-null  float64
 3   Delivery_person_ratings      43676 non-null  float64
 4   Restaurant_latitude          45584 non-null  float64
 5   Restaurant_longitude         45584 non-null  float64
 6   Delivery_location_latitude   45584 non-null  float64
 7   Delivery_location_longitude  45584 non-null  float64
 8   Order_date                   45584 non-null  object 
 9   Time_ordered                 43853 non-null  object 
 10  Time_order_picked            45584 non-null  object 
 11  Weather_conditions           44968 non-null  object 
 12  Road_traffic_density         44983 non-null  object 
 13  Vehicle_condition            45584 non-null  int64  
 14  Type_of_order                45584 non-null  object 
 15  Type_of_vehicle              45584 non-null  object 
 16  Multiple_deliveries          44591 non-null  float64
 17  Festival                     45356 non-null  object 
 18  City                         44384 non-null  object 
 19  Time_taken (min)             45584 non-null  int64  
dtypes: float64(7), int64(2), object(11)
memory usage: 7.0+ MB

df = df.drop(labels=['ID', 'Delivery_person_ID'], axis=1) #drop ID columns since they don't mean anything

#see some quick data examples
df.head()

[HTML output]