Cyclistic Bike-Share Analysis

Trio Wibowo
12 min readOct 29, 2023

--

Background

This is case study from the Google Data Analytics Certificate about How did a bike-share business achieve success?

Cyclistic is the company bike-share from Chicago which has more than 5,800 bikes and 600 docking stations. Cyclistic offers reclining bikes, hand tricycles and cargo bikes, making bike-share more inclusive for disabilities user and customer who cannot use standard two-wheeled bicycles. The majority of customer choose classic bikes. About 8% of drivers use the assistance option. Riders bike are more likely to cycle for leisure, but around 30% use it to commute to work every day.

Currently Cyclistic’s marketing strategy has relied on building general awareness and appealing to broad consumer segments. One approach that helps make this happen is price flexibility:

  • one-way tickets,
  • full-day tickets, and
  • annual memberships

Customers who purchase a one-way ticket or a full day ticket are referred to as casual. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s financial analysts have concluded that annual members are much more profitable than casual riders. While the price flexibility is helping Cyclistic attract more customers, Moreno (Marketing Directors) believes that maximizing the number of annual members will be the key to future growth.

Purpose

In this project, I am as a data analyst have task to collecting, analyzing, and reporting data that helps guide Cyclistic’s marketing strategy.

The stakeholder is come from :

  • executive team (this team will decide whether they approve the recommended marketing), and
  • Lily Moreno (Marketing Directors)

In this case I have a big qutestion “How is the trend of casual and members user using Cyclistic bicycles differently?” From this question have a goal to created a new marketing strategy for convert casual riders bike to join in membership.

So that for this case, I use 6 (six) Steps of Process Analysis (ask, prepare, process, analysis, share, and act) to get insight from data at last 12 month (August 2022 — July 2023) created strategy business or decision making for my stakeholder.

6 step of process analysis

Ask

Identify the business task

From this case, Moreno (my stakeholder) have some question about :

  • How do members and casual riders bike use their bikes differently?
  • Why do casual riders bike buy an Cyclistic membership?
  • How does Cyclistic use digital media to influence casual riders bike to become members?

To answer of the question, I have business task to identify How members and casual riders bike use bicycles with differently?

Prepare

1. Data source

I was provided with historical bicycle trips data by Cyclistic to analyze and identify trends. Download the data for the previous 12 months (August 2022 — July 2023). (source : in here)

2. Identify Tools

All data trips in comma-delimited (.csv) format with 15 columns, size more than 130 MB and there are 12 files, so that :

  • Using excel and google sheet is not good choice because that is have capacity limitations.
  • Databases on bigquery or postgreSQL have large storage, so I can import the data to database, combine and then analysis, but for visualization I must use separate tools like looker studio, power BI or tableau.
  • R studio or Jupyter Notebook can be best choice because it can import data, analysis and make visualization, and for this project I use R programming.

Process

1. Install Packages

I must Install required packages like :

  • tidyverse for data import and wrangling
  • data.table to extract data from zip file
  • lubridate for date functions
  • ggplot for visualization
  • modeest for analysis
library(tidyverse)  # helps wrangle data
library(lubridate) # helps wrangle date attributes
library(ggplot2) # helps visualize data
library(data.table) # extract data from zip file
library(modeest) # for get Modus from the data

2. Import Data

Collect the data from the source and then include data in to data frame, check each data frame to see the number of rows, list column and data type.

bike_share_trips_202208 <- fread("curl https://divvy-tripdata.s3.amazonaws.com/202208-divvy-tripdata.zip | funzip")
bike_share_trips_202209 <- fread("curl https://divvy-tripdata.s3.amazonaws.com/202209-divvy-tripdata.zip | funzip")
bike_share_trips_202210 <- fread("curl https://divvy-tripdata.s3.amazonaws.com/202210-divvy-tripdata.zip | funzip")
bike_share_trips_202211 <- fread("curl https://divvy-tripdata.s3.amazonaws.com/202211-divvy-tripdata.zip | funzip")
bike_share_trips_202212 <- fread("curl https://divvy-tripdata.s3.amazonaws.com/202212-divvy-tripdata.zip | funzip")
bike_share_trips_202301 <- fread("curl https://divvy-tripdata.s3.amazonaws.com/202301-divvy-tripdata.zip | funzip")
bike_share_trips_202302 <- fread("curl https://divvy-tripdata.s3.amazonaws.com/202302-divvy-tripdata.zip | funzip")
bike_share_trips_202303 <- fread("curl https://divvy-tripdata.s3.amazonaws.com/202303-divvy-tripdata.zip | funzip")
bike_share_trips_202304 <- fread("curl https://divvy-tripdata.s3.amazonaws.com/202304-divvy-tripdata.zip | funzip")
bike_share_trips_202305 <- fread("curl https://divvy-tripdata.s3.amazonaws.com/202305-divvy-tripdata.zip | funzip")
bike_share_trips_202306 <- fread("curl https://divvy-tripdata.s3.amazonaws.com/202306-divvy-tripdata.zip | funzip")
bike_share_trips_202307 <- fread("curl https://divvy-tripdata.s3.amazonaws.com/202307-divvy-tripdata.zip | funzip")

3. Crombined all data

All data frame combine into one large data frame (in this case we call all_trips) which have all data trips from last 12 month (August 2022 — July 2023)

all_trips <- bind_rows(bike_share_trips_202208
, bike_share_trips_202209
, bike_share_trips_202210
, bike_share_trips_202211
, bike_share_trips_202212
, bike_share_trips_202301
, bike_share_trips_202302
, bike_share_trips_202303
, bike_share_trips_202304
, bike_share_trips_202305
, bike_share_trips_202306
, bike_share_trips_202307
)

4. Clean up, Modified and Add Field

From the data all_trips have some field not use for analysis like start_lat, start_lng, end_lat and end_lng. So in this step thats field will be remove from the data. Combine all data can make remove field process easier because you don’t have remove each data frame in different code.

finnaly the data is ready for analysis, where I have data with 5,723,485 row and 16 column like :

1. rider_id (type of text) is id to identify of record of trips

2. rideable_type (type of text) is type of bike like

  • electric_bike
  • classic_bike
  • docked_bike

3. started_at (type of datetime) is start date riders bike in first station

4. ended_at (type of datetime) is end date riders bike in end station

5. start_station_name (type of text) is name station riders bike start the trip

6. start_station_id (type of text) is id from station when riders bike start the trip

7. end_station_name (type of text) is name station station riders bike end the trip

8. end_station_id (type of text) is id from station when riders bike end the trip

9. member_casual (type of text) is type of riders bike like :

  • member riders bike (purchase a one-way ticket or a full day ticket)
  • casual riders bike (purchase annual memberships)

10. date (type of text) is date from riders bike start of the trip

11. month (type of text) is number of month from riders bike start of the trip

12. day (type of text) is number of day from riders bike start of the trip

13. day_of_week (type of text) is day from riders bike start of the trip

14. year (type of text) is number of year from riders bike start of the trip

15. number_day_of_week (type of integer) is number day of week from riders bike start of the trip like :

  • 01 = Monday
  • 02 = Tuesday
  • 03 = Wednesday
  • 04 = Thusday
  • 05 = Friday
  • 06 = Saturday
  • 07 = Sunday

16. ride_length (type of duration) is duration between start_at and ended_at in minute

Analysis

Statistical summary

Let’s look at a statistical summary. Please focus on looking at each column!

The number of trips by riders bike

From field member_casual , I to know how many number of trips from each other type of riders bike. In last 12 months (August 2022 — July 2023) I have finding that member riders bike contributing to 62.09% of all trips, so I want to increase this percentage after I know trend from the data.

The number of trips by riders bike and bike type

Electric_bike is the most popular bike type with the biggest percentage from members and casual riders bike. Let’s next focus on the field ride_length of each riders bike to see duration of the trips and then break down by field member_casual to compare casual and members riders bike.

Descriptive analysis on ride_length (in minutes)

In overall the average duration is close to 20 minutes and then if I breakdown per user type for looking at mean and median duration from casual and members riders bike.

Mean, median, mode ride length per member type

I can see that the average duration of casual riders bike is 30 minutes. It is longer than member riders bike is 12 minutes. The median is same happen but with gap not to far in 12 minutes and 8 minutes. And now let’s compare modus from day of week between members and casual riders bike.

Using statistical mode, I can see the most common day for renting bikes is :

  • 6 (Saturday) for casual
  • 4 (Thusday) for member

Let’s breakdown the average duartion by each day for members vs casual riders bike.

Average duration (minutes) in daily per riders bike

The results of average duration from Sunday to Saturday have the same trend between casuals and members. The average longer trips occur on Saturday and Sunday.

  • Peak of the trend average duration for member riders bike occurs in 13.68 and 13.87 minutes
  • And the peak of trend average duration for casual riders bike occurs in 32.58 and 33.03 minutes

Maybe it would be clearer if it were depicted in visualization.

Pivot average duration in daily and monthly per bicyclist

For details, I can pivot the average duration last 12 months in daily from Monday to Sunday.

Pivot Number of trips in daily and monthly per bicyclist

I can also do the same thing for number of trips for last 12 months in daily from Sunday to Saturday. But, seeing many numbers like this, will make our analysis less effective. I must to go through each line one by one carefully and it takes a very long time. Therefore it will be easier if the numbers are explained in visualization.

Share

Determine the best way to share findings. Simple findings work well in a tables as shown above. Other findings I will present visualizations using ggplot2.

Using bar chart, the daily average duration trend can be clearly. The trend from casual or members riders bike build shaped like the letter “U” with its peak point on Saturday and Sunday.

Next step is look at the pattern for number of trips in daily :

  • The peak for member riders bike occurs on Tuesday, Wednesday and Thursday.
  • Meanwhile, the peak for casual riders bike occurs on Saturday and Sunday.

The bike with most large number of tris by member and casual riders bike are electric bikes and classic bikes, especially electric bikes which very popular. Meanwhile, docked bikes only used by casual riders bike more inclusive with disabilities user and customer who cannot use standard two-wheeled bicycles.

The number of trips per day is dominated by electric bikes, with trend increase from Monday until Saturday, with a difference not too far, it’s classic bikes which still in great demand. Meanwhile, docked bikes have the lowest number of trips from classic and electric bikes.

The average duration per day for docked bikes have very long usage, with range 133–169 minutes or 2–3 hours, cause are intended of disabilities user, so that it’s longer when compared to non-disabilities user.

Meanwhile, for electric bikes, the average duration per day is very short, in range of 11–14 minutes, I think the electric bikes have advantages in duration because electric bikes equipped with electric drive motors which are capable of producing faster speeds.

All right, after analyze in daily period, let’s look at the monthly period with line chart.

During last 12 months (August 2022 — July 2023) the trend generated by member and casual riders bike was almost the same trend for number of trips. There was a very significant decrease in December 2022 until February 2023.

Maybe it will be easier to see of peak if sort by month.

It can be seen that peak in number of trips casual and member riders bike is almost same in June (06) — September (09).

The number of trips have impact on the average duration. During the last 12 months (August 2022 — July 2027), if number of trips is not to much then average duration will be short, especially for member and casual riders bike in December 2022 — March 2023.

Meanwhile, if sort it by month, the trend will look similar with the peak for longer average duration it happen in July (07).

And then for bike type, trend average duration for docked bikes have very large gap from electric bikes and classic bikes, if can see from average duration for a docked bikes in range of 115–212 minutes or 2–3 hours. Meanwhile for electric bikes have average duration in range 9–14 minutes and for classic bikes it only in 13–22 minutes.

Finally for result with interactive visualization, I created a dashbourd using tableu. You can see in this link.

Dashboard Tableau — Cylistic Bike-Share Analysis

Identify trend and present the findings

Here is summary result analysis from above :

  • The largest contribution number of trips in cylistic is member riders bike in percentage to 62.09% from overall.
  • Electric bikes become most popular and favored by member and casual riders bike.
  • While docked bikes have the lowest number of trips because it’s intended for disabilities users and all of them come from casual riders bike.
  • The average duration for member riders bike is shorter than that of casual, especially for electric bikes which have electric drive motors can make it easier for users.
  • The average duration for member and casual riders bike is longest on Saturday and Sunday, especially for docked bikes which are intended of users disabilities with average duration in range 2–3 hours.
  • Casual riders bike most of them trips on Saturday and Sunday with longer average duration, I think is defined they use a bike for to relax and enjoy of view.
  • Meanwhile, member riders bike mostly trips on weekdays like Tuesday, Wednesday and Thursday to carry out daily activities such as work, shopping, hobbies and so on.
  • If look at the trend in monthly, the number of trips is greatly influenced by season and weather conditions.
Weather in Chicago (source : in here)
  • The peak from number of trips being in June — September which coincides with Summer or the transition from Summer to Fall. I think it can happen because in Summer many people do outdoor activities from another season.

Act

There is some recommendations from me based on result of analysis

  1. Increase the number of electric bikes, that are currently most popular bike and then can improve service for electric bicycles such as system charging, spare batteries, and other spare parts.
  2. Arrange bicycle routes with station points that cross office areas, schools, shopping centers, parks and places that have good views, I think it can make new users attract to join and use bicycles.
  3. Create a membership type for weekend (Saturday and Sunday) with a different offer price from annual member and casual.
  4. Create a special membership for disabilities users, even though the number of trips is not many, if look at the average duration which is in the range of 2–3 hours, it will provide an opportunity to increase member users by offering attractive prices compared to casual users.
  5. Create big event/campain in Summer like bicycle racing competition or big-discount for member riders bike.

Note : for detail code and source you can visit my Github and Kaggle

--

--

Trio Wibowo
Trio Wibowo

Written by Trio Wibowo

Data analyst with over 4 years of experience transforming data into insights for decision-making to improve product quality and service delivery.

No responses yet