The Data Analysis Process
1. Ask | The Business Understanding
As the first large-scale smart bike share system implemented in the country, and therefore the first to have a year’s worth of ridership data, the goal for this report was to confirm our baseline understanding of our customers, where they were from, and how differing member types utilized the system. I hoped to solidify the lessons learned in our monthly reports to help reduce operational costs with a more efficient and smaller field team. Also, in working as a client of the city, I aimed to figure out what insights could be determined from ridership routes that would help recommend and validate planned infrastructure in the future.
2. Prepare | Data Collection
Being a smart bike system, with each bike containing its own computer, riders navigating the signup process via the website or smartphone app, and a robust operational backend, our vendor, Social Bicycles, collected a tremendous amount of data. I knew we would not have access to credit card billing information, so an additional field was added during sign-up to collect billing zip codes to track the user’s location of residence before the launch of the system. Early on, through the monthly report process, it was discovered that a few additional data points were missing, and they were able to be added retroactively, ensuring a reasonably clean data set of the trip and user data from the start of the annual report process.
Data Understanding
A majority of the data needed to solve my goals was found in two tables that could be exported from the backend operations system: “Users” and “Trips.” While I had users’ location of residence available through their zip code, I didn’t have any additional tied to their city, county, or state. With that, I added a “ZipCode” table to the dataset to paint a picture of our user base more accurately.
Users.csv Data Columns
Account, First name, Last name, Email, Phone number, RFID card number, Member Since, Status, Payment plan, Total Number of Trips, Total Distance Ridden by User [Miles], Average Length of Trip [Miles], Total Number of Trips Started within a Hub, Total Number of Trips Ended within a Hub, Total Time User Rode Bikes, Average Length (time) of User Trips, Used promo codes, Last Sign In Date, Last Rental Date, Calories burned, Carbon reduced [lbs], Signup origin, Billing Zip Code
Trips.csv Data Columns
Route ID, User ID, Payment Plan, Start Hub, Start Area, Start Latitude, Start Longitude, Start Date, Start Time, Start Time, Start Time Seconds, End Hub, End Area, End Latitude, End Longitude, End Date, End Time, Member Type, Trip Type, Bike ID, Bike Name, Distance [Miles], Duration, Rental Access Path, Multiple Rental, Ride cost, Fees, Bonuses, Total cost
3. Preparation | Cleaning Data
Before I could begin any data processing, a small amount of data had to be cleaned and joined. First, zip codes were not always put in accurately, and those not matching a five-digit zip code had to be scrubbed. In all, I was able to maintain 76% of the 15,708 user zip code data for our location sample.
With that solved, I had to join user zip code data to trips data and did so in Excel by calling the primary key “Account” from the “User” table to match the foreign key “User ID” in the “Trips” table with the Index(Match()) function. I duplicated this process for the city, county, and state locations pulling that information from the “Zip Code” table to the “Trips” table as well. Once complete, I had a single CSV that contained nearly all of the information I needed to begin processing my data and separate from any identifying user information that was included in the “User” table. With more than 50,000 rows, I’m glad I now have alternative means, outside of Excel, in which to join data.
4. Analysis | Data Understanding
Having cleaned and joined data, I needed to answer the business objective questions posed early on:
- Who are my customers?
- At what level are they engaged with the system?
- How are they utilizing the citywide service?
Beginning with trip data, I examined each trip’s start and end hub across all users, subscription users, casual users, and those living within the primary zip code in which we operated. The data revealed that while member groups shared most of the same top hubs, the origin/destination differed significantly. Essentially, there were three different “systems” within the city across the 900 different trip possibilities between hubs.
All of this data enabled a better understanding of how to operate the system more efficiently. Teams could now see which hubs essentially had no turnover, with most rentals starting and ending at the same hub, meaning they could service those locations significantly less and only at scheduled inspection times. Alternatively, the data revealed where the bike supply could be increased or decreased before the demand presented itself. Breaking data down by subscription type also enabled themes to present themselves throughout the course of the week and year, allowing for further refinement of operational team efforts. As responsibilities would shift more towards shop-based repair and scheduled overhauls as the product aged, this was a significant benefit as labor costs did not have to increase.
Similar to my investigation of trip data, I wanted to have a thorough understanding of where our users lived. This would greatly assist in marketing efforts and act as a data set to share with organizational partners, adding value and making co-marketing requests from our team more likely to be executed.
5. Share | Report Assembly
While data alone tells a story, I wanted to visualize these findings to tell a great story. I utilized Tableau to generate geographic visuals showing three different scales of interest, the City of Tampa, the Tampa Bay metropolitan area, and the entire US. With Florida typically being a destination for those East of the Mississippi, it was shocking to discover that the only state not represented in Coast’s member base was Vermont.
These graphics painted a compelling picture of who was utilizing the system as detractors scoffed at the system as being only for tourists. I was happy to discover that 37% of trips were completed by riders who lived in the city, and 32% lived in the eight-county metro area surrounding Tampa. In years past, this latter group typically came to the town for their event, be it a Tampa Bay Lightning Game or a show at the Straz Center, and departed shortly after. Visitors, who may have had dinner beforehand, were now staying downtown longer, exploring, and spending more money because of bike share. This data supported an often touted but challenging-to-prove estimate in bike share that each trip represents $7 being inserted into the local economy (2015). Wrapping up the report, survey data further backed this notion that the system proved to be an asset to mobility downtown following its first year of operation.
6. Act | Putting Data to Use
Finally, going forward into the future, the city and state have several infrastructure projects being discussed but not yet under construction. Knowing a good portion of ridership was casual and leisure based, starting and ending at the same hub, I wanted to look at the top ten one-way transportation trips. The ridership heatmap shown below was aggregated utilizing a combination of our vendor’s backend software and countless hours in Photoshop overlaying a hundred or so screen grabs, each representing a month’s worth of hub-to-hub trip rides. The resulting image painted a picture of the city that needed some serious investment in infrastructure. Users of the system weren’t engaging with the city center at all; instead, they chose to cycle on the safe and separated Hillsborough River. Additionally, users were choosing a long route around town instead of merely traversing the city. This is even more interesting as peak ridership times don’t overlap with peak automotive times, meaning most streets throughout town are mostly empty and have limited conflicts. Using this data, I was able to support and act as a consultant on two main infrastructure projects, the Cass Cycle Track running east/west at the north side of downtown and the first state project of its kind, the Jackson Street Cycle Track running east/west in the middle of downtown.