The starting data set

The original file of roller coaster data from Kaggle, Kaggle roller coaster data, is a tidy dataset with 14 variables in 157 rows.

The data includes:

31 rows have incomplete data, missing values in the Speed, Height, Drop, Length, Duration or Num_of_Inversions columns.

# Missing
Coaster 0
Park 0
City 0
State 0
Type 0
Design 0
Year_Opened 0
Top_Speed 10
Max_Height 5
Drop 3
Length 3
Duration 28
Inversions 0
Num_of_Inversions 1

A little data research

The Roller Coaster Database shows 708 operating roller coasters in the United States. There’s isn’t any information on Kaggle about the selection criteria of the roller coasters included in the dataset so conclusions based on this dataset probably don’t reflect reality. I feel I can augment the data without angst, all in the name of fun and learning.

I used the Roller Coaster Database (RCDB), Wikipedia, Coasterpedia - Roller Coaster wiki and Ultimate Coaster to find some of the missing data values and to add a few more rows to the roller coaster dataset. My augmented dataset has 187 roller coasters.

But I still couldn’t find the drop height for ten of the roller coasters.

# Missing
coaster 0
park 0
city 0
state 0
type 0
design 0
year_opened 0
top_speed 0
max_height 0
drop 10
length 0
duration 0
inversions 0
num_of_inversions 0

Filling in all the drops

Rather than eliminate these ten rows, I’ll assign values for drop by using the average proportion of drop to max_height =

0.908926

This proportion is pretty close to one, I’ll look at the relationship of drop to max_height and see if it’s really that close to one.

It’s not quite a straight line, and there are a few points a distance from the line. Since mean is not resistant to outliers, I’ll see if there are outliers for the proportion.

There are both low and high outliers. I’ll calculate the cutoffs for eliminating the the low and high outliers using the standard 25th percentile - IQR * 1.5 and 75th percentile + IQR * 1.5.

New average proportion drop / height = 0.8978595

This looks a little better, so I’ll use this average proportion to fill in the missing drop values.

Adding design as numeric

The design of a roller coaster has an impact on the ride experience. In order to include it in the Fun Factor calculation it needs to have a numeric value. The categories for design and their descriptions are:

Sit Down - a traditional roller coaster ridden while sitting down.

Bobsled - designed like a bobsled run – without a fixed track. The train travels freely through a trough.

Stand Up - a coaster ridden while standing up instead of sitting down.

Flying - meant to simulate the sensations of flight by with riders in a prone superhero-like position.

Pipeline - a coaster where riders are positioned between the rails instead of above or below.

Inverted - a roller coaster which uses trains traveling beneath, rather than on top of, the track. Unlike a suspended roller coaster, an inverted roller coaster’s trains don’t pivot freely.

Suspended - a roller coaster using trains which travel beneath the track and pivot on a swinging arm from side to side, exaggerating the track’s banks and turns.

Wing - pairs of riders sit on either side of a roller coaster track in which nothing is above or below the riders.

4th Dimension - riders are rotated independently of the orientation of the track, generally about a horizontal axis that is perpendicular to the track.

Let’s see how many of each design are in the roller coaster dataset:

Sit Down coasters are a definite majority, there’s only one 4th Dimension, about 25 Inverted, and a few of most of the other designs. Based on this diagram, using the design descriptions and continuing with angst-free data augmentation, I’m assigning the following numeric values for design:

Sit Down = 100

Bobsled = 200

Inverted = 300

Stand Up = 400

Suspended = 450

Flying = 500

Pipeline = 550

Wing = 600

4th Dimension = 700

Roller Coaster Fun Factor

Now I’m ready to calculate the Fun Factor. Inversions are a big element in roller coaster excitement. But Number of Inversions is a single digit in all of the rows and it won’t have a very big impact on the total Fun Factor. So I’ll multiply the number of inversions by 100 to give those coasters their due score.

Fun Factor is the sum of speed + height + drop + length + duration + 100 * number of inversions + design value. I’m expecting a wide spread in Fun Factor. Most of the numeric variables in the data have wide ranges in value. There are older roller coasters and kiddie roller coasters included in the data that are shorter, have slower speeds and don’t have inversions.

The top roller coaster

The coaster with the highest fun factor is wooden! It’s the Beast at Paramounts Kings Island in Ohio, and it’s the longest roller coaster in the data. The second highest fun factor is the Son Of Beast, the second longest and also at Paramounts Kings Island. Folks in Ohio can have lots of coasting fun, six of the longest coasters are in Ohio.

Top 20 Roller Coasters
Fun Factor Coaster State Park Year Length Design Design Value # Inversions
1 8025 Beast Ohio Paramounts Kings Island 1979 7359 Sit Down 100 0
2 7882 Son Of Beast Ohio Paramounts Kings Island 2000 7032 Sit Down 100 1
3 7647 Fury 325 North Carolina Carowinds 2015 6602 Sit Down 100 0
4 7538 Millennium Force Ohio Cedar Point 2000 6595 Sit Down 100 0
5 7087 Voyage Indiana Holiday World 2006 6442 Sit Down 100 0
6 6713 California Screamin California Disneys California Adventure 2001 6072 Sit Down 100 1
7 6620 Desperado Nevada Buffalo Bills Resort & Casino 1994 5843 Sit Down 100 0
8 6365 Mamba Missouri Worlds of Fun 1998 5600 Sit Down 100 0
9 6360 Steel Force Pennsylvania Dorney Park 1997 5600 Sit Down 100 0
10 6217 Wild Thing Minnesota Valleyfair! 1996 5460 Sit Down 100 0
11 6207 Titan Texas Six Flags Over Texas 2001 5312 Sit Down 100 0
12 6161 Superman - Ride Of Steel Massachusetts Six Flags New England 2000 5400 Sit Down 100 0
13 6159 Nitro New Jersey Six Flags Great Adventure 2001 5394 Sit Down 100 0
14 6147 Intimidator North Carolina Carowinds 2010 5316 Sit Down 100 0
15 6108 Superman - Ride Of Steel New York Six Flags Darien Lake 1999 5400 Sit Down 100 0
16 6101 Mean Streak Ohio Cedar Point 1991 5427 Sit Down 100 0
17 6087 Diamondback Ohio Kings Island 2009 5282 Sit Down 100 0
18 6043 Superman - Ride Of Steel Maryland Six Flags America 2000 5350 Sit Down 100 0
19 5917 Riddlers Revenge California Six Flags Magic Mountain 1998 4370 Stand Up 400 6
20 5798 Magnum XL-200 Ohio Cedar Point 1989 5106 Sit Down 100 0

Length is the biggest contributing component in Fun Factor. I don’t think one component should outweigh the others in the calculation. I’ll weight length, as I did for Number of Inversions, so it contributes to Fun Factor more equally. The longer lengths are in thousands of feet, so I’ll divide it by 10 to bring it down into scale with the other components. Dividing length by 10 gives the following histogram for Fun Factor.

Now the top coasters are steel, have higher design values and more inversions. But longer, sit down coasters with no inversions also show up in the top 20, Fury 325 at number 12 and Millennium Force at number 19. So I think the new Fun Factor calculation gives a better representation for each component.

Top 20 Roller Coasters - scaled Fun Factor
Row# Fun Factor Coaster State Park Year Length Design Design Value # Inversions
1 2147.0 X2 California Six Flags Magic Mountain 2002 3610 4th Dimension 700 2
2 1984.0 Riddlers Revenge California Six Flags Magic Mountain 1998 4370 Stand Up 400 6
3 1916.3 Montu Florida Busch Gardens Tampa 1996 3983 Inverted 300 7
4 1904.8 Alpengeist Virginia Busch Gardens Williamsburg 1997 3828 Inverted 300 6
5 1826.5 Chang Kentucky Six Flags Kentucky Kingdom 1997 4155 Stand Up 400 5
6 1762.0 Viper California Six Flags Magic Mountain 1990 3830 Sit Down 100 7
7 1753.7 Medusa California Six Flags Discovery Kingdom 2000 3937 Sit Down 100 7
8 1732.5 Scream! California Six Flags Magic Mountain 2003 3985 Sit Down 100 7
9 1728.0 Raptor Ohio Cedar Point 1994 3790 Inverted 300 6
10 1716.0 Great American Scream Machine New Jersey Six Flags Great Adventure 1989 3800 Sit Down 100 7
11 1709.8 Kumba Florida Busch Gardens Tampa 1993 3978 Sit Down 100 7
12 1705.2 Fury 325 North Carolina Carowinds 2015 6602 Sit Down 100 0
13 1697.7 Kraken Florida SeaWorld Orlando 2000 4177 Sit Down 100 7
14 1692.0 Mantis Ohio Cedar Point 1996 3900 Stand Up 400 4
15 1673.5 Medusa New Jersey Six Flags Great Adventure 1999 3985 Sit Down 100 7
16 1672.5 Silver Bullet California Knott’s Berry Farm 2004 3125 Inverted 300 6
17 1658.5 Superman Krypton Coaster Texas Six Flags Fiesta Texas 2000 4025 Sit Down 100 6
18 1633.9 Batman The Ride Texas Six Flags Fiesta Texas 2015 1019 Wing 600 6
19 1602.5 Millennium Force Ohio Cedar Point 2000 6595 Sit Down 100 0
20 1587.0 Incredible Hulk Florida Universal Studios Islands of Adventure 1999 3700 Sit Down 100 7

Older and kiddie coasters

As I expected, coasters at the low end are older or geared to the very young. However the oldest coaster Zippin Pippin in Tennessee, built in 1915, is ranked 150 out of 187 and has a fun factor of 656.5 due to it’s length of 2865 feet.

Bottom 10 Coasters
Row# Fun Factor Coaster State Park Type Year
178 401.0000 Comet Pennsylvania Waldameer Park Wood 1951
179 377.1000 Woodstock Express Ohio Cedar Point Steel 1999
180 377.0000 Bobsleds New York Seabreeze Steel 1962
181 365.2000 Leap The Dips Pennsylvania Lakemont Park Wood 1999
182 362.7122 Wild Chipmunk Colorado Lakeside Amusement Park Steel 1955
183 286.5703 Gadget’s Go Coaster California Disneyland Steel 1993
184 281.5200 Merlin’s Revenge California Castle Amusement Park Steel 2001
185 280.0000 High Speed Thrill Coaster Pennsylvania Knoebels Steel 1955
186 279.5200 Spacely’s Sprocket Rockets Illinois Six Flags Great America Steel 1998
187 236.3593 Jr. Gemini Ohio Cedar Point Steel 1979
Oldest Coaster
rowid coaster park city state type design year_opened top_speed max_height drop length duration inversions num_of_inversions dsgn_val fun
150 Zippin Pippin Libertyland Memphis Tennessee Wood Sit Down 1915 40 70 70 2865 90 N 0 100 656.5

Where the fun is

To get a more realistic view of the distribution of roller coasters in the U.S., I used RCDB’s census page where I could search for the total count of operating roller coasters by state.

California at the top isn’t surprising, there are a number of amusement parks in LA, Orange and San Diego counties, and the temperate climate means the parks can be open year round (except that all amusement parks are closed right now due to Covid 19). Pennsylvania has five more operating coasters than Florida, which I didn’t expect. Almost a third of them are wooden (Pennsylvania has the most wooden roller coasters) and ten of them were opened before 1955. The oldest operating roller coaster in Florida opened in 1972.

Seven states don’t have any roller coasters, Delaware, Rhode Island and Vermont are small and close to states with amusement parks. Montana and Wyoming are probably too sparsely populated, and Alaska is probably too cold. Disney runs the Aulani resort in Hawaii, but it doesn’t have a roller coaster. I doubt anyone misses them in the island paradise.

U.S. Roller Coaster Census
Location Steel Wood Total
California 78 6 84
Pennsylvania 37 18 55
Florida 47 3 50
Texas 44 4 48
New Jersey 41 2 43
New York 34 6 40
Ohio 32 8 40
Missouri 19 6 25
Illinois 19 4 23
Georgia 20 2 22
Virginia 18 4 22
Maryland 17 2 19
North Carolina 13 2 15
Colorado 12 2 14
Indiana 8 6 14
Massachusetts 13 1 14
Michigan 10 3 13
Minnesota 11 2 13
Tennessee 11 2 13
Wisconsin 6 6 12
Kentucky 8 3 11
Utah 9 1 10
Iowa 5 4 9
Alabama 7 1 8
Connecticut 5 3 8
New Hampshire 6 2 8
Oklahoma 7 1 8
New Mexico 6 1 7
Arkansas 5 1 6
Idaho 4 2 6
Maine 5 1 6
Washington 4 2 6
Kansas 5 0 5
Nevada 5 0 5
Oregon 5 0 5
Arizona 4 0 4
Louisiana 4 0 4
South Carolina 3 1 4
West Virginia 2 2 4
South Dakota 2 0 2
Mississippi 1 0 1
Nebraska 1 0 1
North Dakota 1 0 1

References

  1. Kaggle, kaggle.com;
  2. Roller Coaster Database, rcdb.com
  3. Coasterpedia, coasterpedia.net
  4. Ultimate Coaster, ultimaterollercoaster.com
  5. tripsavvy, tripsavvy.com;

Resources

  1. Github repository, https://github.com/sopranomax/Data110_Projects
  2. RPubs, Roller Coasters