Skater Point Projections Using The NHL's API
Introduction
People occasionally ask me how to access the NHL’s API and use the data to make skater projections. This document describes a way to get started doing that using the R programming language.
The projection methodology here is ultra simple: use 2-year rate stats and 1-year time-on-ice data to predict each skater’s points for the upcoming NHL season (2024-2025). The raw projections are pushed to Google Sheets where they can be fine-tuned as desired (but the simple raw projections are surprisingly good for many skaters).
The process described here is only the beginning of what can be done with data pulled from the NHL’s API. It’s possible to get detailed data and build more complicated models. Hopefully this document provides an entry point for people who are interested in such things but who don’t know how to get started.
Note: I show R code below but I don’t explain it. This document is not intended to teach you how to program in R. I strongly encourage you to learn a programming language if you’re curious. I’m self-taught - it can be done.
Load Libraries
Start by loading these libraries (after installing them if necessary).
library(tidyverse)
library(jsonlite)
library(googledrive)
library(googlesheets4)
library(kableExtra)
Get NHL Data
There are a couple of tricks to pulling data from the NHL’s API.
First, you need to find the data. I show you how to do that below.
Second, you need to “unnest” the JSON data pulled from the API and turn it into something usable. The scripts below show you one way to do that.
Scoring Summary
Start by pulling skater summary data for the last two seasons.
To find this data go to the NHL’s stats website using this link and then open the developer tools in your web browser. Now refresh the page and use the developer tools to find the URL that requests data from the NHL’s API - it should start with “https://api.nhle.com/stats” (the full URL is in the script below). If you’re new to this type of thing it could take a few minutes to hunt for the URL but it’s there and you’ll find it eventually.
Now for some magic: you need to find the limit hidden in the URL and change it from 50 to -1. This tells the API to return all relevant data, not just the top 50 results.
Here’s a script that pulls each skater’s aggregate scoring data for the last two seasons (a sample is displayed after the script).
<- read_json("https://api.nhle.com/stats/rest/en/skater/summary?isAggregate=true&isGame=false&sort=%5B%7B%22property%22:%22points%22,%22direction%22:%22DESC%22%7D,%7B%22property%22:%22goals%22,%22direction%22:%22DESC%22%7D,%7B%22property%22:%22assists%22,%22direction%22:%22DESC%22%7D,%7B%22property%22:%22playerId%22,%22direction%22:%22ASC%22%7D%5D&start=0&limit=-1&cayenneExp=gameTypeId=2%20and%20seasonId%3C=20232024%20and%20seasonId%3E=20222023")
summary_data_raw
<- summary_data_raw[["data"]] |>
summary_data tibble() |>
unnest_wider(1) |>
select(player_id = playerId,
skater = skaterFullName,
position = positionCode,
gp = gamesPlayed,
as_goals = goals,
as_assists = assists,
as_points = points,
pp_goals = ppGoals,
pp_points = ppPoints) |>
mutate(pp_assists = pp_points - pp_goals,
.after = pp_goals) |>
mutate(position = if_else(position == "D", "D", "F")) |>
mutate(goals_x_pp = as_goals - pp_goals,
assists_x_pp = as_assists - pp_assists)
|>
summary_data slice_head(n = 3) |>
kable()
player_id | skater | position | gp | as_goals | as_assists | as_points | pp_goals | pp_assists | pp_points | goals_x_pp | assists_x_pp |
---|---|---|---|---|---|---|---|---|---|---|---|
8478402 | Connor McDavid | F | 158 | 96 | 189 | 285 | 28 | 87 | 115 | 68 | 102 |
8476453 | Nikita Kucherov | F | 163 | 74 | 183 | 257 | 21 | 82 | 103 | 53 | 101 |
8477492 | Nathan MacKinnon | F | 153 | 93 | 158 | 251 | 22 | 60 | 82 | 71 | 98 |
The data shown above are summary data for:
all-strengths points;
power play points; and
points for all game states excluding the power play (i.e., all-strengths points minus power play points).
Time-On-Ice
Next, pull time-on-ice (TOI) data.
2-Year TOI Data (Rates)
Start by pulling aggregate TOI data for the last two seasons. This will be used to convert the scoring data from counting stats (i.e., total goals and total assists) to rate stats (i.e., goals/second and assists/second).
To find the TOI data go to the NHL’s stats website using this link and repeat the process described above to find the URL that requests data from the NHL’s API. Don’t forget to change the limit in the URL from 50 to -1.
Here’s a script that pulls each skater’s aggregate TOI data for the last two seasons (a sample is displayed after the script).
<- read_json("https://api.nhle.com/stats/rest/en/skater/timeonice?isAggregate=true&isGame=false&sort=%5B%7B%22property%22:%22timeOnIce%22,%22direction%22:%22DESC%22%7D,%7B%22property%22:%22playerId%22,%22direction%22:%22ASC%22%7D%5D&start=0&limit=-1&cayenneExp=gameTypeId=2%20and%20seasonId%3C=20232024%20and%20seasonId%3E=20222023")
rate_toi_data_raw
<- rate_toi_data_raw[["data"]] |>
rate_toi_data tibble() |>
unnest_wider(1) |>
select(player_id = playerId,
as_toi = timeOnIce,
pp_toi = ppTimeOnIce) |>
mutate(toi_x_pp = as_toi - pp_toi)
|>
rate_toi_data slice_head(n = 3) |>
kable()
player_id | as_toi | pp_toi | toi_x_pp |
---|---|---|---|
8474563 | 254377 | 30983 | 223394 |
8474578 | 245427 | 33933 | 211494 |
8480839 | 244277 | 31428 | 212849 |
1-Year TOI Data (Projections)
Repeat the above process using this link to pull TOI data for last season. This data will be used for projecting TOI in the upcoming season.
To state the obvious: some skaters will get significantly different ice-time in the upcoming season. To the extent you want to project different TOI you can make adjustments after the data are pushed to Google Sheets.
Here’s a script that pulls each skater’s TOI data for last season (a sample is displayed after the script).
<- read_json("https://api.nhle.com/stats/rest/en/skater/timeonice?isAggregate=false&isGame=false&sort=%5B%7B%22property%22:%22timeOnIce%22,%22direction%22:%22DESC%22%7D,%7B%22property%22:%22playerId%22,%22direction%22:%22ASC%22%7D%5D&start=0&limit=-1&cayenneExp=gameTypeId=2%20and%20seasonId%3C=20232024%20and%20seasonId%3E=20232024")
pred_toi_data_raw
<- pred_toi_data_raw[["data"]] |>
pred_toi_data tibble() |>
unnest_wider(1) |>
select(player_id = playerId,
pred_as_toi = timeOnIcePerGame,
pred_pp_toi = ppTimeOnIcePerGame) |>
mutate(pred_toi_x_pp = pred_as_toi - pred_pp_toi) |>
mutate(across(pred_as_toi:pred_toi_x_pp, round))
|>
pred_toi_data slice_head(n = 3) |>
kable()
player_id | pred_as_toi | pred_pp_toi | pred_toi_x_pp |
---|---|---|---|
8474590 | 1554 | 202 | 1352 |
8474563 | 1548 | 195 | 1353 |
8476875 | 1533 | 221 | 1312 |
Prepare Projections
Join And Shrink Data
Combine the scoring data with the TOI data.
<- summary_data |>
projections left_join(rate_toi_data, by = "player_id") |>
left_join(pred_toi_data, by = "player_id")
|>
projections slice_head(n = 3) |>
kable()
player_id | skater | position | gp | as_goals | as_assists | as_points | pp_goals | pp_assists | pp_points | goals_x_pp | assists_x_pp | as_toi | pp_toi | toi_x_pp | pred_as_toi | pred_pp_toi | pred_toi_x_pp |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8478402 | Connor McDavid | F | 158 | 96 | 189 | 285 | 28 | 87 | 115 | 68 | 102 | 207535 | 34830 | 172705 | 1282 | 204 | 1077 |
8476453 | Nikita Kucherov | F | 163 | 74 | 183 | 257 | 21 | 82 | 103 | 53 | 101 | 204361 | 39357 | 165004 | 1300 | 243 | 1057 |
8477492 | Nathan MacKinnon | F | 153 | 93 | 158 | 251 | 22 | 60 | 82 | 71 | 98 | 207318 | 39556 | 167762 | 1369 | 270 | 1099 |
Remove all skaters with fewer than 100 games played in the last two seasons.
<- projections |>
projections filter(gp >= 100)
Convert Scoring Data To Rate Stats
Convert the scoring data from simple counts to rate stats on a per second basis.
<- projections |>
projections mutate(pred_goals_x_pp = goals_x_pp / toi_x_pp,
pred_assists_x_pp = assists_x_pp / toi_x_pp,
pred_goals_pp = pp_goals / pp_toi,
pred_assists_pp = pp_assists / pp_toi) |>
select(player_id,
skater,
position,
gp,
pred_toi_x_pp,
pred_pp_toi,
pred_goals_x_pp,
pred_assists_x_pp,
pred_goals_pp,
pred_assists_pp)
|>
projections slice_head(n = 3) |>
kable()
player_id | skater | position | gp | pred_toi_x_pp | pred_pp_toi | pred_goals_x_pp | pred_assists_x_pp | pred_goals_pp | pred_assists_pp |
---|---|---|---|---|---|---|---|---|---|
8478402 | Connor McDavid | F | 158 | 1077 | 204 | 0.0003937 | 0.0005906 | 0.0008039 | 0.0024978 |
8476453 | Nikita Kucherov | F | 163 | 1057 | 243 | 0.0003212 | 0.0006121 | 0.0005336 | 0.0020835 |
8477492 | Nathan MacKinnon | F | 153 | 1099 | 270 | 0.0004232 | 0.0005842 | 0.0005562 | 0.0015168 |
Add Teams
The point projections are basically done at this point but it would be nice to have each skater’s current team included in the data. That requires pulling each team’s current roster.
This script pulls the active roster (player_id only) for every NHL team.
<- function () {
get_current_rosters
<- "20242025"
season
<- paste0("https://api-web.nhle.com/v1/club-schedule-season/mtl/", season)
tri_code_url
<- read_json(tri_code_url)
tri_code_data
<- tri_code_data[["games"]] |>
tri_codes tibble() |>
unnest_wider(1) |>
filter(gameType == 2) |>
select(awayTeam) |>
unnest_wider(1)
<- unique(tri_codes$abbrev)
tri_codes
<- "https://api-web.nhle.com/v1/roster/"
base_url
<- list()
roster_data
for (i in (1:length(tri_codes))) {
<- read_json(paste0(base_url, tri_codes[i], "/current"))
temp_roster_data
<- temp_roster_data |>
temp_roster tibble() |>
unnest_longer(1) |>
unnest_wider(1) |>
mutate(team = tri_codes[i]) |>
select(player_id = id,
team)
<- temp_roster
roster_data[[i]]
}
<- roster_data |>
roster_data bind_rows()
return(roster_data)
}
<- get_current_rosters() teams_data
Join the teams data to the projections.
<- projections |>
projections left_join(teams_data, by = "player_id")
NOTE: sometimes players do not appear on an active NHL roster even though they are expected to play in the upcoming NHL season. You can make adjustments after pushing the data to Google Sheets.
Push To Google Sheets
That’s it. Running the above code takes about 15 seconds (and most of that time is spent pulling the roster data).
Push the data to Google Sheets where the final point projections can be calculated and any desired adjustments can be made. You’ll need to create a blank spreadsheet in Google Sheets and then copy the URL to the google_sheet_url object in the following script.
<- "YOUR URL HERE"
google_sheets_url
sheet_write(projections,
ss = google_sheets_url,
sheet = "raw_projections")
I’ve pushed the projections to my own Google Sheets and you can make a copy of them using this link.
Preview The Projections
Here are the Top 20 scorers based on these ultra simple skater projections (assuming that every skater plays 82 games).
Rank | Skater | Team | Points | Goals | Assists |
---|---|---|---|---|---|
1 | Connor McDavid | EDM | 142 | 48 | 94 |
2 | Nathan MacKinnon | COL | 136 | 50 | 86 |
3 | Nikita Kucherov | TBL | 133 | 38 | 95 |
4 | Leon Draisaitl | EDM | 114 | 45 | 69 |
5 | David Pastrnak | BOS | 112 | 54 | 58 |
6 | Mikko Rantanen | COL | 108 | 50 | 58 |
7 | Artemi Panarin | NYR | 107 | 39 | 68 |
8 | Jack Hughes | NJD | 105 | 42 | 63 |
9 | Auston Matthews | TOR | 102 | 58 | 44 |
10 | Kirill Kaprizov | MIN | 101 | 51 | 50 |
11 | Mitch Marner | TOR | 100 | 31 | 69 |
12 | Matthew Tkachuk | FLA | 96 | 32 | 64 |
13 | Elias Pettersson | VAN | 94 | 36 | 58 |
14 | William Nylander | TOR | 94 | 41 | 53 |
15 | Sidney Crosby | PIT | 93 | 37 | 56 |
16 | Brayden Point | TBL | 93 | 49 | 44 |
17 | Jason Robertson | DAL | 92 | 37 | 55 |
18 | Cale Makar | COL | 92 | 22 | 70 |
19 | J.T. Miller | VAN | 91 | 34 | 57 |
20 | Jack Eichel | VGK | 90 | 39 | 51 |