Skater Point Projections Using The NHL's API

 

Introduction

People occasionally ask me how to access the NHL’s API and use the data to make skater projections. This document describes a way to get started doing that using the R programming language.

The projection methodology here is ultra simple: use 2-year rate stats and 1-year time-on-ice data to predict each skater’s points for the upcoming NHL season (2024-2025). The raw projections are pushed to Google Sheets where they can be fine-tuned as desired (but the simple raw projections are surprisingly good for many skaters).

The process described here is only the beginning of what can be done with data pulled from the NHL’s API. It’s possible to get detailed data and build more complicated models. Hopefully this document provides an entry point for people who are interested in such things but who don’t know how to get started.

Note: I show R code below but I don’t explain it. This document is not intended to teach you how to program in R. I strongly encourage you to learn a programming language if you’re curious. I’m self-taught - it can be done.

Load Libraries

Start by loading these libraries (after installing them if necessary).

library(tidyverse)
library(jsonlite)
library(googledrive)
library(googlesheets4)
library(kableExtra)

Get NHL Data

There are a couple of tricks to pulling data from the NHL’s API.

First, you need to find the data. I show you how to do that below.

Second, you need to “unnest” the JSON data pulled from the API and turn it into something usable. The scripts below show you one way to do that.

Scoring Summary

Start by pulling skater summary data for the last two seasons.

To find this data go to the NHL’s stats website using this link and then open the developer tools in your web browser. Now refresh the page and use the developer tools to find the URL that requests data from the NHL’s API - it should start with “https://api.nhle.com/stats” (the full URL is in the script below). If you’re new to this type of thing it could take a few minutes to hunt for the URL but it’s there and you’ll find it eventually.

Now for some magic: you need to find the limit hidden in the URL and change it from 50 to -1. This tells the API to return all relevant data, not just the top 50 results.

Here’s a script that pulls each skater’s aggregate scoring data for the last two seasons (a sample is displayed after the script).

summary_data_raw <- read_json("https://api.nhle.com/stats/rest/en/skater/summary?isAggregate=true&isGame=false&sort=%5B%7B%22property%22:%22points%22,%22direction%22:%22DESC%22%7D,%7B%22property%22:%22goals%22,%22direction%22:%22DESC%22%7D,%7B%22property%22:%22assists%22,%22direction%22:%22DESC%22%7D,%7B%22property%22:%22playerId%22,%22direction%22:%22ASC%22%7D%5D&start=0&limit=-1&cayenneExp=gameTypeId=2%20and%20seasonId%3C=20232024%20and%20seasonId%3E=20222023")

summary_data <- summary_data_raw[["data"]] |>
        tibble() |>
        unnest_wider(1) |>
        select(player_id = playerId,
               skater = skaterFullName,
               position = positionCode,
               gp = gamesPlayed,
               as_goals = goals,
               as_assists = assists,
               as_points = points,
               pp_goals = ppGoals,
               pp_points = ppPoints) |>
        mutate(pp_assists = pp_points - pp_goals,
               .after = pp_goals) |>
        mutate(position = if_else(position == "D", "D", "F")) |>
        mutate(goals_x_pp = as_goals - pp_goals,
               assists_x_pp = as_assists - pp_assists)

summary_data |> 
        slice_head(n = 3) |> 
        kable()  
player_idskaterpositiongpas_goalsas_assistsas_pointspp_goalspp_assistspp_pointsgoals_x_ppassists_x_pp
8478402Connor McDavidF15896189285288711568102
8476453Nikita KucherovF16374183257218210353101
8477492Nathan MacKinnonF153931582512260827198

The data shown above are summary data for:

  • all-strengths points;

  • power play points; and

  • points for all game states excluding the power play (i.e., all-strengths points minus power play points).

Time-On-Ice

Next, pull time-on-ice (TOI) data.

2-Year TOI Data (Rates)

Start by pulling aggregate TOI data for the last two seasons. This will be used to convert the scoring data from counting stats (i.e., total goals and total assists) to rate stats (i.e., goals/second and assists/second).

To find the TOI data go to the NHL’s stats website using this link and repeat the process described above to find the URL that requests data from the NHL’s API. Don’t forget to change the limit in the URL from 50 to -1.

Here’s a script that pulls each skater’s aggregate TOI data for the last two seasons (a sample is displayed after the script).

rate_toi_data_raw <- read_json("https://api.nhle.com/stats/rest/en/skater/timeonice?isAggregate=true&isGame=false&sort=%5B%7B%22property%22:%22timeOnIce%22,%22direction%22:%22DESC%22%7D,%7B%22property%22:%22playerId%22,%22direction%22:%22ASC%22%7D%5D&start=0&limit=-1&cayenneExp=gameTypeId=2%20and%20seasonId%3C=20232024%20and%20seasonId%3E=20222023")

rate_toi_data <- rate_toi_data_raw[["data"]] |>
        tibble() |>
        unnest_wider(1) |>
        select(player_id = playerId,
               as_toi = timeOnIce,
               pp_toi = ppTimeOnIce) |>
        mutate(toi_x_pp = as_toi - pp_toi)

rate_toi_data |> 
        slice_head(n = 3) |> 
        kable()  
player_idas_toipp_toitoi_x_pp
847456325437730983223394
847457824542733933211494
848083924427731428212849

1-Year TOI Data (Projections)

Repeat the above process using this link to pull TOI data for last season. This data will be used for projecting TOI in the upcoming season.

To state the obvious: some skaters will get significantly different ice-time in the upcoming season. To the extent you want to project different TOI you can make adjustments after the data are pushed to Google Sheets.

Here’s a script that pulls each skater’s TOI data for last season (a sample is displayed after the script).

pred_toi_data_raw <- read_json("https://api.nhle.com/stats/rest/en/skater/timeonice?isAggregate=false&isGame=false&sort=%5B%7B%22property%22:%22timeOnIce%22,%22direction%22:%22DESC%22%7D,%7B%22property%22:%22playerId%22,%22direction%22:%22ASC%22%7D%5D&start=0&limit=-1&cayenneExp=gameTypeId=2%20and%20seasonId%3C=20232024%20and%20seasonId%3E=20232024")

pred_toi_data <- pred_toi_data_raw[["data"]] |>
        tibble() |>
        unnest_wider(1) |>
        select(player_id = playerId,
               pred_as_toi = timeOnIcePerGame,
               pred_pp_toi = ppTimeOnIcePerGame) |>
        mutate(pred_toi_x_pp = pred_as_toi - pred_pp_toi) |>
        mutate(across(pred_as_toi:pred_toi_x_pp, round))

pred_toi_data |> 
        slice_head(n = 3) |> 
        kable()  
player_idpred_as_toipred_pp_toipred_toi_x_pp
847459015542021352
847456315481951353
847687515332211312

Prepare Projections

Join And Shrink Data

Combine the scoring data with the TOI data.

projections <- summary_data |>
        left_join(rate_toi_data, by = "player_id") |>
        left_join(pred_toi_data, by = "player_id")

projections |> 
        slice_head(n = 3) |> 
        kable()  
player_idskaterpositiongpas_goalsas_assistsas_pointspp_goalspp_assistspp_pointsgoals_x_ppassists_x_ppas_toipp_toitoi_x_pppred_as_toipred_pp_toipred_toi_x_pp
8478402Connor McDavidF158961892852887115681022075353483017270512822041077
8476453Nikita KucherovF163741832572182103531012043613935716500413002431057
8477492Nathan MacKinnonF1539315825122608271982073183955616776213692701099

Remove all skaters with fewer than 100 games played in the last two seasons.

projections <- projections |>
        filter(gp >= 100)

Convert Scoring Data To Rate Stats

Convert the scoring data from simple counts to rate stats on a per second basis.

projections <- projections |>
        mutate(pred_goals_x_pp = goals_x_pp / toi_x_pp,
               pred_assists_x_pp = assists_x_pp / toi_x_pp,
               pred_goals_pp = pp_goals / pp_toi,
               pred_assists_pp = pp_assists / pp_toi) |>
        select(player_id,
               skater,
               position,
               gp,
               pred_toi_x_pp,
               pred_pp_toi,
               pred_goals_x_pp,
               pred_assists_x_pp,
               pred_goals_pp,
               pred_assists_pp)

projections |> 
        slice_head(n = 3) |> 
        kable()  
player_idskaterpositiongppred_toi_x_pppred_pp_toipred_goals_x_pppred_assists_x_pppred_goals_pppred_assists_pp
8478402Connor McDavidF15810772040.00039370.00059060.00080390.0024978
8476453Nikita KucherovF16310572430.00032120.00061210.00053360.0020835
8477492Nathan MacKinnonF15310992700.00042320.00058420.00055620.0015168

Add Teams

The point projections are basically done at this point but it would be nice to have each skater’s current team included in the data. That requires pulling each team’s current roster.

This script pulls the active roster (player_id only) for every NHL team.

get_current_rosters <- function () {
        
        season <- "20242025"
        
        tri_code_url <- paste0("https://api-web.nhle.com/v1/club-schedule-season/mtl/", season)
        
        tri_code_data <- read_json(tri_code_url)
        
        tri_codes <- tri_code_data[["games"]] |>
                tibble() |>
                unnest_wider(1) |>
                filter(gameType == 2) |>
                select(awayTeam) |>
                unnest_wider(1)
        
        tri_codes <- unique(tri_codes$abbrev)
        
        base_url <- "https://api-web.nhle.com/v1/roster/"
        
        roster_data <- list()
        
        for (i in (1:length(tri_codes))) {
                
                temp_roster_data <- read_json(paste0(base_url, tri_codes[i], "/current"))
                
                temp_roster <- temp_roster_data |>
                        tibble() |>
                        unnest_longer(1) |>
                        unnest_wider(1) |>
                        mutate(team = tri_codes[i]) |>
                        select(player_id = id,
                               team)
                
                roster_data[[i]] <- temp_roster
                
        }
        
        roster_data <- roster_data |>
                bind_rows()
        
        return(roster_data)
        
}

teams_data <- get_current_rosters()

Join the teams data to the projections.

projections <- projections |>
        left_join(teams_data, by = "player_id")

NOTE: sometimes players do not appear on an active NHL roster even though they are expected to play in the upcoming NHL season. You can make adjustments after pushing the data to Google Sheets.

Push To Google Sheets

That’s it. Running the above code takes about 15 seconds (and most of that time is spent pulling the roster data).

Push the data to Google Sheets where the final point projections can be calculated and any desired adjustments can be made. You’ll need to create a blank spreadsheet in Google Sheets and then copy the URL to the google_sheet_url object in the following script.

google_sheets_url <- "YOUR URL HERE"

sheet_write(projections, 
            ss = google_sheets_url, 
            sheet = "raw_projections")

I’ve pushed the projections to my own Google Sheets and you can make a copy of them using this link.

Preview The Projections

Here are the Top 20 scorers based on these ultra simple skater projections (assuming that every skater plays 82 games).

RankSkaterTeamPointsGoalsAssists
1Connor McDavidEDM1424894
2Nathan MacKinnonCOL1365086
3Nikita KucherovTBL1333895
4Leon DraisaitlEDM1144569
5David PastrnakBOS1125458
6Mikko RantanenCOL1085058
7Artemi PanarinNYR1073968
8Jack HughesNJD1054263
9Auston MatthewsTOR1025844
10Kirill KaprizovMIN1015150
11Mitch MarnerTOR1003169
12Matthew TkachukFLA963264
13Elias PetterssonVAN943658
14William NylanderTOR944153
15Sidney CrosbyPIT933756
16Brayden PointTBL934944
17Jason RobertsonDAL923755
18Cale MakarCOL922270
19J.T. MillerVAN913457
20Jack EichelVGK903951