The case for using normalized ice hockey player statistics to identify uniquely qualified individuals within a talent pool.
Mack G. Stevens
Rader School of Business
Milwaukee School of Engineering
April 2017
Abstract
The subject of this study involves the raw player scoring statistics of the 2016-2017 Girls Minnesota High School Hockey League. A transformation technique is presented, and applied to the raw scoring data by normalizing the values with respect to a combination of three variables: (1) the relative strength of the opposing team with respect to the player's team; (2) the relative strength of the opposing team with respect to the overall league; and (3) the relative quality of the opposing goalie. The transformed data was then re-ranked sequentially and compared with the NCAA division 1 & 3 college hockey programs that the players had committed toward attending post-high school. It was found that the normalized data was able to reconfirm what the scouts from the major college ice hockey programs had previously evaluated in terms of quality of player. The 11th and 12th grade Forward players with larger normalized scores were uniformly committed to the higher profile programs while those with smaller values tended to be committed to either lower profile NCAA division 1 or NCAA division 3 women's programs, or in most cases uncommitted. A comparison between the normalized and raw top 100 scoring values of the cumulative NCAA division 1 committed players showed that the normalized data was an increasingly better predictor of scoring talent based on college recruiting offers to those players.
Keywords: quality-of-opponent, ice hockey, normalize, goalie, scouting
*email: [email protected]
1 Introduction
The business of ice hockey in North America and Europe is an ever growing and dynamic industry. To give you an appreciation of its magnitude, a 2015 report authored by Norman O'Rielly estimated that the total impact of hockey is worth in excess of $11.2 billion to the Canadian economy alone. Ice hockey teams range from developing youth programs starting with Mites, Squirts, Peewees, Bantams and Midgets (high school) to Juniors (under 21 years) to College (NCAA D1/D3) and various Professional leagues. At each step or next level, there is significant effort to identify and recruit uniquely talented players from the talent pool in the previous development level. This identification process has been traditionally performed by regional scouts associated with Juniors/College/Professional club teams. This process is not inexpensive and often involves many hours of travel plus hotel and per diem costs to reach a venue to scout a particular individual or group of players. Even the most grizzled and experienced regional scout or coach will tell you that a scouting trip at best can be a hit or miss scenario. Coaches and scouting personal typically rely on the word of mouth advice from other so-called experts in the industry, typically from player coaches or other program managers that have a special interest (financial or otherwise) to promote their current players to teams and organizations at the next level of play.
Since most teams and organizations do not have an unlimited scouting budget, and are also often squeezed for limited time during the winter season to scout games, it is paramount that they make scouting trip decisions based on the most complete and statistically unbiased sources. An extremely powerful tool that provides exactly this type of information has been developed by LSQ-Rank(TM).
LSQ-Rank is proprietary software that analyzes player and team statistics generated in competitive sports leagues to produce value-added statistics used to identify uniquely qualified individuals within a designated talent pool. This software adapts adjustment computation methods developed for spatial data analysis to the science of sports performance ranking. It uses least squares mathematics, linear algebra and calculus techniques to produce team ranking and relative strength of opponent data that mitigate the un-wanted effects of human bias.
Specifically, LSQ-Rank is a numerical ranking system that performs a least squares "best fit" of in-season performance statistics based on measurable attributes between sports teams. Rigorous algorithms are employed to compute composite margin of victory (MOV) and observation weights. The solution of the observation equations produces:
(1) LSQ-RANK, a relative ranking order of teams in a given league or interconnecting network;
(2) GOALDIFF, a modeled goal differential based on rank & score;
(3) RSOOPT, the relative strength of an opposing team(s);
(4) NPSTAT, normalized player statistics with respect to RSOOPT
A study was performed using girls hockey data produced in the Minnesota State High School League (MSHSL), and the surrounding states/provinces of Wisconsin, North Dakota and Manitoba. The teams within this "composite" league were systematically ranked throughout the regular and post-season playoffs. Based on the LSQ-Rankings, normalized individual player statistic were produced and attributed with other player data such as player position (forward, defense or goalie), year at school, and college commitment (NCAA D1 or D3). Statistical analysis of the normalized scoring data revealed that there are several unsigned or uncommitted high school seniors and juniors that could be a good fit at the college hockey NCAA D1 level with regard to scoring potential among their peers.
2 Literature
Based on a review of the available literature and blog sites concerning hockey analytics, it is found that the overwhelming use of a measure of strength of schedule is applied as a retrodictive modifier to adjust the ordinal ranking of a composite team relative to its league members. This strength of schedule (SOS) adjustment is found in systems such as RPI, KRACH and Z-Ratings employed by various College boards like the NCAA to rank league members for qualification into their post-season play-off tournaments (Carberry). These ranking systems use a probability model based on wins-losses-ties of a team's opponents and their opponent's opponents to compute a numerical modifier (Siouxsports). One of the more interesting contributors to the application of SOS is Dr. Micah Blake McCurdy of Halifax, Nova Scotia. Under his Twitter handle "@IneffectiveMath", and back in 2014 he wrote a white paper entitled "Schedule Adjustment for Counting Stats". In his article, Blake focuses on adjusting raw team puck possession statistics based on SOS to produce schedule adjusted Corsi counts. Some other sites dedicated to professional sport such as NFL football show non-rigorous steps of how to use SOS to enhance the methodologies used to predict games scores and odds of winning (Burke). The method used in this study to apply SOS is counter intuitive with respect to the norm because SOS or the Quality of Opposing Team (QOOPT) as it is referred to in this paper is derived directly from the numerical integration of the least squares ranking function. The LSQRank values already take into account which teams defeated other teams and by what modeled measure of success, specifically raw game score, although more sophisticated models tailored to ice hockey could be employed at a higher cost with respect to data collection. In addition, what most other applications fail to do is apply QOOPT as an assessment tool to predict generalized individual athletic talent as opposed to theoretical yardage that an NFL running back might gain for fantasy football betting purposes. The closest sports analogy that I can think of that could apply this paper's specific methodology might be used to assess a baseball players hitting ability/success, where a particular pitcher/catcher combination would assume the same function as the ice hockey goalie in this study.
3 Analysis
Minnesota girls high school hockey raw scoring and goalie data were collected and indexed from the three main girls high school hockey hubs in the upper Midwest: (1) Saint Paul / Minneapolis Star Tribune; (2) Wisconsin Prep Hockey; and (3) North Dakota High School Hockey websites. A sample of 831 players composed of 549 skaters (406 forwards & 143 defensemen) and 281 goalies were selected from within the combined league(s). On average, each goalie registered playing time in 13.4 games and skaters in this sample scored at least 1 point in 14.7 games. The raw save percentages for each goalie were converted to normalized values with respect to a measure of the caliber of the opposing team faced by the goalie (see Table C-1). A normalized save percentage value for each game that the goalie played in was computed, and summed over all games played to compute their corrected average for the season. These values were subsequently used to compute normalized scoring values for each skater. In plain terms, the normalization of scoring values account for the quality of competition that the player scored against. For example, consider the case where two players on the same team playing in the same game score points against two significantly different opposing goalies. This case occurs rather frequently at the high school level when the starting goalie of the losing team gets replaced after two periods of play to allow the back-up goalie some playing time in an apparent no-win game. Let's assume that Forward(A) scores two goals (1 even strength and 1 power play) against the 1st "starting" goalie, while Forward(B) scores a short-handed goal and an even strength assist against the 2nd "back-up" goalie. The adjusted (but not normalized) points for each Forward player would be:
Adjusted_Points(A) = (1 + 1(0.8)) = 1.80 (1-1)
Adjusted_Points(B) = (1 + 1(1.25)) = 2.25 (1-2)
This calculation corrects for the non-five-on-five game situation. These adjusted scoring points are now multiplied by a ratio that accounts for the quality of the opposing team/goalie combination (QOOPT) that the player faced when scoring. There are three factors that are considered: (1) the relative strength of the opposing team with respect to the player's team; (2) the relative strength of the opposing team relative to the overall league; and (3) the relative caliber of the opposing goalie. This ratio is represented by the following equation:
QOOPT = [RSOOP1/RSOPT] * [RSOOP2] * [Norm_GSV%] (1-3)
The value of [RSOOP1/RSOPT] * [RSOOP2] is the same for both Forward(A) and Forward(B).
For example, let
[RSOOP1 / RSOPT] = 0.9105140
[RSOOP2 ] = 0.9069118
And therefore [RSOOP1 / RSOPT] * [RSOOP2] = 0.82575591
Notice that both coefficients are less than 1.0. A value of [RSOOP1/RSOPT] being less than 1.0 means that the opposing team in this case is a slightly lower rated team compared to the player's team, and further more that the player receives less than "full credit" for that scoring instance. On the other hand, if the opposing team was a higher rated team than the player's team, then the [RSOOP1/RSOPT] ratio would be greater than 1.0, and the Forward would receive a small premium on that particular scoring event. The [RSOOP2] coefficient represents the relative strength of the opposing team with respect to the entire composite league(s) under consideration. A value of RSOOP2 being less than 1.0 represents the situation were the team in question is rated less than the "standard" team in which all teams are being compared against. In fact, in the look-up table there is only one team that has an assigned value of 1.0 whereas all other teams have a value less than 1.0.
The third coefficient [Norm_GSV%] represents the average normalized goal save percentage statistic for that goalie over the entire course of the season. All values in this look-up table are less than 1.0, whereas a value of 1.0 represents a "perfect normalized" save percentage by a theoretically perfect goalie. For example, the highest rated normalized (NSV%) in the league was computed to be a value of 0.858 while this particular goalie's raw save percent over the course of the season was 0.930. For computation sake, let the
starting goalie (#1) [Norm_GSV%] = 0.766
back-up goalie (#2) [Norm_GSV%] = 0.522
Substituting these values into the normalization equation, the computed normalized scoring values for each forward are:
Forward(A) = (1.80) * (0.82575591) * (0.766) = 1.1385522
Forward(B) = (2.25) * (0.82575591) * (0.522) = 0.9698503
As one can clearly see, in the case of Forward(A) her original raw scoring value of 2 points was initially corrected by 0.2 points to account for the 5-on-4 power play situation and further more had a 0.66 point deduction based on the caliber of the opposing team/goalie combination. In the case of Forward(B), she received a 0.25 points surplus to correct for the 4-on-5 "short-handed" goal that she scored, but because it was against a lesser capable goalie, her raw score of 2 points was ultimately reduced to approximately 1 point. Frequency histograms of both the raw and normalized points were plotted. Figure No.1 below shows both the histograms of the "raw" points scored by the entire 549 skater sample and the re-scaled distribution of the normalized scoring data for the same 549 players. Figure No.2 contains a plot comparing raw goalie save percentages versus rescaled and adjusted normalized (NSV%) values.
Figure No.1: Raw vs Normalized scoring points MN Girls HS hockey (2016-17)
Figure No.2: Normalized goalie save%
Table no.1 below contains the descriptive statistics contrasting the raw scoring points versus the normalized scoring points for the Minnesota girls high school hockey 2016-2017 season. As one can see from figures No.1 above, the data range is re-scaled (reduced) by approximately 45% and the maximum value drops from 86 (raw) points to 36.5 (normalized) points. The standard deviation of the data set changes from 13.5 (raw) points to 5.5 (normalized) points.
Table No.1: Descriptive Statistics (Raw vs. Normalized)
Scoring Points
Girls Minnesota HS Hockey 2016-2017
Raw Normalized
Sample Size 548 548
Max 86 36.5
Min 5 0.3
Range 81 36.2
Std Dev 13.5 5.5
Mean 25.1 8.2
Median 22 6.8
The normalized scoring data was subsequently sorted by player year at school and player position (Forward or Defense). Ordered lists were printed containing name and player normalized points, player high school and committed NCAA division 1 or division 3 college program (see appendix A). The following correlations were made between the normalized scoring points and the college program that the skater has verbally committed to attending. These statistics are summarized in Tables No.2 & No.3 below. The pattern that immediately stands out is intuitive, that is the greater normalized scoring points that a player has earned, and the more likely that she has been signed or committed to a NCAA D1 college ice hockey program. Among the 10th to 12th grade players at the forward position, it is approximately a 90% certainty that they have been signed at the D1 level by earning a normalized scoring value of 20 or more points. This same rate drops below 50% for those girls scoring 15 to 20 normalized points, and drops yet again to approximately 20% for the 10 to 15 point category. For players at the defensemen position, the correlation to the normalized scoring statistic is not as distinct as it is with players at the forward position. The reason for this is that defensemen on average have significantly fewer direct scoring opportunities and often the bulk of their scoring points came from assists generated by goals scored by their teammates at forward.
Table No.2: Forwards - Player College Commit vs. Normalized Points
Year Pos Normalized
Points Scored NCAA D1 NCAA D3 Un-Committed
12 F 20+ 10 of 11 0 of 11 1 of 11
12 F 15 - 20 8 of 17 2 of 17 7 of 17
12 F 10 - 15 5 of 32 4 of 32 23 of 32
12 F 5 - 10 3 of 50 47 of 50
12 F 0 - 5 30 of 30
11 F 20+ 8 of 9 1 of 9
11 F 15 - 20 4 of 9 5 of 9
11 F 10 - 15 7 of 25 18 of 25
11 F 5 - 10 42 of 42
11 F 0 - 5 35 of 35
10 F 20+ 3 of 3
10 F 15 - 20 1 of 4 3 of 4
10 F 10 - 15 3 of 16 13 of 16
10 F 5 - 10 1 of 30 29 of 30
10 F 0 - 5 1 of 29 28 of 29
9 F 10+ 1 of 16 15 of 16
9 F 0 - 10 47 of 47
Table No.3: Defensemen - Player College Commit vs. Normalized Points
Year Pos Normalized
Points Scored NCAA D1 NCAA D3 Un-Committed
12 D 10+ 5 of 10 1 of 10 4 of 10
12 D 5 - 10 8 of 32 3 of 32 21 of 32
12 D 0 - 5 20 of 20
11 D 10+ 4 of 6 2 of 6
11 D 5 - 10 2 of 17 15 of 17
11 D 0 - 5 1 of 19 18 of 19
10 D 10+ 1 of 2 1 of 2
10 D 5 - 10 6 of 12 6 of 12
10 D 0 - 5 11 of 11
9 D 10+ 1 of 1
9 D 0 - 10 13 of 13
The other main factor that reduces a strong correlation to scoring points is that being a capable defenseman requires mastering many other non-scoring related skills that are not always logistically measurable, or if so, they are just not collected or recorded at the high school level. An example of this would be player "plus-minus" statistic, in which a player earns a positive '+1' tally point if they are in play on the ice when a teammate scores a goal, and conversely receives a minus '-1' tally point if they are on the ice when the opposing team scores a goal against their team. For defensemen, a high positive plus-minus stat could be a good indicator of a great player, whose overall game play on the ice highly contributes towards teammates scoring goals. With that said, raw plus-minus stats can be deceiving in the same way that raw scoring stats are used, and should also be normalized with respect to the quality of the opposing team. Another notable pattern in the data that is apparent is that girl high school players typically do not commit to a NCAA D3 hockey program until after the completion of their senior (12th grade) year hockey season. The reason for this is that some very capable players are still waiting on a D1 opportunity or weighing the option of playing an additional "gap" year at the USA/CAN Girls 19U AAA level in order to secure a NCAA D1 scholarship opportunity during the year following high school graduation. This growing trend of playing a gap year as it is called in girls hockey is similar to young men playing Junior A's (20 and Under) for a year or two before going to a NCAA D1, NCAA D3 or ACHA college program.
Finally, comparing the running cumulative total of NCAA D1 player commitments, i.e. the normalized top 100 versus the raw top 100 clearly shows that the normalized values are a significantly better predictor of scoring talent than the un-process raw data. In Figure No.3 below, the normalized trend line is always above the raw trend line, and steadily increases its separation as you traverse down the list from 1 to 100. In other words, the normalized data increasingly becomes a better predictor of scoring ability than the raw data.
Figure No.3: Cumulative NCAA division 1 commitments
4 Discussion
The use of point scoring data normalized for quality of an opposing team & goalie has four main advantages over the use of raw point statistics. These advantages are:
(1) It allows an organization to find the overlooked gem stones, i.e. those players who have for what ever reason have slipped through the traditional recruiting/scouting cracks;
(2) It can guide the planning process to make the most efficient use of recruiting resources, time and budget ($$);
(3) It can reconfirm what coaches and scouts are seeing in a player based on limited observation time/exposure; and
(4) It provides a continuous numerical scale of the players scoring potential rather than a simple ordinal scale.
Finding those unnoticed gems before your competition does is every coach's and scouts desire because it gives them the one up in the recruitment process. The sooner one can identify a potential fit into your hockey program, easier the process usually becomes until a final commitment is made on both sides between player and next level hockey club. Sometimes, a player for whatever reason is either overlooked or does not play in games that attracted interest from various next level programs. A case in point is a specific 12th grade Forward that played for Forest Lake high school located in the far northern suburban fringes of the Twin Cities of St.Paul/Minneapolis. This particular player is highlighted in yellow in Table A-1 (Appendix A) and Table B-1 (Appendix B). At the time of writing this paper, there has been no indication on various web sites which publishes these things that she has been made an offer for a D1 college hockey commitment. This seems perplexing because based on first hand knowledge of this particular player's hockey abilities and seeing her play in various high school and spring/summer AAA venues against other players with college D1 commitments, she is a likely D1 caliber recruit. Why has this player (initially) been over looked? First, perhaps a D1 college scout did attend a game she played in, but she had a "bad" game which happens even to the best of elite players. Based on that single performance it could have biased that scouts impressions of the player in question. Also, looking at this player's regular season raw scoring statistics from the Girls MN Hockey Hub, her numbers were somewhat upper mid-range over the entire league. Her raw score was 42 points which tied her for 40th place overall. But once one normalizes this raw statistic with respect to the relative quality of teams that she played against, her rank jumps dramatically upward to 14th (23.22 normalized points) in the league overall, and places her among those already committed to other prime time women's D1 college programs. A favorable scoring statistic does not always represent the true nature of a player's abilities, but none the less it is a strong indicator. By using a normalized point score statistic, a scout or coaching staff can overcome the impressions of observing a singular event, and mitigate the adverse affects of witnessing a "false-negative" performance in the case of the player having a poor outing, or conversely witnessing a "false-positive" in the case where the player is having an outstanding game (outlier) of their life, but several standard deviations above their normal level of play.
A second case in point, a 11th grade Forward highlighted in green in both Table A-1 & Table B-1 (Appendix A & B), is being actively considered by Rochester Institute of Technology (RIT), an eastern D1 college program. After reviewing her normalized point scoring data relative to the entire MSHSL league, their coaching staff commented that this value-added statistical information greatly reinforced what their assistant coaches were seeing on the ice with this player, that she can "put the puck in the net" (McDonald). This is a very good thing in hockey slang. More importantly, by being able include this normalized player scoring information into their evaluation process, RIT can expedite their decision to offer this player a position in their program, and therefore devote resources towards other unsettled and timely recruiting efforts.
A third case in point involves two players, both 12th grade Forwards who played their high school hockey in not-primetime programs, one for the Saint Paul Blades Coop located in the heart of Saint Paul, MN and the other for Moose Lake-Willow River located on the southern fringes of Northeastern Minnesota literally in "drive-by" country along route I35 from the Twin Cities to Duluth, MN. The Moose Lake program like most community based hockey in rural Minnesota has limited resources, but receives strong local support. Whereas the viability of the Blades Coop program is often in question from year to year due to the cyclical nature of the local demographics, and whose talent pool is heavily affected by other nearby primetime "private-school" programs that literal siphon off the best talented players in Saint Paul school district. Regional rankings for these two high school hockey programs are shown in Figure D-1 (Appendix D). Although this paints a bleak picture for these two programs, every once in a while they produce a couple of singularly talented players. These two players (we will call them HZ & JB) are highlighted in blue in tables A-1 and B-1. The reason they are special is that both lead the state high school league (the entire state of Minnesota!) in scoring statistics, i.e. HZ was the regular season raw goal scoring leader, and similarly JB was the raw total point leader (goals & assists). Both are obviously very talented athletes in their own right, but both come from programs whose teams played mostly other marginal to weak teams, with the occasional primetime team in the schedule mix. Both players HZ & JB, if they elect to start college right away next year (2017-2018) as true freshman, are probably not bound for a D1 college hockey program. Player HZ had every woman's D3 college hockey program calling and recruiting her to play at their school within a proverbial 500 mile radius, and had two local Minnesota D1 programs (MSU & SCSU) asked her to try out as a "walk-on". She has committed to play at UW-Eau Claire which is an excellent NCAA D3 public university (part of the UW system campuses) located off of I94 in northwestern Wisconsin, approximately 90 minutes due east of her home in Saint Paul, MN. Player JB is most likely experiencing a similar situation as player HZ.
So the nagging question that comes to mind is are both HZ and JB, although they both lead the state in scoring, are they being over looked by D1 college programs simply because of the teams they play on and against? To answer this question, one must compare their normalized scoring statistics against their peers. Table no.4 contains the data for both players.
Table No.4: Raw vs. Normalized
MSHSL Total Points Scoring 2016-17
Player RAW Normalized
JB 75 (rank = 1st) 15.441 (rank = 53rd)
HZ 63 (rank = 8th) 16.514 (rank = 43rd)
In HZ's case, her normalized statistic dropped her from a raw rank of 8th to a normalized rank of 43rd, while JB plummeted from a raw rank of 1st to a normalized rank of 53rd. Comparing the five immediate peers both above and below HZ in the normalized rankings Table no.B-1, i.e. ranks 38 to 42 and 44 to 48 we find that 4 out of 5 above her and 3 out of 5 below her are bound or committed to major D1 college programs. In the case of JB, only 2 out of 5 of her closest immediate peers (both above and below) are headed towards major D1 college programs. Its seems plausible that the answer is yes, that these two players, HZ & JB, are both being overlooked by the regional D1 college programs, furthermore because their geographic location in Minnesota, they both lack exposure to Eastern D1 universities. With that said, both players HZ and JB, if they have D1 hockey ambitions and are willing to delay full-time higher educational plans, they should strongly consider playing a gap year with a high profile USA/CAN 19U AAA women's program to gain valuable exposure to major college programs and grow and refine their hockey skills with other players on par with their skill level.
The next case in point that we will emphasize is the advantage of using a continuous numerical scale of normalized values over a stepped ordinal scale of whole numbers. The raw scoring values published on the Girls MN Hockey Hub under the "state leaders" web page is little more than a public forum to give the girls in the "out-state" (non-metro players) a symbolic pat on the back. It lists the top 100 scorers around the MSHSL in ordinal sequence without regard to quality or level of play environment that these scoring statistics were produced in the first place. The publishers of this top 100 list are conveniently ignoring the fact that the level of play is not in anyway shape or form a uniform distribution across the league. In figure D-1 (Appendix D), a plot is shown that illustrates the tremendous variation in level of play among the various leagues in the North Central Region which includes the state of Minnesota. The plotted trend lines are non-linear in shape and have a distinct downward slope, and the steeper the trend line slope, the greater the variation in quality of play exists within the league. With this in mind, the process of normalizing scoring statistics with respect the relative strength of the opposing team can transform context-less raw data into powerful value-added information. Figure No.4 compares the raw top 100 scoring list against the normalized top 100 scorers. The two trend lines are both annotated with several players common to both top 100 lists. It should be pointed out that not all of the same players exist in both trend lines. A printed list of the normalized top 100 scores can be found in Table B-1 (Appendix B). The first two columns of this table contain the true rank and normalized scoring point value, respectively, while the last two columns contain the regular season raw scoring points and their ordinal state-wide rank. The notation of "<100" indicates that this particular player was not contained in the raw top 100. As one can see from both the Table B-1 and Figure No.4 (right) that the raw ordinal data is a stepped pattern, scoring points, i.e. several players may share the same ordinal (integer) value for total points scored. The normalized data on the other hand is a relatively smooth curve that can be modeled with a higher order polynomial equation, thus making it extremely useful in subsequent analyses whether they are strictly qualitative or quantitative applications. As one can see it is much easier to distinguish the relative point differential between any two players using the normalized data as opposed to flipping a coin over several players with the same ordinal value in the raw data list. Figure No.4: Top 100
5 Comments about the study
It should be noted that neither data set, raw nor QOOPT-normalized, have been adjusted for overall playing time of the athlete. It is a simple fact of high school hockey that the better players (or perhaps the preferred players handed the golden baton) on any one team over the course of the season will receive the lion's share of the available playing time. This in turn has a snowball effect on the accumulation of scoring points, the longer a skater is on the ice, the more likely that he/she will be presented with a scoring opportunity. In an ideal world, it would be advantageous to be able to normalize this scoring data with respect to the player's ice time in a game situation. On a team with many talented players, ice time is naturally divided more evenly as opposed to teams with a singularly dominate player who rarely comes off the ice to the bench to rest during a game. This observation partially explains how (previously discussed) players such as HZ and JB were able to accumulated an extraordinary tally of raw scoring points over the course of the 25 game high school season. They were both, by exponential margins, the "Go-To" player on their respective teams.
Additionally, during "blow-out" games where the better team has a six or more goal lead differential over their opponent, MSHSL protocol invokes "running" time during the 3rd (final) period of play. In this case, the time clock does not pause during a stoppage of play, but instead continues to run as a way to lessen the psychological trauma that the losing team may experience due to the over-whelming disparity of the game outcome. Yes it seems a wee bit silly to do this, but none the less the net effect is that it significantly shortens the length of the game, thus reduces the scoring chances of the players on both squads. Also in the case of a blow-out game, coaches with teams having a deeper talent pool of reserve players tend to bench their starting players during the third period to allow less experienced underclassmen more game time. Conversely, mediocre teams where the talent pool is shallow such as the St. Paul Blades Coop and Moose Lake which players HZ and JB played on respectively, can ill afford to bench their best player even with a huge lead without fear of blowing the game.
Typically, at the high school or Midget level of ice hockey, the only position that has its playing time recorded during a match is the goalie position. To collect playing time data for Forwards and Defensemen in high school games would require special arrangements with the coach/manager to garner student volunteer help throughout the season to tally the data precisely. Although with that said, it might be possible to model playing time indirectly through other recorded game statistics. Future studies will attempt to introduce a numerical model for playing time in order to use it as an additional normalization variable.
6 Conclusions
Sports leagues such as in ice hockey can sometimes have a large variation of quality of play from top to bottom within a given league. More often than not, league officials or proponents publish raw data statistics and unknowingly make or intentionally infer faulty generalizations based upon the raw data patterns. By normalizing these raw data statistics with respect to a critical variable or attribute, previously misleading patterns can be transformed into powerful value-added data. The subject of this study involved the raw player scoring statistics of the 2016-2017 Girls Minnesota High School Hockey League. A transformation technique was presented, and applied to the raw scoring data by normalizing the values with respect to a combination of three variables: (1) the relative strength of the opposing team with respect to the player's team; (2) the relative strength of the opposing team with respect to the overall league; and (3) the relative quality of the opposing goalie. The transformed data was then re-ranked sequentially and compared with the NCAA D1 & D3 college hockey programs that the players had committed toward attending post-high school. It was found that the normalized data was able to reconfirm what the scouts from the major college ice hockey programs had previously evaluated in terms of quality of player. The 11th and 12th grade Forward players with larger normalized scores were uniformly committed to the higher profile programs while those with smaller values tended to be committed to either lower profile NCAA D1 or D3 women's programs, or in most cases uncommitted. A comparison between the normalized and raw top 100 scoring values of the cumulative NCAA division 1 committed players showed that the normalized data was an incrementally better predictor of scoring talent based on college recruiting offers to those pool of players. It is projected that broad use of this type of normalization technique will significantly save college athletic departments and other for-profit sports organizations valuable resources and allow them to optimize their recruiting strategies to make better use of their limited monetary budgets.
7 References
O'Reilly, Norm. (2015, May 15). Here's exactly how much hockey is worth to Canada's economy. Canadian Business. Web. < http://www.canadianbusiness.com/economy/canada-size-of-hockey-economy/>
McDonald, Scott. (2017, March 12). Head Coach Women's Hockey. Rochester Institute of Technology. Personal communication with Mr. William "Billy" Klein.
8 Glossary
Mites USA/CAN Youth hockey classification 8 years old & under
Squirts USA/CAN Youth hockey classification 10 years old & under
Peewees USA/CAN Youth hockey classification 12 years old & under
Bantams USA/CAN Youth hockey classification 14 years old & under
Midgets USA/CAN Youth hockey classification under 19 years old
Juniors USA/CAN Youth hockey classification under 21 years old
Seniors USA/CAN Youth hockey classification 21 years old & above
False-Negative
A test result that indicates a person does not have a disease or condition when the person actually does have it.
False-Positive
A test result that indicates a person has a disease or condition when the person actually does not have it.
Normalization
Normalization of ratings means adjusting values measured on different scales to a notionally common scale, often prior to averaging.
Outlier
A data point that is distinctly separate from the rest of the data. One definition of outlier is any data point more than 1.5 interquartile ranges (IQRs) below the first quartile or above the third quartile.
Put the puck in the net
Ice hockey slang for scoring a goal against an opposing team.
Power-Play
In ice hockey, a team is said to be on a power play when at least one opposing player is serving a penalty, and the team has a numerical advantage on the ice (whenever both teams have the same number of players on the ice, it is called even strength).
Short-Handed
In ice hockey, a team is said to be short handed when at least one player is serving a penalty, and the opposing team has a numerical advantage on the ice (whenever both teams have the same number of players on the ice, it is called even strength).
Great article!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit