I
previously published results using sentiment analysis to show that commenters on the NBA reddit had higher sentiment towards young players, high-scoring white players, and white coaches. In this post, I am going to extend that analysis to the NFL reddit. My overall finding is that like the NBA, NFL redditors like rookies and players that gain more yards. However, I did not find any significant coefficients for race. For coaches, I found the commenters like coaches that outperform expectations, but again did not find evidence of bias.
Brief review of the method
To try to understand what factors (e.g. performance, age, race) influence player popularity, I scraped millions of comments from r/NFL from 2013-2018. I then quantified each commenter's opinion toward players using the sentiment analyzer VADER. This analysis calculates whether a word is positive (“GOAT!”) or negative, and ties that feeling to a player. Sentiment scores generally ranged from -0.2 to 0.3. Finally, I quantified the impact of each factor on popularity by performing a least-square regression with an outcome variable of sentiment towards a player. Details of the analysis are in the
previous post, and a series of notebooks covering
scraping data,
sentiment quantification, and
regression.
Unique difficulties of performing sentiment analysis
Compared to the NBA, two factors make analyzing the NFL more challenging: the larger number of players; and the smaller number of players with comparable stats.
The larger number of players made resolving which player was being talked about more difficult (named entity recognition). I performed named entity recognition by identifying comments that contained player first or last names, then matched those names to players. If an NBA commenter mentioned "Blake," I could link that comment to Blake Griffin since there is only one active "Blake" in the NBA. However, in the NFL, "Blake" could refer to
Blake Bortles, Blake Bell, Blake Jarwin, or others. This means that the best way to identify comments about NFL players is to find full name matches, in contrast to how people normally talk about players (just their first or last name). A second side effect of the larger number of players was that comment-player matching took longer, as the way I implemented it took O(n^2).
The nature of the NFL also made it harder to compare player stats. In basketball, all players score, rebound, and assist. In the NFL, half the players play defense; and on offense, lineman touch the ball; I ended up only analyzing skill-position players. This means that despite the NFL having more players overall, there were fewer players to analyze, and the statistical power of the analysis was lower.
Most and least popular players
To try to compare players between different positions (QB, RB, WR, and TE), I used Football Outsiders'
advanced metric DVOA, which is defense adjusted z-score of yards above average. For example, if a QB and WR both had a DVOA of 1, they would both be one standard deviation better than the average player at their position.
Here are the most and least popular players for the years 2013-2018:
Lowest Sentiment Seasons Highest Sentiment Seasons
Player | Year | Avg Sentiment | Player | Year | Avg Sentiment |
Michael Bennett | 2017 | -0.25 | David Johnson | 2016 | 0.24 |
Danny Trevathan | 2015 | -0.15 | JJ Watt | 2017 | 0.22 |
Vontaze Burfict | 2015 | -0.15 | JJ Watt | 2016 | 0.22 |
Vontaze Burfict | 2016 | -0.14 |
| | |
On the unpopular side, Michael Bennett in 2017 was one of the first players to kneel during the national anthem; and Vontaze Burfict has a reputation as a dirty player. For the popular players, all three are pro Bowlers. While I am not an avid NFL fan, these results pass my smell test.
Regression results
As I did with the NBA, to quantify which features were important, I ran a weighted least square regression with clustered standard errors at the player level. Due to the lack of features, I only present two specifications here. First, I ran a regression with DVOA as the only feature (spec 1). This single coefficient was statistically significant (t-stat of 2.1), albeit weakly, and shows people prefer high-achieving players. I then ran a second regression adding features for age, and race for players; and city-level statistics for commenters (spec 2). In this regression, DVOA was no longer significant, while the rookie coefficient was. Neither the overall age or race coefficients were significant.
Coefficient | (1) | (2) |
DVOA | 0.0057 (2.1) | 0.0051 (1.4) |
Rookie | - | 0.018 (2.4) |
1 year of youth<27 td="">27> | - | 0.0011 (0.7) |
Race (white) | - | 0.0067 (1.2) |
Compared to the NFL results, fewer coefficients were significant, and the t-statistics were smaller. I believe this is in part due to the problems mentioned previously: difficulty in matching players to comments; and fewer player performance features to compare with.
Beyond statistical power, this analysis be be complicated by the assumption that NFL fans can (or do) compare players from different positions fairly. Int his analysis, different positions are mixed together, but we are using a synthetic stat (DVOA) to compare them. While two players may be equivalent by DVOA, it may be hard for a casual fan to see that; instead they likely consider more basic stats, which could skew results (e.g. quarterbacks gain more yards than tight ends). While I had a categorical variable for position which was not significant, there may be more subtle influences.
Another difference between the NFL and NBA is that NFL players wear helmets, making personal connections weaker.
NFL coaches
We can also calculate sentiment towards NFL coaches. Here are the most popular and least popular coaches:
Coach | Year | Avg Sentiment | Player | Year | Avg Sentiment |
Sean McVay | 2017 | 0.28 | Gregg Williams | 2017 | -0.29 |
Adam Gase | 2016 | 0.27 | Gregg Williams | 2016 | -0.23 |
Todd Bowles | 2015 | 0.27 | Sean Payton | 2015-2016 | -0.11 |
Marc Trestman | 2013 | 0.24 | Mike Smith | 2013 | -0.08 |
The most popular coach was a young, rookie coach who improved the Rams' wins by 7 in a single year. On the unpopular coach side, Gregg Williams was involved in "bountygate," a scandal where coaches were paying players for causing injuries.
Having looked at the most and least popular players, we can again perform a regression:
Coefficient | Magnitude (t-statistic) |
Win % over expectation | 0.22 (2.1) |
Age (years) | -0.2W (-2.0) |
Tenure with team (years) | 0.25W (1.95) |
Race (white) | Not significant |
(For non Win coefficients, I expressed the magnitude in terms of wins.)
As with NBA coaches, NFL coaches were more popular when they outperformed expectations, were younger, or had longer tenure. In contrast to the NBA, where there was significant and large bias against coaches worth ~10 wins / year, we could not detect bias against NFL coaches. This could be due to due to a lack of power (there are many fewer NFL black coaches than NBA coaches); reflect differences in media coverage; or indeed reflect decreased bias among NFL commenters. It is interesting that NFL coaches are perceived to be much more important than NFL coaches, yet bias is less prevalent.