...And You Will Know Me By The Trail of Papers: Fortnite Reddit prefers female skins

Fortnite is the most popular game of 2018-2019 (and I would argue one of the best designed games ever). “Skins” in Fortnite are costumes for your character, letting you dress up as an astronaut, bigfoot, or a fish controlling a robot body. The release of each new skin is met with discussion online; recently, a black female skin “Luxe” was released, and people seemed unusually critical of the skin. Since I had all of the code from my NBA and NFL analyses*, I figured I would investigate to see how skin race and gender influences sentiment on Fortnite reddit. In terms of results, the two headlines are:

Raw mentions for skins is primarily driven by time since release, and whether a skin was included in a Battle Pass
Sentiment is higher for female skins

* Don’t let “I already have the code” fool you on a data science project. Most of the work was in collecting the data and cleaning it.

Methods

For details of my methods, see this blog post. In brief, I scraped Fortnite reddit for comments from January 2018 through July 2019, with the help of pushshift.io. I then performed named entity recognition* to identify which posts were about Fornite skins. To quantify which skins were most liked, I used VADER with a lexicon modified for Fortnite. For covariates, I scraped two Fortnite skins websites, Gamepedia and Progameguides. All notebooks used for the project can be found on github.

* For NER this time, I tried fine-tuning spaCy’s NER model. I labeled entities for ~300 comments. I found that for skins that had 5+ labels, the NER worked fairly well (decent recall on spot-checked posts). However, for skins that had <= 3 labels, the recall was abysmal. Rather than hand labeling 1,500 comments (5 for each skin), I decided to go back to simple regex extraction for skin names

Analysis of which skins get discussed the most

Before diving into which skins had the highest sentiment, I first wanted to see which skins were discussed the most. Here are the five most commented skins:

Skin	Mentions
Omega	40,300
John Wick	29,900
Drift	29,400
Skull Trooper	23,500
Black Knight	17,900

To understand why these skins are popular, it helps to know a bit about how Fortnite is played. Every 3-4 months, the developers release a “Season,” which includes a “Battle Pass.” The Battle Pass costs around $10, and includes access to a large number of skins that get unlocked as you play. Three of the above skins are from the Battle Pass (Omega, Drift, and Black Knight). John Wick is a skin from cross-promotional advertisement from John Wick 3; and "John Wick" was also the nickname for a Battle Pass skin, The Reaper. Finally, Skull Trooper is an old skin from October 2017 that was the signature skin of a Fortnite streamer, Myth.

The distribution of number of skin comments followed an exponential distribution. Here is the number of comments per skin, in rank order (note the log-scale of the y-axis:

To better understand what is driving skin discussion, I performed a regression with a target variable of log(skin mentions). The covariates for this regression were:

Skin gender (including non-human skins)
Skin race ("non-human"; or "not visible" for some skins)
Number of days since release
Whether the skin was part of a Battle Pass
Whether the skin was a tier 1 or 100 Battle Pass skin

Whether the skin / character was featured in the Fortnite “story”

The results of this regression were that the primary drivers of skin discussion were time since release, and whether it was part of a battle pass. Skin “demographic” features did not matter. Here are the coefficients and p-values for the different features (coefficient is in log units):

Feature	Coefficient (p-value)
Battle Pass	1.4 (< 0.001)
Tier 1 skin	0.6 (0.17)
Tier 100 skin	1.3 (0.015)
Story skin	2.1 (0.002)
Days since release	0.004 (< 0.001)
Race (black)	-0.3 (0.44)
Gender (male)	-0.1 (0.46)

Analysis of what drives skin sentiment

In addition to analyzing which skins are discussed most, I wanted to understand what drives which skins are liked or disliked. I used VADER to analyze sentiment towards skins on a sentence-by-sentence level, then averaged the sentiment to get average sentiment towards each skin. The sentiment for each sentence can range from -1 to 1. Here the most liked and disliked skins in the sample (minimum 20 mentions):

Most liked skins		Least liked skins
Skin	Mean sentiment	Skin	Mean sentiment
Straw Ops	0.23	Shaman	-0.11
Psion	0.22	Birdie	-0.06
Scarlet Defender	0.21	Hypernova	-0.05

All of these skins are less popular skins, which highlights one of the biases of this analysis: people who discuss skins may have stronger opinions; and this bias may be biggest for the least discussed skins. Of these skins, only Hypernova is male, which may indicate that female skins have wider variance (both more liked and disliked).

To investigate that hypothesis, we can plot the distribution of sentiment towards both male and female skins. The overall mean sentiment skins was 0.084, with STD of 0.58:

While there are both well liked male and female skins, there is a large swath of male skins with neutral opinion (0-0.1).

To complete the analysis, I ran a regression targeting mean sentiment for each skin, with the same features as before. I started with a simple specification with covariates for race and gender. In this specification, the coefficient for gender was significantly negative for male skins (-0.01, ~ 0.15 standard deviations); no racial coefficient was significant. I then ran a complete specification with all covariates, and got similar results.

Covariate	Coefficients for spec 1 (p-value)	Coefficients for spec 2 (p-value)
Race	NS	NS
Gender (Male)	-0.011 (0.012)	-0.012 (0.007)
Battle Pass, etc.		NS

Discussion

In the first part of this analysis, I found that skins featured in the Battle Pass were discussed more often. This makes intuitive sense, as these skins are featured on splash screens and marketing for Fortnite. Many of these skins also have unlockable content, which people discuss how to unlock.

In terms of sentiment, I found that male skins had lower sentiment than female skins. One possible explanation for this is that Fortnite reddit skews towards young men, who might simply be more attracted to female skins. Popular streamers like Daequan often objectify female characters, making comments like, “Gimme that booty!” Another potential explanation is that female skins may have more diverse aesthetics, which allows people who prefer those aesthetics to attach to those skins. For example, many male skins share standard military profiles, and are relatively indistinguishable. In contrast, female skins can express a wider range of emotions, and may have more variety in clothes (skirts, tops, etc.). Some tangential evidence for this may be the large number of male skins with neutral sentiment.

As a final note to myself, if I want to revisit this type of analysis in the future, I need to improve the sentiment analysis. While I believe the assumptions of my current model – that sentence level sentiment reflects skin sentiment – is broadly true, in checking my data a large minority of samples have inaccurate sentiment. While performing more sophisticated sentiment analysis may take more time, it should give me a better estimate of entity sentiment, and frankly feel less hacky.

...And You Will Know Me By The Trail of Papers

Pages

Saturday, September 7, 2019

Fortnite Reddit prefers female skins

Methods

Analysis of which skins get discussed the most

Analysis of what drives skin sentiment

Discussion

No comments:

Post a Comment