Exploring a Dataset on Video Game Sales

Jordan Maulino
Final Project
CS360
Prof. Sophie Engle

Motivation

When deciding on a dataset for my final project, I wanted choose something that I thought would be interesting to explore. As someone who spent a lot of time playing video games growing up, I was excited to stumble upon this dataset while perusing Kaggle. After looking at the dataset and seeing what kind of features it contained, I thought it would be fun to make some visualizations that provided some insights on different types of videogames, their ratings, and their sales.

The Data

I found this dataset on Kaggle.com, which is a website that hosts predictive modeling and analytics competitions. It contains close to 17,000 video game observations along with 16 columns containing information on each videogame's:

Name
Platform
Year of Release
Genre
Publisher
Sales - Global and National(North America, Europe, Japan)
User and Critic Scores (from Metacritic.com)
Developer
ESRB Rating

Processing the data

I processed this dataset using Trifacta Wrangler to remove all of the incomplete entries (any entries that had empty or null values). This left me ~6,900 complete obvservations to work with. The data didn't need much further processing after that. Although I decided to keep all of the columns, my visualizations only use the columns concerning a videogame's name, platform, genre, sales, and user/critic scores. In addition, I took subsets of this processed data (using Excel) in my Parallel Coordinates and Multiline Time-Series visualizations to make the data a little easier to manipulate in D3.

You can use the toolbar above to view each visualization.

About

Jordan Maulino