Two things I love, analytics and Science Fiction!
So when I stumbled on a dataset about UFO sightings, I knew I had to analyze it! This also turned into the perfect opportunity to showcase how I use KNIME to clean, prep and do some analysis on my datasets. I made a 4 part series on the data prepping portion, and the links to those are at the bottom of this article. This post will give an overview of my process, and my findings.
My primary dataset, which I got from Kaggle, consisted of UFO sightings from the early 1900s to 2014. I decided to restrict my final dataset to events in the US from 1950 to 2013, because other time ranges and countries did not have enough representation for inclusion.
My secondary datasets were lists of movies about aliens from Wikipedia and IMDB
KNIME is really the backbone of all my analytical endeavours. I used it here for data cleaning, prepping, variable data type assignment, calculations, I/O, correlational analysis, and a bunch of other things.
PowerBI is my go to visualization software. I used it to make an interactive dashboard around UFO sightings, and I also used it for some variable transformations that I did not do in KNIME.
I used Excel to do some primary data cleaning, and to store the final datasets, which were then imported to PowerBI.
You can download the dashboard from my Google Drive. I hope you have a lot of fun exploring it, because I sure did! But at the very least, enjoy the handsome picture of our super cute alien, Groot!
UFO sightings exploded right before the year 2000. This coincides with the prevalence of the World Wide Web. It is possible that as more people conversed across vast distances, more individuals felt comfortable enough to report their experiences, and some copy-cats could have gotten in on the fun? Maybe.
Aliens seem to love the water!
Perhaps they have mastered the art of green energy, and need a massive amount of hydro to power those gigantic thrusters!
However, a more plausible explanation is that there is less light pollution on the beach, so you are more likely to see light in the sky.
Speaking of light, this appears to be dominant in sightings, and again light is more noticeable at night. Seattle is the city with the most sightings, but the state with the most sightings is California. Guess what else is in California? Hollywood!
I promised you an exploration of the link between UFO sightings and movies about aliens.
Turns out, these two variables are correlated!
Looking at the dashboard, it appears that these to variables move relatively together around 2000 onwards. But it is hard to decipher a relationship in PowerBI. So, lets head back to KNIME
In the graph above, I plotted the Z-score normalized number of movies about aliens released in a given year, and the number of UFO sightings. As you can see from the graph, there appears to be a positive relationship between these variables. But is it significant?
Holy mother of significance! I mean, look at that wildly significant p-value 🤩
And the R-squared of 0.74 shows a strong relationship between the number of movies about aliens released in a given year, and the number of UFO sightings in that year.
But lets remember that correlation does not equal causation. We’ll need to do further analysis if we want to conclude that people tend to see more UFOs when there are more alien movies.
I really enjoyed this project and writing this article. I hoped you enjoyed reading it!
I hope to revisit this topic with a richer dataset, and perhaps do some causal analysis in the future.
Until then, I am Groot!
Files and Resources
Data Prepping Videos
KNIME Workflow and Data Files
You can find the KNIME workflow on my KNIME hub. The data files are encapsulated in the workflow.
See you on the blog!