"Where Should I live" is a question that we all must ask ourselves at one point or another. As Graduate students, we are all at a point of our lives when we must decide where to live as we begin our careers. There are many factors that affect this decision, and those factors may not be the same for everyone. On top of this, it can be quite hard to find out important information about cities in which one might be interested in living.
We decided to remedy this situation by creating our own way to determine where we want to live. Using a K-Nearest Neighbors (KNN) model, we have created a tool that will help users determine where they might want to live, as well as other potential options that are similar. We also wanted to provide users a way to explore the data, looking into all the various statistics of over 950 cities across the United States.
Our tool is presented through 3 visualizations.
––– Choropleth Map: First is a choropleth map showing the average of all cities throughout the state for the variable you would like to see. In this choropleth, the darker the state is, the larger their value in the picked metric.
––– KNN Model: If you have a city in mind, or just want to explore the data, you can look for the closest cities to the one you have in mind using our KNN model.
––– Data Exploration: Lastly, If you find a variable that you are interested in, you can select a state to examine individual cities in a bar chart below the choropleth.
Each of these can be accessed in the below segment. For all three visualizations, there are controls present in the top right section for each visualization.
–––––
Firstly, there is a variable selection. This changes the variable represented by the choropleth, as well as the variable displayed in the bar chart. Changing this variable will reload the choropleth.
In the choropleth, you can hover over states to see a tooltip describing their statistics. This tooltip includes the name of the state, the value of the variable selected, and the number of cities in that state that is represented in our dataset.
In the choropleth, if you like a specific state's data, you can click on the state to highlight it. This opens the state up for the next two modes of analysis. If you have selected a state, you can double click on the state to unselect it, allowing you to select a different state.
–––––
Next, in the KNN section, there is a selection window for the city you would like to select. This list of cities is pulled from the state which is selected, so if you are unable to select a city you should first select a state.
Below this dropdown, there is a slider to represent the k number of cities you would like to see. It defaults to 5, but you can move it to the far left to remove the dots, or to the far right to display up to 20 cities.
When you have a city selected, k number of dots will show up on the map where the most similar cities are located. The selected city is represented by a black dot, the most similar by a red dot, and the rest by yellow dots. These can be hovered over to learn the name and rank of city.
–––––
Last, in the data exploration section, when there is state chosen on the choropleth, there is a dropdown which displays all cities within the state that is chosen. You can then select multiple responses from this dropdown, and then click "show me the money" to have the selected cities' values for the selected variable displayed in a barchart below the map.
If you would like to remove the barchart, you can click the reset button. This also deselects all cities that were previously selected, so you will need to readjust the selected cities if you choose to use this again.
Which Variable?
KNN
Data Exploration
Select City:Please note: the above dropdown is used to select cities. Each city selected from the dropdown will be added the to bar chart visualization when the "show me the money!" button is clicked. To remove cities, click the "reset" button.
Selected Variable:
K-Nearest Neighbors (KNN) models are used to find the closest k neighbors to a selected data point based on the distance between the points. We used a standardized matrix of all the city variables in the dataset, and then calculated the euclidean distance from each city to all others. These were then ranked from closest to furthest, which is what you can see in the above section.
By hovering your cursor over each dot on the choropleth (after you choose a city from the KNN dropdown), you can see how similar multiple cities are to the chosen city. To increase/decrease the number of cities shown, you can change the slider value which specifies the k value (number of cities shown). This distance encompasses all the standardized variables that you can select through the variable dropdown.
The tooltip shows both the distance to the selected city and the ranking based on distance (with the smallest distance being the most similar city).
The data we used came from the following sources: