About This Project


Domain | Machine Learning/Data Visualization

Skills | Machine Learning, Python development, Front End Web Development

Tools | Python (SciPy, NumPy, Pandas, and Beautiful Soup), D3.js, Leaflet.js

Role | Lead Web Developer and Data Scientist

Team | Jayanth Prathipati and Oliver Ebeling-Koning (Lead Data Scientist)

Problem Space

Software professionals are in demand in a lot of places in the world. In addition, they have an opportunity that few others have had before; the ability to work remote. We were curious about what trends were formed in a software developer's job search (being a software developer myself...it seemed like a good thing to look into). In addition, as software developers, we were curious about pay distribution for specific features such as location and programming language of choice.


We used Beautiful Soup to parse Stack Overflow jobs, a website with job listings for software developers. We used data about these job listings, such as location, salary, and programming language of choice to see if there would be any clustering. In addition, we were curious about what programming language had the highest starting salaries. Finally, we were curious about how location influenced salary size, we looked into average cost of living and tried to see where developers would get the most in-pocket income.

Data Cleaning

We initially had to clean all of the data and try to get the closest major city to a lot of minor locations. For example, we could get a job listing for Queens, New York. We needed to have a way to figure out if that was in the New York metropolitan area. We used reverse geocoding to get the general area of a job and sort our listings by metro locations.

Final Results

After cleaning our data, we fed this data into NumPy to perform K-Means clustering in order to make better sense of the data. We found (to no surprise) that jobs were centered around various tech hubs around the US such as S.F, NYC, and Austin. We did find some surprises, namely that Chicago and Atlanta had lots of tech jobs. Chicago had approximately 208 job listings and Atlanta had approximately 63. We have listed two visualizations that we built to better show our results. Please check them below. They are unfortunately not responsive, so please look at them on a laptop or desktop device!

Here is a choropleth map that cumulates all of the developer jobs in the USA. Slide the bar below to see a count of developer jobs for a specific language!


This is a Map that shows the different clusters that were generated based on the Jobs data. We overlaid a subsample of the raw Jobs data as markers on this map.