Stay Above the Fray. Peer Into the Future.

Visualization tools to save your work.

by ROBERT YANG

Recognizing patterns that periodically appear in our lives helps us overcome obstacles. Looking at the frequency of repetitive acts helps us scope out a plan to close perception gaps. Symbols become recognizable, their fuzzy outlines sharpen. Understanding subject matter in a fresh light gives you the foundation for confidence. And you can now unlock a new pathway towards a destination.

There are a few tools that help us understand ambiguity. When organizing data sets, depending on how large your tables are, easily you can load information into an excel workbook, use conditional logic to mung and organize the data and then create new labels for rows in which you run a summarizing function (like “sumifs”) to organize data segments and clusters into a broader, higher level view.

When you’re looking at data spanning past the million row threshold it would be worth it to learn how to query a database, selecting which columns to look at, with certain conditions where you look at quantifiable ranges to identify outliers, anomalies and dominant features of an activity stream.

But then comes the challenge of presenting the information in a clear way that users of all levels can understand without getting into the weeds or becoming too enthusiastic and appearing wonkish about the details.

Taking something wildly out of scope, taming the base and controlling the outputs are some of the skills data analysts use to make sense of information. Here we will present some of the more interesting tools sets and programs you can use to enhance your data communication skills. I won’t go into much detail on the backstories of the tools but easily you can spend a good amount of time wormholing through the sidequests.

Google BigQuery

If you don’t have access to a database, BigQuery is a great way to play with public data. The platform even comes with a GUI that lets you practice building queries and pulling data from sources like the New York Times, Bureau of Statistics and Labor or US Census population data.


Pandas/Python

It may feel overwhelming to try and learn a whole programming language just to generate some nice looking charts. But take the leap by looking at Pandas. A real proper stack would be to use Pandas in a Jupyter notebook in which you create an API connection to Google BigQuery. The API access key is a document that sits on your harddrive and any time you want to access you can create a connection with a few lines of code. From Jupyter you can then find example scripts on Kaggle or Stack Overflow to experiment. The amount of packages to transform your tables into visuals are numerous. Some very easy data visualization packages to start off with are Plotly or MatPlotLib.


Jupyter

Anaconda is the data analysis platform you would install to start learning the ins and outs of Pandas. The program creates an environment within your desktop to help simplify package management for all the different pieces you would need to portray information in a useful light. Using Anaconda takes all the legwork out of establishing your Jupyter environment so you not left hacking all the pieces to death trying to get something to work.


Kaggle

Learning by example and seeing how other users perform tasks in Jupyter will help you learn the needed skills to manipulate data and portray visuals. There are challenges and puzzles to draw from and also down and dirty free bootcamps to help you materialize your latent talents.

HTML/CSS

I wanted to touch on portfolio development with a very useful framework to build your landing pages. Bulma.io focuses on the barebones elements of having well organized graphical components. No javascript is required and the ability to scale websites without a CMS is straightforward and direct. Responsiveness is also preprogrammed into some of the components in that you won’t need to tinker too hard around flex boxes and containers. If you want to go fully mobile usefulness (while adding a bit of beneficial core vital statistics flair for Google page ranking crawlers) take a look at AMP by Google. Here they provide high level javascript that doesn’t bloat your code or tax page load with al the fun bells and whistles.

Stack Overflow

A great resource to tap in the form of forums and messageboards when you are solving a problem that is way too specific. Chances are someone already asked the same question and the problem has been solved.

Github

Repositories in Github are extremely helpful. It does take some time to figure out what the right pieces are to your own personal stack so take a look how others from a variety of skill levels set up their environments in order to make pretty things happen.

In order to help you present information as truthfully about reality as possible you need a workflow. You don’t want to overcomplicate your situation, especially if you are short on time. Look for resources to help you gain a firm understanding of what you are trying to accomplish.

And finally, a few more live instruction resources I have used to bring my skill levels up to confidently work in my chosen environment.

Brandon Rhodes is a great instructor and can help you get started on exploring data sets and build visualizations is a 2015 video/tutorial on Pandas at PyCon in Montreal.

I must admit, I spent a good deal of time tearing through videos by Sentdex.

Remember you can always adjust the player speed if you need to move at a faster pace.