Anexinet BI Tweets
Since this is my first post on this blog, I figured it would be best to write about something that has not been mentioned here in the past. After scouring the blog for a few hours, I decided a simple post on using Azure ML with a little bit of R would be useful.
In this post, I am going to use a simple dataset of the teams tweets and create a simple word cloud based on user tweet frequency.
The dataset is a simple cvs file that looks like this:
I am going to make a simple assumption that you have created a free account on Azure ML. Once you navigate to the Azure ML Studio site, login and click on “my experiments”. You will then presented with a screen similar to this:
The modal window will default to EXPERIMENT on the left-hand side, but we need to first setup our custom csv dataset referenced above, so we will select DATASET first.
Next, we will select FROM LOCAL FILE to the right of the DATASET and be presented with the following:
The process is as follows:
1. Select a file.
2. Name the dataset.
Next we need to create an experiment. Select the new link from the bottom left-hand side and choose EXPERIMENT, then Blank Experiment.
Once you select the Blank Experiment you will be presented with a blank canvas. The first thing we will need is the data source. If you navigate to the left-hand side under Saved Datasets you will find the dataset we created in the previous step. To be more efficient, you could just start typing in the search to find your dataset quicker.
Next, drag the dataset onto the canvas. Once the dataset is on the canvas, you can single-click on the component and see a list of properties on the right-hand side.
Also, if you want to see a sample visualization of your dataset, you can click on the circle on the bottom of the component (called a port) and select visualize.
The next component we will need is the Execute R Script which is found under the R Language Modules section. Drag this onto the canvas below our dataset
We now need to connect the components. Select the circle at the bottom of the dataset and connect it to the left circle of the Execute R Script like below.
All of the
ports (circles) on each component have names and can be seen be hovering over any of them.
In our example, the top port is the dataset port and the bottom left port is dataset1 port. The Execute R Script can have multiple datasets, so the port to the right of our connected port can also be used for input.
Now all we have left to do is add some R code to generate the word cloud. Single click on the Execute R Script and add the following code in R Script window.
- Lines 1 & 2 are just requiring the necessary libraries to generate the word cloud.
- Line 4 is assigning the first input port to the variable tweets. The 1st port is our dataset determined from dragging the dataset to the left-most port on the Execute R Script.
- Lines 6-11 are calling the wordcloud function found in the wordcloud library. The tweets$handle and tweets$tweets are references to our two columns in the dataset. The first is the user handle and the second are the number of tweets.
- More info on this library can be found here Wordcloud
- Line 13 is just defining an output port that could be used for subsequent components. In our case, the output would be the image of the word cloud