We’re approaching the end, probably a relief to some I know but well done for hanging in there through all seven parts. There’s one more small thing I want to show you before I call time on this blogged out tutorial.
Sentiment Results Via a Web Page
Hardly exciting I know but if you think about it this is the only thing that’s missing, reporting the results. As Python is already installed on most Pi distros it makes sense to use that to knock up a quick webserver than install Apache and PHP and all that jazz.
The Bottle WebServer framework lets us cook up a webserver real quick. And as the Sqlite3 libraries are built in we’re already half way done on the hard work. All we need to do is knock up a quick script to pull the recent sentiment scores and show them, nothing more than that. You can download the code from here.
from bottle import route, run, request, response import sqlite3 import sys @route('/sentiment') def getsentiment(): # connect to the database con = sqlite3.connect("../twitter.db") with con: con.row_factory = sqlite3.Row sql = "SELECT * FROM twitterdata ORDER BY rowId DESC limit 50" cur = con.cursor() cur.execute(sql) dataout = cur.fetchall() html = [] html.append("<html><head><title>Latest Sentiment Scores</title></head><body>") html.append("<table>") html.append("<tr><td>User</td><td>Tweet</td><td>Score</td></tr>") for row in dataout: html.append("<tr><td>") html.append(row["twitteruser"]) html.append("</td><td>") html.append(row["twitterdata"]) html.append("</td><td>") html.append(str(row["sentimentscore"])) html.append("</td></tr>") html.append("</table>") html.append("</body>") html.append("</html>") return ' '.join(html) # this line makes the webserver run run(host='192.168.2.2', port='8000', debug=True)
Pretty trivial. Once you get it running with python webserver.py you can call the page and see the output.
Improvements
So in seven parts we’ve covered the basics of sentiment analysis, connecting to Twitter with OAuth and R, saved to Sqlite3 and also put together a quick and dirty web server to report. There are a few improvements you could make….
Date/Times
The database doesn’t have a column for the date/time of the tweet though the dataframe in R will have it there. No big deal but if you wanted to plot peaks of hashtags then it becomes a big deal, you need the date/time in there.
Rate Limits
Observation of the current status of your API rate limit is very important. You don’t want to get black listed as a bad citizen of Twitter API users. No, no, no…. There are functions in the R Twitter library to see the status of the rate limits based on your credentials.
Last Tweet ID
Yeah the search isn’t the most robust, it’s a basic search. The searchTwitter method in the R Twitter library will take a Tweet ID and search forward from there. This would mean no repeats in the database either.
The Raspberry Pi
Remember it’s only a small machine and when R is doing its sentiment thang the CPU takes a beating, usually topping 98% utilisation. The memory fairs better. This is worth keeping in mind with the frequency of the cron job, in the tutorial I set it to every ten minutes. That gives R chance to pull the tweets, get the sentiment and save to the database then get it’s breath back. It’s all about experimenting with what works best but remember it’s only a small machine.
