I'm (just barely) starting to blog about data for families and households. One of the first pieces I want to do is about potty training, and there's some gaps in the existing literature that I'm hoping to address by collecting some data from parents on when and how they did potty training. If you've potty trained a child *within the past year*, and feel like sharing how it went, I would be most appreciative if you do this 5 minute survey. I'll share the results back here and at littldata.com.
Using real time weather data collected in Austin, TX, I've created a piece of music that streams online. [more inside]
Just finished building a content recommendation engine for MeFi using natural language processing and non-negative matrix factorization techniques! It produces a list of post recommendations based on a user history of posts, comments and favorites. It can also make recommendations based on a piece of text, so for example, you could paste a particular post and it will return a list of other posts that have some similar characteristics. I hope you enjoy playing around with it! Please let me know what you think. Here's more info in case you're interested (: https://github.com/tomasbielskis/metafilterpostrecommender
I've been working on Datasette for a while: it's a tool for publishing static structured data online with a browseable web interface and a JSON API. Until today you had to install various bits and pieces on your computer to use it, but I just released Datasette Publish which provides a web interface that lets you upload data as CSV files and click a button to turn that into an online database and API. My blog has more, including an animated screencast showing how I used it to build an API for exploring California campaign finance data.
This software library helps you capture AIS messages broadcast by passing ships and then to join them with public data sets that reveal what the ships are carrying.
I'm releasing a knitting pattern every single week of the year and documenting the money side of things from every possible angle for analysis. [more inside]
This map shows a 5,700-year timelapse of the world's cities being born one-by-one, starting with the first known city, Eridu, in 3700 BC. The data is from one of the coolest academic studies I've come across in a long time, which compiled a comprehensive dataset of the world's cities and their historic populations, from 3700 BC to 2000 AD. [more inside]
Beyond the Stacks is an interview podcast in which librarians, archivists, and information science professionals talk about the coolest experiences of their careers, and how they got there. [more inside]
With Clarifai, an image concept extraction API utilizing convolutional neural networks, and ConceptNet, a lexical relationship database, I built a template system to generate paragraphs of text from photographs. word.camera is responsive — it works on desktop, tablet, and mobile devices. The code behind it is open source and available on GitHub, because lexography is for everyone. [more inside]
Opinion polls are all well and good, but they don't give you much of an idea of what might actually happen in an election (particularly in a multi-party democracy like the UK). Electobot aims to solve that by running thousands of simulated elections in order to work out what might happen if the election were run tomorrow with the polls as they are. In addition to running the simulations, I've also been blogging the results at Electobot: The Blog. [more inside]
You publish content, photos, thoughts, and memories online every day. Don't throw it all away. Free My Data is a directory that gives you an easy reference for when you want to archive and backup your digital life. [more inside]
Data visualizations ported from Processing to Clojure. [more inside]
Open data is great! But it's easy to get lost in the details and make something more complex than needed. Simple Open Data is a guide that keeps it simple and concrete - what you really need to do and why.
I have completed a new fun data art project: I translated the ups and downs of the S&P 500 for the year into a reggae-ish song, while an animation represents the data visually and in sync with the music. It's been a record year for the S&P 500 -- and now you can hear it! [more inside]
The Water Quality Portal is a collaborative project between the US Geological Survey (USGS) and the US Environmental Protection Agency (EPA) to make it easier to share water quality data. Data are harvested from the USGS NWIS database and the EPA STORET database, mapped to the WQX standard, and served back out again. There is information about over 2 million sites and over 200 million sample results. The most significant feature that we have added this year (and why I am posting this) is a mapping tool that allows users to map up to 250,000 sites based on any of the 15 different criteria to narrow a search. [more inside]
I did some data visualization of hockey stats, with a couple blog posts. It's all using 2013 regular season data. [more inside]
An animated data visualization I designed in association with CNNMoney went live this morning. It shows and compares salaries for different people (Kobe Bryant, ExxonMobil CEO Rex Tillerson, a minimum wage worker, a physician, etc), accumulating in real-time for 1 minute. Watch the disparity grow second by second!
After years of studying these shibboleths, and tracking my research progress here, I'm happy to share some data results from the 2010 survey. My first published article, Sociophonetic Variation in an Internet Place Name is available in Names: A Journal of Onomastics special issue on Names, Naming and the Internet (Maney Publishing). Enjoy! [more inside]
BEDOPS is a suite of tools to address common questions raised in genomic studies, mostly with regard to overlap and proximity relationships between data sets. BEDOPS aims to be scalable, flexible and performant, facilitating the efficient and accurate analysis and management of large-scale genomic data.
With the accompanying infographic-heavy blog post, it does what it says on the tin and more. Tracks your keystrokes, makes beautiful pictures and surprising observations from the data. Open source, and you can download 50 days of my data. [more inside]
I'm collecting charts and data that might help to explain why those Occupy Wall St. folks are so upset.
I have been working on this for a while but somehow it had never occurred to me to post it here. It's not really a blog, per se. Rather, it is more of a reference or index. I describe it as "tools and techniques for textual data" (the alliteration was accidental). It's not an exhaustive collection but it is fairly comprehensive. Most of the low hanging fruit has been picked so posts now are few and far between. I would welcome any suggestions you might have. You can email me through MeFi...
Since the official locator for ballot drop sites in Oregon is pretty hard to find, not super user-friendly, and doesn't work in mobile browsers, I worked with Scott Duncombe at the Bus Project to develop a friendlier site that Oregon voters can use to find the nearest place to drop off their ballots. Locations are provided by Data.Oregon.gov. We're still looking at features to add over the next days/weeks, but the site is live.