Political Search Engine (Politik Arama Motoru in Turkish) is a domain-specific search engine — one that lets you search keywords in the election declarations of the four parties represented in the Turkish Parliament as of 2011.

[Read more]
Tweetolife is an analytics tool that lets you analyze the gender and time-of-the-day dimensions of tweeting behavior.
[Read more]
Concept Game is a game with a purpose that is based on the principles of human computation. It presents the problem of recognizing common sense facts as a game of chance — a slot-machine game. It is implemented as a Facebook application. The backend is a text miner (called BagPack) which produces candidate commonsense facts [Read more]
My metalhead friend Deniz Cem Önduygu wanted to celebrate Metallica’s 30th birthday by creating an infographic. This is the end result:

[Read more]
There have been two general elections in Turkey in the last 5 years. Recently, we (Çilek Ağacı) obtained the vote counts at a district level, and used a technique called ecological inference to guess the vote transfers between the political parties from 2007 to 2011.

[Read more]
I have just submitted my first paper on corpus-based semantics. With Marco Baroni, we propose a general method to build a feature space for word pairs that represents the relations between them. We call it BagPack: Bag-of-words representation of Paired concept knowledge. [Read more]
Rovereto Twitter N-Gram Corpus (RTC) is an n-gram dataset enriched with meta-data such as gender and time of posting. The n-gram corpus is based on 75 million English tweets extracted from a larger sample of 240 million tweets collected from the public stream of Twitter, between December 2010 and July 2011.

[Read more]
The commonsense assertions that are rated in Concept Game are released to the public domain: Commonsense dataset Read this post or this paper for details. [Read more]
In our paper titled “Stereotypical gender actions can be extracted from web text”, Marco and I recruited Amazon Turkers to create a gold standard of stereotypical gender expectations of more than 600 actions such as building a snowman, leaving the work early, enjoying power, and feeling lonely. The actions were sampled verb phrases from the [Read more]
During my PhD, I needed to guess the genders of Twitter users based on their given names (read this post to see how I used this information). There are two very useful datasets concerning the frequencies of names broken down by gender. They come from US Census Bureau data and US Social Security Administration’s most [Read more]