Category Archives: Commentary
Commentary about the field of data science
Have you ever tried to buy tickets to a popular summer concert only to find it sold out within minutes of first availability? If so, you’re not alone. Apparently, this situation is becoming a frustrating ritual that frequently plays out at online ticket sources where tickets to hot concerts seem to vanish instantly. What’s going on? Bots, that’s what. Bots are software applications used by high-tech scalpers to feed a growing, multibillion-dollar secondary market for tickets. Leading ticket seller Ticketmaster claims that bots buy up in excess of 60% of the most desirable tickets for some events. A recent Ticketmaster lawsuit accuses one group of scalpers of using bots to request up to 200,000 tickets a day.
Machine Learning to the Rescue
Fortunately, Ticketmaster chose a big data solution to combat the onslaught of the ticket bots. The firm hired machine learning specialist John Carnahan, who had previous computational marketing roles at the Rubicon Project, Fox and Yahoo, to head up its Data Science Group and the bot-fighting program. Using a classification algorithm similar in nature to spam classifiers, legitimate ticket requests can be identified from the bot-driven requests. One of feature variables used to detect a bot is the click speed in filling out the web form. Human click rates are non-linear whereas bots are regular and fast. Carnahan uses a color-coded dashboard application to show ticket requests in real-time. In some cases the bot requests outnumbered the human requests several hundred-fold.
It isn’t clear what the training-set looks like for the ticket bot classifier since Ticketmaster won’t say how many of the 148 million tickets sold each year were purchased by bots.
Slowed Down But Not Out
The curious thing about the Ticketmaster anti-bot effort is that it was designed to slow down the bots but not knock them out altogether. The goal is send the bot, when detected, to the end of the line in order to let fans get the tickets they desire.
This kind of limited tolerance for bots can still wreak havoc on performances. Some sold-out shows see up to 20% no-shows with many of the best seats vacant. Basically, the speculators who bought tickets with bots couldn’t get the prices they wanted.
Not Only About the Bots
The science of ticket selling may appear straightforward, but Ticketmaster is sitting on a treasure trove of unexplored transactional and social media data that it can use to better understand customer buying behavior in order to sell more tickets. In addition to bot detection, this application of big data can also lend itself to implementing recommender systems using scientific rigor in order to show customers related tickets they’re likely to buy and figure out how unstructured social media data can play a contributing role. With this initiative, Ticketmaster’s dominance in the highly competitive ticket industry can be furthered with data science.
Welcome to Radical Data Science! This blog’s goal is to impart all that is contemporary within the field of Data Science including machine learning, data mining, predictive analytics, knowledge discovery in databases, statistical inference and most importantly the “Big Data” movement. Here at RDS, I’ll cover a wide range of aspects to this growing area of technology – Commentary, Opinion, Technology, How-To, Applications, Field Report, and Culture. I’ll try to evenly distribute the posts in each category, but the TECHNOLOGY and HOW-TO categories take longer to write so they might come more infrequently.
My name is Daniel D. Gutierrez and I come from a mathematics/computer science background, although for the last several years I’ve applied machine learning techniques to the analysis of astrophysical data sets for the LIGO project and the detection of gravitational waves. My long-term background in data science extends far before this cool name was en vogue. As a database technologist, I taught database courses at UCLA Extension for 18 years, wrote three database books, and served a technical editor for Database Advisor Magazine. My academic history with data science stems from graduate machine learning work at Stanford and Caltech.
My company is AMULET Analytics a Los Angeles based consultancy specializing in data science services. We extract intelligence from your business data assets.
But what’s with the “radical” in RDS? This is because I firmly believe that data science is the most disruptive technology in decades. The ability to make scientific predictions based on business data sets is pretty radical in terms of competitive advantage. I first became enamored with prediction when I was a kid reading the “Foundation” series by Issac Asimov. In the book, character Hari Seldon is a mathematics professor who develops “psychohistory” allowing him to predict the future in probabilistic terms. Now that’s disruptive!
As a long-time data scientist I am VERY excited about this field and with RDS I hope to spread this excitement to my readers. Oh yes, I almost forgot, please spread the word about RDS! I’d like to establish an active Data Science community here. I encourage you to leave comments about the posts as this will encourage the learning experience.
Data Science rules!