Monthly Archives: November 2012
I had an excellent alternative to all the media machinations on Election Day Nov. 6 2012; I went to a machine learning event! Hosted by the LA Machine Learning group LA Machine Learning group on Meetup.com, the event went by the name of “The Unreasonable Effectiveness of Ensembles.” This topic was of specific interest to me because many of the entries on the Kaggle leaderboard for the $3 million Heritage Health data science competition use ensembles. I wanted to refine my knowledge about this useful ML technique.
The event took place at the Century City offices of Factual.com, a company that provides access to data for powering web and mobile apps, mobile advertising, and enterprise solutions. I arrived early to get a jump on the vibe of this Meetup group since this was my first event. The Factual office was perfect for this kind of thing, complete with a large meeting area that had an ample lecture space, bean bag chairs, piano and other games designed to please the developer geeks working there. Factual did it right by providing plenty of pizza and beer. One guy had his tablet tuned to Nate Silver’s blog to monitor the election results. After chatting with a number of fellow data scientists, the talk was about to begin.
The lecturer was Rudiger Lippert, a software developer at Factual. Rudiger studied Electrical Engineering at Boston University and went on to get a Master’s degree at UCLA, specializing in Signal Processing. In graduate school his research centered around Speech Recognition. Rudi’s talk was excellent and covered all the areas of ensembles I had hoped for.
Ensemble methods are considered by many to be the most important development in machine learning of the last decade. By combining many weak models to produce a single strong model, ensemble methods have performance which rivals and very often beats that of other model classes such as Support Vector Machines (SVM) and Neural Networks. The talk started with simple Decision Trees, and went on to Bagging, Random Forests, Boosting, and newer developments such as Regularized Greedy Forests. It was a great overview of the subject and one that I plan to utilize in my consulting practice.
I would definitely recommend this Meetup group, but if you’re not from LA, try to find a similar group in your area.
I’m pleased to announce that I have a new gig with Big Data Republic (sponsored by UBM Tech from InformationWeek fame) as a forum moderator. I’m looking forward to my participation in all the cool discussions. This is an exciting new destination in the Big Data space, so please stop by to see all the bleeding-edge technology being discussed.
On October 25, 2012 I attended the LA area R user group organized through: http://www.meetup.com/LAarea-R-usergroup/. This is a really nice user group with a focus on the R statistical programming environment that many data scientists use for developing machine learning algorithms. I’ve used R for years and it is my favorite although I do use a Matlab derivative called Octave on occasion. If you’re looking to get into R or strengthen your skills with R, I highly recommend you seek out a local R meetup.com group.
The event was held over at the offices of Adconion Media Group in Santa Monica, which happens to be walking distance from my gym. So after running a couple of miles, I headed over to the event that provided ample pizza and soft drinks (to counteract the calories burned moments earlier). It was a great venue for a group of about 50 attendees or so. It was a typical hipster, high-tech workplace, complete with a large presentation area for the meetup.
The meeting I attended was entitled “More highlights of useR! 2012 conference (Part 2)” and amounted to a discussion of some cool stuff discovered at the useR! 2012 Conference. The group’s organizer Szilard Pafka led a very detailed tutorial/demo on how to load and use R with Amazon’s EC2. If you’re interested in this configuration, the recorded talk can be obtained here: http://www.r-bloggers.com/RUG/2012/08/highlights-from-the-user-2012-conference/
Although I wasn’t particularly interested in the topics for this meetup, I do greatly appreciate there being a user group that aligns so well with my interests in data science, machine learning and statistics. Meetup.com is an excellent resource for finding like-minded people in your area of expertise.
Welcome to Radical Data Science! This blog’s goal is to impart all that is contemporary within the field of Data Science including machine learning, data mining, predictive analytics, knowledge discovery in databases, statistical inference and most importantly the “Big Data” movement. Here at RDS, I’ll cover a wide range of aspects to this growing area of technology – Commentary, Opinion, Technology, How-To, Applications, Field Report, and Culture. I’ll try to evenly distribute the posts in each category, but the TECHNOLOGY and HOW-TO categories take longer to write so they might come more infrequently.
My name is Daniel D. Gutierrez and I come from a mathematics/computer science background, although for the last several years I’ve applied machine learning techniques to the analysis of astrophysical data sets for the LIGO project and the detection of gravitational waves. My long-term background in data science extends far before this cool name was en vogue. As a database technologist, I taught database courses at UCLA Extension for 18 years, wrote three database books, and served a technical editor for Database Advisor Magazine. My academic history with data science stems from graduate machine learning work at Stanford and Caltech.
My company is AMULET Analytics a Los Angeles based consultancy specializing in data science services. We extract intelligence from your business data assets.
But what’s with the “radical” in RDS? This is because I firmly believe that data science is the most disruptive technology in decades. The ability to make scientific predictions based on business data sets is pretty radical in terms of competitive advantage. I first became enamored with prediction when I was a kid reading the “Foundation” series by Issac Asimov. In the book, character Hari Seldon is a mathematics professor who develops “psychohistory” allowing him to predict the future in probabilistic terms. Now that’s disruptive!
As a long-time data scientist I am VERY excited about this field and with RDS I hope to spread this excitement to my readers. Oh yes, I almost forgot, please spread the word about RDS! I’d like to establish an active Data Science community here. I encourage you to leave comments about the posts as this will encourage the learning experience.
Data Science rules!