Category Archives: Field Report
Field report of data science event
Last night I attended the latest meeting of the Los Angeles area R User Group. The event was held in a very obscure rooftop location in Boelter Hall (the Engineering building). I’m glad I arrived a bit early otherwise I would have missed out on the limit pizza supply. But no matter, my motivation for attending was to hear all about the international useR! 2013 recently help in Albacete, Spain. I wasn’t disappointed. The group’s organizer, Szilard Pafka and Bryan Galvin gave useful presentations about all the new stuff that was offered up at the conference. I’d like to present a summary here for RDS blog readers. In no particular order:
- Check out the useR! 2013 website for detailed information about the conference.
- There is a new JVM-based interpreter for R called Renjin.
- It seems the evil empire Oracle is all into R now with Oracle R Enterprise.
- A brand new version of the plyr R package is in the works called dplyr. The main benefit is it will be much faster!
- The shiny web application framework package for R continues to be a rising star. Check out this tutorial.
- There is a new “pretty quick” version of R called pqR.
- And last but not least, I’m excited to say that the useR! 2014 conference will be held on June 30 – July 3 in Los Angeles at UCLA and sponsored by the UCLA Statistics department.
I always have a good time at this Meetup, which has become one of my favorites. I highly recommend this particular Meetup group if you happen to be in Los Angeles, but if not, try to locate a similar group near you.
I had another scintillating evening with the Los Angeles area R User Group, January 24, 2013. I always have a good time at this Meetup, which has become my favorite, but I was particularly intrigued by the theme – a panel of distinguished data scientist talking about “data science.” Fun to be sure! The R group meetings are free along with free validated parking and all the pizza and drinks you can navigate.
Once again held at Adconion Media Group in Santa Monica, the panel consisted of: Avram Aelony, Eric Kostello, Yasmin Lucero, Szilard Pafka, Ryan Rosario, and Oliver Will (see inset photo I took with my new iPhone 5). The first order of business was panelist introductions. It appears that all but one held a Ph.D. which led to the first discussion topic – do you need a Ph.D. to be a data scientist? Although contrary to what’s mentioned above, the consensus was you don’t. I agree in a sense, but I think a Ph.D. in a related field like statistics, computer science, mathematics or even physics would certainly propel your career.
A lot of the hour-and-a-half discussion dealt with defining “data science” along with what a typical data scientist actually does. I found this dialog enlightening because I always like to better understand how my fellow data scientists view themselves and our field. The panel seemed to agree that the term” data science” is relatively new and maybe somewhat overhyped by some, but many of its methods have seasoned and sound foundations in statistics, computer science and their various related fields, e.g. machine learning, databases, distributed computing, visualization, etc.).
Questions from the audience were welcome (nearly 100 in attendance) so I brought up a point about how many of the job descriptions being floated these days for data science positions seem to be overly heavy with qualifications to the point where the successful candidate needs to fill the role of CTO, VP Engineering, system admin, and coder all at once. Heck, if the employer is looking for a one-person company why didn’t they say so! The panel’s response was that many times, the employer has no idea what a data scientist is, so they just Google a bunch of terms and throw them into the job description. A statistically insignificant number of candidates will have all the qualifications, so the point was – just go in and talk to them and see what sticks.
The discussion eventually moved into more technical subjects such as using R version control and processing environments such as Hadoop and Amazon EC2. I thought there was a good balance between technical and non-technical material, but the organizer Szilard Pafka had so many other topics to cover he’s already schedule a Part 2 panel for March.
After walking away from the thought provoking panel discussion, I stopped to think that this is an excellent time for a technical person to retool herself to get aboard the data science gravy train. This is why I put together a list of FREE data science educational resources over at Big Data Republic.
It is a mighty good time to be a data scientist, and being part of the Silicon Beach data science scene is pretty energizing. I hope to see you all at the next Meetup!
On Dec. 6, 2012 I attended another great evening event with the Los Angeles area R user group. This was my second meeting with the group and I was looking forward to the topic matter: “Real-time dashboards with Node and R.” As a data scientist, my development environment of choice is R (with a little Octave thrown in for good measure – Octave is an open source Matlab derivative), so I’m always looking for ways to extend my knowledge of R and different ways to apply it to machine learning. This meeting served this purpose very well.
I arrived at the meeting venue early over at Adconion Media Group in Santa Monica. Silicon Beach, of which Santa Monica is part, is really heating up with interest in Big Data. Adconion is a fantastic Meetup venue, easily accessible, great parking, high tech appeal, and somewhat recognizable if you ever watched “Curb Your Enthusiasm” since this is the same office park where Larry David’s office is located. Adconion supplied the wide selection of pizza and soft drinks (Thank you!). The first hour is a meet-and-greet with all the attendees; 74 registered this time. I met a bunch of like-minded data scientists like a Caltech Ph.D. candidate in computational neuroscience, and another who uses R for sentiment analysis on publicly traded companies for investment managers.
Once the lecture began, the group was led through a fast-paced example of using Node and Google Chart Tools along with a Twitter sentiment analysis algorithm in R to yield a “happiness” indicator for countries around the world. The globe map used color coding to indicate general “happiness.” Click HERE for a demo that presents “How popular is fast food.” The presenter, Dr. Dirk Neumann was a data scientist and founder of social supercomputing startup Opani. He received a Ph.D. in Computation and Neural Systems from Caltech.
As I mentioned in my previous field report for this group, I highly recommend it for anyone in the field. The presentation content tends to be fairly complex, but it’s a good way to push your personal envelope.
I had an excellent alternative to all the media machinations on Election Day Nov. 6 2012; I went to a machine learning event! Hosted by the LA Machine Learning group LA Machine Learning group on Meetup.com, the event went by the name of “The Unreasonable Effectiveness of Ensembles.” This topic was of specific interest to me because many of the entries on the Kaggle leaderboard for the $3 million Heritage Health data science competition use ensembles. I wanted to refine my knowledge about this useful ML technique.
The event took place at the Century City offices of Factual.com, a company that provides access to data for powering web and mobile apps, mobile advertising, and enterprise solutions. I arrived early to get a jump on the vibe of this Meetup group since this was my first event. The Factual office was perfect for this kind of thing, complete with a large meeting area that had an ample lecture space, bean bag chairs, piano and other games designed to please the developer geeks working there. Factual did it right by providing plenty of pizza and beer. One guy had his tablet tuned to Nate Silver’s blog to monitor the election results. After chatting with a number of fellow data scientists, the talk was about to begin.
The lecturer was Rudiger Lippert, a software developer at Factual. Rudiger studied Electrical Engineering at Boston University and went on to get a Master’s degree at UCLA, specializing in Signal Processing. In graduate school his research centered around Speech Recognition. Rudi’s talk was excellent and covered all the areas of ensembles I had hoped for.
Ensemble methods are considered by many to be the most important development in machine learning of the last decade. By combining many weak models to produce a single strong model, ensemble methods have performance which rivals and very often beats that of other model classes such as Support Vector Machines (SVM) and Neural Networks. The talk started with simple Decision Trees, and went on to Bagging, Random Forests, Boosting, and newer developments such as Regularized Greedy Forests. It was a great overview of the subject and one that I plan to utilize in my consulting practice.
I would definitely recommend this Meetup group, but if you’re not from LA, try to find a similar group in your area.
On October 25, 2012 I attended the LA area R user group organized through: http://www.meetup.com/LAarea-R-usergroup/. This is a really nice user group with a focus on the R statistical programming environment that many data scientists use for developing machine learning algorithms. I’ve used R for years and it is my favorite although I do use a Matlab derivative called Octave on occasion. If you’re looking to get into R or strengthen your skills with R, I highly recommend you seek out a local R meetup.com group.
The event was held over at the offices of Adconion Media Group in Santa Monica, which happens to be walking distance from my gym. So after running a couple of miles, I headed over to the event that provided ample pizza and soft drinks (to counteract the calories burned moments earlier). It was a great venue for a group of about 50 attendees or so. It was a typical hipster, high-tech workplace, complete with a large presentation area for the meetup.
The meeting I attended was entitled “More highlights of useR! 2012 conference (Part 2)” and amounted to a discussion of some cool stuff discovered at the useR! 2012 Conference. The group’s organizer Szilard Pafka led a very detailed tutorial/demo on how to load and use R with Amazon’s EC2. If you’re interested in this configuration, the recorded talk can be obtained here: http://www.r-bloggers.com/RUG/2012/08/highlights-from-the-user-2012-conference/
Although I wasn’t particularly interested in the topics for this meetup, I do greatly appreciate there being a user group that aligns so well with my interests in data science, machine learning and statistics. Meetup.com is an excellent resource for finding like-minded people in your area of expertise.