Monthly Archives: February 2013
Much has been debated about the validity of p-values in determining statistical relationships between data elements. Here is a long list of 402 Citations Questioning the Indiscriminate Use of Null Hypothesis Significance Tests in Observational Studies. And here is a current and measured contribution to the debate appearing over at the Simply Statistics blog. How do you weigh in? Do you feel the use of p-values produces the kind of false positives we always see in the mainstream press?
Here is a new webinar from Revolution Analytics that introduces the use of the R statistical programming environment for doing data mining. Presented by Joe Rickert, the seminar demos several examples of data mining with various R packages. Rickert’s slides can be downloaded HERE. Enjoy!
I had another scintillating evening with the Los Angeles area R User Group, January 24, 2013. I always have a good time at this Meetup, which has become my favorite, but I was particularly intrigued by the theme – a panel of distinguished data scientist talking about “data science.” Fun to be sure! The R group meetings are free along with free validated parking and all the pizza and drinks you can navigate.
Once again held at Adconion Media Group in Santa Monica, the panel consisted of: Avram Aelony, Eric Kostello, Yasmin Lucero, Szilard Pafka, Ryan Rosario, and Oliver Will (see inset photo I took with my new iPhone 5). The first order of business was panelist introductions. It appears that all but one held a Ph.D. which led to the first discussion topic – do you need a Ph.D. to be a data scientist? Although contrary to what’s mentioned above, the consensus was you don’t. I agree in a sense, but I think a Ph.D. in a related field like statistics, computer science, mathematics or even physics would certainly propel your career.
A lot of the hour-and-a-half discussion dealt with defining “data science” along with what a typical data scientist actually does. I found this dialog enlightening because I always like to better understand how my fellow data scientists view themselves and our field. The panel seemed to agree that the term” data science” is relatively new and maybe somewhat overhyped by some, but many of its methods have seasoned and sound foundations in statistics, computer science and their various related fields, e.g. machine learning, databases, distributed computing, visualization, etc.).
Questions from the audience were welcome (nearly 100 in attendance) so I brought up a point about how many of the job descriptions being floated these days for data science positions seem to be overly heavy with qualifications to the point where the successful candidate needs to fill the role of CTO, VP Engineering, system admin, and coder all at once. Heck, if the employer is looking for a one-person company why didn’t they say so! The panel’s response was that many times, the employer has no idea what a data scientist is, so they just Google a bunch of terms and throw them into the job description. A statistically insignificant number of candidates will have all the qualifications, so the point was – just go in and talk to them and see what sticks.
The discussion eventually moved into more technical subjects such as using R version control and processing environments such as Hadoop and Amazon EC2. I thought there was a good balance between technical and non-technical material, but the organizer Szilard Pafka had so many other topics to cover he’s already schedule a Part 2 panel for March.
After walking away from the thought provoking panel discussion, I stopped to think that this is an excellent time for a technical person to retool herself to get aboard the data science gravy train. This is why I put together a list of FREE data science educational resources over at Big Data Republic.
It is a mighty good time to be a data scientist, and being part of the Silicon Beach data science scene is pretty energizing. I hope to see you all at the next Meetup!