Verbatim

My thoughts, written down.

Harvard Career & Academic Resource Center (CARC) Awesomeness

Posted by signal on August 29th, 2012

Today I registered for many of the CARC workshops for Fall 2012.  Some of the best ones are offered only on-campus, however there were quite a few good ones offered online that I was able to register for:

Listen Up: Three Ways to Make Your Audience Pay Attention (webinar)
Gaining Grammar Confidence (webinar)
How to use Positive Psychology to Improve Performance and Wellbeing (teleconference)
Resume and Cover Letter 101 (webinar)
Perfectionism (webinar)
Lifting the Curtain on Powerful Persuasive Speaking (webinar)

One in particular that looked very interesting that has already filled up (I was not able to get into it), was:

Networking: Making Your Connections Count (webinar)

 

Posted in Uncategorized | No Comments »

Ready to try my first Coursera course: Statistics One

Posted by signal on August 26th, 2012

In preparation for taking STAT E-50 Spring 2013, I am taking a class from CourseraStatistics One.  I have heard many good things about this class, and using Coursera for general Data Science training.  They have classes in Machine Learning, Data Analysis, Neural Networks and so much more.  I picked up the text, Statistics, 4th Edition (by Freedman, Pisani, & Purves. Norton Publishing), cheap online.  The international version (which I bought), is supposed to be the same as the US version just a different color.

My wife will also be working through her undergraduate Statistics class at Champlain College, and so we will be doing statistics together.

One of the main draws of the Coursera class is that it uses R.  I have been trying to get my R skills in shape and doing some practical statistics will be great.  The class is taught by Professor Andrew Conway, Princeton University.  It will be an interesting experience to take a class “unofficially”, with much less stress, before I take the real class for credit.

Posted in Uncategorized | No Comments »

Using Coda2 – My experiences during S-75

Posted by signal on August 7th, 2012

Shortly after registering for S-75 Building Dynamic Websites, I began searching for what tools I would use to build my projects with. I specifically looked at tools that run on Mac OSX. At the very least, I was looking for an IDE. Some of the programs I looked at were:

Versions

Cornerstone2

Coda2

In reviewing all of these programs, it seemed that Coda2 was what I wanted.  It had the ability to remotely edit files.  I knew I would be storing most of my files on the CS-50 appliance (virtual machine) used for the class, but I wanted to use rich editing tools.  Coda2 also supports SFTP/FTP, CSS, PHP, Version Control (Git/SVN) and more.

There is a forum that is used to discuss Coda2, you can find it here.  Coda2 is definitely not without bugs.  I experienced a lot of sluggish behavior, and at times it just became unresponsive and I had to force quit and restart.  I never lost any data.

My biggest disappointment had to do with the code validation and error checking.  If you are developing monolithic files, where everything you are trying to do is in one file, I am sure it likely works well.  However, when developing dynamic web sites, its very typical to have a file output your header for example, with your document specification, etc., and then have many files that are included together to create your overall code.  Coda2 doesn’t like this.  If it sees you have HTML in a file, but no header for example, it freaks out. It’s not smart enough to look at all the files in the project and start with index.html and assemble them logically.  Hopefully they fix this, I basically was on my own when it came to validation and error correction.  I manually scraped my code from “View Source” in my browser and uploaded to W3’s Validation Service.

Things I liked about Coda2:

  • Syntax highlighting
  • File navigation
  • Powerful Editor
  • Good page preview ability

I should mention that I did not use the version control built into Coda2.  This had nothing to do with its potential to do this function.  Because the code was actually being stored on the CS-50 appliance, it made more sense for me to use git built into the CS-50 appliance.

I will say that an IDE is definitely not necessary for a class like S-75, although I did find value in using one.  If you are already comfortable with something like Text Wrangler or vi, then that may work just as good.

 

Posted in Uncategorized | 3 Comments »

My Data Science Roadmap

Posted by signal on August 3rd, 2012

I have set a goal to learn Data Analytics and began this journey a while back.  One means which I am learning Data Science by is EMC’s Data Science Training.  They succinctly outline the skills I am looking to master for building a practical foundation of analytics:

Problem Category of Techniques Methods to Learn
Group items by similarity Find structure and commonalities in the data Clustering K-means clustering
Discover relationships between actions or items Association Rules Apriori
Discover relationships between the outcome and input variables Regression Linear Regression Logistic Regression
Assign (known) labels to objects Classification Naïve Bayes   Decision Trees
Find the structure in a temporal process     Forecast the behavior of a temporal process Time Series Analysis ACF, PACF, ARIMA
Analyze text data Text Analysis Regular Expressions, Document representation (Bag of Words), TF-IDF

 

In addition to the above I plan to approach with foundation knowledge in Mathematics, Computer Science, Machine Learning, Artificial Intelligence, Predictive Analytics and Life Science.  Some of this will be via my degree program at Harvard, however the program I am in, Information Technology, only gives some courses that are useful in Data Science.  Other knowledge will come from additional courses I will take outside of my degree program, books, and possibly even the pursuit of another graduate degree specific to Data Analytics.

A few degree programs that look very attractive are below.  The prerequisites are what prevent me from pursuing one of these programs at this time.  I have significant amount of work I need to do to get my Mathematics and Life Sciences foundations built up before I would be able to be admitted.  My background is in technology and computer science, which is very useful to Data Science, but only one part of a much larger domain of knowledge.

Master of Science in Bioinformatics – John Hopkins University

Master of Science in Analytics – North Carolina State University 

Master of Science in Analytics – Northwestern University 

Master of Science in Predictive Analytics – Northwestern University

Mining Massive Data Sets Graduate Certificate – Stanford University

MSc Machine Learning – University of London

Master of Science in Data Mining – Central Connecticut State University

Master of Science Biomedical Informatics

 It would likely be three years or more before I would be able to pursue a program such as above.  In the meantime I plan to build up my knowledge in the various domains.

College Courses I will take outside of Harvard (all of the below have co-requisite labs as well):

Biology I
Biology II
Chemistry I
Chemistry II
Organic Chemistry I
Organic Chemistry II

Courses I am taking or have taken at Harvard that will help in Data Science:

Introduction to Statistics
Java for Distributed Computing
Oracle Database Administration
Visualization
Computing Foundations for Computational Science
Books I will be working through:

R

Data Mining with R: Learning with Case Studies (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
The R Book
Data Mashups in R
R in a Nutshell: A Desktop Quick Reference
R Cookbook (O’Reilly Cookbooks)
Getting Started with RStudio
Parallel R

Statistics

Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems)
All of Statistics: A Concise Course in Statistical Inference (Springer Texts in Statistics)
Think Stats
Statistics in a Nutshell: A Desktop Quick Reference (In a Nutshell (O’Reilly))
Statistics Hacks: Tips & Tools for Measuring the World and Beating the Odds

Linear Algebra

Introduction to Linear Algebra, Fourth Edition

Machine Learning

Machine Learning in Action
Machine Learning for Hackers

Data Mining

Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites
21 Recipes for Mining Twitter
Big Data Glossary
Data Analysis with Open Source Tools

Visualization

Designing Data Visualizations
Now You See It: Simple Visualization Techniques for Quantitative Analysis
Beautiful Visualization: Looking at Data through the Eyes of Experts (Theory in Practice)
Visualize This: The FlowingData Guide to Design, Visualization, and Statistics

Hadoop

Hadoop: The Definitive Guide
HBase: The Definitive Guide
Programming Pig
Cassandra: The Definitive Guide

There is much I have left out, I am sure, and if anyone has any good books to recommend please do.  I have found the Quora fourms to be particularly helpful in networking with others about Data Science.

 

 

Posted in Uncategorized | 2 Comments »

My final project for S75 = Ajax, PHP, javascript, XML, CSS, mySQL, HTML, Google and BART

Posted by signal on August 2nd, 2012

I am finally done with my Summer School class S-75 Building Dynamic Websites, taught by Professor David Malan. It was a very intense course. I liked how fast things moved and how we were challenged every single day. I already had a background in some of the technologies (PHP, HTML, mySQL) but had really no working knowledge in so much other stuff such as xPath, XML, CSS, Ajax, javascript, etc.

My final project was building a mashup between the BART (Bay Area Rapid Transit) using their API and the Google Maps v3 API. It was written in PHP, used mySQL as a datastore cache, and pulled realtime information from BART using Ajax. The program was written using the Model / View / Controller methodology and even version controlled via git using bitbucket.org as a repository host.

I pretty much gave my soul to this class for the 7 weeks. I was even coding on my vacation :). However, I feel it was a good investment and I would do it again. If you ever have the opportunity to take any course from Professor Malan I highly recommend it. I am considering taking CS-50 possibly in Spring 2013 to fill in some of the gaps of my computer science background.

Below is a screenshot of my final project, where the Pitssburg/Bay Point – SFIA/Millbrae route has been selected. You can see stations plotted along a path that was drawn in the actual route color used by BART.

The user an click on the stations and it pulls realtime data from BART. Since it is using Ajax, there is no additional page load that happens and so the whole process is a very seamless user experience. It was one of those projects where when it starts off you wonder how you will make it happen in such a short period of time, but then you amaze yourself by pulling it off. Now that I know my way around the Google Maps API, and I have a reasonably good foundation in xPath/XML, I am on the hunt to find some other sites with GIS data, where I can build a mashup of something that does not yet exist, something with Big Data.

Below is another screenshot of what it looks like when the user clicks on a station and receives realtime data, this time on the Fremont – Daly City route (green).

Posted in Uncategorized | 3 Comments »

Project Euler – A Great Learning Tool

Posted by signal on July 30th, 2012

I am planning on taking CSCI E-160 next semester Java for Distributed Computing, but one of my issues is that its been sometime since I have actually used Java.  I did get my foundations in Java while a computer science undergrad at Louisiana State University, unfortunately that was eight years ago.  Java has never been my “goto” language for quick and dirty hacking, that has always been a script language such as Perl.  So the task before me was to try to brush up on my Java knowledge and as I was doing so I discovered Project Euler.

Project Euler presents you with over 300 computational problems designed to be solved with computer programs.  There is a whole community of people that have worked to solve these problems.  The problems have varying levels of difficulty and the tools to solve them are up to the user.  Many of the users use C, Java, x86 assembly, however there are many using languages such as Delphi, Pascal, PHP, etc.  Part of the fun in solving the problems is to review others solutions, something you can only do once you have submitted your own.  Many of the users are math/science geeks and have extremely clever ways of solving problems.  For example, today I solved a problem using primes, which led me to the paper The Genuine Sieve of Eratosthenes by Melissa E. O’Neill.  This showed one of the most efficient algorithms used with discovering prime factors….of course I had already written my Java code.  The point is however, that you learn your programming language at the same time you are learning some pretty good information on math and algorithms.

So far I have only completed a handful of the Project Euler problems, however I actually find it an interesting aside to try to solve them when i am trying to take my mind off something else.  They are fun, and that’s the point of the site.

 

Posted in Uncategorized | No Comments »

Spring 2012 Semester Complete – GPA 4.0!

Posted by signal on May 25th, 2012

The Spring 2012 semester is finally all wrapped up. I got my grades back today and I received an A in CSCIe-131b, my first class toward my ALM.  I am very excited about this.  Now I look ahead to Building Dynamic Websites with Professor Malan, and I am looking forward to a challenging and rewarding class.

I continue to struggle with the desire to focus more on Bioinformatics.  Having just come back from the Data Science Summit 2012, I am very excited about what the future holds with analytics and Big Data.  I will have to be content with working in my primary discipline and having to increase my knowledge more slowly over time in other areas such as Life Science.  I have run the plan over and over in my head on how I could tackle a degree in Biotechnology / Bioinformatics, but it just doesn’t make sense for me right now.  My journey naturally leads me toward Computer Science / Information Technology.  I need to complete that and stay focused.  So that is the plan.  That said, nothing stops me from becoming more involved and relevant to a Data Science community and I am looking forward to any opportunities that present itself, particularly a project or area of research in academia that I could assist in.

Posted in Uncategorized | No Comments »

Almost the end, of my beginning at Harvard

Posted by signal on May 4th, 2012

The final week of Spring semester is now wrapped up, and all that lies ahead is my final for CSCIE-131b, Communication Protocols and Internet Architectures.  When I began this semester I did not know what to expect.  I had studied via distance education before at different universities, but each experience was varied and in many cases lackluster.  What I have found so far, has been a refreshing experience that has delivered in both quality and authenticity.

My goal is the ALM in Information Technology (concentration in Information Management Systems).  In many ways, I find the Biotechnology (concentration in Bioinformatics) to be even more interesting, or the Information Technology (concentration in mathematics and computation) of interest.  The reality is my undergraduate background does not lead itself to either of these pursuits.  I have a passion for problem solving.  A second point, is that for many of the Biotechnology classes, they are not offered online.  Both mathmatics and computation and biotechnology also require more classes be done on campus and have other stricter requirements, including a thesis.  Living in Florida, and the impracticality of taking flights every week for classes, make either degree option a non-starter.

This weekend I will spend time studying for my final and hopefully walk away 1 class further toward matriculation at Harvard University.  My plan is and has always been to only take one class per semester, at least until matriculated.  This is actually the recommendation of the University.  In the meantime, my plan is to watch the schedules and build out a set of classes that satisfy my degree requirements and at the same time challenge me and fuel my passions.  For me, this pursuit is all about the journey.  I plan to frequently take classes outside of the degree requirement, just for the sake of gaining the knowledge and experience, and in some cases clearing away the prerequisites for more interesting classes.

One such area is analytics and statistics.  Some classes offered are right up that alley, such as Visualization, which I am looking forward to.  Other classes are not part of the required degree work, but would be beneficial to anyone who is a problem solver.  For example I plan to take classes in Statistics, Linear Algebra and Discrete Math.  If the opportunity presented itself, I would even take classes that would further my knowledge in Bioinformatics.  It is very exciting to see the divergence of life sciences, engineering, programming and Big Data.  I am strong in two of these disciplines (engineering and programming), and I am learning more and more each day about Big Data.  Life Sciences is my weakest area, having not taken any real classes in this area since High School.

All things considered, I am looking forward to the educational experiences and becoming as involved as I can in my Harvard experience.

 

Posted in ALM in IT | No Comments »