First, let’s start with a joke:
Analytics is a lot like golf. In the above we see the Gorilla gets the ball to the putting green in one shot. This is fantastic. However, any golfer knows that this is only 80% of the game. The short game, close to the pin is where the Masters are separated from the hackers. The same can be said about text analytics. When you attempt to take advantage of your unstructured text there are a lot of easy 80% solutions you can engage in. However 80% isn’t the goal. The more hacking you do to cover the last 20% is ROI and opportunity lost.
From my experience, the best analytics approaches cover the problem end to end. The heavy lifting 80% up front is the processing. I’ve spoken on this site about noise control already, so I won’t belabor the point. A good processing pipeline has low noise and produces information you can work with. The last 20% of the game consists of knowing your goal and taking the right measured steps to get there. In particular you need to start with good analytical questions and good procedures to fill in the blanks with analysis of your data. Your process should be scientific. The scientific method is simple and well described. It should be something you take to heart when solving analytical problems. Finally, the process should be collaborative. A lot of analytical systems aren’t set up to be collaborative, which is a shame. This kind of work is best done with multiple people able to work together. If you have ever put together a jigsaw puzzle you know what collaborative work can be like. What makes for a good collaborative experience is one where you aren’t constantly blocking each other, you can share information quickly and easily and you can both contribute in real time to the solution.
Knocking the ball into the whole is exciting. You have to do it 18 times per game. Analytics is exciting too. Finding information you can act upon is wonderful. However you have to keep doing it. It takes several iteration of analytics to really accomplish anything. Tomorrow you will have a whole new set of questions to tackle. My system designs rely upon a set of core philosophical beliefs. The primary one is that as interesting and exciting the core technology is, don’t expose the handles and nobs and levers of the algorithms. Analysts are trying to solve hard problems. Adding in algorithms tweeks and the problems they face become exponential. The technology should be an invisible hand that is helping, not this thing involved with the analysis that you have to carefully tend to. Another philosophy is to keep the goal in mind, always. You should be able to look up from the ball and see the flag at the hole. Thus all local decisions should be guided by where you want to end up.
Personally, I take Samuel Clemens point of view on the game of golf. It is a beautiful walk in the sun. Ruined. However, the analogy to something I do love to do, text analytics, offers some clear guidance.