Wednesday, April 13, 2016

How do you use data sorting tools?

Friend of the blog Philip Skinner donated $200 from Perkin Elmer Informatics to the DIY Science Zone at the 2015 GeekGirlCon. With that donation, he gets a post on any topic. He asked to talk about Spotfire and other data tools, as he writes below: 
As a medchemist I lived and breathed SAR. And, working with Spotfire, I still kind of do as that tool is often used to allow quick comparison of many properties and make SAR much easier than looking at Excel tables. My question is this.... how many other industries use similar methodologies? 
I talked to a major unnamed oil and gas company once, but their homogeneous catalysis unit (metallocenes) and they were doing essentially SAR on their catalysts - albeit with different properties than the cLogP and #RotationalBonds that a medchemist is used to. 
I'd be fascinated to learn what other non medchem chemists use similar techniques - be it battery development, paints, stuff to keep mollusks from growing on your boats or maybe process chemistry. What tools do people use (Excel, Spotfire, JMP?). And what properties do they use in lieu of the traditional cLogP etc. 
I think in process development, there are a variety of data tracking tools (and various Design of Experiments software packages.) I think I'm still a fan of Excel, but I am probably stuck in the Dark Ages. I think most research-intensive organizations do some sort of multi-parameter optimization, but I dunno what kind of parameters they have, and how they keep track of all the data. Readers? 


  1. I'm in a process group at a CMO and we generally use statistical analysis for screening and optimization of reactions, I guess that would be parameters would be anything in the realm of chemistry and cost/time. We use umetrics software Mode and it seems to fit the needs. I've also used JMP and consider that a good software package. We don't do much PCA for discrete variables but is an intrest I am pursuing. Any user recommendations on that front?

  2. I'm a research scientist at D-O-W and we use JMP for most statistical analysis. Six Sigma certification requires demonstrated proficiency in JMP. JMP also has some pretty good DOE capabilities. DOE works fairly well for optimization and formulation work, but I'm still not convinced of its utility in exploratory basic research.

    1. I like JMP for the data visualization tools as well. I admit, I haven't tried Spotfire, but JMP makes building graphs really easy.

  3. Back in the day, at a high throughput screening company that no longer exists, I used Spotfire for visualizing data from high throughput screening experiments (homogeneous and heterogeneous catalysis). We had an in-house searchable database as well. It worked fine but we usually exported the data into Spotfire to visualize and prepare presentations. I found it especially useful for het cat, since there was a ridiculous amount of data associated with each experiment (e.g. conversion at 10 temperature points for multiple feeds) We also used Excel. Formal DOE wasn't as common back then (200-2010) as it is now, so we did not use it.

  4. In my pharm dev group at mid-size pharma we use JMP for all statistical analyses from DoEs, and (by executive decree) Prism for routine data visualization and basic statistical tests. And of course, day-to-day calculations go through excel.

    Most of the Prism work is done for drug product development, so trending CU/BU of tablets across a manufacturing run, and especially for reporting results from dissolution testing.

  5. We have JMP but we don't have any expertise on its use in the company. We use R for DoE work, stats, graphing, data sorting, and we also have the usual Excel.

  6. Vortex for most of the SAR type work. Love the support for python scripting to integrate other tools.