Thursday, March 9, 2017

Ask CJ: how to transition from bench scientist to data scientist?

From the inbox, a good question:
Any idea on how to transition from bench scientist (chem or bio) into a data scientist position? After a 3 year post-doc I’ve started looking at the job market and see a lot more data scientist than “bench”-scientist positions being advertised. Usually with decent companies like Amazon and Google. 
These positions usually say they are looking for people who can analyze data (probably no problem to spin that since that’s the backbone of acquiring a PhD), but the requirement for expertise in coding. Do you think they are rigidly looking for computer science degrees or would being self-taught or using one of these “free” online code schools be sufficient?
I think they're looking for someone who has a fair bit of experience in coding (or at least the Amazon/Google positions). I suspect that being self-taught but having a lot of experience (i.e. folks in academia who do coding as part of their academic work) would be welcome. I don't know much about various coding languages, but it seems that R (?) is one of the ones that people talk about.

I don't know about the free online code schools, although I do want to point out this Bloomberg News article noting that some of these coding schools charge a lot of money for not-very-successful results.

Readers, you probably have much more experience with this than I - what's your opinion?

UPDATE: Someone we'll call MK writes in with a detailed story about how they went about a similar path to the desired one above:
To introduce myself: I did a physical chemistry PhD (some kind of spectroscopist) and then did something weird: I took a postdoc in [something data/public health/medically-oriented] at a government lab. I stayed in my post-doc for 1.5 years, found a job, and now work as a "biomedical analyst" (not my real title, but very close) in the government. 
Thanks to MK for their contribution.

(Incidentally, one of the things that I have always said is that "STEM is really about TE." I think coding schools are an interesting example of actual T shortage - in other words, people have decided that there is enough demand to set up new, private coding schools to teach skills that employers will pay for. As of yet, I am unaware of anyone setting up "chemistry schools." (Of course, how much of this new commercial activity is actually about extracting federal student loan dollars is another question.) )


  1. If you're just getting started with coding, Python or R (or both) would be a good first choice. Whichever you choose, learn how to create graphs as quickly as you can. This might be a little easier in R than in Python, but that's mainly because there are so many choices for how to make graphs in Python. You should also learn how to read, parse, and write text files as quickly as you can. Python is a general-purpose language with libraries for all sorts of things, including many common cheminformatics tasks and just about everything you can do in R, which is more narrowly focused on statistical calculations.

    You'll want to find a comfortable programming environment for each language. There are LOTS of options. For Python, Pyzo is a good choice for beginners on any OS. For Windows users, the WinPython distribution is a reasonably painless way to get started with scientific Python programming. For R, it might be better to just learn to use it from the command line. But RStudio and RKWard are both sound options.

    1. I'd argue that the intractability and integration of RStudio makes it a great IDE. I only use the command line for running scheduled scripts.

  2. You might want to look at instrument companies, or even chemical companies that are building up a data science group - your skills as a chemist will be useful and you can build up your coding skills and resume. I used to work at a company that employed a lot of laboratory scientists, and also had a large group of programmers. Most of the programmers were chemists/physicists/engineers who'd picked up programming along the way. This was not accidental - they worked closely with the scientists and it was really helpful to have people who thought like we did. From there, many later found jobs at places like Apple, Microsoft, Agilent, etc.

  3. I would imagine that a data science position in someone like Apple or Google would prefer a more hard core comp sci/math background. However there are seemingly lots more opening in Data Science than say MedChem and a lot of those are in companies such as larger biotechs and pharma. You are correct that experience with a few programming environment is important such as R, Python and SQL. Familiarity with some of the off-the-shelf data analytics environments such as Tibco Spotfire or Tableau is also important.

    There are some ok free coding tutorial site (CodeAcademy, Khan Academy) but they likely dont have the depth you would need. I would likely do some kind of extension program - I know UCSD Extension have one. You may argue that they don't have great results wrt hiring in the field. And maybe I'm old school, but I would think a certificated program would carry more weight than "I'm self taught on CodeAcademy".

    Another option would be to look into the vendor side. Often the vendors of the off the shelf data analysis applications have openings where you can be peripherally involved in data science, using your science expertise too, and maybe be employed while picking up additional training. Easier to move laterally into a new field within a company when your academic credentials are exactly ideal. For example, here at PerkinElmer (We sell Tibco Spotfire for scientific organizations, including add-ins like Lead Discovery for Chemical intelligence) we do hire people in Field Application Scientist roles from time to time - these people do the demonstrations when people are interested in purchasing the software.

  4. There's also this program, which is specifically set up for grad students/postdocs/academics to get into data science. I have a friend that did it, and she had a great experience. There's no cost, and they have great placement rates afterward (companies sponsor the program).

  5. This is hilarious. Life science PhD's refer to anyone who lacks a PhD as "technicians", even those with masters and 5-8 years of college education.

    Then they go on to think that a PhD in life science is a substitute for any other "kind" of education, using the same mechanical thinking mechanism that toddlers develop around the age of 5-6 years old.

    If you get out of school in your early 30s and realize that your line of study is useless, then you wasted your life for no reasons. And no, you can't substitute adding alkyl zinc to benzaldehyde for software engineering. That's not how it works.

    This is the part where you should think long and hard about your life direction.

  6. This is something I should comment on, since I was trying to do this (unsuccessfully) last year.

    Firstly, "data analysis" is a meaningless phrase, since it is highly dependent on context. Yes, you do "data analysis" in experimental organic chemistry, but you wouldn't hire a statistician to do organic synthesis just because he/she had experience in "data analysis", right? Same thing.

    Second, while some of these companies (and bootcamps) may say they want "smart STEM PhDs", it really refers to only a few subjects - Math, statistics, physics, computer science, biostatistics, or computational chemistry. I say this from experience since I was rejected by most of the data science bootcamps (Insight, Metis, The Data Incubator, etc.) for having minimal programming experience and the "wrong" PhD - notice that organic chemistry is not in the aforementioned list. I did end up taking a different "data science" bootcamp last year, and the recruiter who came to work with us was highly confused by my resume - I remember him saying "Oh wow, great, you have a PhD, you're really smart! But it's in organic chemistry, which is irrelevant to Data Science...and everything was all experimental, I have no idea what any of this means - what does "synthetic methods" mean? Maybe you should think about hiding your PhD and all that stuff or leaving it out of your resume since it is irrelevant to data science".

    Unfortunately, the skills you have as an experimental chemist are not really transferable to "data science". You'll need an expert-level command over R or Python, including the various libraries (in Python, that would include scikit-learn, pandas, statsmodels, Theano, TensorFlow, etc.), as well as other languages used in production (e.g. Java/C++/C#). It is also necessary to know SQL so that you can query databases and pull the correct information, and a knowledge of Bash scripting won't hurt either. These aren't really things you can pick up in a few weeks at a code school or learn online in a few classes.

    You'll also need to know the CS fundamentals well (e.g. algorithms and data structures), since they are necessary for understanding memory allocation and some of the machine learning methods (K-Means, Random Forest, etc.).

    Also, Codecademy is a great place to start if you have 0 experience in programming/CS, but keep in mind that that's all it is. It's the equivalent of a course to prepare you for general chemistry. Codeacademy itself won't give you enough knowledge to become employable in this area.

  7. There are many flavors of data science, and bench scientists will be possibly useful for some of them. I second the suggestion to use Python and R, and the entirety of Adamantane's excellent comment. Jupyter is another option instead of RStudio, and it works for both R and Python. Another thing I would recommend is basic SQL skills. In 2015, I briefly tried (unsuccessfully) to become a data scientist.

    One thing I think bench scientists bring to a "data science" job is an excellent awareness controlled experimental data from designed experiments vs. uncontrolled correlational data collected from "the field". Good data scientists of any stripe understand this distinction of course, but the good are few and bench scientists are more able to easily grok this important distinction.

    I think if you want to be taken seriously, you should prepare a (small) portfolio of projects from e.g. that demonstrate your proficiency at solving problems.

    1. The Anaconda Python package is another one to get started without having to set up dependencies. It includes a lot of the packages you need for doing data science already.

  8. If you're more of a bench chemist, you're going to need a lot more training if you're going to make the transition between these fields. There's a lot of masters programs out there that might help you make this transition, I think you would need more direct experience in coding and statistics based on your background. A lot of the online options won't necessarily give you anything other than an introductory/general background.

    FYI, I am currently trying to make that transition, but I spend the last 2 years of my (experimental) physical chemistry PhD and the last 3 of my postdoc essentially as a chemistry-flavored programmer. So, I have more experience working with Python, R, C++, Linux, etc. than your typical chemistry PhD. I took some online classes to help fill in the gaps and learn some of the jargon so people would take me seriously. I think you'd need to know more than just "data analysis" to be a data scientist.

    However, I do know some synthetic chemists that have made the transition into data analyst / business consulting roles. These don't necessarily pay as well, but I think you don't need as extensive experience coding to make that transition.

  9. I know a number of Insight graduates, some from biochemistry/ synthetic organic backgrounds with little to no experience in coding, but that's the exception--most were theoretical pchemists with lots of coding. My husband made it to the finals for The Data Incubator that seems to have really good results/placement. The application includes a brutal data audition where you have 3-4 days to complete a bunch of challenges. My husband had been teaching himself python for 3 months and got through, but by the skin of his teeth. So as Aaron and Adamantane suggest, you will certainly be better set up the more you already know. Their deadline is coming up--worth a shot? Also there are programs that help with the transition of the more consultant-type positions that Aaron mentions; my husband ended up within one such program at a consultancy called ANSER where he is doing some traditional analysis while honing his data science chops. Booz Allen Hamilton has a large department staffed almost exclusively with PhD chemists, most of whom were bench scientists. Basically there are a lot of options to help you along out there--you just have to look hard. Good luck!

  10. That something I can comment on, since I am making the transition from bench medicinal chemistry to "health" data science.

    First, I agree with many of the aforementioned posts - you need to gain skills and experience in various programming languages (R, Python, SQL, HTML/CSS...). "Free" platforms like Codacademy, General Assembly or Modeanalytics will help you to acquire the basics.

    Second, you should gain stronger skills in coding, machine learning, statistics, exploratory data analysis using MOOCs...Look at EdX, Coursera, FutureLearn....many are "free" or certificate comes at a small price.

    Third, apply to Data Science bootcamps. Last year, I applied to various Data Science bootcamps (S2DS, Insight Data Science SF and Boston, and the Data Incubator NYC). I was successfully selected to the last two and, I went to Insight Health Data Science in Boston, Fall 2016. Those programs are intense and you should be prepared to code for long hours and be able to switch between languages.

    Fourth - Go to meetups in Data Science / Business Analytics near you. Subscribe to DS podcasts and channels e.g. Data Science Central to be aware of new techniques - Sentiment analysis, Deep Learning / Neural Networks with Tensorflow etc...

    On your job hunt - you might be frustrated not to see Chemistry among other backgrounds like Maths, Physics, Statistics, Bioinformatics and occasionally Chemo-informatics. Recruiters do not necessarily know what a data scientist supposed to have and copy/paste requirements from one application to another. There are still strong assumptions out there that Chemists are purely experimental scientists.

    This is where you will show your knowledge and skills beyond your PhD (no need to remove it) - Build a portfolio (blog posts, github account...) that displays many flavors of "data science" you are experiencing using professional or personal projects: regular expression, web scraping, supervised and unsupervised ML techniques... I am currently re-doing my professional website with a portfolio/blog posts. Display your coding skills on Github.

    Good luck.

  11. Theoretical/computational pchemist here (PhD). I picked up a fair amount of coding through my graduate research (C, Mathematica, Matlab, etc.), and have also done some work on my own through Codeacademy (Python & SQL, in particular). I just wanted to share my extremely limited experience in interviewing for a data science position, because I think it could actually help someone land a job:

    I interviewed for a data scientist position based on a positive interaction I had with a recruiter at my school's job fair. The recruiter seemed genuinely interested in the usefulness of my scientific background, despite having 0 experience in a formal data science career. [I think this is evidence that there is some hope out there for people looking to make the jump from academia to data science.]

    I made it past a first round of interviews, one with a technical recruiter and another with the hiring manager; however, in round 2 I interviewed with members of this company's data science team and got absolutely grilled on SQL and machine learning. By grilled I mean that they wanted me to dictate the lines of code that I would write to solve example problems that they described to me. I was inadequately prepared for this and it went poorly (completely my own fault), so I did not receive an offer.

    Basically i'm sharing this information because I do think that someone more prepared, i.e., with more SQL practice, could have landed the job in the scenario that I experienced. Good luck!

    1. I would give a +1 to this post, because SQL is something that is more difficult to get experience as a physical scientist. This is probably going to be your biggest weakness trying to switch fields because almost all jobs in data science require SQL experience.

  12. A question for those with some experience in this. How much creativity is involved? I have a PhD in organic chemistry. A big reason I was attracted to chemistry in the first place was the creativity involved. Now I'm considering alternative careers and I'd like to find something where creativity is important.

  13. Programs such as Insight do have a very good track record for placing PhDs into jobs. However, they are very selective in who they accept in the first place. One common criticism I have heard is, "if you're good enough to get into insight, you're good enough to get a job without insight."

    My personal experience: I applied to insight 3 times. The first time was after spending a month learning python. I got rejected without an interview. The second time, I taught myself some data analysis and machine learning packages, I got an interview but no acceptance. The third and final time, I taught myself SQL, took some machine learnings moocs, and did a few projects on Kaggle. I finally got in, but I turned it down-- a company gave me a full time offer just as I was finishing the final interviews with insight. Data science is definitely in demand; I would say that my starting salary is 30-50% higher than what it would have been had I gone into industry in chemistry.

    1. Can you tell us what your background is? What kind of computer experience did you have before you started with python?

    2. Did several years in an inorganic chemistry lab before switching over to computational chemistry to complete my Ph.D. Had very little experience programming before the switch. In computational chemistry, I gained a familiarity with the UNIX environment and with writing short scripts to help process data, but I still started from a very low baseline.

    3. This is a very helpful comment, thank you!

  14. Its very informative blog and useful article thank you for sharing with us , keep posting learn Data Science Training


looks like Blogger doesn't work with anonymous comments from Chrome browsers at the moment - works in Microsoft Edge, or from Chrome with a Blogger account - sorry! CJ 3/21/20