Wednesday, September 6, 2017

Should chemistry majors learn to code?

...Software better geared to those earning chemistry degrees or conducting research is readily available. Common examples include MATLAB, Python’s SciPy stack, and GNU Octave. The last two are free, open-source packages. So why haven’t these become standard tools taught to all undergraduate chemistry majors? A key barrier to their adoption is that each of these packages requires some level of programming ability ... and computer programming is not part of the standard training for chemists ... but it should be. 
Learning computer programming is an invaluable skill for chemists, as it empowers them to do more with collected data and, ultimately, to be more efficient and effective scientists. A competitive edge today doesn’t necessarily go to the person who can collect the best data but to the person who can best process and analyze the data collected. This nuance involves automating repetitive and time-consuming tasks, mining large data sets that don’t fit well in spreadsheets, and extracting information and trends too subtle or complex for people to discern without computers. 
I recently chatted with a fellow chemist who had interviewed for a job at a company. The interviewer asked whether this chemist knew how to program. The company maintains an extensive database of its research results obtained over the years, and research managers want their team members to have computer programming skills so that they can access this data and use it in their ongoing research. The interviewer also pointed out that computer programming is a skill that, unfortunately, most chemists joining the company do not have....
As always, I view claims about employability at arm's length. Until Professor Weiss can show me data indicating that chemists with coding experience are hired more often or with higher salaries than those without, I will view his assertion with some skepticism. Still, it seems very likely to be true.

Still, I don't think anyone would strongly argue against Professor Weiss' suggestion that chemists learn how to code. I think the far more substantive debate would be: what in the 4-year chemistry curriculum should be dropped in favor of coding courses?* Readers, what say you?

*My suggestion: add a coding module to each of laboratories in traditional 4-year programs (i.e. general, organic, physical, analytical.) Note: I Am Not A Chemistry Professor. 

41 comments:

  1. As a chemist who happens to be a self-taught coder* I'd advocate for proper stats training ahead of coding any day.


    *(I had the inclination - it seems most chemists don't - if they did have the inclination then surely it'd be more likely they went into computing instead of chemistry in the first place).

    ReplyDelete
    Replies
    1. ErrHuman: I was going to post here to say that statistics would be far more useful and important, but you beat me to it. Glad to see your post made it to the top as well.!

      Delete
    2. Hear hear! Med chemist, and not a week goes by I didn't wish I had more stats knowledge. Maybe it's just me, I thoroughly enjoyed calculus, but could have stopped soon after diffeqs in favor of a semester or two of statistics.

      Delete
    3. The idea of learning to code and having a good knowledge of statistics are not mutually exclusive - these days the combination is often called a "Data Scientist." I have spent over 30 years post Ph.D. in the analytical division of what used to be a very large company. I taught myself to code because I quickly discovered that even the best vendor software only did at most 80% of what I needed and it was that last 20% that made me productive. Automation makes the analyst's life better, removing the tedium and minimizing manual data entry errors. Automation typically needs to be tweaked/tailored from project to project.

      I highly recommend the Software and Data Carpentry curriculum to those looking to getting started. Coursera has some good classes that are free if you don't care about certificates. I find both the R (a statistical processing language designed by statisticians for statisticians) and the Python communities to be welcoming and collaborative. I agree with those who emphasize scripting analyses and using version control. I have selfish reasons - summarized by a quote first attributed to Karl Broman: "Your closest collaborator is you, six months from now, and you don't answer email." Having well documented analyses under version control and a compendium of the data from a project really helps. I use both Rmarkdown/Rstudio and Jupyter notebooks (mainly for Python but one can use R or Julia) for what Donald Knuth calls "literate programming" - where the report reproduces the analysis.

      Often a client will return and say something like, "Remember that analysis you did for me a couple of months ago? I want you to repeat the analysis and compare the results." Having code under version control, access to a data compendium makes life a lot easier.

      Open Source software is really valuable. Somebody probably has a good start to what you want to do that you can modify and contribute back. If you're not sure how an algorithm works, you can "use the Source." You also don't have to wait for it gets on the manufacturer's radar screen to get developed.

      Delete
  2. You do not need programming skills to "access data"- maybe you need database searching skills.

    That said, if the question is "Is knowing how to program better than not knowing how to program?", the answer is obviously yes. The real question is "Is learning how to program the best use of your time at this point in your career""

    ReplyDelete
    Replies
    1. Maybe their commitment to open data access was enough to get the data into an SQL database, but not enough to have an IT guy put together a frontend?

      Delete
    2. Database queries ARE code, written in a database query language. You can get by with mere "database searching skills" only if someone more knowledgeable than you has already put the raw data into a searchable format (because it doesn't magically come that way) and provided a convenient search interface. You might be fine with hiring someone else to do that sort of thing for you. Others might prefer to learn how to do it themselves, especially if there is employer demand that translates into improved career opportunities. Anecdotally, it appears that such demand at least exists.

      Delete
  3. Chemists who can code will be happier in their careers, like Dilbert is.

    ReplyDelete
  4. At McGill University in Canada, where I did my bachelor's degree, learning to code in MatLab is a mandatory part of the undergraduate chemistry program.

    ReplyDelete
    Replies
    1. More specifically, it was part of the third-year instrumental analysis lab courses.

      Delete
    2. Another McGill ChemistSeptember 6, 2017 at 9:09 AM

      We had to build and code our own interface for (IIRC) a potentiometric titration experiment, in that same class. Except in our case, it was in Turbo Pascal (those were the days...) Today they'd likely do something like that in Python.

      Delete
    3. Not in the major, per se, but we had to use Maple for our required three semesters of calc at [mid-tier Ivy].

      Delete
  5. I know it sounds cheesy, but I'd incorporate some kind of "soft skills" training into the curriculum, rather than coding or programming. I've seen many friends and colleagues get good jobs in areas only tangentially related to chemistry and they all secured their positions based on their ability to communicate well, collaborate with others, and by having a positive attitude. The nuts and bolts of the job came later.

    ReplyDelete
    Replies
    1. That should naturally happen *outside* the major in any well-designed undergraduate curriculum.

      Delete
    2. I think the problem with 'soft skills' is either you learned them in kindergarten or you have no hope of learning them.

      Delete
  6. Basic programming is a valuable skill, and was part of my undergrad curriculum (also in the context of an analytical lab course). I think there is a lot to be said for dropping one of the advanced math courses, like differential equations, and replacing it with a course that covers stats and basic programming.

    ReplyDelete
  7. I attended undergrad university at a small state school in the US and STEM college majors were REQUIRED to have a minor to graduate. If you were a biochem major, biology was the obvious choice but as a chem major, you were not required to take any bio classes... However you were required to complete all calculus so it is recommended to minor in math... However Stats was unfortunately not part of the math college and could not be substituted for upper level maths to achieve the math minor.

    Short story: stats would be a better required class than upper level math at my old institution, or as part of the 'math core' at any institution.

    ReplyDelete
  8. At least while doing a post-doc in physical chemistry, it seems like everyone had picked up Python in grad school or was learning Python because they realized they needed it. I know I ended up using it a lot, especially on a project where the data we had was too big for a spreadsheet. It probably would be helpful just to teach this in the undergraduate curriculum since it is a transferable skill. Similarly, more emphasis should probably be placed on statistics, since it seems like several people I knew (including PIs!) didn't even have a grasp of introductory statistics and probability.

    Of course, I recently switched careers to data science, so I found that these skills were more useful in a different field. :)

    ReplyDelete
  9. Every 30 year old programmer working in the Blockchain/Distributed Ledger field is a millionaire, so that might be one way to think about it.

    ReplyDelete
  10. Chemist + self taught programmer here, currently in grad school. I don't think I will ever be hired just based on my programming skills. On the other hand, I was able to automate a few simple tasks in our research group, which would have not been possible otherwise: simple tasks such as calibration curves, data analysis, so on... It's a life skill. It saves my time every single day. I feel better about myself for not doing the same lather-rinse-repeat cycle that most of the fellow chemists do. I can make visually appealing, publication-quality plots in a matter of minutes. Ten years down the line, machine learning algorithms will be used in every facet of our lives. You can choose to jump on the bandwagon either now or later.

    It burns my insides to see fellow chemists still using Excel spreadsheets day in and day out: 1. as a data storage format, 2. as a mathematical tool and 3. as a data visualization tool. It really is time to change. Jupyter notebooks have made coding as simple as it can ever be. Introduction to Python course must be given as much importance as a general chemistry or a introductory physics course. One major issue to consider: I know at least a few senior professors who are highly averse to the idea of introducing any aspects of programming in their research ("I don't understand what it is; I don't want to learn it; I can live without it; hence I don't want to change"). It might be very difficult to convince them to introduce minor programming modules in the classes they teach (understandably so). However, younger professors and instructors might be more open to such ideas, and introducing them in laboratory courses (as you suggested) might be an easier option.

    ReplyDelete
    Replies
    1. "It burns my insides to see fellow chemists still using Excel spreadsheets day in and day out: 1. as a data storage format, 2. as a mathematical tool and 3. as a data visualization tool."

      I resemble that remark. Thanks for the thoughts.

      Delete
    2. My grad advisor who started his academia career circa 1994 directly after his post-doc was very much against the use of powerpoint for all presentations, preferring either overhead slides via that awful light table thing or giving talks free-handed on a chalkboard/white board. I would expect this from maybe an older prof but never one in his 40's at the time.

      Delete
    3. "It burns my insides to see fellow chemists still using Excel spreadsheets day in and day out: 1. as a data storage format, 2. as a mathematical tool and 3. as a data visualization tool."

      Especially once you consider many recent infamous reproducible research debacles were caused by difficult-to-track formula errors in Excel. Bugs still occur otherwise, obviously, but they're much easier to notice and test for in normal programming languages than nasty, ugly Excel formulas like =B4*C10/(X32 - AB3)

      Delete
  11. We were also required to program in Matlab in our required analytical and chemoinformatics courses. And much like anything else, if you don't use it, you lose it. While I use different aspects of every other "mainstream" chem-major required course (organic, inorganic, pchem, biochem, analytical, etc.) in my daily job, I couldn't do a thing in Matlab if I tried. Those interested in going into programming will take courses outside the chem major. Those that later decide it's a necessary/desirable skill set will teach themselves later. I don't think requiring a dedicated course at the expense of something else is a great idea. A brief introduction does not provide enough of a foundation/skillset for it to stick.

    ReplyDelete
  12. Yes, if anything it would enable them to go for careers in software engineering since there aren't any careers in chemistry anyway.

    ReplyDelete
  13. As a biologist who learned to code Pascal for an elective, I can say that my Pascal skills have been... slightly helpful? for one or two projects.

    Things I would consider higher priorities:
    1) Anything that builds "soft skills", of which I have a deficit;
    2) More stats training;
    3) How to social-engineer an NIH study section;
    4) Anything that builds time management skills, of which I also have a deficit;
    5) Grant writing skill development;
    6) Lab accounting/budgeting skills;
    7) OK, this is about where I'd put coding in a particular language.

    ReplyDelete
    Replies
    1. No, you had it at #2, during which you will learn and use R.

      Delete
    2. Unless your university has a site license for SAS (or another package)....

      Delete
  14. I'm a masters degreeded chemist in my early 60's and still employed (Thank goodness!)...

    Programming has always been a hobby... but one I have used on EVERY job I've ever had.

    Back when I had calculus in Freshman year college in the 70s they made us learn enough BASIC on our own to be able to do numerical integration (mainframe with Keypunch)...

    I went Northeastern and on my co-op job I wound up writing BASIC code to process some GPC data (input from a Digitizer) on a desktop "calculator" (http://www.hpmuseum.org/hp9830.htm)

    One year I needed to fill an elective slot so I took a beginning Fortran course... It met at 8AM, and since I was not morning person I rarely made it to class... but since I already knew how to code I got A anyway from projects and the exams.

    I took a course in the Chemistry Department Called "Principles Of Electron Instrumentation" where they taught basic electronics and how to code a 6502 8 bit microprocessor in assembler! ... I guess in case we needed to built our own instruments.

    While i have written programs on very job I have had, none were really hard core sophisticated data crunching, mainly things to streamline workflow and database stuffed reporting, some of which were quite large projects used by others in other company outside of my department. I also have been involved with coding for custom instruments...

    The languages/enviornments I have used at work are BASIC, Fortran, PASCAL, VGL (Programing language in SampleManager LIMS), Datatrive (was a Vax based), AppleScript, LabView and Xojo.

    Because I have not had to do anything hardcore I never learned python (which has a lot of scientific packages) and never could get comfortable in C bases languages... It's not the concepts, it just that my brain HATES C syntax.

    Over the last 20 years I have worked in mixed PC and Mac environments so Xojo (formerly REALBasic- https://www.xojo.com/index.php) which can compile to Mac, PC and Linux has been my main go to language even though it does not have the scientific packages... It uses BASIC type syntax but is a modern Object oriented language and got over the hump of learning object oriented programming (OOP)

    I think being able to program has been a big help on all of my jobs...

    That said I don't have the skills to do it full time and I don't think I would like it... I like being able to pick and choose what programming projects I want ti do...and I end up doing most of it on my own time

    ReplyDelete
    Replies
    1. I think your 6502 assembler skills would have been useful had you wanted to code for an Apple II--they used 'em.

      Delete
  15. I've done a good amount of (fairly simple) coding in my graduate lab, mostly to make data analysis faster/easier. I like coding things a lot, because if something isn't working, it's possible to figure out what's wrong and fix it. The satisfaction of building something useful and functional helps me feel productive even when science isn't working.

    I took introductory computer science classes in both undergrad and grad school - not required, just for fun. I don't know if a required course would have been particularly useful...I didn't fully learn the material until I had regular opportunities to use it, at which point I had forgotten most of the specific details of the courses I took anyway. My ideal curriculum would have a little bit of coding in several different undergrad courses, maybe through a combination of instrumentation/data analysis and computational chemistry. I think this would be more effective than a single coding-focused course.

    I hope I can find a job where coding experience is useful. I've been considering trying to go into data science, but I don't think I have enough stats and coding experience to actually get hired for that. I know more than anyone in my lab, but that's not saying much at all...

    ReplyDelete
  16. My two cents:

    It's not just that "chemistry majors should learn to code"; I feel that *all college graduates today should learn to code*. Programming is becoming a fundamental type of literacy these days. Just like how all college graduates should be fully literate in English and have some exposure to mathematics (e.g. calculus), all graduates should also have some experience with coding or programming.

    As to *how* to incorporate programming into a typical undergraduate chemistry curriculum - I'm not entirely sure. Like a lot of people here, I took a required course as an undergrad on Matlab programming, after which I promptly forgot everything, since we never used it again. My PhD work in synthetic organic chemistry also involved *zero* programming, and other organic chemists here will probably also have similar experiences. In organic chemistry, programming is one of those things that is nice to know, but not at all necessary for success, and may even be viewed as somewhat of a distraction - is knowing how to program in Java going to get you better separations in your columns? Not really.

    Everything I know about programming came AFTER I finished my PhD - I self-taught programming with online courses, starting with Codecademy, and after I felt I had reached a decent level of competency, I enrolled in a "Data Science" bootcamp last year. Everything I learned was completely orthogonal to chemistry; there's little overlap between training and running a machine learning model using Python/scikit-learn and being able to do asymmetric oxidations at -78 C.

    If you're doing computational chemistry, then sure, knowing fundamental programming and CS is incredibly important. In experimental synthetic chemistry...I'm not so sure. My academic experiences have proved that programming has limited utility in chemistry. I think it's time for this part of chemistry to catch up to the modern age as well. Like Anon 3:15 PM says, if you can type print('Hello World!') into a Python interpreter, then congratulations - you know more programming than 99% of organic chemists. But you also know less programming than 100% of professional developers.

    ReplyDelete
  17. As an analytical chemist working in drug discovery DMPK...I'm a self-taught coder -- I use or have used R and Basic in Excel, and I'm working on learning some Python. I've found coding to be very useful and I would highly recommend learning either some Python or R. I don't think it's essential to a career in chemistry by any means, and I don't think it should be part of the undergraduate chemistry curriculum (to add it you'd have to cut something else out).

    With that said, though...having some coding skills gives you an edge, and it makes you better at your job. You can (and I have) save yourself time down the line by writing scripts to automate mundane workflow-related tasks, and you'd be amazed how many more options you have for visualizing or analyzing data using a language like R or Python (R in particular is amazing when it comes to visualizing large, complex datasets). And what about the times when the software package you have doesn't do everything you need it to do? There's no point allowing yourself to be constrained by the limitations of some commercial software package. The world of open-source is surprisingly vast, and in cases where open-source can't do what you need, if you know how to code you may be able to put together your own solution.

    In science, we are in the business of designing experiments, collecting data and figuring out what that data means. For data analysis (the "figuring out what this data means" part), Python and R are IMHO light years ahead of what you can do in Excel. So I think that learning some Python or R will ultimately make you a better scientist. It's not essential, you don't have to have it and I don't think it necessarily has to be taught in undergrad. But if you have time to learn it, I'd recommend it.

    ReplyDelete
  18. Are kids still not being taught basic programming in high school? Why on earth not?

    I find it so weird that schools are always a decade or so behind what is needed (presumably due to availability of teachers and aggressiveness about cutting departments). Like most UK students, I learned French in school, which is almost useless. Painful to think about the hundredfold greater usefulness of Chinese.

    ReplyDelete
  19. Consider how useful a semester of a foreign language really is. Unless you take a skill to a certain level, there's really no pay-off. How much training is Prof. Weiss talking about?

    (Personally, I'm self-taught in a half-dozen programming languages/environments...very poorly in all. I don't even put them on my resume because I'd feel scuzzy about it.)

    ReplyDelete
  20. "This nuance involves automating repetitive and time-consuming tasks"

    That does sound great, and if learning coding would have made column chromatography less cumbersome I'd be all for it (I get that there are automated systems for this, which do save some time, but it's still a mind numbing hassle---though I've not run one for > a decade, I still recoil at the thought).

    In, let's say many, years since finishing my PhD in organic chemistry I can't think of a single situation in which coding would have helped me. I can think of a few situations where knowing German would have helped me....

    ReplyDelete
    Replies
    1. The German language thing is easy enough to fix. Try Duolingo. It's cheeky, and people debate its effectiveness, but it's been a helpful start for me. Together with the Beilstein German-English / English-German dictionary ( http://web.stanford.edu/group/swain/beilstein/bedict2.html ), you can start tackling some German language literature. It takes time but it's a goal that's far from daunting.

      Delete
  21. This all seems silly to me. Can analytical chemists utilize coding skills to their benefit? Of course. So can other types of chemists. Even synthetic organic chemists could (and do) utilize these types of skills. But to what end? In addition to knowing our job, and the job of our biologist colleagues, and the pharmacologists, and the DMPK/PPDM types, and how to do rudimentary software and hardware repairs on many different types of instrumentation...now we ALSO need to know how to program/code?! When are these other folks going to gut up and learn some chemistry? Why is it always flowing one direction in this industry, right on top of the heads of the chemists? Yeesh. Pardon my frustration...

    ReplyDelete
  22. 1 - Agree about statistics, interestingly my econ degree required much more math than my chem degree did. Proving Bayes' theorem isn't all that handy to be able to do, but the rest of the stats stuff I remember has turned out pretty useful.

    2 - Data management is a huge issue, especially with instruments that all output their spectra in either .txt or some weird proprietary format. I think one reason Excel persists is that it's relatively inexpensive and fairly easy, and you can use it to get data into a format pretty much anything will read pretty easily. I like Origin for nice looking figures and actual stats. Some folks like Igor, but I've found it really tedious. Haven't tried to use R in a long time, probably want to get to it eventually.

    3 - I agree with the comment above who noted that the real question is "should I learn to code in language X at this point in my career?" I think maybe for students the answer is going to be yes more often as their career trajectory hasn't even really started. Does that mean unis should start throwing that requirement at their chemistry major? I've no idea.

    4 - "soft skills" are a constant debate in any field, but honestly given the quality of writing I saw from students and often see in the literature, maybe we should start there.

    ReplyDelete
  23. "Coding skills" at the level of languages such as Mathematica, MathLab or python should be standard in any career that involves data and mathematical analysis.

    ReplyDelete