Monday, March 28, 2016

Is it a big deal to switch from Fortran to C++ and Python?

The article about a computational comeback in drug discovery was a good read (C&EN, Jan. 25, page 19). I do agree that graphics processing unit (GPU) technology has made computer calculations faster in drug design. Using GPU technology in conjunction with advanced computer languages is really how the computer systems work. 
One benefit to computer modeling is using scientific computer languages such as C++ and Python, which have made the calculations of advanced chemical mathematics much faster. Previously, the use of the Fortran language was acceptable for chemical math equations that were simplified. Continued advancement in computer hardware and languages will spur growth for future years. 
Mike Renier
South Range, Mich.
I wish I could comment intelligently on this, but I can't.  


  1. People still use Fortran....?

    And yes, Python in particular makes it much easier to process data. It's an interpreted language, though, so I wouldn't necessarily call it "fast". It is much faster to prototype code in Python than write something in C++ and then spend days debugging it, though.

    1. That's why Python is my go-to language these days. Much faster for me (so many built-in types as part of the language for one thing). A lot of my stuff is scripts to process data files.

      I don't do computational work, but it looks like C++ has become more popular, with libaries like Boost and others. Unlike FORTRAN (I think that means FORmula TRANslator), C++ wasn't designed for science and engineering so it takes more work to go from the "problem domain" to the "programming domain".

  2. Yep, people still use Fortran. It's still unbeatable for fast number crunching and new compilers are excellent.
    And, being that it has anally-retentive syntax, it teaches good coding practice. ;)

  3. Python is perfect for most projects a scientist or a student may have. There tens of libraries and packages that make your life very easy. If your project needs original algorithms or heavy calculations/simulations, then Fortran or C++ is the way to go I guess. Python is also easier to learn for people without CS background (like me) since you don't really need to know what's happening in the background. So, it's fair to say that it's more accessible to chemists and biologists that need quick data analysis or visualizations.

  4. What is "fast"? Fast to write or fast to calculate? If you I need raw CPU crunch power Fortran + assembler for I/O is still the king. If I need to work up a few MB of data not much can beat Python or Excel/VB.

  5. Python and the statistical processing language, R, have grown in popularity for several reasons. First, both languages provide interfaces to both C and Fortran, making it possible to use well-tested libraries and to move computationally-intensive algorithms to highly-optimized, compiled code. The second advantage is that both are open source languages. They have vibrant communities that have probably already written a package that does what you need. Don't understand the algorithm or want to modify it? The source is available. Need an upgrade? It's free. You can test it and make it better by reporting bugs and fixes. My management has been supportive and lets me contribute. Most of the code is hosted on github and the maintainers welcome contributors. My experience is that when I ask well formulated questions that show I have made a reasonable attempt to find and understand the available resources, that the communities are very welcoming and supportive.

    I have benefitted greatly by courses on Coursera taught by the Johns Hopkins Biostatistics Faculty (for R) and presentations and tutorials presented at PyCon and PyData that are archived on YouTube and have source files on github. I learn best by example and good examples are a quick Google search away. The whole Software Carpentry approach desires to teach researchers to to be competent with these tools and do more reproducible research. And all their material is on github with a very generous license.

    I have spent my entire career (> 30 years) in an analytical division of what used to be an iconic company. We have been under severe budgetary constraints for a decade. I need to be productive using old equipment and a lot of old software. Frequently, repetitive tasks can be scripted and made much more reproducible using these Open Source tools. Long ago I discovered that most vendor software did at most 80% of what I needed - and the last 20% - especially automation - was what made me productive. A colleague once noted that we used automation to reduce tedium and permit analysts to focus on the core questions we trained to answer, not end an endless cycle of point, click, copy and paste.

    We use the DRY principle - "Don't Repeat Yourself." Key computations are wrapped into functions and packages with unit tests. The script that processes a given data set tends to be short and often is incorporated as a code chunk in a R knitr document written in R markdown or LaTeX that generates the report on the fly. If I am using Python, it is a Jupyter notebook. The notebook may be shared as HTML or converted to a PDF. What's not to like? For the cases where I have to use PowerPoint (I really dislike this...) I can generate the figures as PNGs. Working on a reproducible path for that, but it is a low priority for me...

    One of my favorite quotes comes from the Johns Hopkins group. I think it originated with Karl Broman - "Your closest collaborator is you six months from now - and you don't respond to email." I put so much effort in this because inevitably a client comes to me and says, "do you remember the analysis you did for me 3 months ago? I'd like you to repeat it on these samples and compare the results." I have worked on so many projects in the interim that I have to say, "I really don't remember. But if you give me my report number I can retrieve my compendium and source code repository and repeat it within 15-30 minutes." 90% of the time I can.

  6. Free Radical (@Free_Radical1) here. From what I have learned while taking up Python:

    There's computation time, and there's developer time. FORTRAN and C languages are "closer to the silicon", compiled, and will do calculations faster than Python. Even using the fancy Python packages such as NumPy and SciPy, which make use of math routines constructed in these faster languages, Python will be slower. However, Python makes it easier for you to cobble together something that works. It also seems to be a popular choice for "data carpentry", where you collect, manipulate, visualize and interpret data (e.g. Pandas).

    I'm learning from experience that even though Python

    1. <-- grr I only half-deleted the last sentence; ignore

  7. Coding is coding, if you are a CS focused person. However, many graduate students leaving physics have passable FORTRAN experience but never learn how to actually code well, as in producing designed and documented source code that is maintainable and robust.

    That aside, FORTRAN to python is relatively painless. It is straight forward to pick up numpy libraries and get right into matrix manipulation. Moving into C++ is a bit tougher as the point of going right to C++ rather than canonical C would be to use the object oriented nature (sure, there are other non-OO aspects that are useful).