Hi there!
I’m Maiya, a rising sophomore at the University of Michigan, where I study biochemistry and mathematics. I am also a participant in the BME CUReS Cancer REU here at UT in Professor Pengyu Ren’s lab, where I am supervised by two wonderful graduate students: formally, by Brandon Walker, and informally, by Rui Qi.
My research in Prof. Ren’s lab tidily combines my two fields of interest (mathematics and biochemistry) into the fascinating field of computational molecular modeling. Now, though I am studying both, I am much more familiar with biochemistry, and while my computer skills are better than say, my parents, I had only passing experience with programming and scripting before I arrived; this is where practice has come in.
My mentor, Brandon, emailed me before I arrived in Austin to send a couple of papers and mention that most of the projects they were envisioning me working on involved a hefty amount of computer programming, mostly in Python. Of the papers he sent, one was biomedical in nature, the other three outlined the chemical force field that the lab uses, the general concepts behind molecular dynamic modeling, and an example of the use of molecular dynamics to model the free energy of binding of water. I was able to comfortably read and understand the first, but the other three required hefty amounts of Googling, Wikipedia-ing, and wading through unfamiliar words and long equations rife with unfamiliar symbols and words like “Fourier expansion”. Lacking programming experience, I dove into the free Codecademy Python course I found, and hoped for the best.
I arrived in Austin and on my first day, Brandon helped me set up a workstation. Seems straightforward enough, except the lab uses the Linux CentOS operating system, and I’d never used Linux, and had only seen a command line interface when I accidentally opened Terminal on my Mac. He emailed me a protocol and set me to replicating some data on a protein called MELK, a kinase that is upregulated in some cancer cells. I spent a lot of time Googling how to do such-and-such a task and slowly worked my way through the protocol.
One of the first tasks he gave me was writing a script to read a specific kind of molecular coordinate file called a Tinker xyz file and calculate the maximum dimensions of the protein described by the file. I just had to find the most positive and most negative x, y, and z coordinates and then use those to make a ‘box’ that would be filled with water molecules, ions, and eventually the protein. I spent hours just trying to figure out how to read the file, let alone processing the data in it. I ended up modifying the file and using a Python module that automatically reads the modified file to achieve my goal.
Over the last 5-ish weeks, I have seen a lot of change. My focus has turned from MELK to a protein called Aldolase, an enzyme that interconverts between fructose-1,6-biphosphate and glyceraldehyde-3-phosphate and dihydroxyacetone phosphate as a step in glycolysis. It has shown potential as a cancer drug target by acting as a regulator for a transcription factor called HIF-1a, which upregulates cell growth and glycolysis and therefore the production of ATP. Targeting Aldolase decreases the production of ATP, which acts as a feedback regulator of HIF-1a, decreasing both cell growth and further decreasing the production of ATP, which is a source of cellular energy.
More tangibly, my programming ability has improved. Though I am by no means an expert yet, through 5 weeks of writing a variety of scripts to do little tasks, debugging, improving, and more debugging, my programming abilities have improved immensely.
At the beginning, I knew enough about programming to be able to recognize when a task could maybe be more efficiently done with a short program, but not to execute. Now, I can at least plan out and start writing a script to achieve a task, even if I hit roadblocks along the way.
One of my most recent tasks was writing a program that reads what is called a Tinker arc file (essentially a bunch of Tinker xyz files all strung together), computes and records the distance between specific sets of atoms, and plots them against frames or as a frequency distribution. The arc file is the output of a molecular dynamics simulation and can be thought of as video film: a bunch of snapshots that when played sequentially let you see an approximation of what reality looks like. Like that first program I had to write, this has to read the Tinker xyz format, but in this case, I can’t just modify the file due to the file’s prohibitively large size. Encountering the same issue again, namely, how to read this file type, has given me the chance to put the small pieces of coding skills that I’ve developed over the last several weeks to use and allowed me to tangibly see my own improvement.
Every script I write, every time I use a remote secure server host to access the lab cluster from somewhere else, every time I use the terminal interface instead of using Finder on my laptop to make folders or move or rename files, every time I have to check Google or Stack Exchange or a module’s documentation or tutorials…
I know I am improving as a programmer, as an engineer, and as a scientist.
I know I am improving because each little task amounts to practice, and each little snippet of practice sums to growth and brings me a baby step closer to expertise.
-Maiya Yu, University of Michigan