Hello, Rosalind

Learning about bioinformatics

Posted by David Haley on June 16, 2023 · 3 mins read

#bioinformatics  ·  #software

I like learning new things. My friend Lynn encouraged me to learn about bioinformatics, the convergence of Software (informatics) and Life Sciences (bio). I know a thing or two about software, but my science here dates back to high school and the French Baccalauréat (sort of equivalent to USA AP-level courses). I remember good ol’ ACGT and not a whole lot more.

There’s so much to learn out there; I gather ‘Serious Business’ is done with sophisticated tools such as Nextflow. That’s too advanced for me now so I wanted to start simpler.

I found Rosalind, a gamified bioinformatics learning tool with over a hundred problems. It introduces the biological concept and presents a programming problem to go with it.

The software problems have been pretty trivial (counting, replacing, and reversing characters). It’s mostly been an exercise getting my environment set up and learning how to Kotlin even. (I chose Kotlin, not the more common Python, because I felt like it and I already know Python.)

I’m using BioKotlin, by the Buckler Lab at Cornell University. It’s inspired by BioPython which as been around since 2006 (!!).

I tried using IntelliJ’s build mode but couldn’t figure out how to add a dependency (I promise, I know what I’m doing… mostly…). So I switched to a gradle project.

I wanted to easily run solutions against the problem’s sample data (as a constant) or a text input file, so my DNA solution looks something like this:

import biokotlin.seq.NUC
import biokotlin.seq.Seq

fun main(args: Array<String>) {
  val contents = if (args.isNotEmpty()) {
    java.io.File(args[0]).readText().trim()
  }
  else {
    "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"
  }

  val seq = Seq(contents)
  println("the solution")
}

Honestly, the hardest part was figuring out how to run the darn thing on the command line. In Python (venv) and Ruby (bundle) it’s easy to “just run it” with dependencies loaded up & ready. But this doesn’t work:

$ kotlin -cp build/classes/kotlin/main/ DnaKt
Exception in thread "main" java.lang.NoClassDefFoundError: biokotlin/seq/SeqInterfaces
	at DnaKt.main(Dna.kt:12)
    [...]

I ended up changing the build.gradle.kts file to this:

application {
    mainClass.set((if (project.hasProperty("mainClass")) project.property("mainClass") else "NULL") as String)
}

which allows me to run it like so:

$ gradle -PmainClass=DnaKt run --args="inputs/rosalind_dna.txt"

There’s probably a better way of doing this but here we are. I’ve solved the first 3 problems: DNA, RNA transcription, and reverse-complementing a sequence. Here’s a cool visualization of my “experience points” so far!

Halfway to level 4!