Connect with CINECA - Amanjeev Sethi

Meet the CINECA team and see the people behind the scenes as part of our Connect with CINECA series.

amanjeev-sethi.jpg

Aman is a member of the CanDIG team in Toronto, Canada, and a CINECA Work Package 2 contributor. He is working in the area of privacy within federated/distributed systems. His formal training and education are in computer science and he has spent most of his working years as a scientific software engineer. To CINECA WP2 he brings expertise in the requirements of the CanDIG project concerning the Authentication and Authorization Infrastructure which are required to achieve data sharing across Canadian institutions, and working towards data sharing across the globe. [See Michal Procházka‘s blog on Integration of new cohort infrastructures to the ELIXIR AAI here].

What are you particularly passionate about in your research?

As a scientific programmer I have been building tools, systems, and infrastructure to support the research teams working in genomics. As a result, I have had the opportunity to move around different areas to solve a variety of problems. At the same time, I enjoy some of the problems unique to the intersection of computer science and genomics. Being in the genomics field allows me to learn about scientific progress and provide my expertise to facilitate it.

To me, the essential aspect of the work I do is to improve the state of scientific software development and infrastructure which has different requirements and needs when compared with most commercial counterparts. It is capturing this requirement set and automating the elements that we can automate that I am interested in. I am also heavily interested in privacy and security within the distributed systems that work with human genetic data.

Have you changed the way you work over the years? If so, how?

Over the years I have learned these gems, and they have changed my outlook and work ethic -

“People systems” matter more than “Machine systems”

In the field of software development, it is possible to lose sight of why we are building something. The way we organise in our group, the way we hire in our teams, contributes to what gets built in no small degree. See Conway’s Law.

“Readability matters, almost always, more than the clever solution”

We deal with complexity five out of seven days, at least. This is why many of us fall prey to complex thinking, and it shows up in our writing, whether it is software, documentation, or research. Now I try to write simpler code and use tools that lower the barrier to create documentation. At the same time, learning about various ways to document your work is one of the best tools in my arsenal.

If someone was about to start in your field, what are the top things they should know?

  • Computer Science, as a field, can be acutely mathematical, specifically at the research level.

  • Tools of the trade are still evolving, and there is no one-size-fits-all. Taking programming languages as an example; knowing the strengths and weaknesses of the various programming languages is beneficial in problem-solving, and enables you to choose the appropriate language for a given task. Similarly, being able to understand the representation of data in your computer/program (data structures) and the process of your solution as it works (algorithm) on that data is an excellent asset to solution-finding.

  • Distributed computing is a sub-field within computer science. It is the interconnected (networked) computers, likely separated geographically, working together to solve problems and produce results. There are many flavours of distributed computing, but amusingly the most crucial point that ties all of them together is the speed of light being finite. You see, if the speed of electromagnetic waves were infinite, then communication between the computers would likely be instantaneous. Granted that if that were the case, we would not recognise the world we live in (I hear some Physicists gasp aloud), but at least we would probably not have to worry about inventing algorithms to keep two talking computers consistent with each other.

  • One has to understand which parts of the system are not a computer problem. At CanDIG, our focus is on controlled participation of two or more large systems to share and crunch data. The problem we are solving is not only profoundly technical but also an organisational one. Think about data sharing between two or more institutions; these institutions need a way to encode trust so the machines can work in an automated fashion for Dr. Doe who works in one of these institutions and wants to analyze the data from other participating institutions. That is, for Dr. Doe, the data and the analyses should seamlessly converge. However, code-of-trust has to be established by humans first.

Do you have any predictions for the future of your research area?

One of the predictions that I can make without hesitation is that more of the students and researchers in computer science will care about the ethics of what we study and create than was perhaps traditionally the case. As more and more fields become -omics, the need to incorporate ethical computing becomes critical. This is especially true for scientists dealing with biological data of human beings. We are already witnessing institutions like Harvard planning to add ethics courses to their computer science curriculum.

What are some of the questions that people should be asking you, but aren’t?

People who are involved in software engineering within the scientific community should be asking how can newer, safer programming languages and languages that are strongly-typed and statically-typed (with type-inference, of course) help create more robust and more reliable applications. Languages such as Python provide us with the ability to develop proof-of-concept blazing fast. However, because Python does not require variable types to be clearly defined in advance, it can be harder to identify some bugs in Python programs than with a strongly typed language.