The oldest debate among programmers who specialize in a certain
language has been on what is ‘the right tool for the job’. Over time,
programming languages have evolved, the initial 2 languages used in the
industry have grown to hundreds. With languages specializing in processing
data, hardware concurrency etc. It can be difficult to choose the right
language for a project, especially when many programmers are only familiar with
very few languages. Most people would agree that programming languages were developed
with various concepts in mind. However, some languages impact not only the
coding process, but also the resulting product, its maintenance and how easily
testable it is.
Computer Scientists at the University of California-Davis have
published a study on the effect of programming languages on the quality of the
software. They studied a large data set from Github: 729 projects with 80
million source lines of code, 29,000 authors and 1.5 million commits in 17
languages. They used a “mixed-methods” approach: visualization, and text
analytics to study the effects of language features on software quality such
as: static vs dynamic typing, strong vs. weak typing. By controlling for
effects such as: team size, project size, and project history. They compared similar
source code in different languages to find how well the project has improved
over time and its defects. The research paper concludes that languages have a
significant but modest effect on the product’s quality, which seems common
sense to any programmer.
For example, advocates for strong static typing argue that type
inference helps in catching software bugs early. Advocates for dynamic typing
may argue that a lot of time can be spent in correcting type errors when some
languages can figure out the type at runtime. However, the paper finds that
strong typing is better than weak typing and functional programming is better
than procedural programing.
The research paper studied the number of defects and bugs that came up
in projects with similar history in different languages. The defects were then
grouped into categories. They created a keyword search tool to organize the
defects. They also manually sifted through the change logs to find the bugs
that were fixed and compare them to their defects database. They then use heat
maps to study the relationship of defect types and language. The paper lists
the various projects and the languages used. Popular source code was analyzed
including: the linux kernel, android libraries, git, bitcoin, bootstrap, jqeury,
node etc.
The main defects that were found: 88.53% generic programming errors
that are mainly language specific (exception handling, type error, refactoring
etc.), 5% incorrect memory management, 4% failure errors (reboot/crash). They
found that the language that created the most defects was C/C++, and the
language that created the least defects was Scala.
It is interesting that some languages create more bugs than others.
Many programmers are familiar with certain languages based on what they learned
in school and in the industry. I think it is difficult to study open source
because people may be working on a different language at work and bring their
errors with them to their open source work. From experience, I can say that
c/c++ memory management model takes getting used, and someone who works on a different
language during their work week will create bugs if they are working on c++
open source project on weekends.
Link:
http://macbeth.cs.ucdavis.edu/lang_study.pdf
No comments:
Post a Comment