Turns out (despite previous reports) that Perl does not suck the most, JavaScript does. Also, Ruby is by far the hackiest language on the planet ... keep reading to see why.
Methodology
As in the first study, all data were collected from search results retrieved via Google's Code Search. For each target language, three pieces of information were initially gathered:
Total Files
An approximation of the language's footprint in Google's database (and thus its popularity). Determined by one of the following queries: lang:<language-name>, lang:"<language-name>", or file:.*\.ext where ext is the file extension of that language's source code files.
Hacks
Measure of a languages hackiness. Determined by one of the following: lang:<language-name> hack or lang:"<language-name>" hack
Sucks
Measure of a languages suckiness. Determined by one of the following:lang:<language-name> sucks or lang:"<language-name>" sucks
When choosing between two queries, the one with the larger number of hits is kept. For example, there are approximately 4.4 million hits for lang:c, and 4.53 million hits for lang:"c". In this case, the latter number is retained.
Collected Data
Popular Languages:
Language | Total Files | Hacks | Sucks |
---|---|---|---|
C | 4,530,000 | 224,000 | 11,300 |
C++ | 847,000 | * 2,700 | 3,000 |
C# | 120,000 | 2,000 | 50 |
Fortran | 115,000 | 400 | 20 |
Java | 830,000 | 10,400 | 500 |
JavaScript | * 22,700 | 600 | 100 |
Lisp | * 36,000 | 600 | 100 |
Perl | 208,000 | 14,200 | 400 |
PHP | 580,000 | 14,200 | 300 |
Python | 326,000 | 400 | 300 |
Ruby | 15,600 | 2,000 | 50 |
Shell | 80,600 | 4,000 | 50 |
Visual Basic | * 29,900 | 400 | 50 |
Unpopular Languages:
Each of these languages have a footprint of less than 1,000 total files. These statistically insignificant outliers will not be considered during subsequent analysis.
Language | Total Files | Hacks | Sucks |
---|---|---|---|
ADA | 100 | 50 | 0 |
COBOL | 150 | 0 | 0 |
Pascal | * 600 | 100 | 3 |
SmallTalk | * 400 | 100 | 6 |
* Values for starred entries were collected as follows:
- C++ Hacks: This value is a composite of three queries each starting with lang:"c++" - little\shack (300), dirty\shack (400), and ugly\shack (2,000).
- JavaScript: Searching for lang:"javascript" returns only 200 results, while lang:"javascript" div returns 22,700.
- Lisp file count: Like JS, to get a reasonable count, used lang:"lisp" off instead of lang:"lisp" (only 400).
- Pascal: lang:pascal has only 300 hits, while lang:pascal const has 300.
- SmallTalk: lang:smalltalk has only 100 hits, while lang:smalltalk dir has 400.
- Visual Basic: lang:basic has only 400 hits while lang:basic def has 29,900 hits.
Inferred Data
To analyze this data, the following metrics are helpful:
Hack Ratio
The number of "hack" results multiplied by 1,000 and divided by the total number of files.
Suck Ratio
The number of "sucks" results multiplied by 1,000 and divided by the total number of files.
For example, the Hack Ratio of PHP is 14,200 * 1,000 / 580,000 = 24.48.
Languages Sorted by Hack Ratio
Rank | Language | Hack Ratio |
---|---|---|
1 | Ruby | 128.21 |
2 | Perl | 68.27 |
3 | Shell | 49.63 |
4 | C | 49.45 |
5 | JavaScript | 26.43 |
6 | PHP | 24.48 |
7 | C# | 16.67 |
8 | Lisp | 16.67 |
9 | Visual Basic | 13.38 |
10 | Java | 12.53 |
11 | Fortran | 3.48 |
12 | C++ | 3.19 |
13 | Python | 1.23 |
Languages Sorted by Suck Ratio
Rank | Language | Suck Ratio |
---|---|---|
1 | JavaScript | 4.41 |
2 | C++ | 3.54 |
3 | Ruby | 3.21 |
4 | Lisp | 2.78 |
5 | C | 2.49 |
6 | Perl | 1.92 |
7 | Visual Basic | 1.67 |
8 | Python | 0.92 |
9 | Shell | 0.62 |
10 | Java | 0.6 |
11 | PHP | 0.52 |
12 | C# | 0.42 |
13 | Fortran | 0.17 |
Analysis
Graphing Suck Ratio as a function of Hack Ratio gives us an estimate of the value of hackiness as a measure of suckiness in a language (click for larger image).
Clearly there is generally a positive trend between the two metrics. Languages with higher Hack Ratios tend to also have higher Suck Ratios.
This means one can expect a language with a low Hack Ratio to tend not to suck, and likewise, a language with a low Suck Ratio will probably require fewer hacks.
However, as the ratios increase, the strength of the relationship decreases. This leads to notable exceptions such as C++, which has a high Suck Ratio, but comparitively low Hack Ratio.
Conclusions
It would seem that the foregone conclusions in the previous study were premature. With a Suck Ratio of 4.41, JavaScript is over twice as sucky as Perl, which has a Suck Ratio of just 1.92.
According to these findings, Ruby is the hackiest language of all, with a Hack Ratio of 128.21. In fact, it's nearly twice as hacky as its nearest competitor, Perl (with a Hack Ratio of just 68.27).
There's clearly a need for more research in this area, as the field of "statistical inference of the relative virtues of programming languages" is still in its infancy.
dzone this article
12 comments:
Thanks! There's definitely room for more study.
Without scripting a data-collection/analysis program, it's hard to distinguish between sentiments indicating "this language sucks" or "this library sucks" or "my life sucks" etc.
I did however find that some languages had higher incidences of "dirty hack", "ugly hack" or "little hack". There may yet be stronger correlation of suckiness to one of these phrases, instead of the vanilla "hack".
Thanks for reading!
A note on Smalltalk stats - and mind you, I'm not claiming that Smalltalk has some "massive" use you aren't seeing. However, this kind of search will understate the level of usage, for reasons I explain here:
http://www.cincomsmalltalk.com/blog/blogView?showComments=true&entry=3338708546
James Robertson
Hmm
The link got truncated. How about looking here then :)
James Robertson
I agree with James' comment. I'm not sure how you would get a file number for Smalltalk, but I dont think it is accurately represented, since much (most?) of the shared code for the Visualworks community is stored in a code repository.
Yet more Smalltalk Detail.
For Cincom Smalltalk, have a look at this page which gives a few details for the public store (a source code repository). Most people doing open work in CST use that. 475 packages listed there.
Then there's Squeak Map, and that's where the Squeak community puts an awful lot of stuff. I don't "live" in that community, so it may miss a lot. In any case, 669 listed there.
Then there are other dialects, and I have no idea how they share code - but it's not via source files, as in, say, Java.
Thanks jarober, those are some good point about SmallTalk's popularity.
I should probably clarify that by the total-files-on-google measure, only open source projects which Google happens to serve will be found.
SmallTalk has a small footprint in terms of Google's code search, so it was omitted from the hackiness/suckiness analysis. Its inclusion could have been misleading.
On that note, I'd bet that Visual Basic would probably have a much larger footprint in a non-open source arena. Though I'm unsure how this would affect its hackiness or suckiness.
Nice post.
For JavaScript, i suspect "suck" is often preceded or followed by "ie", as in "internet explorer sucks".
Since JS code interacts with a wide range of browsers, the suckage and hackage frequently refer to that rendering environment, not the language itself.
This seems fundamentally different than C++ or something compiled -- rare in those cases to say "safari sucks" or "hack for netscape bug"
perhaps fodder for a subsequent study?
thanks again,
nate
This is great, I think you should continue to expand and refine your survey!
Please also include the names of each language with the datapoints for your graph.
In the vein of www.thedailywtf.com you could also include a 'wtf' category :)
Done deal - Graph now has labels as well as linear regression line.
Your methodology still has some flaws. I searched for C# sucks in Google Code Search and got about 100 results, and just like the fifth result had this:
[...]Unix truly sucks */
And the line with C# was :
/* A Unicode escape, as in C# (though we only permit them in strings and characters, not arbitrarily in the source code.) */
I would like to add that this was a C file
"C# sucks" returns 1 result
Visual C# sucks here...gives us bogus warnings about the fields being unusable unless we set them here
Whereas if you search "Perl sucks", you get more meaningful reasults like
Perl sucks sometimes
Make it faster. If Perl sucks port it to another language
perl sucks at this type of test
Mom, perl sucks!
Indeed it is. Perl sucks.
argh! perl sucks with such constructs, very errorprone...
Do you can write anything else about it? Great article!
You forgot the D Programming Language, so I looked it up for you.
"file:\.d$" 68,000 hits.
"lang:d hack" 100 hits.
"lang:d sucks" 6 hits.
This leads to the following ranks:
Hack Rank: 13, just before Python
Suck Ratio: 14!
Not bad, I think! :)
Post a Comment