Skip to content
Snippets Groups Projects
Commit 76c076f8 authored by Dmitriy Zaporozhets's avatar Dmitriy Zaporozhets
Browse files

Merge tag 'v2.9.5' of https://github.com/github/linguist

Linguist 2.9.5
parents c3d6fc5a a00967dd
No related branches found
No related tags found
No related merge requests found
Showing
with 42824 additions and 1137 deletions
source :rubygems
source 'https://rubygems.org'
gemspec
Copyright (c) 2011 GitHub, Inc.
Copyright (c) 2011-2013 GitHub, Inc.
 
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
Loading
Loading
Loading
Loading
@@ -8,44 +8,38 @@ We use this library at GitHub to detect blob languages, highlight code, ignore b
 
Linguist defines the list of all languages known to GitHub in a [yaml file](https://github.com/github/linguist/blob/master/lib/linguist/languages.yml). In order for a file to be highlighted, a language and lexer must be defined there.
 
Most languages are detected by their file extension. This is the fastest and most common situation. For script files, which are usually extensionless, we do "deep content inspection"™ and check the shebang of the file. Checking the file's contents may also be used for disambiguating languages. C, C++ and Obj-C all use `.h` files. Looking for common keywords, we are usually able to guess the correct language.
Most languages are detected by their file extension. This is the fastest and most common situation.
For disambiguating between files with common extensions, we use a [Bayesian classifier](https://github.com/github/linguist/blob/master/lib/linguist/classifier.rb). For an example, this helps us tell the difference between `.h` files which could be either C, C++, or Obj-C.
 
In the actual GitHub app we deal with `Grit::Blob` objects. For testing, there is a simple `FileBlob` API.
 
Linguist::FileBlob.new("lib/linguist.rb").language.name #=> "Ruby"
```ruby
Linguist::FileBlob.new("lib/linguist.rb").language.name #=> "Ruby"
 
Linguist::FileBlob.new("bin/linguist").language.name #=> "Ruby"
Linguist::FileBlob.new("bin/linguist").language.name #=> "Ruby"
```
 
See [lib/linguist/language.rb](https://github.com/github/linguist/blob/master/lib/linguist/language.rb) and [lib/linguist/languages.yml](https://github.com/github/linguist/blob/master/lib/linguist/languages.yml).
 
### Syntax Highlighting
 
The actual syntax highlighting is handled by our Pygments wrapper, [Albino](https://github.com/github/albino). Linguist provides a [Lexer abstraction](https://github.com/github/linguist/blob/master/lib/linguist/lexer.rb) that determines which highlighter should be used on a file.
We typically run on a prerelease version of Pygments to get early access to new lexers. The [lexers.yml](https://github.com/github/linguist/blob/master/lib/linguist/lexers.yml) file is a dump of the lexers we have available on our server. If there is a new lexer in pygments-main not on the list, [open an issue](https://github.com/github/linguist/issues) and we'll try to upgrade it soon.
### MIME type detection
Most of the MIME types handling is done by the Ruby [mime-types gem](https://github.com/halostatue/mime-types/blob/master/lib/mime/types.rb.data). But we have our own list of additions and overrides. To add or modify this list, see [lib/linguist/mimes.yml](https://github.com/github/linguist/blob/master/lib/linguist/mimes.yml).
MIME types are used to set the Content-Type of raw binary blobs which are served from a special `raw.github.com` domain. However, all text blobs are served as `text/plain` regardless of their type to ensure they open in the browser rather than downloading.
The MIME type also determines whether a blob is binary or plain text. So if you're seeing a blob that says "View Raw" and it is actually plain text, the mime type and encoding probably needs to be explicitly stated.
The actual syntax highlighting is handled by our Pygments wrapper, [pygments.rb](https://github.com/tmm1/pygments.rb). It also provides a [Lexer abstraction](https://github.com/tmm1/pygments.rb/blob/master/lib/pygments/lexer.rb) that determines which highlighter should be used on a file.
 
Linguist::FileBlob.new("linguist.zip").binary? #=> true
See [lib/linguist/mimes.yml](https://github.com/github/linguist/blob/master/lib/linguist/mimes.yml).
We typically run on a pre-release version of Pygments, [pygments.rb](https://github.com/tmm1/pygments.rb), to get early access to new lexers. The [languages.yml](https://github.com/github/linguist/blob/master/lib/linguist/languages.yml) file is a dump of the lexers we have available on our server.
 
### Stats
 
The [Language Graph](https://github.com/github/linguist/graphs/languages) is built by aggregating the languages of all repo's blobs. The top language in the graph determines the project's primary language. Collectively, these stats make up the [Top Languages](https://github.com/languages) page.
The Language Graph you see on every repository is built by aggregating the languages of all repo's blobs. The top language in the graph determines the project's primary language. Collectively, these stats make up the [Top Languages](https://github.com/languages) page.
 
The repository stats API can be used on a directory:
 
project = Linguist::Repository.from_directory(".")
project.language.name #=> "Ruby"
project.languages #=> { "Ruby" => 0.98,
"Shell" => 0.02 }
```ruby
project = Linguist::Repository.from_directory(".")
project.language.name #=> "Ruby"
project.languages #=> { "Ruby" => 0.98, "Shell" => 0.02 }
```
 
These stats are also printed out by the binary. Try running `linguist` on itself:
 
Loading
Loading
@@ -56,21 +50,27 @@ These stats are also printed out by the binary. Try running `linguist` on itself
 
Checking other code into your git repo is a common practice. But this often inflates your project's language stats and may even cause your project to be labeled as another language. We are able to identify some of these files and directories and exclude them.
 
Linguist::FileBlob.new("vendor/plugins/foo.rb").vendored? # => true
```ruby
Linguist::FileBlob.new("vendor/plugins/foo.rb").vendored? # => true
```
 
See [Linguist::BlobHelper#vendored?](https://github.com/github/linguist/blob/master/lib/linguist/blob_helper.rb) and [lib/linguist/vendor.yml](https://github.com/github/linguist/blob/master/lib/linguist/vendor.yml).
 
#### Generated file detection
 
Not all plain text files are true source files. Generated files like minified js and compiled CoffeeScript can be detected and excluded from language stats. As an extra bonus, these files are suppressed in Diffs.
Not all plain text files are true source files. Generated files like minified js and compiled CoffeeScript can be detected and excluded from language stats. As an extra bonus, these files are suppressed in diffs.
 
Linguist::FileBlob.new("underscore.min.js").generated? # => true
```ruby
Linguist::FileBlob.new("underscore.min.js").generated? # => true
```
 
See [Linguist::BlobHelper#generated?](https://github.com/github/linguist/blob/master/lib/linguist/blob_helper.rb).
See [Linguist::Generated#generated?](https://github.com/github/linguist/blob/master/lib/linguist/generated.rb).
 
## Installation
 
To get it, clone the repo and run [Bundler](http://gembundler.com/) to install its dependencies.
github.com is usually running the latest version of the `github-linguist` gem that is released on [RubyGems.org](http://rubygems.org/gems/github-linguist).
But for development you are going to want to checkout out the source. To get it, clone the repo and run [Bundler](http://gembundler.com/) to install its dependencies.
 
git clone https://github.com/github/linguist.git
cd linguist/
Loading
Loading
@@ -80,17 +80,16 @@ To run the tests:
 
bundle exec rake test
 
*Since this code is specific to GitHub, is not published as a official rubygem.*
## Contributing
 
If you are seeing errors like `StandardError: could not find any magic files!`, it means the CharlockHolmes gem didn’t install correctly. See the [installing section](https://github.com/brianmario/charlock_holmes/blob/master/README.md) of the CharlockHolmes README for more information.
The majority of patches won't need to touch any Ruby code at all. The [master language list](https://github.com/github/linguist/blob/master/lib/linguist/languages.yml) is just a configuration file.
 
## Contributing
We try to only add languages once they have some usage on GitHub, so please note in-the-wild usage examples in your pull request.
Almost all bug fixes or new language additions should come with some additional code samples. Just drop them under [`samples/`](https://github.com/github/linguist/tree/master/samples) in the correct subdirectory and our test suite will automatically test them. In most cases you shouldn't need to add any new assertions.
### Testing
Sometimes getting the tests running can be too much work, especially if you don't have much Ruby experience. It's okay, be lazy and let our build bot [Travis](http://travis-ci.org/#!/github/linguist) run the tests for you. Just open a pull request and the bot will start cranking away.
 
1. Fork it.
2. Create a branch (`git checkout -b detect-foo-language`)
3. Make your changes
4. Run the tests (`bundle install` then `bundle exec rake`)
5. Commit your changes (`git commit -am "Added detection for the new Foo language"`)
6. Push to the branch (`git push origin detect-foo-language`)
7. Create a [Pull Request](http://help.github.com/pull-requests/) from your branch.
8. Promote it. Get others to drop in and +1 it.
Here's our current build status, which is hopefully green: [![Build Status](https://secure.travis-ci.org/github/linguist.png?branch=master)](http://travis-ci.org/github/linguist)
require 'rake/clean'
require 'rake/testtask'
 
task :default => :test
 
Rake::TestTask.new do |t|
t.warning = true
Rake::TestTask.new
task :samples do
require 'linguist/samples'
require 'yajl'
data = Linguist::Samples.data
json = Yajl::Encoder.encode(data, :pretty => true)
File.open('lib/linguist/samples.json', 'w') { |io| io.write json }
end
namespace :classifier do
LIMIT = 1_000
desc "Run classifier against #{LIMIT} public gists"
task :test do
require 'linguist/classifier'
require 'linguist/samples'
total, correct, incorrect = 0, 0, 0
$stdout.sync = true
each_public_gist do |gist_url, file_url, file_language|
next if file_language.nil? || file_language == 'Text'
begin
data = open(file_url).read
guessed_language, score = Linguist::Classifier.classify(Linguist::Samples::DATA, data).first
total += 1
guessed_language == file_language ? correct += 1 : incorrect += 1
print "\r\e[0K%d:%d %g%%" % [correct, incorrect, (correct.to_f/total.to_f)*100]
$stdout.flush
rescue URI::InvalidURIError
else
break if total >= LIMIT
end
end
puts ""
end
def each_public_gist
require 'open-uri'
require 'json'
url = "https://api.github.com/gists/public"
loop do
resp = open(url)
url = resp.meta['link'][/<([^>]+)>; rel="next"/, 1]
gists = JSON.parse(resp.read)
for gist in gists
for filename, attrs in gist['files']
yield gist['url'], attrs['raw_url'], attrs['language']
end
end
end
end
end
#!/usr/bin/env ruby
 
# linguist — detect language type for a file, or, given a directory, determine language breakdown
#
# usage: linguist <path>
require 'linguist/file_blob'
require 'linguist/repository'
 
Loading
Loading
@@ -23,12 +27,11 @@ elsif File.file?(path)
 
puts "#{blob.name}: #{blob.loc} lines (#{blob.sloc} sloc)"
puts " type: #{type}"
puts " extension: #{blob.pathname.extname}"
puts " mime type: #{blob.mime_type}"
puts " language: #{blob.language}"
 
if blob.large?
puts " blob is to large to be shown"
puts " blob is too large to be shown"
end
 
if blob.generated?
Loading
Loading
Gem::Specification.new do |s|
s.name = 'linguist'
s.version = '1.0.0'
s.name = 'github-linguist'
s.version = '2.9.5'
s.summary = "GitHub Language detection"
 
s.authors = "GitHub"
s.authors = "GitHub"
s.homepage = "https://github.com/github/linguist"
 
s.files = Dir['lib/**/*']
s.executables << 'linguist'
 
s.add_dependency 'charlock_holmes', '~> 0.6.6'
s.add_dependency 'escape_utils', '~> 0.2.3'
s.add_dependency 'mime-types', '~> 1.18'
s.add_dependency 'pygments.rb', '~> 0.2.11'
s.add_dependency 'escape_utils', '~> 0.3.1'
s.add_dependency 'mime-types', '~> 1.19'
s.add_dependency 'pygments.rb', '~> 0.5.2'
s.add_development_dependency 'mocha'
s.add_development_dependency 'json'
s.add_development_dependency 'rake'
s.add_development_dependency 'yajl-ruby'
end
require 'linguist/blob_helper'
require 'linguist/generated'
require 'linguist/language'
require 'linguist/mime'
require 'linguist/pathname'
require 'linguist/repository'
require 'linguist/samples'
require 'linguist/generated'
require 'linguist/language'
require 'linguist/mime'
require 'linguist/pathname'
 
require 'charlock_holmes'
require 'escape_utils'
require 'mime/types'
require 'pygments'
require 'yaml'
 
module Linguist
# DEPRECATED Avoid mixing into Blob classes. Prefer functional interfaces
# like `Language.detect` over `Blob#language`. Functions are much easier to
# cache and compose.
#
# Avoid adding additional bloat to this module.
#
# BlobHelper is a mixin for Blobish classes that respond to "name",
# "data" and "size" such as Grit::Blob.
module BlobHelper
# Internal: Get a Pathname wrapper for Blob#name
#
# Returns a Pathname.
def pathname
Pathname.new(name || "")
end
# Public: Get the extname of the path
#
# Examples
Loading
Loading
@@ -27,7 +26,23 @@ module Linguist
#
# Returns a String
def extname
pathname.extname
File.extname(name.to_s)
end
# Internal: Lookup mime type for extension.
#
# Returns a MIME::Type
def _mime_type
if defined? @_mime_type
@_mime_type
else
guesses = ::MIME::Types.type_for(extname.to_s)
# Prefer text mime types over binary
@_mime_type = guesses.detect { |type| type.ascii? } ||
# Otherwise use the first guess
guesses.first
end
end
 
# Public: Get the actual blob mime type
Loading
Loading
@@ -39,7 +54,23 @@ module Linguist
#
# Returns a mime type String.
def mime_type
@mime_type ||= pathname.mime_type
_mime_type ? _mime_type.to_s : 'text/plain'
end
# Internal: Is the blob binary according to its mime type
#
# Return true or false
def binary_mime_type?
_mime_type ? _mime_type.binary? : false
end
# Internal: Is the blob binary according to its mime type,
# overriding it if we have better data from the languages.yml
# database.
#
# Return true or false
def likely_binary?
binary_mime_type? && !Language.find_by_filename(name)
end
 
# Public: Get the Content-Type header value
Loading
Loading
@@ -71,7 +102,7 @@ module Linguist
elsif name.nil?
"attachment"
else
"attachment; filename=#{EscapeUtils.escape_url(pathname.basename)}"
"attachment; filename=#{EscapeUtils.escape_url(File.basename(name))}"
end
end
 
Loading
Loading
@@ -90,15 +121,6 @@ module Linguist
@detect_encoding ||= CharlockHolmes::EncodingDetector.new.detect(data) if data
end
 
# Public: Is the blob binary according to its mime type
#
# Return true or false
def binary_mime_type?
if mime_type = Mime.lookup_mime_type_for(pathname.extname)
mime_type.binary?
end
end
# Public: Is the blob binary?
#
# Return true or false
Loading
Loading
@@ -132,23 +154,28 @@ module Linguist
#
# Return true or false
def image?
['.png', '.jpg', '.jpeg', '.gif'].include?(extname)
['.png', '.jpg', '.jpeg', '.gif'].include?(extname.downcase)
end
# Public: Is the blob a supported 3D model format?
#
# Return true or false
def solid?
extname.downcase == '.stl'
end
 
# Public: Is the blob a possible drupal php file?
# Public: Is this blob a CSV file?
#
# Return true or false
def drupal_extname?
['.module', '.install', '.test', '.inc'].include?(extname)
def csv?
text? && extname.downcase == '.csv'
end
 
# Public: Is the blob likely to have a shebang?
# Public: Is the blob a PDF?
#
# Return true or false
def shebang_extname?
extname.empty? &&
mode &&
(mode.to_i(8) & 05) == 05
def pdf?
extname.downcase == '.pdf'
end
 
MEGABYTE = 1024 * 1024
Loading
Loading
@@ -160,6 +187,28 @@ module Linguist
size.to_i > MEGABYTE
end
 
# Public: Is the blob safe to colorize?
#
# We use Pygments for syntax highlighting blobs. Pygments
# can be too slow for very large blobs or for certain
# corner-case blobs.
#
# Return true or false
def safe_to_colorize?
!large? && text? && !high_ratio_of_long_lines?
end
# Internal: Does the blob have a ratio of long lines?
#
# These types of files are usually going to make Pygments.rb
# angry if we try to colorize them.
#
# Return true or false
def high_ratio_of_long_lines?
return false if loc == 0
size / loc > 5000
end
# Public: Is the blob viewable?
#
# Non-viewable blobs will just show a "View Raw" link
Loading
Loading
@@ -190,7 +239,12 @@ module Linguist
#
# Returns an Array of lines
def lines
@lines ||= (viewable? && data) ? data.split("\n", -1) : []
@lines ||=
if viewable? && data
data.split(/\r\n|\r|\n/, -1)
else
[]
end
end
 
# Public: Get number of lines of code
Loading
Loading
@@ -211,153 +265,16 @@ module Linguist
lines.grep(/\S/).size
end
 
# Internal: Compute average line length.
#
# Returns Integer.
def average_line_length
if lines.any?
lines.inject(0) { |n, l| n += l.length } / lines.length
else
0
end
end
# Public: Is the blob a generated file?
#
# Generated source code is supressed in diffs and is ignored by
# Generated source code is suppressed in diffs and is ignored by
# language statistics.
#
# Requires Blob#data
#
# Includes:
# - XCode project XML files
# - Minified JavaScript
#
# Please add additional test coverage to
# `test/test_blob.rb#test_generated` if you make any changes.
# May load Blob#data
#
# Return true or false
def generated?
if xcode_project_file? || generated_net_docfile?
true
elsif generated_coffeescript? || minified_javascript?
true
elsif name == 'Gemfile.lock'
true
else
false
end
end
# Internal: Is the blob an XCode project file?
#
# Generated if the file extension is an XCode project
# file extension.
#
# Returns true of false.
def xcode_project_file?
['.xib', '.nib', '.pbxproj', '.xcworkspacedata', '.xcuserstate'].include?(extname)
end
# Internal: Is the blob minified JS?
#
# Consider JS minified if the average line length is
# greater then 100c.
#
# Returns true or false.
def minified_javascript?
return unless extname == '.js'
average_line_length > 100
end
# Internal: Is the blob JS generated by CoffeeScript?
#
# Requires Blob#data
#
# CoffeScript is meant to output JS that would be difficult to
# tell if it was generated or not. Look for a number of patterns
# outputed by the CS compiler.
#
# Return true or false
def generated_coffeescript?
return unless extname == '.js'
# CoffeeScript generated by > 1.2 include a comment on the first line
if lines[0] =~ /^\/\/ Generated by /
return true
end
if lines[0] == '(function() {' && # First line is module closure opening
lines[-2] == '}).call(this);' && # Second to last line closes module closure
lines[-1] == '' # Last line is blank
score = 0
lines.each do |line|
if line =~ /var /
# Underscored temp vars are likely to be Coffee
score += 1 * line.gsub(/(_fn|_i|_len|_ref|_results)/).count
# bind and extend functions are very Coffee specific
score += 3 * line.gsub(/(__bind|__extends|__hasProp|__indexOf|__slice)/).count
end
end
# Require a score of 3. This is fairly arbitrary. Consider
# tweaking later.
score >= 3
else
false
end
end
# Internal: Is this a generated documentation file for a .NET assembly?
#
# Requires Blob#data
#
# .NET developers often check in the XML Intellisense file along with an
# assembly - however, these don't have a special extension, so we have to
# dig into the contents to determine if it's a docfile. Luckily, these files
# are extremely structured, so recognizing them is easy.
#
# Returns true or false
def generated_net_docfile?
return false unless extname.downcase == ".xml"
return false unless lines.count > 3
# .NET Docfiles always open with <doc> and their first tag is an
# <assembly> tag
return lines[1].include?("<doc>") &&
lines[2].include?("<assembly>") &&
lines[-2].include?("</doc>")
end
# Public: Should the blob be indexed for searching?
#
# Excluded:
# - Files over 0.1MB
# - Non-text files
# - Langauges marked as not searchable
# - Generated source files
#
# Please add additional test coverage to
# `test/test_blob.rb#test_indexable` if you make any changes.
#
# Return true or false
def indexable?
if binary?
false
elsif language.nil?
false
elsif !language.searchable?
false
elsif generated?
false
elsif size > 100 * 1024
false
else
true
end
@_generated ||= Generated.generated?(name, lambda { data })
end
 
# Public: Detects the Language of the blob.
Loading
Loading
@@ -366,33 +283,15 @@ module Linguist
#
# Returns a Language or nil if none is detected
def language
if defined? @language
@language
return @language if defined? @language
if defined?(@data) && @data.is_a?(String)
data = @data
else
@language = guess_language
data = lambda { (binary_mime_type? || binary?) ? "" : self.data }
end
end
# Internal: Guess language
#
# Please add additional test coverage to
# `test/test_blob.rb#test_language` if you make any changes.
#
# Returns a Language or nil
def guess_language
return if binary_mime_type?
# Disambiguate between multiple language extensions
disambiguate_extension_language ||
# See if there is a Language for the extension
pathname.language ||
# Look for idioms in first line
first_line_language ||
 
# Try to detect Language from shebang line
shebang_language
@language = Language.detect(name.to_s, data, mode)
end
 
# Internal: Get the lexer of the blob.
Loading
Loading
@@ -402,269 +301,16 @@ module Linguist
language ? language.lexer : Pygments::Lexer.find_by_name('Text only')
end
 
# Internal: Disambiguates between multiple language extensions.
#
# Delegates to "guess_EXTENSION_language".
#
# Please add additional test coverage to
# `test/test_blob.rb#test_language` if you add another method.
#
# Returns a Language or nil.
def disambiguate_extension_language
if Language.ambiguous?(extname)
name = "guess_#{extname.sub(/^\./, '')}_language"
send(name) if respond_to?(name)
end
end
# Internal: Guess language of .cls files
#
# Returns a Language.
def guess_cls_language
if lines.grep(/^(%|\\)/).any?
Language['TeX']
elsif lines.grep(/^\s*(CLASS|METHOD|INTERFACE).*:\s*/i).any? || lines.grep(/^\s*(USING|DEFINE)/i).any?
Language['OpenEdge ABL']
elsif lines.grep(/\{$/).any? || lines.grep(/\}$/).any?
Language['Apex']
elsif lines.grep(/^(\'\*|Attribute|Option|Sub|Private|Protected|Public|Friend)/i).any?
Language['Visual Basic']
else
# The most common language should be the fallback
Language['TeX']
end
end
# Internal: Guess language of header files (.h).
#
# Returns a Language.
def guess_h_language
if lines.grep(/^@(interface|property|private|public|end)/).any?
Language['Objective-C']
elsif lines.grep(/^class |^\s+(public|protected|private):/).any?
Language['C++']
else
Language['C']
end
end
# Internal: Guess language of .m files.
#
# Objective-C heuristics:
# * Keywords
#
# Matlab heuristics:
# * Leading function keyword
# * "%" comments
#
# Returns a Language.
def guess_m_language
# Objective-C keywords
if lines.grep(/^#import|@(interface|implementation|property|synthesize|end)/).any?
Language['Objective-C']
# File function
elsif lines.first.to_s =~ /^function /
Language['Matlab']
# Matlab comment
elsif lines.grep(/^%/).any?
Language['Matlab']
# Fallback to Objective-C, don't want any Matlab false positives
else
Language['Objective-C']
end
end
# Internal: Guess language of .pl files
#
# The rules for disambiguation are:
#
# 1. Many perl files begin with a shebang
# 2. Most Prolog source files have a rule somewhere (marked by the :- operator)
# 3. Default to Perl, because it is more popular
#
# Returns a Language.
def guess_pl_language
if shebang_script == 'perl'
Language['Perl']
elsif lines.grep(/:-/).any?
Language['Prolog']
else
Language['Perl']
end
end
# Internal: Guess language of .r files.
#
# Returns a Language.
def guess_r_language
if lines.grep(/(rebol|(:\s+func|make\s+object!|^\s*context)\s*\[)/i).any?
Language['Rebol']
else
Language['R']
end
end
# Internal: Guess language of .t files.
#
# Returns a Language.
def guess_t_language
score = 0
score += 1 if lines.grep(/^% /).any?
score += data.gsub(/ := /).count
score += data.gsub(/proc |procedure |fcn |function /).count
score += data.gsub(/var \w+: \w+/).count
# Tell-tale signs its gotta be Perl
if lines.grep(/^(my )?(sub |\$|@|%)\w+/).any?
score = 0
end
if score >= 3
Language['Turing']
else
Language['Perl']
end
end
# Internal: Guess language of .v files.
#
# Returns a Language
def guess_v_language
if lines.grep(/^(\/\*|\/\/|module|parameter|input|output|wire|reg|always|initial|begin|\`)/).any?
Language['Verilog']
else
Language['Coq']
end
end
# Internal: Guess language of .gsp files.
#
# Returns a Language.
def guess_gsp_language
if lines.grep(/<%|<%@|\$\{|<%|<g:|<meta name="layout"|<r:/).any?
Language['Groovy Server Pages']
else
Language['Gosu']
end
end
# Internal: Guess language from the first line.
#
# Look for leading "<?php" in Drupal files
#
# Returns a Language.
def first_line_language
# Only check files with drupal php extensions
return unless drupal_extname?
# Fail fast if blob isn't viewable?
return unless viewable?
if lines.first.to_s =~ /^<\?php/
Language['PHP']
end
end
# Internal: Extract the script name from the shebang line
#
# Requires Blob#data
#
# Examples
#
# '#!/usr/bin/ruby'
# # => 'ruby'
#
# '#!/usr/bin/env ruby'
# # => 'ruby'
#
# '#!/usr/bash/python2.4'
# # => 'python'
#
# Please add additional test coverage to
# `test/test_blob.rb#test_shebang_script` if you make any changes.
#
# Returns a script name String or nil
def shebang_script
# Fail fast if blob isn't viewable?
return unless viewable?
if lines.any? && (match = lines[0].match(/(.+)\n?/)) && (bang = match[0]) =~ /^#!/
bang.sub!(/^#! /, '#!')
tokens = bang.split(' ')
pieces = tokens.first.split('/')
if pieces.size > 1
script = pieces.last
else
script = pieces.first.sub('#!', '')
end
script = script == 'env' ? tokens[1] : script
# python2.4 => python
if script =~ /((?:\d+\.?)+)/
script.sub! $1, ''
end
# Check for multiline shebang hacks that exec themselves
#
# #!/bin/sh
# exec foo "$0" "$@"
#
if script == 'sh' &&
lines[0...5].any? { |l| l.match(/exec (\w+).+\$0.+\$@/) }
script = $1
end
script
end
end
# Internal: Get Language for shebang script
#
# Returns the Language or nil
def shebang_language
# Skip file extensions unlikely to have shebangs
return unless shebang_extname?
if script = shebang_script
Language[script]
end
end
# Public: Highlight syntax of blob
#
# options - A Hash of options (defaults to {})
#
# Returns html String
def colorize(options = {})
return if !text? || large? || generated?
return unless safe_to_colorize?
options[:options] ||= {}
options[:options][:encoding] ||= encoding
lexer.highlight(data, options)
end
# Public: Highlight syntax of blob without the outer highlight div
# wrapper.
#
# options - A Hash of options (defaults to {})
#
# Returns html String
def colorize_without_wrapper(options = {})
if text = colorize(options)
text[%r{<div class="highlight"><pre>(.*?)</pre>\s*</div>}m, 1]
else
''
end
end
Language.overridden_extensions.each do |extension|
name = "guess_#{extension.sub(/^\./, '')}_language".to_sym
unless instance_methods.map(&:to_sym).include?(name)
warn "Language##{name} was not defined"
end
end
end
end
require 'linguist/tokenizer'
module Linguist
# Language bayesian classifier.
class Classifier
# Public: Train classifier that data is a certain language.
#
# db - Hash classifier database object
# language - String language of data
# data - String contents of file
#
# Examples
#
# Classifier.train(db, 'Ruby', "def hello; end")
#
# Returns nothing.
#
# Set LINGUIST_DEBUG=1 or =2 to see probabilities per-token,
# per-language. See also dump_all_tokens, below.
def self.train!(db, language, data)
tokens = Tokenizer.tokenize(data)
db['tokens_total'] ||= 0
db['languages_total'] ||= 0
db['tokens'] ||= {}
db['language_tokens'] ||= {}
db['languages'] ||= {}
tokens.each do |token|
db['tokens'][language] ||= {}
db['tokens'][language][token] ||= 0
db['tokens'][language][token] += 1
db['language_tokens'][language] ||= 0
db['language_tokens'][language] += 1
db['tokens_total'] += 1
end
db['languages'][language] ||= 0
db['languages'][language] += 1
db['languages_total'] += 1
nil
end
# Public: Guess language of data.
#
# db - Hash of classifier tokens database.
# data - Array of tokens or String data to analyze.
# languages - Array of language name Strings to restrict to.
#
# Examples
#
# Classifier.classify(db, "def hello; end")
# # => [ 'Ruby', 0.90], ['Python', 0.2], ... ]
#
# Returns sorted Array of result pairs. Each pair contains the
# String language name and a Float score.
def self.classify(db, tokens, languages = nil)
languages ||= db['languages'].keys
new(db).classify(tokens, languages)
end
# Internal: Initialize a Classifier.
def initialize(db = {})
@tokens_total = db['tokens_total']
@languages_total = db['languages_total']
@tokens = db['tokens']
@language_tokens = db['language_tokens']
@languages = db['languages']
end
# Internal: Guess language of data
#
# data - Array of tokens or String data to analyze.
# languages - Array of language name Strings to restrict to.
#
# Returns sorted Array of result pairs. Each pair contains the
# String language name and a Float score.
def classify(tokens, languages)
return [] if tokens.nil?
tokens = Tokenizer.tokenize(tokens) if tokens.is_a?(String)
scores = {}
if verbosity >= 2
dump_all_tokens(tokens, languages)
end
languages.each do |language|
scores[language] = tokens_probability(tokens, language) +
language_probability(language)
if verbosity >= 1
printf "%10s = %10.3f + %7.3f = %10.3f\n",
language, tokens_probability(tokens, language), language_probability(language), scores[language]
end
end
scores.sort { |a, b| b[1] <=> a[1] }.map { |score| [score[0], score[1]] }
end
# Internal: Probably of set of tokens in a language occurring - P(D | C)
#
# tokens - Array of String tokens.
# language - Language to check.
#
# Returns Float between 0.0 and 1.0.
def tokens_probability(tokens, language)
tokens.inject(0.0) do |sum, token|
sum += Math.log(token_probability(token, language))
end
end
# Internal: Probably of token in language occurring - P(F | C)
#
# token - String token.
# language - Language to check.
#
# Returns Float between 0.0 and 1.0.
def token_probability(token, language)
if @tokens[language][token].to_f == 0.0
1 / @tokens_total.to_f
else
@tokens[language][token].to_f / @language_tokens[language].to_f
end
end
# Internal: Probably of a language occurring - P(C)
#
# language - Language to check.
#
# Returns Float between 0.0 and 1.0.
def language_probability(language)
Math.log(@languages[language].to_f / @languages_total.to_f)
end
private
def verbosity
@verbosity ||= (ENV['LINGUIST_DEBUG'] || 0).to_i
end
# Internal: show a table of probabilities for each <token,language> pair.
#
# The number in each table entry is the number of "points" that each
# token contributes toward the belief that the file under test is a
# particular language. Points are additive.
#
# Points are the number of times a token appears in the file, times
# how much more likely (log of probability ratio) that token is to
# appear in one language vs. the least-likely language. Dashes
# indicate the least-likely language (and zero points) for each token.
def dump_all_tokens(tokens, languages)
maxlen = tokens.map { |tok| tok.size }.max
printf "%#{maxlen}s", ""
puts " #" + languages.map { |lang| sprintf("%10s", lang) }.join
tokmap = Hash.new(0)
tokens.each { |tok| tokmap[tok] += 1 }
tokmap.sort.each { |tok, count|
arr = languages.map { |lang| [lang, token_probability(tok, lang)] }
min = arr.map { |a,b| b }.min
minlog = Math.log(min)
if !arr.inject(true) { |result, n| result && n[1] == arr[0][1] }
printf "%#{maxlen}s%5d", tok, count
puts arr.map { |ent|
ent[1] == min ? " -" : sprintf("%10.3f", count * (Math.log(ent[1]) - minlog))
}.join
end
}
end
end
end
module Linguist
class Generated
# Public: Is the blob a generated file?
#
# name - String filename
# data - String blob data. A block also maybe passed in for lazy
# loading. This behavior is deprecated and you should always
# pass in a String.
#
# Return true or false
def self.generated?(name, data)
new(name, data).generated?
end
# Internal: Initialize Generated instance
#
# name - String filename
# data - String blob data
def initialize(name, data)
@name = name
@extname = File.extname(name)
@_data = data
end
attr_reader :name, :extname
# Lazy load blob data if block was passed in.
#
# Awful, awful stuff happening here.
#
# Returns String data.
def data
@data ||= @_data.respond_to?(:call) ? @_data.call() : @_data
end
# Public: Get each line of data
#
# Returns an Array of lines
def lines
# TODO: data should be required to be a String, no nils
@lines ||= data ? data.split("\n", -1) : []
end
# Internal: Is the blob a generated file?
#
# Generated source code is suppressed in diffs and is ignored by
# language statistics.
#
# Please add additional test coverage to
# `test/test_blob.rb#test_generated` if you make any changes.
#
# Return true or false
def generated?
name == 'Gemfile.lock' ||
minified_files? ||
compiled_coffeescript? ||
xcode_project_file? ||
generated_parser? ||
generated_net_docfile? ||
generated_net_designer_file? ||
generated_protocol_buffer?
end
# Internal: Is the blob an XCode project file?
#
# Generated if the file extension is an XCode project
# file extension.
#
# Returns true of false.
def xcode_project_file?
['.xib', '.nib', '.storyboard', '.pbxproj', '.xcworkspacedata', '.xcuserstate'].include?(extname)
end
# Internal: Is the blob minified files?
#
# Consider a file minified if it contains more than 5% spaces.
# Currently, only JS and CSS files are detected by this method.
#
# Returns true or false.
def minified_files?
return unless ['.js', '.css'].include? extname
if data && data.length > 200
(data.each_char.count{ |c| c <= ' ' } / data.length.to_f) < 0.05
else
false
end
end
# Internal: Is the blob of JS generated by CoffeeScript?
#
# CoffeeScript is meant to output JS that would be difficult to
# tell if it was generated or not. Look for a number of patterns
# output by the CS compiler.
#
# Return true or false
def compiled_coffeescript?
return false unless extname == '.js'
# CoffeeScript generated by > 1.2 include a comment on the first line
if lines[0] =~ /^\/\/ Generated by /
return true
end
if lines[0] == '(function() {' && # First line is module closure opening
lines[-2] == '}).call(this);' && # Second to last line closes module closure
lines[-1] == '' # Last line is blank
score = 0
lines.each do |line|
if line =~ /var /
# Underscored temp vars are likely to be Coffee
score += 1 * line.gsub(/(_fn|_i|_len|_ref|_results)/).count
# bind and extend functions are very Coffee specific
score += 3 * line.gsub(/(__bind|__extends|__hasProp|__indexOf|__slice)/).count
end
end
# Require a score of 3. This is fairly arbitrary. Consider
# tweaking later.
score >= 3
else
false
end
end
# Internal: Is this a generated documentation file for a .NET assembly?
#
# .NET developers often check in the XML Intellisense file along with an
# assembly - however, these don't have a special extension, so we have to
# dig into the contents to determine if it's a docfile. Luckily, these files
# are extremely structured, so recognizing them is easy.
#
# Returns true or false
def generated_net_docfile?
return false unless extname.downcase == ".xml"
return false unless lines.count > 3
# .NET Docfiles always open with <doc> and their first tag is an
# <assembly> tag
return lines[1].include?("<doc>") &&
lines[2].include?("<assembly>") &&
lines[-2].include?("</doc>")
end
# Internal: Is this a codegen file for a .NET project?
#
# Visual Studio often uses code generation to generate partial classes, and
# these files can be quite unwieldy. Let's hide them.
#
# Returns true or false
def generated_net_designer_file?
name.downcase =~ /\.designer\.cs$/
end
# Internal: Is the blob of JS a parser generated by PEG.js?
#
# PEG.js-generated parsers are not meant to be consumed by humans.
#
# Return true or false
def generated_parser?
return false unless extname == '.js'
# PEG.js-generated parsers include a comment near the top of the file
# that marks them as such.
if lines[0..4].join('') =~ /^(?:[^\/]|\/[^\*])*\/\*(?:[^\*]|\*[^\/])*Generated by PEG.js/
return true
end
false
end
# Internal: Is the blob a C++, Java or Python source file generated by the
# Protocol Buffer compiler?
#
# Returns true of false.
def generated_protocol_buffer?
return false unless ['.py', '.java', '.h', '.cc', '.cpp'].include?(extname)
return false unless lines.count > 1
return lines[0].include?("Generated by the protocol buffer compiler. DO NOT EDIT!")
end
end
end
Loading
Loading
@@ -2,6 +2,9 @@ require 'escape_utils'
require 'pygments'
require 'yaml'
 
require 'linguist/classifier'
require 'linguist/samples'
module Linguist
# Language names that are recognizable by GitHub. Defined languages
# can be highlighted, searched and listed under the Top Languages page.
Loading
Loading
@@ -9,28 +12,22 @@ module Linguist
# Languages are defined in `lib/linguist/languages.yml`.
class Language
@languages = []
@overrides = {}
@index = {}
@name_index = {}
@alias_index = {}
@extension_index = {}
@filename_index = {}
@extension_index = Hash.new { |h,k| h[k] = [] }
@filename_index = Hash.new { |h,k| h[k] = [] }
@primary_extension_index = {}
 
# Valid Languages types
TYPES = [:data, :markup, :programming]
 
# Internal: Test if extension maps to multiple Languages.
# Names of non-programming languages that we will still detect
#
# Returns true or false.
def self.ambiguous?(extension)
@overrides.include?(extension)
end
# Include?: Return overridden extensions.
#
# Returns extensions Array.
def self.overridden_extensions
@overrides.keys
# Returns an array
def self.detectable_markup
["AsciiDoc", "CSS", "Creole", "Less", "Markdown", "MediaWiki", "Org", "RDoc", "Sass", "Textile", "reStructuredText"]
end
 
# Internal: Create a new Language object
Loading
Loading
@@ -43,18 +40,18 @@ module Linguist
 
@languages << language
 
# All Language names should be unique. Warn if there is a duplicate.
# All Language names should be unique. Raise if there is a duplicate.
if @name_index.key?(language.name)
warn "Duplicate language name: #{language.name}"
raise ArgumentError, "Duplicate language name: #{language.name}"
end
 
# Language name index
@index[language.name] = @name_index[language.name] = language
 
language.aliases.each do |name|
# All Language aliases should be unique. Warn if there is a duplicate.
# All Language aliases should be unique. Raise if there is a duplicate.
if @alias_index.key?(name)
warn "Duplicate alias: #{name}"
raise ArgumentError, "Duplicate alias: #{name}"
end
 
@index[name] = @alias_index[name] = language
Loading
Loading
@@ -62,33 +59,56 @@ module Linguist
 
language.extensions.each do |extension|
if extension !~ /^\./
warn "Extension is missing a '.': #{extension.inspect}"
raise ArgumentError, "Extension is missing a '.': #{extension.inspect}"
end
 
unless ambiguous?(extension)
# Index the extension with a leading ".": ".rb"
@extension_index[extension] = language
# Index the extension without a leading ".": "rb"
@extension_index[extension.sub(/^\./, '')] = language
end
@extension_index[extension] << language
end
 
language.overrides.each do |extension|
if extension !~ /^\./
warn "Extension is missing a '.': #{extension.inspect}"
end
@overrides[extension] = language
if @primary_extension_index.key?(language.primary_extension)
raise ArgumentError, "Duplicate primary extension: #{language.primary_extension}"
end
 
@primary_extension_index[language.primary_extension] = language
language.filenames.each do |filename|
@filename_index[filename] = language
@filename_index[filename] << language
end
 
language
end
 
# Public: Detects the Language of the blob.
#
# name - String filename
# data - String blob data. A block also maybe passed in for lazy
# loading. This behavior is deprecated and you should always
# pass in a String.
# mode - Optional String mode (defaults to nil)
#
# Returns Language or nil.
def self.detect(name, data, mode = nil)
# A bit of an elegant hack. If the file is executable but extensionless,
# append a "magic" extension so it can be classified with other
# languages that have shebang scripts.
if File.extname(name).empty? && mode && (mode.to_i(8) & 05) == 05
name += ".script!"
end
possible_languages = find_by_filename(name)
if possible_languages.length > 1
data = data.call() if data.respond_to?(:call)
if data.nil? || data == ""
nil
elsif result = Classifier.classify(Samples::DATA, data, possible_languages.map(&:name)).first
Language[result[0]]
end
else
possible_languages.first
end
end
# Public: Get all Languages
#
# Returns an Array of Languages
Loading
Loading
@@ -124,33 +144,22 @@ module Linguist
@alias_index[name]
end
 
# Public: Look up Language by extension.
#
# extension - The extension String. May include leading "."
#
# Examples
#
# Language.find_by_extension('.rb')
# # => #<Language name="Ruby">
#
# Returns the Language or nil if none was found.
def self.find_by_extension(extension)
@extension_index[extension]
end
# Public: Look up Language by filename.
# Public: Look up Languages by filename.
#
# filename - The path String.
#
# Examples
#
# Language.find_by_filename('foo.rb')
# # => #<Language name="Ruby">
# # => [#<Language name="Ruby">]
#
# Returns the Language or nil if none was found.
# Returns all matching Languages or [] if none were found.
def self.find_by_filename(filename)
basename, extname = File.basename(filename), File.extname(filename)
@filename_index[basename] || @extension_index[extname]
langs = [@primary_extension_index[extname]] +
@filename_index[basename] +
@extension_index[extname]
langs.compact.uniq
end
 
# Public: Look up Language by its name or lexer.
Loading
Loading
@@ -231,16 +240,18 @@ module Linguist
raise(ArgumentError, "#{@name} is missing lexer")
 
@ace_mode = attributes[:ace_mode]
@wrap = attributes[:wrap] || false
 
# Set legacy search term
@search_term = attributes[:search_term] || default_alias_name
 
# Set extensions or default to [].
@extensions = attributes[:extensions] || []
@overrides = attributes[:overrides] || []
@filenames = attributes[:filenames] || []
 
@primary_extension = attributes[:primary_extension] || default_primary_extension || extensions.first
unless @primary_extension = attributes[:primary_extension]
raise ArgumentError, "#{@name} is missing primary extension"
end
 
# Prepend primary extension unless its already included
if primary_extension && !extensions.include?(primary_extension)
Loading
Loading
@@ -320,6 +331,11 @@ module Linguist
# Returns a String name or nil
attr_reader :ace_mode
 
# Public: Should language lines be wrapped
#
# Returns true or false
attr_reader :wrap
# Public: Get extensions
#
# Examples
Loading
Loading
@@ -331,7 +347,7 @@ module Linguist
 
# Deprecated: Get primary extension
#
# Defaults to the first extension but can be overriden
# Defaults to the first extension but can be overridden
# in the languages.yml.
#
# The primary extension can not be nil. Tests should verify this.
Loading
Loading
@@ -343,11 +359,6 @@ module Linguist
# Returns the extension String.
attr_reader :primary_extension
 
# Internal: Get overridden extensions.
#
# Returns the extensions Array.
attr_reader :overrides
# Public: Get filenames
#
# Examples
Loading
Loading
@@ -377,13 +388,6 @@ module Linguist
name.downcase.gsub(/\s/, '-')
end
 
# Internal: Get default primary extension.
#
# Returns the extension String.
def default_primary_extension
extensions.first
end
# Public: Get Language group
#
# Returns a Language
Loading
Loading
@@ -441,11 +445,36 @@ module Linguist
def hash
name.hash
end
def inspect
"#<#{self.class} name=#{name}>"
end
end
 
extensions = Samples::DATA['extnames']
filenames = Samples::DATA['filenames']
popular = YAML.load_file(File.expand_path("../popular.yml", __FILE__))
 
YAML.load_file(File.expand_path("../languages.yml", __FILE__)).each do |name, options|
options['extensions'] ||= []
options['filenames'] ||= []
if extnames = extensions[name]
extnames.each do |extname|
if !options['extensions'].include?(extname)
options['extensions'] << extname
end
end
end
if fns = filenames[name]
fns.each do |filename|
if !options['filenames'].include?(filename)
options['filenames'] << filename
end
end
end
Language.create(
:name => name,
:color => options['color'],
Loading
Loading
@@ -453,12 +482,12 @@ module Linguist
:aliases => options['aliases'],
:lexer => options['lexer'],
:ace_mode => options['ace_mode'],
:wrap => options['wrap'],
:group_name => options['group'],
:searchable => options.key?('searchable') ? options['searchable'] : true,
:search_term => options['search_term'],
:extensions => options['extensions'],
:extensions => options['extensions'].sort,
:primary_extension => options['primary_extension'],
:overrides => options['overrides'],
:filenames => options['filenames'],
:popular => popular.include?(name)
)
Loading
Loading
# Defines all Languages known to GitHub.
#
# All languages have an associated lexer for syntax highlighting. It
# defaults to name.downcase, which covers most cases. Make sure the
# lexer exists in lexers.yml. This is a list of available in our
# version of pygments.
# defaults to name.downcase, which covers most cases.
#
# type - Either data, programming, markup, or nil
# lexer - An explicit lexer String (defaults to name.downcase)
# lexer - An explicit lexer String (defaults to name)
# aliases - An Array of additional aliases (implicitly
# includes name.downcase)
# ace_mode - A String name of Ace Mode (if available)
# wrap - Boolean wrap to enable line wrapping (default: false)
# extension - An Array of associated extensions
# primary_extension - A String for the main extension associated with
# the langauge. (defaults to extensions.first)
# overrides - An Array of extensions that takes precedence over conflicts
# the language. Must be unique. Used when a Language is picked
# from a dropdown and we need to automatically choose an
# extension.
# searchable - Boolean flag to enable searching (defaults to true)
# search_term - Deprecated: Some languages maybe indexed under a
# different alias. Avoid defining new exceptions.
Loading
Loading
@@ -22,7 +22,12 @@
# Any additions or modifications (even trivial) should have corresponding
# test change in `test/test_blob.rb`.
#
# Please keep this list alphabetized.
# Please keep this list alphabetized. Capitalization comes before lower case.
ABAP:
type: programming
lexer: ABAP
primary_extension: .abap
 
ASP:
type: programming
Loading
Loading
@@ -38,7 +43,6 @@ ASP:
- .ascx
- .ashx
- .asmx
- .asp
- .aspx
- .axd
 
Loading
Loading
@@ -49,43 +53,53 @@ ActionScript:
search_term: as3
aliases:
- as3
extensions:
- .as
primary_extension: .as
 
Ada:
type: programming
color: "#02f88c"
primary_extension: .adb
extensions:
- .adb
- .ads
 
ApacheConf:
type: markup
aliases:
- apache
primary_extension: .apacheconf
Apex:
type: programming
lexer: Text only
extensions:
- .cls
primary_extension: .cls
 
AppleScript:
type: programming
aliases:
- osascript
primary_extension: .scpt
extensions:
- .applescript
- .scpt
primary_extension: .applescript
 
Arc:
type: programming
color: "#ca2afe"
lexer: Text only
extensions:
- .arc
primary_extension: .arc
 
Arduino:
type: programming
color: "#bd79d1"
lexer: C++
primary_extension: .ino
AsciiDoc:
type: markup
lexer: Text only
ace_mode: asciidoc
wrap: true
primary_extension: .asciidoc
extensions:
- .ino
- .adoc
- .asc
 
Assembly:
type: programming
Loading
Loading
@@ -94,13 +108,11 @@ Assembly:
search_term: nasm
aliases:
- nasm
extensions:
- .asm
primary_extension: .asm
 
Augeas:
type: programming
extensions:
- .aug
primary_extension: .aug
 
AutoHotkey:
type: programming
Loading
Loading
@@ -108,8 +120,16 @@ AutoHotkey:
color: "#6594b9"
aliases:
- ahk
primary_extension: .ahk
Awk:
type: programming
lexer: Awk
primary_extension: .awk
extensions:
- .ahk
- .gawk
- .mawk
- .nawk
 
Batchfile:
type: programming
Loading
Loading
@@ -119,42 +139,33 @@ Batchfile:
- bat
primary_extension: .bat
extensions:
- .bat
- .cmd
 
Befunge:
extensions:
- .befunge
primary_extension: .befunge
 
BlitzMax:
extensions:
- .bmx
primary_extension: .bmx
 
Boo:
type: programming
color: "#d4bec1"
extensions:
- .boo
primary_extension: .boo
 
Brainfuck:
primary_extension: .b
extensions:
- .b
- .bf
 
Bro:
type: programming
extensions:
- .bro
primary_extension: .bro
 
C:
type: programming
color: "#555"
overrides:
- .h
primary_extension: .c
extensions:
- .c
- .h
- .w
 
C#:
Loading
Loading
@@ -164,8 +175,9 @@ C#:
color: "#5a25a2"
aliases:
- csharp
primary_extension: .cs
extensions:
- .cs
- .csx
 
C++:
type: programming
Loading
Loading
@@ -176,23 +188,19 @@ C++:
- cpp
primary_extension: .cpp
extensions:
- .C
- .c++
- .cc
- .cpp
- .cu
- .cxx
- .h
- .H
- .h++
- .hh
- .hpp
- .hxx
- .tcc
 
C-ObjDump:
type: data
lexer: c-objdump
extensions:
- .c-objdump
primary_extension: .c-objdump
 
C2hs Haskell:
type: programming
Loading
Loading
@@ -200,25 +208,42 @@ C2hs Haskell:
group: Haskell
aliases:
- c2hs
extensions:
- .chs
primary_extension: .chs
CLIPS:
type: programming
lexer: Text only
primary_extension: .clp
 
CMake:
primary_extension: .cmake
extensions:
- .cmake
- .cmake.in
filenames:
- CMakeLists.txt
 
COBOL:
type: programming
primary_extension: .cob
extensions:
- .cbl
- .ccp
- .cobol
- .cpy
CSS:
ace_mode: css
extensions:
- .css
color: "#1f085e"
primary_extension: .css
Ceylon:
type: programming
lexer: Ceylon
primary_extension: .ceylon
 
ChucK:
lexer: Java
extensions:
- .ck
primary_extension: .ck
 
Clojure:
type: programming
Loading
Loading
@@ -226,8 +251,10 @@ Clojure:
color: "#db5855"
primary_extension: .clj
extensions:
- .clj
- .cljs
- .cljx
filenames:
- riemann.config
 
CoffeeScript:
type: programming
Loading
Loading
@@ -235,8 +262,12 @@ CoffeeScript:
color: "#244776"
aliases:
- coffee
- coffee-script
primary_extension: .coffee
extensions:
- .coffee
- ._coffee
- .cson
- .iced
filenames:
- Cakefile
 
Loading
Loading
@@ -251,7 +282,6 @@ ColdFusion:
primary_extension: .cfm
extensions:
- .cfc
- .cfm
 
Common Lisp:
type: programming
Loading
Loading
@@ -260,27 +290,32 @@ Common Lisp:
- lisp
primary_extension: .lisp
extensions:
- .lisp
- .asd
- .lsp
- .ny
- .podsl
 
Coq:
type: programming
extensions:
- .v
primary_extension: .coq
 
Cpp-ObjDump:
type: data
lexer: cpp-objdump
primary_extension: .cppobjdump
extensions:
- .cppobjdump
- .c++objdump
- .cxx-objdump
 
Creole:
type: markup
lexer: Text only
wrap: true
primary_extension: .creole
Cucumber:
lexer: Gherkin
extensions:
- .feature
primary_extension: .feature
 
Cython:
type: programming
Loading
Loading
@@ -289,42 +324,37 @@ Cython:
extensions:
- .pxd
- .pxi
- .pyx
 
D:
type: programming
color: "#fcd46d"
primary_extension: .d
extensions:
- .d
- .di
 
D-ObjDump:
type: data
lexer: d-objdump
primary_extension: .d-objdump
DOT:
type: programming
lexer: Text only
primary_extension: .dot
extensions:
- .d-objdump
- .gv
 
Darcs Patch:
search_term: dpatch
aliases:
- dpatch
primary_extension: .darcspatch
extensions:
- .darcspatch
- .dpatch
 
Dart:
type: programming
extensions:
- .dart
Delphi:
type: programming
color: "#b0ce4e"
primary_extension: .pas
extensions:
- .dpr
- .lpr
- .pas
primary_extension: .dart
 
DCPU-16 ASM:
type: programming
Loading
Loading
@@ -332,43 +362,50 @@ DCPU-16 ASM:
primary_extension: .dasm16
extensions:
- .dasm
- .dasm16
aliases:
- dasm16
 
Diff:
extensions:
- .diff
- .patch
primary_extension: .diff
 
Dylan:
type: programming
color: "#3ebc27"
extensions:
- .dylan
primary_extension: .dylan
 
Ecere Projects:
type: data
group: JavaScript
lexer: JSON
primary_extension: .epj
Ecl:
type: programming
color: "#8a1267"
primary_extension: .ecl
lexer: ECL
extensions:
- .epj
- .eclxml
 
Eiffel:
type: programming
lexer: Text only
color: "#946d57"
extensions:
- .e
primary_extension: .e
 
Elixir:
type: programming
color: "#6e4a7e"
primary_extension: .ex
extensions:
- .ex
- .exs
 
Elm:
type: programming
lexer: Haskell
group: Haskell
primary_extension: .elm
Emacs Lisp:
type: programming
lexer: Scheme
Loading
Loading
@@ -378,24 +415,24 @@ Emacs Lisp:
- emacs
primary_extension: .el
extensions:
- .el
- .emacs
 
Erlang:
type: programming
color: "#949e0e"
color: "#0faf8d"
primary_extension: .erl
extensions:
- .erl
- .hrl
 
F#:
type: programming
lexer: FSharp
color: "#b845fc"
search_term: ocaml
search_term: fsharp
aliases:
- fsharp
primary_extension: .fs
extensions:
- .fs
- .fsi
- .fsx
 
Loading
Loading
@@ -417,7 +454,6 @@ FORTRAN:
- .f03
- .f08
- .f77
- .f90
- .f95
- .for
- .fpp
Loading
Loading
@@ -425,8 +461,10 @@ FORTRAN:
Factor:
type: programming
color: "#636746"
extensions:
- .factor
primary_extension: .factor
filenames:
- .factor-rc
- .factor-boot-rc
 
Fancy:
type: programming
Loading
Loading
@@ -434,13 +472,21 @@ Fancy:
primary_extension: .fy
extensions:
- .fancypack
- .fy
filenames:
- Fakefile
 
Fantom:
type: programming
color: "#dbded5"
primary_extension: .fan
Forth:
type: programming
primary_extension: .fth
color: "#341708"
lexer: Text only
extensions:
- .fan
- .4th
 
GAS:
type: programming
Loading
Loading
@@ -448,49 +494,50 @@ GAS:
primary_extension: .s
extensions:
- .S
- .s
 
Genshi:
GLSL:
group: C
type: programming
primary_extension: .glsl
extensions:
- .kid
- .fp
- .frag
- .geom
- .glslv
- .shader
- .vert
Genshi:
primary_extension: .kid
 
Gentoo Ebuild:
group: Shell
lexer: Bash
extensions:
- .ebuild
primary_extension: .ebuild
 
Gentoo Eclass:
group: Shell
lexer: Bash
extensions:
- .eclass
primary_extension: .eclass
 
Gettext Catalog:
search_term: pot
searchable: false
aliases:
- pot
primary_extension: .po
extensions:
- .po
- .pot
 
Go:
type: programming
color: "#8d04eb"
extensions:
- .go
color: "#a89b4d"
primary_extension: .go
 
Gosu:
type: programming
color: "#82937f"
primary_extension: .gs
extensions:
- .gs
- .gsp
- .gst
- .gsx
- .vark
 
Groff:
primary_extension: .man
Loading
Loading
@@ -502,127 +549,133 @@ Groff:
- '.5'
- '.6'
- '.7'
- .man
 
Groovy:
type: programming
ace_mode: groovy
color: "#e69f56"
primary_extension: .groovy
extensions:
- .gradle
- .groovy
 
Groovy Server Pages:
group: Groovy
lexer: Java Server Page
overrides:
- .gsp
aliases:
- gsp
extensions:
- .gsp
primary_extension: .gsp
 
HTML:
type: markup
ace_mode: html
aliases:
- xhtml
primary_extension: .html
extensions:
- .htm
- .html
- .xhtml
- .xslt
 
HTML+Django:
type: markup
group: HTML
lexer: HTML+Django/Jinja
primary_extension: .mustache # TODO: This is incorrect
extensions:
- .jinja
- .mustache
 
HTML+ERB:
type: markup
group: HTML
lexer: RHTML
aliases:
- erb
primary_extension: .erb
extensions:
- .erb
- .erb.deface
- .html.erb
- .html.erb.deface
 
HTML+PHP:
type: markup
group: HTML
extensions:
- .phtml
primary_extension: .phtml
 
HaXe:
type: programming
lexer: haXe
ace_mode: haxe
color: "#346d51"
extensions:
- .hx
- .hxml
- .mtt
HTTP:
type: data
primary_extension: .http
 
Haml:
group: HTML
type: markup
primary_extension: .haml
extensions:
- .haml
- .haml.deface
- .html.haml.deface
Handlebars:
type: markup
lexer: Text only
primary_extension: .handlebars
 
Haskell:
type: programming
color: "#29b544"
primary_extension: .hs
extensions:
- .hs
- .hsc
 
Haxe:
type: programming
lexer: haXe
ace_mode: haxe
color: "#346d51"
primary_extension: .hx
extensions:
- .hxsl
INI:
type: data
extensions:
- .cfg
- .ini
- .prefs
- .properties
filenames:
- .gitconfig
primary_extension: .ini
 
IRC log:
lexer: IRC logs
search_term: irc
aliases:
- irc
primary_extension: .irclog
extensions:
- .weechatlog
 
Io:
type: programming
color: "#a9188d"
extensions:
- .io
primary_extension: .io
 
Ioke:
type: programming
color: "#078193"
extensions:
- .ik
primary_extension: .ik
J:
type: programming
lexer: Text only
primary_extension: .ijs
 
JSON:
type: data
group: JavaScript
ace_mode: json
searchable: false
extensions:
- .json
primary_extension: .json
 
Java:
type: programming
ace_mode: java
color: "#b07219"
extensions:
- .java
- .pde
primary_extension: .java
 
Java Server Pages:
group: Java
Loading
Loading
@@ -630,8 +683,7 @@ Java Server Pages:
search_term: jsp
aliases:
- jsp
extensions:
- .jsp
primary_extension: .jsp
 
JavaScript:
type: programming
Loading
Loading
@@ -642,9 +694,9 @@ JavaScript:
- node
primary_extension: .js
extensions:
- ._js
- .bones
- .jake
- .js
- .jsfl
- .jsm
- .jss
Loading
Loading
@@ -657,26 +709,55 @@ JavaScript:
 
Julia:
type: programming
extensions:
- .jl
primary_extension: .jl
 
Kotlin:
type: programming
primary_extension: .kt
extensions:
- .kt
- .ktm
- .kts
 
LFE:
type: programming
primary_extension: .lfe
color: "#004200"
lexer: Common Lisp
group: Erlang
LLVM:
extensions:
- .ll
primary_extension: .ll
Lasso:
type: programming
lexer: Lasso
ace_mode: lasso
color: "#2584c3"
primary_extension: .lasso
Less:
type: markup
group: CSS
lexer: CSS
ace_mode: less
primary_extension: .less
 
LilyPond:
lexer: Text only
primary_extension: .ly
extensions:
- .ily
- .ly
Literate CoffeeScript:
type: programming
group: CoffeeScript
lexer: Text only
ace_mode: markdown
wrap: true
search_term: litcoffee
aliases:
- litcoffee
primary_extension: .litcoffee
 
Literate Haskell:
type: programming
Loading
Loading
@@ -684,44 +765,77 @@ Literate Haskell:
search_term: lhs
aliases:
- lhs
primary_extension: .lhs
LiveScript:
type: programming
ace_mode: ls
color: "#499886"
aliases:
- ls
primary_extension: .ls
extensions:
- ._ls
filenames:
- Slakefile
Logos:
type: programming
primary_extension: .xm
extensions:
- .lhs
- .x
- .xi
- .xmi
 
Logtalk:
type: programming
primary_extension: .lgt
extensions:
- .lgt
- .logtalk
 
Lua:
type: programming
ace_mode: lua
color: "#fa1fa1"
primary_extension: .lua
extensions:
- .lua
- .nse
- .rbxs
M:
type: programming
lexer: Common Lisp
aliases:
- mumps
primary_extension: .mumps
extensions:
- .m
 
Makefile:
aliases:
- make
extensions:
- .mak
- .mk
primary_extension: .mak
filenames:
- makefile
- Makefile
- GNUmakefile
 
Mako:
primary_extension: .mako
extensions:
- .mako
- .mao
 
Markdown:
type: markup
lexer: Text only
ace_mode: markdown
wrap: true
primary_extension: .md
extensions:
- .markdown
- .md
- .mkd
- .mkdown
- .ron
Loading
Loading
@@ -730,16 +844,25 @@ Matlab:
type: programming
color: "#bb92ac"
primary_extension: .matlab
extensions:
- .m
- .matlab
 
Max/MSP:
Max:
type: programming
color: "#ce279c"
lexer: Text only
aliases:
- max/msp
- maxmsp
search_term: max/msp
primary_extension: .mxt
extensions:
- .mxt
- .maxhelp
- .maxpat
MediaWiki:
type: markup
lexer: Text only
wrap: true
primary_extension: .mediawiki
 
MiniD: # Legacy
searchable: false
Loading
Loading
@@ -750,31 +873,46 @@ Mirah:
lexer: Ruby
search_term: ruby
color: "#c7a938"
primary_extension: .druby
extensions:
- .duby
- .mir
- .mirah
 
Monkey:
type: programming
lexer: Monkey
primary_extension: .monkey
Moocode:
lexer: MOOCode
extensions:
- .moo
primary_extension: .moo
MoonScript:
type: programming
primary_extension: .moon
 
Myghty:
extensions:
- .myt
primary_extension: .myt
NSIS:
primary_extension: .nsi
 
Nemerle:
type: programming
color: "#0d3c6e"
extensions:
- .n
primary_extension: .n
Nginx:
type: markup
lexer: Nginx configuration file
primary_extension: .nginxconf
 
Nimrod:
type: programming
color: "#37775b"
primary_extension: .nim
extensions:
- .nim
- .nimrod
 
Nu:
Loading
Loading
@@ -783,8 +921,7 @@ Nu:
color: "#c9df40"
aliases:
- nush
extensions:
- .nu
primary_extension: .nu
filenames:
- Nukefile
 
Loading
Loading
@@ -792,7 +929,6 @@ NumPy:
group: Python
primary_extension: .numpy
extensions:
- .numpy
- .numpyw
- .numsc
 
Loading
Loading
@@ -802,7 +938,7 @@ OCaml:
color: "#3be133"
primary_extension: .ml
extensions:
- .ml
- .eliomi
- .mli
- .mll
- .mly
Loading
Loading
@@ -810,38 +946,44 @@ OCaml:
ObjDump:
type: data
lexer: objdump
extensions:
- .objdump
primary_extension: .objdump
 
Objective-C:
type: programming
color: "#438eff"
overrides:
- .m
aliases:
- obj-c
- objc
primary_extension: .m
extensions:
- .h
- .m
- .mm
 
Objective-J:
type: programming
color: "#ff0c5a"
aliases:
- obj-j
primary_extension: .j
extensions:
- .j
- .sj
 
Omgrofl:
type: programming
primary_extension: .omgrofl
color: "#cabbff"
lexer: Text only
Opa:
type: programming
extensions:
- .opa
primary_extension: .opa
 
OpenCL:
type: programming
group: C
lexer: C
primary_extension: .cl
extensions:
- .cl
- .opencl
 
OpenEdge ABL:
type: programming
Loading
Loading
@@ -850,18 +992,21 @@ OpenEdge ABL:
- openedge
- abl
primary_extension: .p
extensions:
- .cls
- .p
Org:
type: markup
lexer: Text only
wrap: true
primary_extension: .org
 
PHP:
type: programming
ace_mode: php
color: "#6e03c1"
primary_extension: .php
extensions:
- .aw
- .ctp
- .php
- .php3
- .php4
- .php5
Loading
Loading
@@ -881,8 +1026,7 @@ Parrot Internal Representation:
lexer: Text only
aliases:
- pir
extensions:
- .pir
primary_extension: .pir
 
Parrot Assembly:
group: Parrot
Loading
Loading
@@ -890,48 +1034,70 @@ Parrot Assembly:
lexer: Text only
aliases:
- pasm
primary_extension: .pasm
Pascal:
type: programming
lexer: Delphi
color: "#b0ce4e"
primary_extension: .pas
extensions:
- .pasm
- .dfm
- .lpr
 
Perl:
type: programming
ace_mode: perl
color: "#0298c3"
overrides:
- .pl
- .t
primary_extension: .pl
extensions:
- .PL
- .nqp
- .perl
- .ph
- .pl
- .plx
- .pm
- .pm6
- .pod
- .psgi
- .t
Pike:
type: programming
color: "#066ab2"
lexer: C
primary_extension: .pike
extensions:
- .pmod
PogoScript:
type: programming
color: "#d80074"
lexer: Text only
primary_extension: .pogo
 
PowerShell:
type: programming
ace_mode: powershell
aliases:
- posh
extensions:
- .ps1
- .psm1
primary_extension: .ps1
Processing:
type: programming
lexer: Java
color: "#2779ab"
primary_extension: .pde
 
Prolog:
type: programming
color: "#74283c"
primary_extension: .prolog
extensions:
- .pl
- .pro
- .prolog
 
Puppet:
type: programming
color: "#cc5555"
primary_extension: .pp
extensions:
- .pp
filenames:
Loading
Loading
@@ -941,8 +1107,7 @@ Pure Data:
type: programming
color: "#91de79"
lexer: Text only
extensions:
- .pd
primary_extension: .pd
 
Python:
type: programming
Loading
Loading
@@ -950,67 +1115,80 @@ Python:
color: "#3581ba"
primary_extension: .py
extensions:
- .py
- .gyp
- .pyt
- .pyw
- .wsgi
- .xpy
filenames:
- wscript
 
Python traceback:
type: data
group: Python
lexer: Python Traceback
searchable: false
extensions:
- .pytb
primary_extension: .pytb
 
R:
type: programming
color: "#198ce7"
lexer: S
overrides:
- .r
primary_extension: .r
extensions:
- .R
- .r
filenames:
- .Rprofile
RDoc:
type: markup
lexer: Text only
ace_mode: rdoc
wrap: true
primary_extension: .rdoc
 
RHTML:
type: markup
group: HTML
extensions:
- .rhtml
primary_extension: .rhtml
 
Racket:
type: programming
lexer: Scheme
lexer: Racket
color: "#ae17ff"
primary_extension: .rkt
extensions:
- .rkt
- .rktd
- .rktl
- .scrbl
Ragel in Ruby Host:
type: programming
lexer: Ragel in Ruby Host
color: "#ff9c2e"
primary_extension: .rl
 
Raw token data:
search_term: raw
aliases:
- raw
extensions:
- .raw
primary_extension: .raw
 
Rebol:
type: programming
lexer: REBOL
color: "#358a5b"
primary_extension: .rebol
extensions:
- .r
- .r2
- .r3
- .rebol
 
Redcode:
extensions:
- .cw
primary_extension: .cw
Rouge:
type: programming
lexer: Clojure
ace_mode: clojure
color: "#cc0088"
primary_extension: .rg
 
Ruby:
type: programming
Loading
Loading
@@ -1029,8 +1207,6 @@ Ruby:
- .god
- .irbrc
- .podspec
- .rake
- .rb
- .rbuild
- .rbw
- .rbx
Loading
Loading
@@ -1038,80 +1214,64 @@ Ruby:
- .thor
- .watchr
filenames:
- Capfile
- Berksfile
- Gemfile
- Guardfile
- Podfile
- Rakefile
- Thorfile
- Vagrantfile
 
Rust:
type: programming
color: "#dea584"
lexer: Text only
extensions:
- .rs
primary_extension: .rs
 
SCSS:
type: markup
group: CSS
ace_mode: scss
extensions:
- .scss
primary_extension: .scss
 
SQL:
type: data
ace_mode: sql
searchable: false
extensions:
- .sql
primary_extension: .sql
 
Sage:
type: programming
lexer: Python
group: Python
extensions:
- .sage
primary_extension: .sage
 
Sass:
type: markup
group: CSS
extensions:
- .sass
primary_extension: .sass
 
Scala:
type: programming
ace_mode: scala
color: "#7dd3b0"
primary_extension: .scala
extensions:
- .sbt
- .scala
 
Scheme:
type: programming
color: "#1e4aec"
primary_extension: .scm
extensions:
- .scm
- .sls
- .sps
- .ss
 
Scilab:
type: programming
primary_extension: .sci
extensions:
- .sce
- .tst
 
Self:
type: programming
color: "#0579aa"
lexer: Text only
extensions:
- .self
primary_extension: .self
 
Shell:
type: programming
Loading
Loading
@@ -1124,28 +1284,27 @@ Shell:
- zsh
primary_extension: .sh
extensions:
- .bash
- .sh
- .zsh
- .tmux
filenames:
- .bash_profile
- .bashrc
- .profile
- .zlogin
- .zsh
- .zshrc
- bashrc
- zshrc
- Dockerfile
Slash:
type: programming
color: "#007eff"
primary_extension: .sl
 
Smalltalk:
type: programming
color: "#596706"
extensions:
- .st
primary_extension: .st
 
Smarty:
extensions:
- .tpl
primary_extension: .tpl
Squirrel:
type: programming
lexer: C++
primary_extension: .nut
 
Standard ML:
type: programming
Loading
Loading
@@ -1153,22 +1312,26 @@ Standard ML:
aliases:
- sml
primary_extension: .sml
extensions:
- .sig
- .sml
 
SuperCollider:
type: programming
color: "#46390b"
lexer: Text only
extensions:
- .sc
primary_extension: .sc
TOML:
type: data
primary_extension: .toml
TXL:
type: programming
lexer: Text only
primary_extension: .txl
 
Tcl:
type: programming
color: "#e4cc98"
extensions:
- .tcl
primary_extension: .tcl
 
Tcsh:
type: programming
Loading
Loading
@@ -1176,81 +1339,83 @@ Tcsh:
primary_extension: .tcsh
extensions:
- .csh
- .tcsh
 
TeX:
type: markup
ace_mode: latex
aliases:
- latex
primary_extension: .tex
overrides:
- .cls
extensions:
- .aux
- .cls
- .dtx
- .ins
- .ltx
- .sty
- .tex
- .toc
 
Tea:
type: markup
extensions:
- .tea
Text:
type: data
lexer: Text only
ace_mode: text
extensions:
- .txt
primary_extension: .tea
 
Textile:
type: markup
lexer: Text only
ace_mode: textile
extensions:
- .textile
wrap: true
primary_extension: .textile
 
Turing:
type: programming
color: "#45f715"
lexer: Text only
primary_extension: .t
extensions:
- .t
- .tu
 
Twig:
type: markup
group: PHP
lexer: HTML+Django/Jinja
extensions:
- .twig
primary_extension: .twig
TypeScript:
type: programming
color: "#31859c"
aliases:
- ts
primary_extension: .ts
Unified Parallel C:
type: programming
group: C
lexer: C
ace_mode: c_cpp
color: "#755223"
primary_extension: .upc
 
VHDL:
type: programming
lexer: vhdl
color: "#543978"
extensions:
- .vhd
- .vhdl
primary_extension: .vhdl
 
Vala:
type: programming
color: "#ee7d06"
primary_extension: .vala
extensions:
- .vala
- .vapi
 
Verilog:
type: programming
lexer: verilog
color: "#848bf3"
overrides:
- .v
primary_extension: .v
extensions:
- .v
- .sv
- .svh
- .vh
 
VimL:
type: programming
Loading
Loading
@@ -1258,11 +1423,8 @@ VimL:
search_term: vim
aliases:
- vim
extensions:
- .vim
primary_extension: .vim
filenames:
- .gvimrc
- .vimrc
- vimrc
- gvimrc
 
Loading
Loading
@@ -1274,85 +1436,151 @@ Visual Basic:
extensions:
- .bas
- .frx
- .vb
- .vba
- .vbs
 
Volt:
type: programming
lexer: D
color: "#0098db"
primary_extension: .volt
XC:
type: programming
lexer: C
primary_extension: .xc
XML:
type: markup
ace_mode: xml
aliases:
- rss
- xsd
- wsdl
primary_extension: .xml
extensions:
- .axml
- .ccxml
- .dita
- .ditamap
- .ditaval
- .glade
- .grxml
- .kml
- .mxml
- .plist
- .pt
- .rdf
- .rss
- .scxml
- .svg
- .tmCommand
- .tmLanguage
- .tmPreferences
- .tmSnippet
- .tmTheme
- .tml
- .ui
- .vxml
- .wsdl
- .wxi
- .wxl
- .wxs
- .x3d
- .xaml
- .xlf
- .xliff
- .xml
- .xmi
- .xsd
- .xsl
- .xul
- .zcml
filenames:
- .classpath
- .project
 
XProc:
type: programming
lexer: XML
primary_extension: .xpl
extensions:
- .xproc
XQuery:
type: programming
color: "#2700e2"
primary_extension: .xquery
extensions:
- .xq
- .xqm
- .xquery
- .xqy
 
XS:
lexer: C
primary_extension: .xs
XSLT:
type: programming
aliases:
- xsl
primary_extension: .xslt
extensions:
- .xs
- .xsl
Xtend:
type: programming
primary_extension: .xtend
 
YAML:
type: markup
type: data
aliases:
- yml
primary_extension: .yml
extensions:
- .reek
- .yaml
- .yml
filenames:
- .gemrc
 
eC:
type: programming
search_term: ec
primary_extension: .ec
extensions:
- .ec
- .eh
 
edn:
type: data
lexer: Clojure
ace_mode: clojure
color: "#db5855"
primary_extension: .edn
fish:
type: programming
group: Shell
lexer: Text only
primary_extension: .fish
mupad:
lexer: MuPAD
extensions:
- .mu
primary_extension: .mu
 
ooc:
type: programming
lexer: Ooc
color: "#b0b77e"
extensions:
- .ooc
primary_extension: .ooc
 
reStructuredText:
type: markup
wrap: true
search_term: rst
aliases:
- rst
primary_extension: .rst
extensions:
- .rst
- .rest
wisp:
type: programming
lexer: Clojure
ace_mode: clojure
color: "#7582D1"
primary_extension: .wisp
require 'digest/md5'
module Linguist
module MD5
# Public: Create deep nested digest of value object.
#
# Useful for object comparison.
#
# obj - Object to digest.
#
# Returns String hex digest
def self.hexdigest(obj)
digest = Digest::MD5.new
case obj
when String, Symbol, Integer
digest.update "#{obj.class}"
digest.update "#{obj}"
when TrueClass, FalseClass, NilClass
digest.update "#{obj.class}"
when Array
digest.update "#{obj.class}"
for e in obj
digest.update(hexdigest(e))
end
when Hash
digest.update "#{obj.class}"
for e in obj.map { |(k, v)| hexdigest([k, v]) }.sort
digest.update(e)
end
else
raise TypeError, "can't convert #{obj.inspect} into String"
end
digest.hexdigest
end
end
end
require 'mime/types'
require 'yaml'
class MIME::Type
attr_accessor :override
end
# Register additional mime type extensions
#
# Follows same format as mime-types data file
# https://github.com/halostatue/mime-types/blob/master/lib/mime/types.rb.data
File.read(File.expand_path("../mimes.yml", __FILE__)).lines.each do |line|
# Regexp was cargo culted from mime-types lib
next unless line =~ %r{^
#{MIME::Type::MEDIA_TYPE_RE}
(?:\s@([^\s]+))?
(?:\s:(#{MIME::Type::ENCODING_RE}))?
}x
mediatype = $1
subtype = $2
extensions = $3
encoding = $4
# Lookup existing mime type
mime_type = MIME::Types["#{mediatype}/#{subtype}"].first ||
# Or create a new instance
MIME::Type.new("#{mediatype}/#{subtype}")
if extensions
extensions.split(/,/).each do |extension|
mime_type.extensions << extension
end
end
if encoding
mime_type.encoding = encoding
end
mime_type.override = true
# Kind of hacky, but we need to reindex the mime type after making changes
MIME::Types.add_type_variant(mime_type)
MIME::Types.index_extensions(mime_type)
end
module Linguist
module Mime
# Internal: Look up mime type for extension.
#
# ext - The extension String. May include leading "."
#
# Examples
#
# Mime.mime_for('.html')
# # => 'text/html'
#
# Mime.mime_for('txt')
# # => 'text/plain'
#
# Return mime type String otherwise falls back to 'text/plain'.
def self.mime_for(ext)
mime_type = lookup_mime_type_for(ext)
mime_type ? mime_type.to_s : 'text/plain'
end
# Internal: Lookup mime type for extension or mime type
#
# ext_or_mime_type - A file extension ".txt" or mime type "text/plain".
#
# Returns a MIME::Type
def self.lookup_mime_type_for(ext_or_mime_type)
ext_or_mime_type ||= ''
if ext_or_mime_type =~ /\w+\/\w+/
guesses = ::MIME::Types[ext_or_mime_type]
else
guesses = ::MIME::Types.type_for(ext_or_mime_type)
end
# Use custom override first
guesses.detect { |type| type.override } ||
# Prefer text mime types over binary
guesses.detect { |type| type.ascii? } ||
# Otherwise use the first guess
guesses.first
end
end
end
# Additional types to add to MIME::Types
#
# MIME types are used to set the Content-Type of raw binary blobs. All text
# blobs are served as text/plain regardless of their type to ensure they
# open in the browser rather than downloading.
#
# The encoding helps determine whether a file should be treated as plain
# text or binary. By default, a mime type's encoding is base64 (binary).
# These types will show a "View Raw" link. To force a type to render as
# plain text, set it to 8bit for UTF-8. text/* types will be treated as
# text by default.
#
# <type> @<extensions> :<encoding>
#
# type - mediatype/subtype
# extensions - comma seperated extension list
# encoding - base64 (binary), 7bit (ASCII), 8bit (UTF-8), or
# quoted-printable (Printable ASCII).
#
# Follows same format as mime-types data file
# https://github.com/halostatue/mime-types/blob/master/lib/mime/types.rb.data
#
# Any additions or modifications (even trivial) should have corresponding
# test change in `test/test_mime.rb`.
# TODO: Lookup actual types
application/octet-stream @a,blend,gem,graffle,ipa,lib,mcz,nib,o,ogv,otf,pfx,pigx,plgx,psd,sib,spl,sqlite3,swc,ucode,xpi
# Please keep this list alphabetized
application/java-archive @ear,war
application/netcdf :8bit
application/ogg @ogg
application/postscript :base64
application/vnd.adobe.air-application-installer-package+zip @air
application/vnd.mozilla.xul+xml :8bit
application/vnd.oasis.opendocument.presentation @odp
application/vnd.oasis.opendocument.spreadsheet @ods
application/vnd.oasis.opendocument.text @odt
application/vnd.openofficeorg.extension @oxt
application/vnd.openxmlformats-officedocument.presentationml.presentation @pptx
application/x-chrome-extension @crx
application/x-iwork-keynote-sffkey @key
application/x-iwork-numbers-sffnumbers @numbers
application/x-iwork-pages-sffpages @pages
application/x-ms-xbap @xbap :8bit
application/x-parrot-bytecode @pbc
application/x-shockwave-flash @swf
application/x-silverlight-app @xap
application/x-supercollider @sc :8bit
application/x-troff-ms :8bit
application/x-wais-source :8bit
application/xaml+xml @xaml :8bit
image/x-icns @icns
text/cache-manifest @manifest
text/plain @cu,cxx
text/x-logtalk @lgt
text/x-nemerle @n
text/x-nimrod @nim
text/x-ocaml @ml,mli,mll,mly,sig,sml
text/x-rust @rs,rc
text/x-scheme @rkt,scm,sls,sps,ss
require 'linguist/language'
require 'linguist/mime'
require 'pygments'
module Linguist
# Similar to ::Pathname, Linguist::Pathname wraps a path string and
# provides helpful query methods. Its useful when you only have a
# filename but not a blob and need to figure out the language of the file.
class Pathname
# Public: Initialize a Pathname
#
# path - A filename String. The file may or maybe actually exist.
#
# Returns a Pathname.
def initialize(path)
@path = path
end
# Public: Get the basename of the path
#
# Examples
#
# Pathname.new('sub/dir/file.rb').basename
# # => 'file.rb'
#
# Returns a String.
def basename
File.basename(@path)
end
# Public: Get the extname of the path
#
# Examples
#
# Pathname.new('.rb').extname
# # => '.rb'
#
# Pathname.new('file.rb').extname
# # => '.rb'
#
# Returns a String.
def extname
File.extname(@path)
end
# Public: Get the language of the path
#
# The path extension name is the only heuristic used to detect the
# language name.
#
# Examples
#
# Pathname.new('file.rb').language
# # => Language['Ruby']
#
# Returns a Language or nil if none was found.
def language
@language ||= Language.find_by_filename(@path)
end
# Internal: Get the lexer of the path
#
# Returns a Lexer.
def lexer
language ? language.lexer : Pygments::Lexer.find_by_name('Text only')
end
# Public: Get the mime type
#
# Examples
#
# Pathname.new('index.html').mime_type
# # => 'text/html'
#
# Returns a mime type String.
def mime_type
@mime_type ||= Mime.mime_for(extname)
end
# Public: Return self as String
#
# Returns a String
def to_s
@path.dup
end
def eql?(other)
other.is_a?(self.class) && @path == other.to_s
end
alias_method :==, :eql?
end
end
Loading
Loading
@@ -8,6 +8,8 @@
- C#
- C++
- CSS
- Clojure
- CoffeeScript
- Common Lisp
- Diff
- Emacs Lisp
Loading
Loading
@@ -25,5 +27,3 @@
- SQL
- Scala
- Scheme
- TeX
- XML
Loading
Loading
@@ -67,20 +67,20 @@ module Linguist
return if @computed_stats
 
@enum.each do |blob|
# Skip binary file extensions
next if blob.binary_mime_type?
# Skip files that are likely binary
next if blob.likely_binary?
 
# Skip vendored or generated blobs
next if blob.vendored? || blob.generated? || blob.language.nil?
 
# Only include programming languages
if blob.language.type == :programming
# Only include programming languages and acceptable markup languages
if blob.language.type == :programming || Language.detectable_markup.include?(blob.language.name)
@sizes[blob.language.group] += blob.size
end
end
 
# Compute total size
@size = @sizes.inject(0) { |s,(k,v)| s + v }
@size = @sizes.inject(0) { |s,(_,v)| s + v }
 
# Get primary language
if primary = @sizes.max_by { |(_, size)| size }
Loading
Loading
Source diff could not be displayed: it is too large. Options to address this: view the blob.
require 'yaml'
require 'linguist/md5'
require 'linguist/classifier'
module Linguist
# Model for accessing classifier training data.
module Samples
# Path to samples root directory
ROOT = File.expand_path("../../../samples", __FILE__)
# Path for serialized samples db
PATH = File.expand_path('../samples.json', __FILE__)
# Hash of serialized samples object
if File.exist?(PATH)
DATA = YAML.load_file(PATH)
end
# Public: Iterate over each sample.
#
# &block - Yields Sample to block
#
# Returns nothing.
def self.each(&block)
Dir.entries(ROOT).each do |category|
next if category == '.' || category == '..'
# Skip text and binary for now
# Possibly reconsider this later
next if category == 'Text' || category == 'Binary'
dirname = File.join(ROOT, category)
Dir.entries(dirname).each do |filename|
next if filename == '.' || filename == '..'
if filename == 'filenames'
Dir.entries(File.join(dirname, filename)).each do |subfilename|
next if subfilename == '.' || subfilename == '..'
yield({
:path => File.join(dirname, filename, subfilename),
:language => category,
:filename => subfilename
})
end
else
if File.extname(filename) == ""
raise "#{File.join(dirname, filename)} is missing an extension, maybe it belongs in filenames/ subdir"
end
yield({
:path => File.join(dirname, filename),
:language => category,
:extname => File.extname(filename)
})
end
end
end
nil
end
# Public: Build Classifier from all samples.
#
# Returns trained Classifier.
def self.data
db = {}
db['extnames'] = {}
db['filenames'] = {}
each do |sample|
language_name = sample[:language]
if sample[:extname]
db['extnames'][language_name] ||= []
if !db['extnames'][language_name].include?(sample[:extname])
db['extnames'][language_name] << sample[:extname]
db['extnames'][language_name].sort!
end
end
if sample[:filename]
db['filenames'][language_name] ||= []
db['filenames'][language_name] << sample[:filename]
db['filenames'][language_name].sort!
end
data = File.read(sample[:path])
Classifier.train!(db, language_name, data)
end
db['md5'] = Linguist::MD5.hexdigest(db)
db
end
end
end
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment