License audit tooling for Fedora packages

This page describes some of the tools that have been used to audit licensing of packages in Fedora Linux.

Packaging tools

The following tools are specifically designed for use with Fedora Linux packages. They use the Fedora License Data as a source of data on valid licenses.

rpmlint

rpmlint is the standard tool used for evaluating Fedora Linux packages for well-known issues for packagers to fix. In the context of licensing, rpmlint evaluates the License: field in the spec file and ensures the values conform to the known set of allowed licenses.

This is packaged in Fedora Linux as rpmlint.

rpminspect

rpminspect is the tool used to evaluate Fedora Linux packages for policy compliance, differences as compared to previous builds, and common packaging errors as they are built in the Fedora Build System. In the context of licensing, rpminspect evaluates the License: field in RPMs and ensures the values conform to the known set of allowed licenses.

This is packaged in Fedora Linux as rpminspect. To use it, you need both rpminspect and rpminspect-data-fedora.

License and source inspection tools

The following tools have been used by Fedora Project contributors to analyze the licensing of current and proposed Fedora Linux packages. All of these tools are distribution-agnostic.

Licensecheck

Licensecheck is a tool used to analyze the licensing of source files. This tool is principally used in the Fedora context for the initial package review for packages proposed for inclusion in Fedora Linux. Licensecheck is run automatically as part of FedoraReview.

By default, licensecheck provides license reports with full license names, but can be used to produce output using any number of license identifier schemes.

This is packaged in Fedora Linux as licensecheck.

SPDX-license-diff

SPDX-license-diff is a Firefox and Chromium/Chrome plugin that takes license text you highlight on a web page and attempts to find close matches to license identifiers or exception identifiers on the SPDX License List. If a match to an SPDX identifier is presented as less than 100%, SPDX-license-diff will display differences between your highlighted text and SPDX’s plain text rendition of the identifier.

SPDX-license-diff will obviously be inconvenient if there is no web interface to the upstream source repository of your package, or your workflow does not involve use of a web browser.

Another limitation of SPDX-license-diff is that it does not fully implement the SPDX matching guidelines. As a result, SPDX-license-diff will typically show textual differences in cases where the highlighted text actually is a match to the SPDX identifier. In cases of close matches, it is generally useful and often necessary to check the XML file for the SPDX identifier in the SPDX license-list-XML repository. For example, many SPDX identifier XML files make use of regular expressions. Bear in mind that the SPDX matching guidelines include rules which are not necessarily reflected in these XML files.

If SPDX-license-diff identifies a license or exception text as a match to an SPDX identifier, you can then use the SPDX identifier to search in the allowed and not-allowed license lists for Fedora.

SPDX Check License

SPDX Check License is a web application (source code) that displays SPDX License List matches to a license or exception text pasted into a text box. As with SPDX-license-diff, the tool does not fully implement the SPDX matching guidelines. This tool may take more time to give an answer than SPDX-license-diff. It will say whether there is a match, or a close match, to an identifier, but it doesn’t display a diff.

askalano

Askalono, packaged in Fedora as askalono-cli, is a simple license scanning tool written in Rust. It is most useful for quick analysis of packages coming out of ecosystems featuring projects known to have (1) highly standardized approaches to layout of license information (it specifically looks only for files that are named LICENSE or COPYING or some obvious variant on those), (2) generally simple license makeup, and (3) cultural preferences for a highly limited set of licenses (for example, Rust crates that don’t bundle legacy C code, Go modules, or Node.js npm packages).

Askalono has some significant shortcomings. It can’t recognize or understand: (1) license notices/license texts that are comments in source files, (2) license notices/license texts in README files, (3) license files that contain multiple license texts (or it will only recognize the first of them), and (4) nonstandard/archaic/legacy licenses (which covers most of the licenses being reviewed in issues in fedora-license-data)

FOSSology

FOSSology is a license compliance software system and tooklit that includes license scanning. The information here focuses on that aspect of the toolkit. It can be run locally and also can be set up as a hosted service. See Get Started for ways to install and a link to a test instance that anyone can use.

FOSSology is good for scanning an entire package for licenses or text that looks like licenses. Files can be viewed easily in the FOSSology interface. FOSSology has the ability to remember past license inspection decisions.

Tips on using FOSSology: * In options: #5 - check "Ignore SCM files"; #7 - check Monk, Nomos, Ojo License Analysis and Package Analysis; #8 - check first two options re: "Scanners matches…​" * Go to License Browswer view. Look for license matches that are suspicious or unexpected, such as things that are not an SPDX identifier or ambiguous. You can then view the files with those matches and inspect what was found to determine if there is a license that needs to be recorded or if it is a false match. Basic Workflow has some helpful information.

FOSSology is not packaged in Fedora.

ScanCode toolkit

ScanCode is a command-line Python tool for detecting licenses and related information in source code. ScanCode output (available in a variety of formats including text, JSON and HTML) reports detected license information using both ScanCode’s own non-SPDX system of license keys and what ScanCode considers to be corresponding SPDX expressions. ScanCode does not fully apply the SPDX matching guidelines in making such determinations and does not appear to strictly produce SPDX-conformant expressions in output.

ScanCode is not packaged in Fedora. It can be installed using pip.