Machine-readable data

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Machine-readable data, or computer-readable data, is data (or metadata) in a format that can be easily processed by a computer. Machine-readable data must be structured data[1].

The OPEN Government Data Act, signed into law on January 14, 2019, defines machine-readable data as "data in a format that can be easily processed by a computer without human intervention while ensuring no semantic meaning is lost." The Act directs U.S. federal agencies to make data open by default,[2] ensuring that "any public data asset of the agency is machine-readable".[3]

There are two types of machine-readable data: human-readable data that is marked up so that it can also be read by machines (e.g. microformats, RDFa, HTML) and data file formats intended principally for processing by machines (CSV, RDF, XML, JSON). Again, these formats are only machine readable if the data contained within them is formally structured; exporting a CSV file from a badly structured spreadsheet does not make the data machine-readable.

Machine readable is not synonymous with digitally accessible. A digitally accessible document may be online, making it easier for humans to access via computers, but its content is much harder to extract, transform and process via computer programming logic if it is not machine-readable.[4]

eXtensible Markup Language (XML) is designed to be both human- and machine-readable, and Extensible Stylesheet Language Transformation (XSLT) is used to improve presentation of the data for human readability. For example, XSLT can be used to automatically render XML in PDF. Machine-readable data can be automatically transformed for human-readability but, generally speaking, the reverse is not true.

For purposes of implementation of the Government Performance and Results Act (GPRA) Modernization Act, the Office of Management and Budget (OMB) defines "machine readable" as follows: "Format in a standard computer language (not English text) that can be read automatically by a web browser or computer system. (e.g.; xml). Traditional word processing documents and portable document format (PDF) files are easily read by humans but typically are difficult for machines to interpret. Other formats such as extensible markup language (XML), (JSON), or spreadsheets with header columns that can be exported as comma separated values (CSV) are machine readable formats; as HTML is a structural markup language, discreetly labeling parts of the document, computers are able to gather document components to assemble tables of contents, outlines, literature search bibliographies, etc. It is possible to make traditional word processing documents and other formats machine readable but the documents must include enhanced structural elements."[5]

See also[edit]

References[edit]

  1. ^ "Machine readable". opendatahandbook.org. Retrieved 2019-07-22.
  2. ^ "HR4174". stratml.us.
  3. ^ "HR4174". stratml.us.
  4. ^ "A Primer on Machine Readability for Online Documents and Data". Data.gov. 2012-09-24. Retrieved 2015-02-27.
  5. ^ OMB Circular A-11, Part 6 Archived 2013-12-07 at the Wayback Machine, Preparation and Submission of Strategic Plans, Annual Performance Plans, and Annual Program Performance Reports