What is metadata?
Metadata is a type of data that describes and provides information about other data. Essentially, metadata is data about data.
TYPES OF METADATA
There are many types of metadata and many ways it can be organized. To help with metadata management, numerous international standards have been created to help structure and guide the collection of metadata. We've reviewed metadata standards and used our experience with working with data to create an organized list of common metadata types. This list is meant to be a practical starting point for structuring and collecting metadata -- we encourage you to look further into metadata standards and think critically about what would be most useful for your specific project.
Industry description/NAICS code(s)
Update frequency/latest update date
DATA QUALITY INFORMATION
Accuracy information/description (caveats, confidence intervals, gaps)
Lineage (raw administrative data or estimates/calculations; if calculated, include formula, algorithm, process, etc.)
SPATIAL REFERENCE INFORMATION
Indirect spatial reference (municipalities, census subdivision, fish plant locations, etc.)
Geographic format (points, polygons, lines)
Label (i.e. column and/or row headings)
Data type (text, numeric, date, currency, etc.)
This is not an exhaustive list of metadata types and categories, however, this list provides a good starting place when beginning to document metadata and demonstrates the type of information that is useful to capture. Similarly, not all metadata will be available for every dataset or variable and that’s okay. Documenting the gaps in your metadata can be just as useful and informative as documenting the available metadata.
Example dataset with metadata
The benefits of metadata
Metadata helps establish trust and transparency in data and analyses. When all information about a dataset is available, community members are empowered to discuss and debate the data and analysis.
Documenting the data discovery and collection process through a standardized metadata process will help both current and future users reference, find, and collect the same data.
Metadata plays an important role in data validation, study replication, and quality control.
Documenting data attributes as metadata creates a simplified, comprehensive reference guide for your data.
A metadata file is a great place to document any changes that you have made to the dataset after it was downloaded from the source. Metadata can also include notes on any errors found in the data.
When collecting publicly available data, it is important that you also collect and save the corresponding metadata. For example, when downloading a dataset from Statistics Canada, the download file often includes both the dataset file and the metadata file.
When requesting custom data you should also request the related metadata.
If a metadata file is unavailable for a dataset you're working with, you can create your own metadata file by documenting identification, data quality, variable, and geographical information. Documenting metadata doesn't have to be an onerous process, carefully gather the metadata you can find (it may be helpful to refer to the list above) and enter it into a spreadsheet. Once you have a template created you can reuse it and adjust it for future datasets.
This article is sponsored in part by the Future Skills Centre.