library("steward")
The goal of this package is to make it a little easier to build and publish data-dictionaries, in particular:
In this article, we will look at a few (what we think are) common tasks:
For this set of examples, we use the ggplot2 diamonds
dataset. We will create an stw_dataset
, which is a data frame with an extra class attached to help manage metadata. An stw_dataset
is a data frame, in the same way that a tibble is a data frame.
In the code that follows we have three steps:
stw_dataset()
using a data frame.stw_mutate_meta()
.stw_mutate_dict()
.The functions stw_mutate_meta()
and stw_mutate_dict()
work a little bit like dplyr::mutate()
. However, instead of mutating the columns of a data frame, stw_mutate_meta()
mutates the top-level metadata, and stw_mutate_dict()
mutates the descriptions of columns in a data frame:
diamonds_new <-
stw_dataset(ggplot2::diamonds) %>%
stw_mutate_meta(
name = "diamonds",
title = "Prices of 50,000 round cut diamonds",
description = "A dataset containing the prices and other attributes of almost 54,000 diamonds.",
sources = list(
list(
title = "DiamondSearchEngine",
path = "http://www.diamondse.info/"
)
)
) %>%
stw_mutate_dict(
price = "price in US dollars ($326--$18,823)",
carat = "weight of diamond (0.2--5.01)",
cut = "quality of the cut (Fair, Good, Very Good, Premium, Ideal)",
color = "diamond color, from D (best) to J (worst)",
clarity = "a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))",
x = "length in mm (0--10.74)",
y = "width in mm (0--58.9)",
z = "depth in mm (0--31.8)",
depth = "total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79)",
table = "width of top of diamond relative to widest point (43--95)"
)
We also offer a function to make some basic checks on the completeness of the metadata:
stw_check(diamonds_new, verbosity = "all")
✔ Dictionary names are unique.
✔ Dictionary names are all non-trivial.
✔ Dictionary descriptions are all non-trivial.
✔ Dictionary types are all recognized.
✔ Metadata has all required fields.
✔ Metadata sources valid.
✔ Metadata has all optional fields.
If you are including a dataset in a package, it is good practice to document it; in fact, CRAN insists!
You can use an stw_dataset
to keep your documentation associated with your dataset as you build it for your package. By keeping all the “stuff” together, this helps assure that the dataset documentation is complete and current.
You can use the function stw_use_data()
wraps usethis::use_data()
and steward::stw_write_roxygen()
. For example, by running:
stw_use_data(diamonds_new)
You will:
diamonds_new
dataset to your package.R/data-diamonds_new.R
.If you are creating an RMarkdown document, you can use the stw_to_table()
function to create a gt table:
stw_to_table(diamonds_new)
DIAMONDS | |||
---|---|---|---|
Prices of 50,000 round cut diamonds | |||
Name | Type | Description | Levels |
carat | number | weight of diamond (0.2--5.01) |
|
cut | string | quality of the cut (Fair, Good, Very Good, Premium, Ideal) |
Fair, Good, Very Good, Premium, Ideal |
color | string | diamond color, from D (best) to J (worst) |
D, E, F, G, H, I, J |
clarity | string | a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best)) |
I1, SI2, SI1, VS2, VS1, VVS2, VVS1, IF |
depth | number | total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79) |
|
table | number | width of top of diamond relative to widest point (43--95) |
|
price | integer | price in US dollars ($326--$18,823) |
|
x | number | length in mm (0--10.74) |
|
y | number | width in mm (0--58.9) |
|
z | number | depth in mm (0--31.8) |
|
Sources: DiamondSearchEngine |