steward

library("steward")

The goal of this package is to make it a little easier to build and publish data-dictionaries, in particular:

for package documentation.
to produce nicely-formatted data-dictionaries in RMarkdown documents.

In this article, we will look at a few (what we think are) common tasks:

create dataset
write dataset to package-documentation
write dataset to gt table

Create

For this set of examples, we use the ggplot2 diamonds dataset. We will create an stw_dataset, which is a data frame with an extra class attached to help manage metadata. An stw_dataset is a data frame, in the same way that a tibble is a data frame.

In the code that follows we have three steps:

Create the stw_dataset() using a data frame.
Add the top-level metadata using stw_mutate_meta().
Add the column-level metadata using stw_mutate_dict().

The functions stw_mutate_meta() and stw_mutate_dict() work a little bit like dplyr::mutate(). However, instead of mutating the columns of a data frame, stw_mutate_meta() mutates the top-level metadata, and stw_mutate_dict() mutates the descriptions of columns in a data frame:

diamonds_new <-
  stw_dataset(ggplot2::diamonds) %>%
  stw_mutate_meta(
    name = "diamonds",
    title = "Prices of 50,000 round cut diamonds",
    description = "A dataset containing the prices and other attributes of almost 54,000 diamonds.",
    sources = list(
      list(
        title = "DiamondSearchEngine",
        path = "http://www.diamondse.info/"
      )
    )
  ) %>%
  stw_mutate_dict(
    price = "price in US dollars ($326--$18,823)",
    carat = "weight of diamond (0.2--5.01)",
    cut = "quality of the cut (Fair, Good, Very Good, Premium, Ideal)",
    color = "diamond color, from D (best) to J (worst)",
    clarity = "a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))",
    x = "length in mm (0--10.74)",
    y = "width in mm (0--58.9)",
    z = "depth in mm (0--31.8)",
    depth = "total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79)",
    table = "width of top of diamond relative to widest point (43--95)"
  )

We also offer a function to make some basic checks on the completeness of the metadata:

stw_check(diamonds_new, verbosity = "all")

✔ Dictionary names are unique.

✔ Dictionary names are all non-trivial.

✔ Dictionary descriptions are all non-trivial.

✔ Dictionary types are all recognized.

✔ Metadata has all required fields.

✔ Metadata sources valid.

✔ Metadata has all optional fields.

Write to package

If you are including a dataset in a package, it is good practice to document it; in fact, CRAN insists!

You can use an stw_dataset to keep your documentation associated with your dataset as you build it for your package. By keeping all the “stuff” together, this helps assure that the dataset documentation is complete and current.

You can use the function stw_use_data() wraps usethis::use_data() and steward::stw_write_roxygen(). For example, by running:

stw_use_data(diamonds_new)

You will:

validate that the metadata is complete.
write the diamonds_new dataset to your package.
write the roxygen documentation to R/data-diamonds_new.R.

Create gt table

If you are creating an RMarkdown document, you can use the stw_to_table() function to create a gt table:

stw_to_table(diamonds_new)

DIAMONDS
Prices of 50,000 round cut diamonds
Name	Type	Description	Levels
carat	number	weight of diamond (0.2--5.01)
cut	string	quality of the cut (Fair, Good, Very Good, Premium, Ideal)	Fair, Good, Very Good, Premium, Ideal
color	string	diamond color, from D (best) to J (worst)	D, E, F, G, H, I, J
clarity	string	a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))	I1, SI2, SI1, VS2, VS1, VVS2, VVS1, IF
depth	number	total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79)
table	number	width of top of diamond relative to widest point (43--95)
price	integer	price in US dollars ($326--$18,823)
x	number	length in mm (0--10.74)
y	number	width in mm (0--58.9)
z	number	depth in mm (0--31.8)
Sources: DiamondSearchEngine

Create

Write to package

Create gt table

Contents