Package 'EFSATools' reference manual

Title:	EFSA Ensemble of Data Collections Tools
Description:	Provides tools for dataset operations and utilities designed to preserve data history within EFSA's ad hoc data collections. It also imports packages developed by EFSA that provide additional support for data collection activities.
Authors:	Lorenzo Copelli [aut] (ORCID: <https://orcid.org/0009-0002-4305-065X>), Luca Belmonte [aut, cre] (ORCID: <https://orcid.org/0000-0002-7977-9170>)
Maintainer:	Luca Belmonte <[email protected]>
License:	EUPL-1.2
Version:	1.0.0
Built:	2026-05-13 09:45:03 UTC
Source:	https://github.com/openefsa/efsatools

Drop empty lines and columns from the specified data frame.

Description

This function drops all the empty lines and columns from the specified data frame, i.e. all the rows and columns that contain only NAs.

Usage

dropEmpty(dataframe)
dropEmpty(dataframe)

Arguments

dataframe

data.frame. The data frame from which to remove the empty lines and columns.

Value

The provided data frame without empty lines and columns and all the types transformed to string.

Examples

# The first row is going to be dropped.
irisTest_ <- iris
irisTest_[1, ] <- NA
irisTestDropped <- dropEmpty(irisTest_)

# The Species column is going to be dropped.
irisTest_ <- iris
irisTest_$Test <- NA
irisTestDropped <- dropEmpty(irisTest_)

# The first row is going to be dropped.
irisTest_ <- iris
irisTest_[1, ] <- NA
irisTestDropped <- dropEmpty(irisTest_)

# The Species column is going to be dropped.
irisTest_ <- iris
irisTest_$Test <- NA
irisTestDropped <- dropEmpty(irisTest_)

Enrich a data frame with an EFSA catalogue.

Description

This function takes a data frame and joins it with an EFSA catalog. The EFSA catalog must be itself a data frame.

Usage

enrich(dataframe, catalogue, joinBy, enrichedColumnName)
enrich(dataframe, catalogue, joinBy, enrichedColumnName)

Arguments

dataframe

data.frame. The data frame to be enriched.

catalogue

data.frame. The data frame that contains the EFSA catalogue to be used for the enrichment. It must contain at least two columns, namely: NAME and CODE.

joinBy

character (string). The variable to be used as the join key.

enrichedColumnName

character (string). The name of the column added to the original data.

Value

The specified data frame enriched with the catalogue data.

Examples

dataframe_ <- iris |> dplyr::rename(CODE = Species)

catalogue_ <- iris |>
  dplyr::rename(CODE = Species) |>
  dplyr::mutate(NAME = "test") |>
  dplyr::select(CODE, NAME) |>
  unique()

enriched_ <- enrich(
  dataframe = dataframe_,
  catalogue = catalogue_,
  joinBy = "CODE",
  enrichedColumnName = "enrichedColumn")

dataframe_ <- iris |> dplyr::rename(CODE = Species)

catalogue_ <- iris |>
  dplyr::rename(CODE = Species) |>
  dplyr::mutate(NAME = "test") |>
  dplyr::select(CODE, NAME) |>
  unique()

enriched_ <- enrich(
  dataframe = dataframe_,
  catalogue = catalogue_,
  joinBy = "CODE",
  enrichedColumnName = "enrichedColumn")

Drop and merge replicated columns from the specified data frame.

Description

This function drops and merges all the replicated columns from the specified data frame.

Usage

removeReplicatedColumns(dataframe, prefix)
removeReplicatedColumns(dataframe, prefix)

Arguments

dataframe

data.frame. The data frame from which to drop the replicated columns.

prefix

character (string). The prefix with which the name of the replicated columns starts.

Details

All the occurrences of "N/A", "NA", and empty strings (case insensitive) inside the provided data frame are replaced with NAs of type character. Then, all and only the columns starting with the specified prefix are selected and united into a single column with name ending per "_deduplicated". All empty entries in the new deduplicated column are replaced with NAs. Finally, the new column is bound with the other columns of the initial dataframe.

Value

The specified data frame with an additional deduplicated column and all the types transformed to string.

Examples

irisTest_ <- iris
irisTest_$Species_1 <- irisTest_$Species
irisTest_$Species_2 <- irisTest_$Species
irisTest_$Species <- NULL

deduplicatedDataframe_ <- removeReplicatedColumns(
  dataframe = irisTest_,
  prefix = "Species_")

irisTest_ <- iris
irisTest_$Species_1 <- irisTest_$Species
irisTest_$Species_2 <- irisTest_$Species
irisTest_$Species <- NULL

deduplicatedDataframe_ <- removeReplicatedColumns(
  dataframe = irisTest_,
  prefix = "Species_")

Implement a Slowly Changing Dimension Type 2.

Description

This function implements a Slowly Changing Dimension Type 2 to merge new and current data while maintaining historical records. The function deactivates the old records and activates new ones, ensuring a history-preserving update strategy. Only the changing records are marked as not active and replaced by new active ones.

Usage

SCD2(newData, currentData, key = names(newData))
SCD2(newData, currentData, key = names(newData))

Arguments

newData

data.frame. The data frame containing new records.

currentData

data.frame. The data frame containing existing records.

key

character (vector). The columns to be used as key.

Details

The function:

Separates active and inactive records from the current data.
Gets the old records that are still present in the new data (i.e., the ones that can remain active).
Gets the records present in new data but not present in still active current data (i.e., the records to activate) and activates them.
Gets the current active records that are not present in the new data (i.e., the records to deactivate) and deactivates them.

Value

A combined data frame with old data marked as not active and new data marked as active.

Examples

currentData_ <- tibble::tribble(
  ~id, ~colA, ~colB, ~colC, ~IS_ACTIVE, ~START_DATE, ~END_DATE,
  1, "a1", "b1", "c1", TRUE, Sys.time(), as.Date(NA),
  2, "a2", "b2", "c2", TRUE, Sys.time(), as.Date(NA),
  3, "a3", "b3", "c3", TRUE, Sys.time(), as.Date(NA))

newData_ <- tibble::tribble(
  ~id, ~colA, ~colB, ~colC,
  1, "a1", "b1", "c1",
  2, "a2", "b2", "c20",
  3, "a4", "b4", "c4")

mergedData <- SCD2(newData = newData_, currentData = currentData_)

currentData_ <- tibble::tribble(
  ~id, ~colA, ~colB, ~colC, ~IS_ACTIVE, ~START_DATE, ~END_DATE,
  1, "a1", "b1", "c1", TRUE, Sys.time(), as.Date(NA),
  2, "a2", "b2", "c2", TRUE, Sys.time(), as.Date(NA),
  3, "a3", "b3", "c3", TRUE, Sys.time(), as.Date(NA))

newData_ <- tibble::tribble(
  ~id, ~colA, ~colB, ~colC,
  1, "a1", "b1", "c1",
  2, "a2", "b2", "c20",
  3, "a4", "b4", "c4")

mergedData <- SCD2(newData = newData_, currentData = currentData_)

Implement a "Simple" Slowly Changing Dimension Type 2.

Description

This function implements a Simplified version of Slowly Changing Dimension Type 2 to merge new and current data while maintaining historical records. The function deactivates all the old records and activates new ones, ensuring a history-preserving update strategy. The difference between a standard SCD2 is that this simplified version applies no checks on the data, deactivating all the old records and activating the new ones, even if some of the old records are still active.

Usage

SSCD2(newData, currentData)
SSCD2(newData, currentData)

Arguments

newData

data.frame. The data frame containing new records.

currentData

data.frame. The data frame containing existing records.

Value

A combined data frame with all old data marked as not active and new data marked as active.

Examples

currentData_ <- tibble::tribble(
  ~id, ~colA, ~colB, ~colC, ~IS_ACTIVE, ~START_DATE, ~END_DATE,
  1, "a1", "b1", "c1", TRUE, Sys.time(), as.Date(NA),
  2, "a2", "b2", "c2", TRUE, Sys.time(), as.Date(NA),
  3, "a3", "b3", "c3", TRUE, Sys.time(), as.Date(NA))

newData_ <- tibble::tribble(
  ~id, ~colA, ~colB, ~colC,
  1, "a1", "b1", "c1",
  2, "a2", "b2", "c20",
  3, "a4", "b4", "c4")

mergedData <- SSCD2(newData = newData_, currentData = currentData_)

currentData_ <- tibble::tribble(
  ~id, ~colA, ~colB, ~colC, ~IS_ACTIVE, ~START_DATE, ~END_DATE,
  1, "a1", "b1", "c1", TRUE, Sys.time(), as.Date(NA),
  2, "a2", "b2", "c2", TRUE, Sys.time(), as.Date(NA),
  3, "a3", "b3", "c3", TRUE, Sys.time(), as.Date(NA))

newData_ <- tibble::tribble(
  ~id, ~colA, ~colB, ~colC,
  1, "a1", "b1", "c1",
  2, "a2", "b2", "c20",
  3, "a4", "b4", "c4")

mergedData <- SSCD2(newData = newData_, currentData = currentData_)

Package 'EFSATools'

Help Index

Drop empty lines and columns from the specified data frame.

Description

Usage

Arguments

Value

Examples

Enrich a data frame with an EFSA catalogue.

Description

Usage

Arguments

Value

Examples

Drop and merge replicated columns from the specified data frame.

Description

Usage

Arguments

Details

Value

Examples

Implement a Slowly Changing Dimension Type 2.

Description

Usage

Arguments

Details

Value

Examples

Implement a "Simple" Slowly Changing Dimension Type 2.

Description

Usage

Arguments

Value

Examples