Manipulação de Dados

dplyr, readr, tidyr, stringr

Jose Storopoli https://scholar.google.com/citations?user=xGU7H1QAAAAJ&hl=en (UNINOVE)https://www.uninove.br
April 7, 2021

Tidy Data

Figure 1: Tidy Data

O que é Tidy Data?

O pipe %>%

x %>% f(y) vira f(x, y)

x %>% f(y) %>% f(z) vira f(f(x, y), z)

library(magrittr)

# atalho é CTRL + SHIFT + M

c(0:10, NA) %>%
  mean(na.rm = TRUE) %>% 
  print() %>% 
  message() %>% 
  message()
[1] 5

Como ler dados com o {readr}

Vamos começar com o primeiro passo da análise de dados: a importação dos dados.

Para isso o {tidyverse} possui um pacote chamado {readr}.

Na pasta datasets/ temos diversos datasets interessantes:

Se vocês quiserem ler arquivos .xlsx ou .xls usem o pacote {readxl}

col_types – O argumento que eu mais uso em read_*()

library(readr)
adult <- read_csv("datasets/adult.csv",
                  col_types = "_iffifffif")
countries <- read_csv("datasets/countries_of_the_world.csv", 
    col_types = cols(Population = col_integer(), 
        `Net migration` = col_double()), 
    locale = locale(decimal_mark = ","))
covid <- read_csv("datasets/covid_19_data.csv", 
    col_types = cols(SNo = col_skip(), ObservationDate = col_date(format = "%m/%d/%Y")))

Manipulação de dados com o {dplyr}

dplyr

Figure 2: dplyr

Selecionar Variáveis – dplyr::select()

library(dplyr)
adult_clean <- adult %>% 
  select(age, workclass, education,
         education_num = `educational-num`,
         marital_status = `marital-status`,
         race, gender,
         hours_per_week = `hours-per-week`,
         income)

OBS: Tem a função rename_with do {dplyr} versão 1.0.

adult %>% 
  rename_with(~gsub("-", "_", .x))
# A tibble: 48,842 x 9
     age workclass    education  educational_num marital_status  race 
   <int> <fct>        <fct>                <int> <fct>           <fct>
 1    25 Private      11th                     7 Never-married   Black
 2    38 Private      HS-grad                  9 Married-civ-sp… White
 3    28 Local-gov    Assoc-acdm              12 Married-civ-sp… White
 4    44 Private      Some-coll…              10 Married-civ-sp… Black
 5    18 ?            Some-coll…              10 Never-married   White
 6    34 Private      10th                     6 Never-married   White
 7    29 ?            HS-grad                  9 Never-married   Black
 8    63 Self-emp-no… Prof-scho…              15 Married-civ-sp… White
 9    24 Private      Some-coll…              10 Never-married   White
10    55 Private      7th-8th                  4 Married-civ-sp… White
# … with 48,832 more rows, and 3 more variables: gender <fct>,
#   hours_per_week <int>, income <fct>
adult %>% 
  rename_with(~gsub("-", "_", .x)) %>% 
  select(where(is.factor)) %>% 
  select(-workclass) %>% 
  rename_all(~paste0("antigo_", .x))
# A tibble: 48,842 x 5
   antigo_education antigo_marital_status antigo_race antigo_gender
   <fct>            <fct>                 <fct>       <fct>        
 1 11th             Never-married         Black       Male         
 2 HS-grad          Married-civ-spouse    White       Male         
 3 Assoc-acdm       Married-civ-spouse    White       Male         
 4 Some-college     Married-civ-spouse    Black       Male         
 5 Some-college     Never-married         White       Female       
 6 10th             Never-married         White       Male         
 7 HS-grad          Never-married         Black       Male         
 8 Prof-school      Married-civ-spouse    White       Male         
 9 Some-college     Never-married         White       Female       
10 7th-8th          Married-civ-spouse    White       Male         
# … with 48,832 more rows, and 1 more variable: antigo_income <fct>

Professor eu gosto de camelCase e agora?

Não tema, tem o pacote {janitor}

library(janitor)
adult %>% clean_names(case = "lower_camel")
# A tibble: 48,842 x 9
     age workclass education educationalNum maritalStatus race  gender
   <int> <fct>     <fct>              <int> <fct>         <fct> <fct> 
 1    25 Private   11th                   7 Never-married Black Male  
 2    38 Private   HS-grad                9 Married-civ-… White Male  
 3    28 Local-gov Assoc-ac…             12 Married-civ-… White Male  
 4    44 Private   Some-col…             10 Married-civ-… Black Male  
 5    18 ?         Some-col…             10 Never-married White Female
 6    34 Private   10th                   6 Never-married White Male  
 7    29 ?         HS-grad                9 Never-married Black Male  
 8    63 Self-emp… Prof-sch…             15 Married-civ-… White Male  
 9    24 Private   Some-col…             10 Never-married White Female
10    55 Private   7th-8th                4 Married-civ-… White Male  
# … with 48,832 more rows, and 2 more variables: hoursPerWeek <int>,
#   income <fct>

Ordenar variáveis com dplyr::arrange()

adult_clean %>% 
  arrange(-age, education_num) %>% 
  select(age, education_num)
# A tibble: 48,842 x 2
     age education_num
   <int>         <int>
 1    90             2
 2    90             4
 3    90             4
 4    90             4
 5    90             5
 6    90             6
 7    90             6
 8    90             7
 9    90             7
10    90             9
# … with 48,832 more rows

Frequencias com dplyr::count()

OBS: vamos ver muito essa função quando falarmos de group_by()

adult_clean %>% 
  count(age, income, sort = TRUE)
# A tibble: 142 x 3
     age income     n
   <int> <fct>  <int>
 1    23 <=50K   1307
 2    24 <=50K   1162
 3    22 <=50K   1161
 4    25 <=50K   1119
 5    27 <=50K   1117
 6    20 <=50K   1112
 7    28 <=50K   1101
 8    21 <=50K   1090
 9    26 <=50K   1068
10    19 <=50K   1050
# … with 132 more rows

Manipular Variáveis – dplyr::mutate()

Odeio potência de 10

use options(scipen = 999, digits = 2)

options(scipen = 999, digits = 2)
countries <- countries %>% clean_names()

countries %>% 
  mutate(
    log_pop = log(population),
    area_sq_km = area_sq_mi * 2.5899985,
    pop_density_per_sq_km = population / area_sq_km)
# A tibble: 227 x 23
   country      region        population area_sq_mi pop_density_per_s…
   <chr>        <chr>              <int>      <dbl>              <dbl>
 1 Afghanistan  ASIA (EX. NE…   31056997     647500               48  
 2 Albania      EASTERN EURO…    3581655      28748              125. 
 3 Algeria      NORTHERN AFR…   32930091    2381740               13.8
 4 American Sa… OCEANIA            57794        199              290. 
 5 Andorra      WESTERN EURO…      71201        468              152. 
 6 Angola       SUB-SAHARAN …   12127071    1246700                9.7
 7 Anguilla     LATIN AMER. …      13477        102              132. 
 8 Antigua & B… LATIN AMER. …      69108        443              156  
 9 Argentina    LATIN AMER. …   39921833    2766890               14.4
10 Armenia      C.W. OF IND.…    2976372      29800               99.9
# … with 217 more rows, and 18 more variables:
#   coastline_coast_area_ratio <dbl>, net_migration <dbl>,
#   infant_mortality_per_1000_births <dbl>, gdp_per_capita <dbl>,
#   literacy_percent <dbl>, phones_per_1000 <dbl>,
#   arable_percent <dbl>, crops_percent <dbl>, other_percent <dbl>,
#   climate <dbl>, birthrate <dbl>, deathrate <dbl>,
#   agriculture <dbl>, industry <dbl>, service <dbl>, log_pop <dbl>,
#   area_sq_km <dbl>, pop_density_per_sq_km <dbl>

O mutate ele altera variáveis in-place ou adiciona novas variáveis preservando as existentes. Mas temos também o transmute adiciona novas variáveis e dropa todas as demais.

countries %>% 
  transmute(
    log_pop = log(population),
    area_sq_km = area_sq_mi * 2.5899985,
    pop_density_per_sq_km = population / area_sq_km)
# A tibble: 227 x 3
   log_pop area_sq_km pop_density_per_sq_km
     <dbl>      <dbl>                 <dbl>
 1   17.3    1677024.                 18.5 
 2   15.1      74457.                 48.1 
 3   17.3    6168703.                  5.34
 4   11.0        515.                112.  
 5   11.2       1212.                 58.7 
 6   16.3    3228951.                  3.76
 7    9.51       264.                 51.0 
 8   11.1       1147.                 60.2 
 9   17.5    7166241.                  5.57
10   14.9      77182.                 38.6 
# … with 217 more rows

Quase todos os verbos (como vocês viram lá em cima) do {dplyr} tem os sufixos _if, _all e _at. Por exemplo:

covid %>% 
  mutate_if(is.character, as.factor)
# A tibble: 236,017 x 7
   ObservationDate `Province/State` `Country/Region` `Last Update`  
   <date>          <fct>            <fct>            <fct>          
 1 2020-01-22      Anhui            Mainland China   1/22/2020 17:00
 2 2020-01-22      Beijing          Mainland China   1/22/2020 17:00
 3 2020-01-22      Chongqing        Mainland China   1/22/2020 17:00
 4 2020-01-22      Fujian           Mainland China   1/22/2020 17:00
 5 2020-01-22      Gansu            Mainland China   1/22/2020 17:00
 6 2020-01-22      Guangdong        Mainland China   1/22/2020 17:00
 7 2020-01-22      Guangxi          Mainland China   1/22/2020 17:00
 8 2020-01-22      Guizhou          Mainland China   1/22/2020 17:00
 9 2020-01-22      Hainan           Mainland China   1/22/2020 17:00
10 2020-01-22      Hebei            Mainland China   1/22/2020 17:00
# … with 236,007 more rows, and 3 more variables: Confirmed <dbl>,
#   Deaths <dbl>, Recovered <dbl>

dplyr::if_else e dplyr::case_when

Usamos o if_else quando queremos fazer um teste booleano e gerar um valor caso o teste seja verdadeiro e outro valor caso o teste seja falso. Basicamente um if ... else ...:

adult_clean %>% 
  mutate(
    race_black = if_else(race == "Black", 1L, 0L)
  ) %>% 
  select(starts_with("race"))
# A tibble: 48,842 x 2
   race  race_black
   <fct>      <int>
 1 Black          1
 2 White          0
 3 White          0
 4 Black          1
 5 White          0
 6 White          0
 7 Black          1
 8 White          0
 9 White          0
10 White          0
# … with 48,832 more rows

Temos algo um pouco mais flexível, poderoso; porém verboso. Esse é o dplyr::case_when:

adult_cat <- adult_clean %>% 
  mutate(
    marital_status_cat = case_when(
      marital_status == "Never-married" ~ 1L,
      marital_status == "Married-civ-spouse" ~ 2L,
      marital_status == "Married-spouse-absent" ~ 3L,
      marital_status == "Married-AF-spouse " ~ 4L,
      marital_status == "Separated" ~ 5L,
      marital_status == "Divorced" ~ 6L,
      marital_status == "Widowed" ~ 7L,
      TRUE ~ NA_integer_
    ),
    marital_age_group = case_when(
      marital_status_cat == 1 & age >=30 ~ "solteirx_convictx",
      marital_status_cat == 1 & age <=30 ~ "solteirx_jovem",
      marital_status_cat > 1 & marital_status_cat <= 4 & age >=30 ~ "adultos_casados",
      marital_status_cat > 1 & marital_status_cat <= 4 & age <=30 ~ "jovens_casados",
      TRUE ~ "divorciados, separados etc"
    )
  ) 
  adult_cat %>% 
    select(starts_with("marital")) %>%
    count(marital_age_group, sort = TRUE)
# A tibble: 5 x 2
  marital_age_group              n
  <chr>                      <int>
1 adultos_casados            20195
2 solteirx_jovem             10798
3 divorciados, separados etc  9718
4 solteirx_convictx           5319
5 jovens_casados              2812

Agrupar e Sumarizar Variáveis – dplyr::group_by() e dplyr::summarise()

Agrupamos dados com o dplyr::group_by() e depois usamos o dplyr::summarise() (também existe na versão inglês americano como dplyr::summarize()`) para computar valores dos grupos. Este tipo de análise é chamada comumente de split-apply-combine.

adult_cat %>% 
  group_by(marital_age_group) %>% 
  summarise(
    n = n(),
    n_prop = n / nrow(.)) %>% 
  arrange(-n)
# A tibble: 5 x 3
  marital_age_group              n n_prop
  <chr>                      <int>  <dbl>
1 adultos_casados            20195 0.413 
2 solteirx_jovem             10798 0.221 
3 divorciados, separados etc  9718 0.199 
4 solteirx_convictx           5319 0.109 
5 jovens_casados              2812 0.0576
covid %>% 
  janitor::clean_names() %>% 
  group_by(country_region) %>% 
  summarise(
    n = n(),
    media_confirmados = mean(confirmed),
    mediana_confirmados = median(confirmed),
    media_mortos = mean(deaths),
    mediana_mortos = median(deaths)
  ) %>% 
  arrange(-mediana_mortos)
# A tibble: 226 x 6
   country_region     n media_confirmad… mediana_confirm… media_mortos
   <chr>          <int>            <dbl>            <dbl>        <dbl>
 1 Iran             375          533357.          361150        25526.
 2 South Africa     360          564541.          627650        14975.
 3 Argentina        362          713468.          413080.       18151.
 4 Indonesia        363          334202.          172053        10592.
 5 Iraq             371          272190.          215784         6416.
 6 Ecuador          364          116916.          111680         8028.
 7 Turkey           354          717519.          275749         9999.
 8 Bolivia          354           98196.          119180.        5184.
 9 Egypt            380           80259.           97192.        4399.
10 Bangladesh       357          279329.          317528         4023.
# … with 216 more rows, and 1 more variable: mediana_mortos <dbl>

Eu posso agrupar por vários grupos por exemplo:

library(tidyr)
covid %>% 
  janitor::clean_names() %>% 
  group_by(country_region, province_state) %>% 
  drop_na() %>% 
  count(wt = deaths, sort = TRUE)
# A tibble: 760 x 3
# Groups:   country_region, province_state [760]
   country_region province_state        n
   <chr>          <chr>             <dbl>
 1 UK             England        13897698
 2 US             New York       10888511
 3 Brazil         Sao Paulo       9651617
 4 India          Maharashtra     9117030
 5 Italy          Lombardia       5743485
 6 US             California      5358608
 7 Brazil         Rio de Janeiro  5335936
 8 US             Texas           5072817
 9 US             New Jersey      5054435
10 US             Florida         4153905
# … with 750 more rows

Lembra que todos os verbos do {dplyr} tem o sufixo _all, _if e _at?

covid %>% 
  summarise_if(is.numeric, median)
# A tibble: 1 x 3
  Confirmed Deaths Recovered
      <dbl>  <dbl>     <dbl>
1      6695    127      1224

Não sei o futuro das coisas _if, _at e _all, pois o lifecycle está em superseded. Então se vocês quiserem um código robusto ao tempo usem o across:

covid %>% 
  summarise(across(where(is.numeric), ~median(.x, na.rm = TRUE)))
# A tibble: 1 x 3
  Confirmed Deaths Recovered
      <dbl>  <dbl>     <dbl>
1      6695    127      1224

Qual a diferença de grouped_df e tibble?

Se você estiver no mundo do {tidyverse} nenhuma, mas se você for dar um pipe %>% de um grouped_df em algo que não é do {tidyverse} e que somente aceita tibbles e data.frames você vai receber um erro. Nesses casos antes de “pipar” %>% você faz um ungroup():

adult_clean %>% 
  group_by(gender) %>% 
  class() %>% 
  print
[1] "grouped_df" "tbl_df"     "tbl"        "data.frame"
adult_clean %>% 
  group_by(gender) %>% 
  ungroup() %>% # <----- "desgrupando"
  class() %>% 
  print
[1] "tbl_df"     "tbl"        "data.frame"
covid <- covid %>% janitor::clean_names()
countries <- countries %>% janitor::clean_names()

Joins com dplyr::join*

Vamos para a cereja do bolo que é os famosos joins. {dplyr} tem os seguintes joins:

OBS: participação especial do {stringr}

library(stringr)
# antes de fazer o join vamos ver se vai dar certo
227 - sum(countries$country %in% covid$country_region)
[1] 42
covid %>%
  count(country_region, wt = confirmed, sort = TRUE) %>% 
  filter(str_detect(country_region, "China"))
# A tibble: 1 x 2
  country_region        n
  <chr>             <dbl>
1 Mainland China 32591323
countries %>%
  filter(str_detect(country, "China"))
# A tibble: 1 x 20
  country region             population area_sq_mi pop_density_per_sq…
  <chr>   <chr>                   <int>      <dbl>               <dbl>
1 China   ASIA (EX. NEAR EA… 1313973713    9596960                137.
# … with 15 more variables: coastline_coast_area_ratio <dbl>,
#   net_migration <dbl>, infant_mortality_per_1000_births <dbl>,
#   gdp_per_capita <dbl>, literacy_percent <dbl>,
#   phones_per_1000 <dbl>, arable_percent <dbl>, crops_percent <dbl>,
#   other_percent <dbl>, climate <dbl>, birthrate <dbl>,
#   deathrate <dbl>, agriculture <dbl>, industry <dbl>, service <dbl>
library(ggplot2)
covid %>% 
  mutate(
    country_region = str_replace(country_region, "Mainland China", "China")
  ) %>% 
  filter(observation_date == max(observation_date)) %>% 
  right_join(countries,
             by = c("country_region" = "country")) %>% 
  mutate(deaths_per_capita = deaths / population) %>% 
  ggplot(aes(x = gdp_per_capita, y = deaths_per_capita)) +
  geom_point() +
  geom_smooth(method = lm)

Mais transformações para formato Tidy Data com {tidyr}

O tidyr tem o famoso drop_na(). Então se vocês forem usar o drop_na() junto com o dplyr não esqueçam do library(tidyr).

OBS: vocês podem importar TODO o {tidyverse} de uma vez só com o library(tidyverse)

Em especial temos as funções pivot_longer() e pivot_wider():

library(tidyr)
relig_income %>% 
  pivot_longer(!religion,
               names_to = "income",
               values_to = "count") %>% 
  mutate(across(where(is.character), as.factor)) %>% 
  filter(!str_detect(income, "Don't know")) %>% 
  count(religion, income, wt = count, sort = TRUE)
# A tibble: 162 x 3
   religion         income       n
   <fct>            <fct>    <dbl>
 1 Evangelical Prot $50-75k   1486
 2 Catholic         $50-75k   1116
 3 Mainline Prot    $50-75k   1107
 4 Evangelical Prot $20-30k   1064
 5 Evangelical Prot $30-40k    982
 6 Catholic         $75-100k   949
 7 Evangelical Prot $75-100k   949
 8 Mainline Prot    $75-100k   939
 9 Evangelical Prot $40-50k    881
10 Evangelical Prot $10-20k    869
# … with 152 more rows
billboard %>% 
  pivot_longer(
    cols = starts_with("wk"),
    names_to = "week",
    values_to = "rank",
    values_drop_na = TRUE
  ) %>% 
  group_by(artist) %>% 
  summarise(
    n = n(),
    median_rank = median(rank)) %>% 
  arrange(-n, median_rank)
# A tibble: 228 x 3
   artist                  n median_rank
   <chr>               <int>       <dbl>
 1 Creed                 104        28.5
 2 Lonestar               95        38  
 3 Destiny's Child        92        13  
 4 N'Sync                 74        12  
 5 Sisqo                  74        25.5
 6 3 Doors Down           73        42  
 7 Jay-Z                  73        45  
 8 Aguilera, Christina    67        17  
 9 Hill, Faith            67        28  
10 Houston, Whitney       67        54  
# … with 218 more rows
fish_encounters %>%
  pivot_wider(
    names_from = station,
    values_from = seen,
    values_fill = 0
  ) %>% 
  pivot_longer(!fish, names_to = "station", values_to = "seen")
# A tibble: 209 x 3
   fish  station  seen
   <fct> <chr>   <int>
 1 4842  Release     1
 2 4842  I80_1       1
 3 4842  Lisbon      1
 4 4842  Rstr        1
 5 4842  Base_TD     1
 6 4842  BCE         1
 7 4842  BCW         1
 8 4842  BCE2        1
 9 4842  BCW2        1
10 4842  MAE         1
# … with 199 more rows

Além do unnest_wider() e unnest_longer():

library(repurrrsive)
chars <- tibble(char = got_chars)
chars %>%
  unnest_wider(char) %>% 
  select(name, books, tvSeries) %>% 
  pivot_longer(!name, names_to = "media") %>% 
  unnest_longer(value) %>% 
  filter(media == "tvSeries") %>% 
  extract(value, "season", "(\\d{1})", convert = TRUE)
# A tibble: 102 x 3
   name             media    season
   <chr>            <chr>     <int>
 1 Theon Greyjoy    tvSeries      1
 2 Theon Greyjoy    tvSeries      2
 3 Theon Greyjoy    tvSeries      3
 4 Theon Greyjoy    tvSeries      4
 5 Theon Greyjoy    tvSeries      5
 6 Theon Greyjoy    tvSeries      6
 7 Tyrion Lannister tvSeries      1
 8 Tyrion Lannister tvSeries      2
 9 Tyrion Lannister tvSeries      3
10 Tyrion Lannister tvSeries      4
# … with 92 more rows

Uma outra maneira

chars %>%
  unnest_wider(char) %>% 
  select(name, books, tvSeries) %>% 
  pivot_longer(!name, names_to = "media") %>% 
  unnest_longer(value) %>% 
  filter(media == "tvSeries") %>% 
  separate(value, into = c(NA, "season"), sep = " ", fill = "right")
# A tibble: 102 x 3
   name             media    season
   <chr>            <chr>    <chr> 
 1 Theon Greyjoy    tvSeries 1     
 2 Theon Greyjoy    tvSeries 2     
 3 Theon Greyjoy    tvSeries 3     
 4 Theon Greyjoy    tvSeries 4     
 5 Theon Greyjoy    tvSeries 5     
 6 Theon Greyjoy    tvSeries 6     
 7 Tyrion Lannister tvSeries 1     
 8 Tyrion Lannister tvSeries 2     
 9 Tyrion Lannister tvSeries 3     
10 Tyrion Lannister tvSeries 4     
# … with 92 more rows

Extras

Converter verbos {dplyr} em SQL com o {dbplyr}

library(dplyr, warn.conflicts = FALSE)

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, mtcars)

mtcars2 <- tbl(con, "mtcars")

Posso muito bem converter verbos {dplyr} para SQL (para todos os amantes de SQL)

summary <- mtcars2 %>% 
  group_by(cyl) %>% 
  summarise(mpg = mean(mpg, na.rm = TRUE)) %>% 
  arrange(-mpg)

summary %>% show_query()
<SQL>
SELECT `cyl`, AVG(`mpg`) AS `mpg`
FROM `mtcars`
GROUP BY `cyl`
ORDER BY -`mpg`
summary %>% collect()
# A tibble: 3 x 2
    cyl   mpg
  <dbl> <dbl>
1     4  26.7
2     6  19.7
3     8  15.1

Big Data com {arrow}

Ambiente

R version 4.1.0 (2021-05-18)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] repurrrsive_1.0.0 ggplot2_3.3.5     stringr_1.4.0    
[4] tidyr_1.1.3       janitor_2.1.0     dplyr_1.0.7      
[7] readr_1.4.0       magrittr_2.0.1    tibble_3.1.2     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        lubridate_1.7.10  lattice_0.20-44  
 [4] png_0.1-7         assertthat_0.2.1  rprojroot_2.0.2  
 [7] digest_0.6.27     utf8_1.2.1        R6_2.5.0         
[10] RSQLite_2.2.7     evaluate_0.14     highr_0.9        
[13] pillar_1.6.1      rlang_0.4.11      rstudioapi_0.13  
[16] blob_1.2.1        jquerylib_0.1.4   Matrix_1.3-4     
[19] rmarkdown_2.9     textshaping_0.3.5 labeling_0.4.2   
[22] splines_4.1.0     bit_4.0.4         munsell_0.5.0    
[25] compiler_4.1.0    xfun_0.24         pkgconfig_2.0.3  
[28] systemfonts_1.0.2 mgcv_1.8-35       htmltools_0.5.1.1
[31] downlit_0.2.1     tidyselect_1.1.1  bookdown_0.22    
[34] fansi_0.5.0       dbplyr_2.1.1      crayon_1.4.1     
[37] withr_2.4.2       grid_4.1.0        nlme_3.1-152     
[40] jsonlite_1.7.2    gtable_0.3.0      lifecycle_1.0.0  
[43] DBI_1.1.1         scales_1.1.1      cachem_1.0.5     
[46] cli_3.0.0         stringi_1.6.2     farver_2.1.0     
[49] snakecase_0.11.0  bslib_0.2.5.1     ellipsis_0.3.2   
[52] ragg_1.1.3        generics_0.1.0    vctrs_0.3.8      
[55] distill_1.2       tools_4.1.0       bit64_4.0.5      
[58] glue_1.4.2        purrr_0.3.4       hms_1.1.0        
[61] jpeg_0.1-8.1      fastmap_1.1.0     yaml_2.2.1       
[64] colorspace_2.0-2  memoise_2.0.0     knitr_1.33       
[67] sass_0.4.0       

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com/storopoli/Linguagem-R, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Storopoli (2021, April 7). Linguagem R: Manipulação de Dados. Retrieved from https://storopoli.io/Linguagem-R/2-Manipulacao_Dados.html

BibTeX citation

@misc{storopoli2021manipulacaodadosR,
  author = {Storopoli, Jose},
  title = {Linguagem R: Manipulação de Dados},
  url = {https://storopoli.io/Linguagem-R/2-Manipulacao_Dados.html},
  year = {2021}
}