Getting started with SQL with R

In this blog post, I show you how to work with databases in R
Author

Albert Rapp

Published

October 5, 2024

In today’s blog post, I’m showing you how to work with databases from within R. As always, you can find the video version of this blog post on YouTube:

Establish a connection

Let us start by setting up a database connection.

library(odbc)
library(DBI)

con <- dbConnect(
  drv = odbc(),
  driver = "SQLite3 Driver",
  database = ':memory:',
  server = 'localhost',
  uid = 'user_name',
  pwd = 'my_password',
  port = 5432
)
## Error in `dbConnect()`:
## ! ODBC failed with error 00000 from [unixODBC][Driver Manager].
## ✖ Can't open lib 'SQLite3 Driver' : file not found
## ℹ From 'nanodbc/nanodbc.cpp:1150'.

If there is an error, find out if you have the ODBC driver installed on your computer. You can check that with

odbcListDrivers()
##                  name   attribute                                    value
## 1              SQLite Description                       SQLite ODBC Driver
## 2              SQLite      Driver                         libsqliteodbc.so
## 3              SQLite       Setup                         libsqliteodbc.so
## 4              SQLite  UsageCount                                        1
## 5             SQLite3 Description                      SQLite3 ODBC Driver
## 6             SQLite3      Driver                        libsqlite3odbc.so
## 7             SQLite3       Setup                        libsqlite3odbc.so
## 8             SQLite3  UsageCount                                        1
## 9     PostgreSQL ANSI Description    PostgreSQL ODBC driver (ANSI version)
## 10    PostgreSQL ANSI      Driver                             psqlodbca.so
## 11    PostgreSQL ANSI       Setup                          libodbcpsqlS.so
## 12    PostgreSQL ANSI       Debug                                        0
## 13    PostgreSQL ANSI     CommLog                                        1
## 14    PostgreSQL ANSI  UsageCount                                        1
## 15 PostgreSQL Unicode Description PostgreSQL ODBC driver (Unicode version)
## 16 PostgreSQL Unicode      Driver                             psqlodbcw.so
## 17 PostgreSQL Unicode       Setup                          libodbcpsqlS.so
## 18 PostgreSQL Unicode       Debug                                        0
## 19 PostgreSQL Unicode     CommLog                                        1
## 20 PostgreSQL Unicode  UsageCount                                        1

Aha! The driver name seems to be different. Using the correct driver name, the previous code works nicely.

con <- dbConnect(
  drv = odbc(),
  driver = "SQLite3",
  database = ':memory:',
  server = 'localhost',
  uid = 'user_name',
  pwd = 'my_password',
  port = 5432
)
con
## <OdbcConnection> :memory:
##   Database: :memory:
##   SQLite Version: 3.36.0

Cool, we have a connection but what if there wasn’t a SQLite driver installed on my system? In that case, I could install it the driver by first checking what the name for the installation is. For Ubuntu users that would be

apt-cache search sqlite odbc
## fp-units-db - Free Pascal - database-library units dependency package
## fp-units-db-3.2.2 - Free Pascal - database-library units
## libcppdb-dev - SQL Connectivity Library (development files)
## libcppdb-mysql0 - SQL Connectivity Library (MySQL backend)
## libcppdb-odbc0 - SQL Connectivity Library (odbc backend)
## libcppdb-postgresql0 - SQL Connectivity Library (PostgreSQL backend)
## libcppdb-sqlite3-0 - SQL Connectivity Library (sqlite3 backend)
## libcppdb0 - SQL Connectivity Library (core library)
## libghc-hdbc-sqlite3-dev - Sqlite v3 HDBC (Haskell Database Connectivity) Driver for GHC
## libghc-hdbc-sqlite3-doc - Sqlite v3 HDBC (Haskell Database Connectivity) Documentation
## libghc-hdbc-sqlite3-prof - Sqlite v3 HDBC Driver for GHC; profiling libraries
## libpoco-doc - Documentation for POCO - The C++ Portable Components
## libreoffice-base-drivers - Database connectivity drivers for LibreOffice
## libsqliteodbc - ODBC driver for SQLite embedded database
## php-db - Database Abstraction Layer
## ruby-sequel - Simple, flexible, and powerful SQL database access toolkit for Ruby

And from there I could identify that libsqliteodbc is the thing I need to install. On Ubuntu, that can be done with

sudo apt-get install libsqliteodbc

Anyway, enough with the technical stuff.

Use database tables

Let’s check what tables we have in our database right now.

con |> 
  dbListTables()
## character(0)

Well, that’s boring. No tables yet. Let’s create one. Let’s use the penguins from the {palmerpenguins} package and throw that into a table called palmerpenguins.

con |> 
  dbCreateTable(
    name = 'palmerpenguins',
    fields = palmerpenguins::penguins
  )

Now, we can see that we have a table in our database.

con |> 
  dbListTables()
## [1] "palmerpenguins"

And we could fetch the data as well.

con |> 
  dbReadTable('palmerpenguins') |> 
  tibble::as_tibble() # nicer output with tibbles
## # A tibble: 0 × 8
## # ℹ 8 variables: species <chr>, island <chr>, bill_length_mm <dbl>,
## #   bill_depth_mm <dbl>, flipper_length_mm <int>, body_mass_g <int>, sex <chr>,
## #   year <int>

Looks like the table is still empty and only the columns (and their type) were initialized. Time to append the actual data.

con |> 
  dbAppendTable(
    name = 'palmerpenguins',
    value = palmerpenguins::penguins
  )

Once that is executed, we should now see data inside of our table.

penguins_db <- con |> 
  dbReadTable('palmerpenguins') |> 
  tibble::as_tibble()
penguins_db
## # A tibble: 344 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <int>       <int>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 334 more rows
## # ℹ 2 more variables: sex <chr>, year <int>

Excellent. We have our data right here. And once we have the full data locally, we can operate on it like we normally would, e.g.

penguins_db |> 
  dplyr::filter(!is.na(sex), bill_length_mm > 50) |> 
  dplyr::select(bill_length_mm, sex)
## # A tibble: 52 × 2
##    bill_length_mm sex  
##             <dbl> <chr>
##  1           50.2 male 
##  2           59.6 male 
##  3           50.5 male 
##  4           50.5 male 
##  5           50.1 male 
##  6           50.4 male 
##  7           54.3 male 
##  8           50.7 male 
##  9           51.1 male 
## 10           52.5 male 
## # ℹ 42 more rows

Get data with SQL

But what if we wanted to have only have the data return already the filtered data? That way, we don’t have to possibly download loads and loads of data from the database only to filter for and select specific data locally.

Well, we could just use a simple SQL statement to get the data that we want. The key function to make that possible is the dbGetQuery() function.

con |> 
  dbGetQuery(
    'SELECT bill_length_mm, sex 
    FROM palmerpenguins
    WHERE bill_length_mm > 50 and sex IS NOT NULL;'
  )  |> 
  tibble::as_tibble()
## # A tibble: 52 × 2
##    bill_length_mm sex  
##             <dbl> <chr>
##  1           50.2 male 
##  2           59.6 male 
##  3           50.5 male 
##  4           50.5 male 
##  5           50.1 male 
##  6           50.4 male 
##  7           54.3 male 
##  8           50.7 male 
##  9           51.1 male 
## 10           52.5 male 
## # ℹ 42 more rows

dtplyr

Another possibility is to use the {dbplyr} package. This package lets you use the syntax from the {dplyr} package and translates that automatically to SQL. You just have to know three particular things to use that:

  1. You need to tell {dbplyr} which table you want to use on the database side with dplyr::tbl().

  2. You can have a look at the SQL code with dplyr::show_query().

  3. If you want to have the full query results locally (and not just a preview), then you need to collect the data using dplyr::collect().

tbl_palmerpenguins <- dplyr::tbl(con, 'palmerpenguins')
extraction <- tbl_palmerpenguins |> 
  dplyr::filter(!is.na(sex), bill_length_mm > 50) |> 
  dplyr::select(bill_length_mm, sex)
extraction |> dplyr::show_query()
## <SQL>
## SELECT "bill_length_mm", "sex"
## FROM "palmerpenguins"
## WHERE (NOT(("sex" IS NULL))) AND ("bill_length_mm" > 50.0)
extraction |> dplyr::collect()
## # A tibble: 52 × 2
##    bill_length_mm sex  
##             <dbl> <chr>
##  1           50.2 male 
##  2           59.6 male 
##  3           50.5 male 
##  4           50.5 male 
##  5           50.1 male 
##  6           50.4 male 
##  7           54.3 male 
##  8           50.7 male 
##  9           51.1 male 
## 10           52.5 male 
## # ℹ 42 more rows

This also works with mutate() calls that change the returned data on the database side befor downloading it.

tbl_palmerpenguins |> 
  dplyr::filter(!is.na(sex), bill_length_mm > 50) |> 
  dplyr::select(bill_length_mm, sex) |> 
  dplyr::mutate(
    sex = if_else(sex == 'male', 'MALE', 'FEMALE')
  ) |> 
  dplyr::show_query()
## <SQL>
## SELECT
##   "bill_length_mm",
##   CASE WHEN ("sex" = 'male') THEN 'MALE' WHEN NOT ("sex" = 'male') THEN 'FEMALE' END AS "sex"
## FROM "palmerpenguins"
## WHERE (NOT(("sex" IS NULL))) AND ("bill_length_mm" > 50.0)

So in this case you would get

tbl_palmerpenguins |> 
  dplyr::filter(!is.na(sex), bill_length_mm > 50) |> 
  dplyr::select(bill_length_mm, sex) |> 
  dplyr::mutate(
    sex = if_else(sex == 'male', 'MALE', 'FEMALE')
  ) |> 
  dplyr::collect()
## # A tibble: 52 × 2
##    bill_length_mm sex  
##             <dbl> <chr>
##  1           50.2 MALE 
##  2           59.6 MALE 
##  3           50.5 MALE 
##  4           50.5 MALE 
##  5           50.1 MALE 
##  6           50.4 MALE 
##  7           54.3 MALE 
##  8           50.7 MALE 
##  9           51.1 MALE 
## 10           52.5 MALE 
## # ℹ 42 more rows

But it is important to note that this does not actually update the underlying data. You see, if you connect the data once more (without the mutate() call), then you get the original data:

tbl_palmerpenguins |> 
  dplyr::filter(!is.na(sex), bill_length_mm > 50) |> 
  dplyr::select(bill_length_mm, sex) |> 
  dplyr::collect()
## # A tibble: 52 × 2
##    bill_length_mm sex  
##             <dbl> <chr>
##  1           50.2 male 
##  2           59.6 male 
##  3           50.5 male 
##  4           50.5 male 
##  5           50.1 male 
##  6           50.4 male 
##  7           54.3 male 
##  8           50.7 male 
##  9           51.1 male 
## 10           52.5 male 
## # ℹ 42 more rows

Programatically assembling SQL queries with {glue}

Another way to assemble SQL queries is using the glue_sql() function from the {glue} package. This can be particularly useful when you are already familiar with SQL and don’t want to learn the {dtplyr} syntax.

For example, if you have vectors that describes particular columns and values your interested in. For our penguins, this could be something like this:

selected_tbl <- 'palmerpenguins'
selected_cols <- c('bill_length_mm', 'sex')
selected_species <- c('Adelie', 'Chinstrap')

Then with the glue_sql() command you can easily throw these variables into a SQL-query that you write yourself.

glue::glue_sql(
  'SELECT {selected_cols}
  FROM {selected_tbl}
  WHERE species IN ({selected_species})',
  .con = con
)
## <SQL> SELECT 'bill_length_mm'
## FROM 'palmerpenguins'
## WHERE species IN ('Adelie')
## <SQL> SELECT 'sex'
## FROM 'palmerpenguins'
## WHERE species IN ('Chinstrap')

Well, this doesn’t look right. It’s two SQL queries instead of one. But the nice thing about glue_sql() is that you can collapse vectors with the * character.

glue::glue_sql(
  'SELECT {selected_cols*}
  FROM {selected_tbl}
  WHERE species IN ({selected_species*})',
  .con = con
)
## <SQL> SELECT 'bill_length_mm', 'sex'
## FROM 'palmerpenguins'
## WHERE species IN ('Adelie', 'Chinstrap')

But if you know SQL, you know that this code is not quite right. Check out what happens if you run this query.

incorrect_query <- glue::glue_sql(
  'SELECT {selected_cols*}
  FROM {selected_tbl}
  WHERE species IN ({selected_species*})',
  .con = con
)
con |> 
  dbGetQuery(incorrect_query) |> 
  tibble::as_tibble()
## # A tibble: 220 × 2
##    `'bill_length_mm'` `'sex'`
##    <chr>              <chr>  
##  1 bill_length_mm     sex    
##  2 bill_length_mm     sex    
##  3 bill_length_mm     sex    
##  4 bill_length_mm     sex    
##  5 bill_length_mm     sex    
##  6 bill_length_mm     sex    
##  7 bill_length_mm     sex    
##  8 bill_length_mm     sex    
##  9 bill_length_mm     sex    
## 10 bill_length_mm     sex    
## # ℹ 210 more rows

That’s not quite right, is it? Well, the thing is: In our SQL-query the quotation marks are not correct. Have a look:

incorrect_query
## <SQL> SELECT 'bill_length_mm', 'sex'
## FROM 'palmerpenguins'
## WHERE species IN ('Adelie', 'Chinstrap')

For the things in the SELECT and FROM part we shouldn’t use single quotes '. In SQLite, these are reserved for literal strings like 'Adelie' and 'Chinstrap'. If you want to use identifiers like column and tables names, then you either don’t use any quotation marks are double quotation marks ". You can tell glue_sql() about the difference by wrapping identifiers into backticks `.

correct_query <- glue::glue_sql(
  'SELECT {`selected_cols`*}
  FROM {`selected_tbl`}
  WHERE species IN ({selected_species*})',
  .con = con
)
correct_query
## <SQL> SELECT "bill_length_mm", "sex"
## FROM "palmerpenguins"
## WHERE species IN ('Adelie', 'Chinstrap')

So, with this query, you can now run the code from before and get the right results.

con |> 
  dbGetQuery(correct_query) |> 
  tibble::as_tibble()
## # A tibble: 220 × 2
##    bill_length_mm sex   
##             <dbl> <chr> 
##  1           39.1 male  
##  2           39.5 female
##  3           40.3 female
##  4           NA   <NA>  
##  5           36.7 female
##  6           39.3 male  
##  7           38.9 female
##  8           39.2 male  
##  9           34.1 <NA>  
## 10           42   <NA>  
## # ℹ 210 more rows

Transactions

Now, let me teach you one more thing that’s essential to working with databases. Namely, transactions. You see, you can often want to have changes to your database run transactionally.

This means that if you update multiple rows, you want to be sure that either all rows were updated successfully or no rows were changed. That’s what transactions do. Here’s how they work:

First, you begin a transaction:

con |> dbBegin()

Then, you make changes to the database like appending more rows to a table.

con |> 
  dbAppendTable(
    'palmerpenguins',
    palmerpenguins::penguins
  )

And now you could see your changes:

con |> 
  dbReadTable('palmerpenguins') |> 
  tibble::as_tibble()
## # A tibble: 688 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <int>       <int>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 678 more rows
## # ℹ 2 more variables: sex <chr>, year <int>

See how you have double the amount of rows now? This means you have successfully appended more rows to the database table. But not so fast. You can revert all your changes since starting your transaction with dbBegin() by calling dbRollback().

con |> dbRollback()

And now if you grab the table again, you can see that you’re back where you started.

con |> 
  dbReadTable('palmerpenguins') |> 
  tibble::as_tibble()
## # A tibble: 344 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <int>       <int>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 334 more rows
## # ℹ 2 more variables: sex <chr>, year <int>

Of course, there’s also a way to make changes permanent. And that happens with dbCommit().

# Start transaction
con |> dbBegin()
# Make changes
con |> 
  dbAppendTable(
    'palmerpenguins',
    palmerpenguins::penguins
  )
# Commit to changes
con |> dbCommit()
# Changes are permanent now.
con |> 
  dbReadTable('palmerpenguins') |> 
  tibble::as_tibble()
## # A tibble: 688 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <int>       <int>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 678 more rows
## # ℹ 2 more variables: sex <chr>, year <int>

You might be wondering how this could be useful. So let me show you how.

Imagine that we have a constraint on the species column inside of our data set. We could make that happen by creating a new table palmerpenguins_new with a bit of SQL.

allowed_species <- c('Adelie', 'Chinstrap', 'Gentoo')
query <- glue::glue_sql(
  "CREATE TABLE palmerpenguins_new (
    species TEXT NOT NULL CHECK(species IN ({allowed_species*})),
    island TEXT,
    bill_length_mm REAL,
    bill_depth_mm REAL,
    flipper_length_mm REAL,
    body_mass_g REAL,
    sex TEXT,
    year INTEGER
  );",
  .con = con
)
con |> dbExecute(query)
## [1] 0

Clearly, we could append the original palmerpenguins data set to this table (as it fulfills the constraint).

con |> 
  dbAppendTable('palmerpenguins_new', value = palmerpenguins::penguins)
con |> 
  dbReadTable('palmerpenguins_new') |> 
  tibble::as_tibble()
## # A tibble: 344 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 334 more rows
## # ℹ 2 more variables: sex <chr>, year <int>

But what about a data set that has a row in it with a wrong species. Like this one here:

fake_dat <- dplyr::bind_rows(
  palmerpenguins::penguins[1,],
  tibble::tibble(species = 'NEW SPECIES')
)
fake_dat
## # A tibble: 2 × 8
##   species     island  bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <chr>       <fct>            <dbl>         <dbl>             <int>       <int>
## 1 Adelie      Torger…           39.1          18.7               181        3750
## 2 NEW SPECIES <NA>              NA            NA                  NA          NA
## # ℹ 2 more variables: sex <fct>, year <int>

Well, appending that should work for the first row but not for the second one. Let’s see what happens if we try anyway:

# Will work
con |> 
  dbAppendTable(
    'palmerpenguins_new', 
    value = fake_dat[1, ]
  )
# Will fail
con |> 
  dbAppendTable(
    'palmerpenguins_new', 
    value = fake_dat[2, ]
  )
## Error in eval(expr, envir, enclos): nanodbc/nanodbc.cpp:1783: 00000
## [SQLite]CHECK constraint failed: species IN ('Adelie', 'Chinstrap', 'Gentoo') (19)

Nice. We get an error because the constrains were not fulfilled for the second row. As we should. But because we the first operation was successful, the first (valid) row was still appended. Notice how there are 345 rows in the database table now as opposed to 344 rows from the original data set.

con |> 
  dbReadTable('palmerpenguins_new') |> 
  tibble::as_tibble()
## # A tibble: 345 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 335 more rows
## # ℹ 2 more variables: sex <chr>, year <int>

That’s unfortunate. We now have to figure out what operations worked and which didn’t. In real-world settings where we often try to string together multiple operations, this can become tediuous quite fast. That’s why it’s common to enforce that either all opeartions work or everything is reverted to normal.

And that’s where transactions come in. Together with tryCatch() they are an invaluable tool to ensure database consistency.

con |> dbBegin()
tryCatch({
  # Will work
  con |> 
    dbAppendTable(
      'palmerpenguins_new', 
      value = fake_dat[1, ]
    )
  # Will fail
  con |> 
    dbAppendTable(
      'palmerpenguins_new', 
      value = fake_dat[2, ]
    )
  con |> dbCommit() # Commit to changes if no error until here
  cli::cli_alert_success("Data uploaded successfully")
}, error = function(e) {
  con |> dbRollback() # Rollback if there was an error
  cli::cli_abort("Data couln't be uploaded successfully. There was the following error {e$message}")
})
## Error in `value[[3L]]()`:
## ! Data couln't be uploaded successfully. There was the following error
##   nanodbc/nanodbc.cpp:1783: 00000 [SQLite]CHECK constraint failed: species IN
##   ('Adelie', 'Chinstrap', 'Gentoo') (19)

Now you’ll see that we are still left with the 345 rows instead of adding yet another row (even though the first operation worked.)

con |> 
  dbReadTable('palmerpenguins_new') |> 
  tibble::as_tibble()
## # A tibble: 345 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 335 more rows
## # ℹ 2 more variables: sex <chr>, year <int>

And with that we have concluded our little trip into databases with R. Of course there’s lots more to learn about SQL and database interactions from within R. But for now, I think this is a good place to stop.


Enjoyed this blog post?

Here are three other ways I can help you:

3 Minutes Wednesdays

Every week, I share bite-sized R tips & tricks. Reading time less than 3 minutes. Delivered straight to your inbox. You can sign up for free weekly tips online.

Data Cleaning With R Master Class

This in-depth video course teaches you everything you need to know about becoming better & more efficient at cleaning up messy data. This includes Excel & JSON files, text data and working with times & dates. If you want to get better at data cleaning, check out the course page.

Insightful Data Visualizations for "Uncreative" R Users

This video course teaches you how to leverage {ggplot2} to make charts that communicate effectively without being a design expert. Course information can be found on the course page.