How to collect dataviz from Twitter into your note-taking system
Intro
It is mid-April and the #30daychartchallenge is well on its way. One glace at the hashtag’s Twitter feed suffices to realize that there are great contributions. That’s a perfect opportunity to collect data viz examples for future inspirations.
Ideally, I can scroll through Twitter and with a few clicks incorporate these contributions straight into my Obsidian or any other Markdown-based note-taking system. Unfortunately, rtweet
’s snapshot function does not seem to work anymore. So, let’s build something on our own that gets the job done. The full script can be found on GitHub gist. Here’s what we will need:
- Twitter app bearer token (to access Twitter’s API) - I’ll show you how to get that
- Elevated API access (just a few clicks once you have a bearer token)
- Dummy mail account to send tweets to
Overview
Before we begin, let me summarize what kind of note-taking process I have in mind:
Stroll through Twitter and see great data viz on twitter.
Send tweet link and a few comments via mail to a dummy mail account
A scheduled process accesses the dummy mail account and scans for new mails from authorized senders.
If there is a new mail, R extracts tweet URL and uses Twitter’s API to download the tweet’s pictures and texts.
A template Markdown file is used to create a new note that contains the images and texts.
Markdown file is copied to your note-taking system within your file system.
Ideally, your Markdown template contains tags like #dataviz and #twitter so that your new note can be easily searched for.
Next time you look for inspiration, stroll through your collections or search for comments.
Preparations
Ok, we know what we want to accomplish. Time to get the prelims done. First, we will need a Twitter developer account. Then, we have to mask sensitive information in our code. If you already have a twitter app resp. a bearer token and know the keyring
package, feel free to skip this section.
Get Twitter developer account
Let’s create a developer account for Twitter. Unfortunately, there is no way to get such an account without providing Twitter with your phone number. Sadly, if this burden on your privacy is a problem for you, then you cannot proceed. Otherwise, create an account at developer.twitter.com.
In your developer portal, create a project. Within this project create an app. Along the way, you will get a bunch of keys, secrets, IDs and tokens. You will see them only once, so you will have to save them somewhere. I suggest saving them into a password manager like bitwarden.
When you create your app or shortly after, you will need to set the authentication settings. I use OAuth 2.0
. This requires
- type of app:
Automated bot or app
- Callback URI / Redirect URI:
http://127.0.0.1:1410
(DISCLAIMER: This is magic to me but thertweet
docs - or possibly some other doc (not entirely sure anymore)- taught me to set up an app that way) - Website URL: Your Twitter link (in my case
https://twitter.com/rappa753
)
Next, you will likely need to upgrade your project to ‘elevated’ status. This can be done for free on your project’s dashboard. From what I recall, you will have to fill out a form and tell Twitter what you want to do with your app. Just be honest and chances are that your request will immediately be granted. Just be yourself! What could possibly go wrong? Go get the girl elevated status (ahhh, what a perfect opportunity for a Taylor song).
How to embed your bearer token and other sensitive material in your code
Use the keyring
package to first save secrets via key_set
and then extract them in your session via key_get()
. This way, you won’t share your sensitive information by accident when you share your code (like I do). In this post, I do this for my bearer token, my dummy mail, my dummy mail’s password and for the allowed senders (that will be the mail where the tweets come from).
<- keyring::key_get('twitter-bearer-token', keyring = 'blogpost')
bearer_token <- keyring::key_get('dataviz-mail', keyring = 'blogpost')
user_mail <- keyring::key_get('dataviz-mail-password', keyring = 'blogpost')
password_mail <- keyring::key_get('allowed_senders', keyring = 'blogpost') allowed_senders
The allowed_senders
limitation is a precaution so that we do not accidentally download some malicious spam mail from God knows who onto our computer. I am no security expert but this feels like a prudent thing to do. If one of you fellow readers knows more about this security business, feel kindly invited to reach out to me with better security strategies.
What to do once we have a URL
Let’s assume for the sake of this section that we already extracted a tweet URL from a mail. Here’s the URL that we will use. In fact, it’s Christian Gebhard’s tweet that inspired me to start this project. From the URL we can extract the tweet’s ID (the bunch of numbers after /status/
). Also, we will need the URL of Twitter’s API.
library(stringr) # for regex matching
library(dplyr) # for binding rows and pipe
<- 'https://twitter.com/c_gebhard/status/1510533315262042112'
tweet_url <- tweet_url %>% str_match("status/([0-9]+)") %>% .[, 2]
tweet_id <- 'https://api.twitter.com/2/tweets' API_url
Use GET() to access Twitter API
Next, we use the GET()
function from the httr
package to interact with Twitter’s API.
library(httr) # for API communication
<- paste("Bearer", bearer_token) # API needs format "Bearer <my_token>"
auth
# Make request to API
<- GET(
request
API_url, add_headers(Authorization = auth),
query = list(
ids = tweet_id,
tweet.fields = 'created_at', # time stamp
expansions = 'attachments.media_keys,author_id',
# necessary expansion fields for img_url
media.fields = 'url' # img_url
)
) request
Response [https://api.twitter.com/2/tweets?ids=1510533315262042112&tweet.fields=created_at&expansions=attachments.media_keys%2Cauthor_id&media.fields=url]
Date: 2022-07-21 17:47
Status: 200
Content-Type: application/json; charset=utf-8
Size: 690 B
So, how do we know how to use the GET()
function? Well, I am no expert on APIs but let me try to explain how I came up with the arguments I used here.
Remember those toys you would play with as a toddler where you try to get a square through a square-shaped hole, a triangle through a triangle-shaped hole and so on? You don’t? Well, neither do I. Who remembers that stuff from very early childhood?
But I hear that starting a sentence with “Remember those…” is good for building a rapport with your audience. So, great! Now that we feel all cozy and connected, I can tell you how I managed to get the API request to work.
And the truth is actually not that far from the toddler “intelligence test”. First, I took a look at a help page from Twitter’s developer page. Then, I hammered at the GET()
function until its output contained a URL that looks similar to the example I found. Here’s the example code I was aiming at.
curl --request GET 'https://api.twitter.com/2/tweets?ids=1263145271946551300&
expansions=attachments.media_keys&
media.fields=duration_ms,height,media_key,preview_image_url,public_metrics,type,url,width,alt_text'
--header 'Authorization: Bearer $BEARER_TOKEN'
This is not really R code but it looks like usually you have to feed a GET request with a really long URL. In fact, it looks like the URL needs to contain everything you want to extract from the API. Specifically, the structure of said URL looks like
- the API’s base URL (in this case https://api.twitter.com/2/tweets)
- a question mark
?
- pairs of
keywords
(likeids
) and a specific value, e.g.ids=1263145271946551300
, that are connected via&
Therefore, it is only a matter of figuring out how to make the output of GET()
deliver this result. Hints on that came from GET()
examples in the docs.
GET("http://google.com/", path = "search", query = list(q = "ham"))
Response [http://www.google.com/search?q=ham]
Date: 2022-07-21 17:47
Status: 200
Content-Type: text/html; charset=ISO-8859-1
Size: 122 kB
<!doctype html><html lang="de"><head><meta charset="UTF-8"><meta content="/im...
document.documentElement.addEventListener("submit",function(b){var a;if(a=b.t...
var a=window.performance;window.start=Date.now();a:{var b=window;if(a){var c=...
var f=this||self;var g,h=null!=(g=f.mei)?g:1,m,n=null!=(m=f.sdo)?m:!0,p=0,q,r...
e);var l=a.fileName;l&&(b+="&script="+c(l),e&&l===window.location.href&&(e=do...
var c=[],e=0;window.ping=function(b){-1==b.indexOf("&zx")&&(b+="&zx="+Date.no...
var k=this||self,l=function(a){var b=typeof a;return"object"==b&&null!=a||"fu...
b}).join(" "))};function w(){var a=k.navigator;return a&&(a=a.userAgent)?a:""...
!1}e||(d=null)}}else"mouseover"==b?d=a.fromElement:"mouseout"==b&&(d=a.toElem...
var a=document.getElementById("st-toggle"),b=document.getElementById("st-card...
...
GET("http://httpbin.org/get", add_headers(a = 1, b = 2))
Response [http://httpbin.org/get]
Date: 2022-07-21 17:47
Status: 200
Content-Type: application/json
Size: 403 B
{
"args": {},
"headers": {
"A": "1",
"Accept": "application/json, text/xml, application/xml, */*",
"Accept-Encoding": "deflate, gzip, br",
"B": "2",
"Host": "httpbin.org",
"User-Agent": "libcurl/7.68.0 r-curl/4.3.2 httr/1.4.3",
"X-Amzn-Trace-Id": "Root=1-62d9913b-1838afb6392fdf321bc3479c"
...
So, the first example shows how an argument query
can be filled with a list that creates the URL we need. The second examples shows us that there is something called add_headers()
. Do I know exactly what that is? I mean, from a technical perspective of what is going on behind the scenes? Definitely not. But Twitter’s example request had something called header. Therefore, add_headers()
is probably something that does what the Twitter API expects.
And where do the key-value pairs come from? I found these strolling through Twitter’s data dictionary. Thus, a GET()
request was born and I could feel like a true Hackerman.
<- paste("Bearer", bearer_token) # API needs format "Bearer <my_token>"
auth
# Make request to API and parse to list
<- GET(
request
API_url, add_headers(Authorization = auth),
query = list(
ids = tweet_id,
tweet.fields = 'created_at', # time stamp
expansions = 'attachments.media_keys,author_id',
# necessary expansion fields for img_url
media.fields = 'url' # img_url
) )
Alright, we successfully requested data. Now, it becomes time to parse it to something useful. The content()
function will to that.
<- request %>% content('parsed')
parsed_request parsed_request
$data
$data[[1]]
$data[[1]]$attachments
$data[[1]]$attachments$media_keys
$data[[1]]$attachments$media_keys[[1]]
[1] "3_1510533145334104067"
$data[[1]]$id
[1] "1510533315262042112"
$data[[1]]$author_id
[1] "1070306701"
$data[[1]]$created_at
[1] "2022-04-03T08:23:01.000Z"
$data[[1]]$text
[1] "#30DayChartChallenge #Day3 - Topic: historical\n\nBack to the Shakespeare data! Hamlet is is longest play, the comedies tend to be shorter.\n\nTool: #rstats\nData: kaggle users LiamLarsen, aodhan\nColor-Scale: MetBrewer\nFonts: Niconne, Noto Sans (+Mono)\nCode: https://t.co/iXAbniQDCb https://t.co/JCNrYH9uP4"
$includes
$includes$media
$includes$media[[1]]
$includes$media[[1]]$media_key
[1] "3_1510533145334104067"
$includes$media[[1]]$type
[1] "photo"
$includes$media[[1]]$url
[1] "https://pbs.twimg.com/media/FPZ95H0XwAMHA8q.jpg"
$includes$users
$includes$users[[1]]
$includes$users[[1]]$id
[1] "1070306701"
$includes$users[[1]]$name
[1] "Christian Gebhard"
$includes$users[[1]]$username
[1] "c_gebhard"
Extract tweet data from what the API gives us and download images
We have seen that parsed_request
is basically a large list that contains everything we requested from the API. Unfortunately, it is a highly nested list, so we have to do some work to extract the parts we actually want. pluck()
from the purrr
package is our best friend on this one. Here’s all the information we extract from the parsed_request
.
library(purrr) # for pluck and map functions
# Extract necessary information from list-like structure
<- parsed_request %>%
tweet_text pluck("data", 1, 'text')
tweet_text
[1] "#30DayChartChallenge #Day3 - Topic: historical\n\nBack to the Shakespeare data! Hamlet is is longest play, the comedies tend to be shorter.\n\nTool: #rstats\nData: kaggle users LiamLarsen, aodhan\nColor-Scale: MetBrewer\nFonts: Niconne, Noto Sans (+Mono)\nCode: https://t.co/iXAbniQDCb https://t.co/JCNrYH9uP4"
<- parsed_request %>%
tweet_user pluck("includes", 'users', 1, 'username')
tweet_user
[1] "c_gebhard"
# We will use the tweet date and time as part of unique file names
# Replace white spaces and colons (:) for proper file names
<- parsed_request %>%
tweet_date pluck("data", 1, 'created_at') %>%
::as_datetime() %>%
lubridatestr_replace(' ', '_') %>%
str_replace_all(':', '')
tweet_date
[1] "2022-04-03_082301"
<- parsed_request %>%
img_urls pluck("includes", 'media') %>%
bind_rows() %>% # bind_rows for multiple pictures, i.e. multiple URLS
filter(type == 'photo') %>%
pull(url)
img_urls
[1] "https://pbs.twimg.com/media/FPZ95H0XwAMHA8q.jpg"
Next, download all the images via the img_urls
and download.file()
. We will use walk2()
to download all files (in case there are multiple images/URLs) and save the files into PNGs that are named using the unique tweet_date
IDs. Remember to set mode = 'wb'
in download.file()
. I am not really sure why but without it you will save poor quality images.
# Download image - set mode otherwise download is blurry
<- paste('tweet', tweet_user, tweet_date, seq_along(img_urls), sep = "_")
img_names walk2(img_urls, img_names, ~download.file(.x, paste0(.y, '.png'), mode = 'wb'))
So let’s do a quick recap of what we have done so far. We
- Assembled an API request
- Parsed the return of the request
- Cherrypicked the information that we want from the resulting list
- Used the image URLs to download and save the files to our working directory.
Let’s cherish this mile stone with a dedicated function.
<- function(tweet_url, bearer_token) {
request_twitter_data # Extract tweet id by regex
<- tweet_url %>% str_match("status/([0-9]+)") %>% .[, 2]
tweet_id <- paste("Bearer", bearer_token) # API needs format "Bearer <my_token>"
auth <- 'https://api.twitter.com/2/tweets'
API_url
# Make request to API and parse to list
<- GET(
parsed_request
API_url, add_headers(Authorization = auth),
query = list(
ids = tweet_id,
tweet.fields = 'created_at', # time stamp
expansions='attachments.media_keys,author_id',
# necessary expansion fields for img_url
media.fields = 'url' # img_url
)%>% content('parsed')
)
# Extract necassary information from list-like structure
<- parsed_request %>%
tweet_text pluck("data", 1, 'text')
<- parsed_request %>%
tweet_user pluck("includes", 'users', 1, 'username')
# Make file name unique through time-date combination
# Replace white spaces and colons (:) for proper file names
<- parsed_request %>%
tweet_date pluck("data", 1, 'created_at') %>%
::as_datetime() %>%
lubridatestr_replace(' ', '_') %>%
str_replace_all(':', '')
<- parsed_request %>%
img_urls pluck("includes", 'media') %>%
bind_rows() %>%
filter(type == 'photo') %>%
pull(url)
# Download image - set mode otherwise download is blurry
<- paste('tweet', tweet_user, tweet_date, seq_along(img_urls), sep = "_")
img_names walk2(img_urls, img_names, ~download.file(.x, paste0(.y, '.png'), mode = 'wb'))
# Return list with information
list(
url = tweet_url,
text = tweet_text,
user = tweet_user,
file_names = paste0(img_names, '.png'),
file_paths = paste0(getwd(), '/', img_names, '.png')
) }
Fill out Markdown template using extracted information and images
We have our images and the original tweet now. Thanks to our previous function, we can save all of the information in a list.
<- request_twitter_data(tweet_url, bearer_token) request
So, let’s bring all that information into a Markdown file. Here is the template.md
file that I have created for this joyous occasion.
library(readr) # for reading and writing files from/to disk
cat(read_file('template.md'))
#dataviz #twitter
![[insert_img_name_here]]
### Original Tweet
insert_text_here
Original: insert_URL_here
### Original Mail
insert_mail_here
As you can see, I started the Markdown template with two tags #dataviz
and #twitter
. This helps me to search for a specific dataviz faster. Also, I have already written out the Markdown syntax for image imports ![[...]]
and added a placeholder insert_img_name_here
. This one will be replaced by the file path to the image. Similarly, other placeholders like insert_text_here
and insert_mail_here
allow me to save the tweet and the mail content into my note taking system too.
To do so, I will need a function that replaces all the placeholders. First, I created a helper function that changes the image import placeholder properly, when there are multiple images.
<- function(file_names) {
md_import_strings paste0('![[', file_names, ']]', collapse = '\n')
}
Then, I created a function that takes the request
list that we got from calling our own request_twitter_data()
function and iteratively uses str_replace_all()
. This iteration is done with reduce2()
which will replace all placeholders in template.md
.
library(tibble) # for easier readable tribble creation
# Replace the placeholders in the template
# We change original mail place holder later on
<- function(template_name, request) {
replace_template_placeholder # Create a dictionary for what to replace in template
<- tribble(
replace_dict ~template, ~replacement,
'\\!\\[\\[insert_img_name_here\\]\\]', md_import_strings(request$file_names),
'insert_text_here', request$text %>% str_replace_all('#', '(#)'),
'insert_URL_here', request$url
)
# Iteratively apply str_replace_all and keep only final result
reduce2(
.x = replace_dict$template,
.y = replace_dict$replacement,
.f = str_replace_all,
.init = read_lines(template_name)
%>%
) paste0(collapse = '\n') # Collaps lines into a single string
}
replace_template_placeholder('template.md', request) %>% cat()
#dataviz #twitter
![[tweet_c_gebhard_2022-04-03_082301_1.png]]
### Original Tweet
(#)30DayChartChallenge (#)Day3 - Topic: historical
Back to the Shakespeare data! Hamlet is is longest play, the comedies tend to be shorter.
Tool: (#)rstats
Data: kaggle users LiamLarsen, aodhan
Color-Scale: MetBrewer
Fonts: Niconne, Noto Sans (+Mono)
Code: https://t.co/iXAbniQDCb https://t.co/JCNrYH9uP4
Original: https://twitter.com/c_gebhard/status/1510533315262042112
### Original Mail
insert_mail_here
As you can see, my replace_template_placeholder()
function also replaces the typical #
from Twitter with (#)
. This is just a precaution to avoid wrong interpretation of these lines as headings in Markdown. Also, the original mail has not been inserted yet because we have no mail yet. But soooon. Finally, we need to write the replaced strings to a file. I got some helpers for that right here.
<- function(replaced_text, request) {
write_replaced_text <- request$file_name[1] %>% str_replace('_1.png', '.md')
file_name write_lines(replaced_text, file_name)
paste0(getwd(), '/', file_name)
}<- replace_template_placeholder('template.md', request) %>%
replaced_template write_replaced_text(request)
Shuffle files around on your file system
Awesome! We created new image files and a new Markdown note in our working directory. Now, we have to move them to our Obsidian vault. This is the place where I collect all my Markdown notes for use in Obsidian. In my case, I will need to move the Markdown note to the vault directory and the images to a subdirectory within this vault. This is because I changed settings in Obsidian that makes sure that all attachments, e.g. images, are saved in a separate subdirectory.
Here’s the function I created to get that job done. The function uses the request
list again because it contains the file paths of the images. Here, vault_location
and attachments_dir
are the file paths to my Obsidian vault.
library(tidyr) # for unnesting
<- function(request, replaced_template, vault_location, attachments_dir) {
move_files # Create from-to dictionary with file paths in each column
<- tribble(
move_dict ~from, ~to,
$file_path, paste0(vault_location, '/', attachments_dir),
request
replaced_template, vault_location%>%
) unnest(cols = 'from')
# Copy files from current working directory to destination
%>% pwalk(file.copy, overwrite = T)
move_dict # Delete files in current working directory
walk(move_dict$from, file.remove)
}
How to extract URL and other stuff from mail
Let’s take a quick breather and recap. We have written functions that
- take a tweet URL
- hussle the Twitter API to give us all its data
- download the images and tweet text
- save everything to a new Markdown note based on a template
- can move the note plus images to the location of our note-taking hub
Not to brag but that is kind of cool. But let’s not rest here. We still have to get some work done. After all, we want our workflow to be email-based. So, let’s access our mails using R. Then, we can extract a Twitter URL and apply our previous functions. Also, this lets us finally replace the insert_mail_here
placeholder in our Markdown note.
Postman gives you access
I have created a dummy mail account at gmail. Using the mRpostman
package, we can establish a connection to our mail inbox. After the connection is established, we can filter for all new emails that are sent from our list of allowed_senders
.
library(mRpostman) # for email communication
<- 'imaps://imap.gmail.com' # mail client
imap_mail # Establish connection to imap server
<- configure_imap(
con url = imap_mail,
user = user_mail,
password = password_mail
)
# Switch to Inbox
$select_folder('Inbox')
con
# Extract mails that are from the list of allowed senders
<- allowed_senders %>%
mails map(~con$search_string(expr = ., where = 'FROM')) %>%
unlist() %>%
na.omit() %>% # Remove NAs if no mail from a sender
as.numeric() # avoids attributes
Grab URLs from mail
If mails
is not empty, i.e. if there are new mails, then we need to extract the tweet URLs from them. Unfortunately, depending on where you sent your email from, the mail text can be encoded.
For example, I send most of the tweets via the share button on Twitter using my Android smartphone. And for some reason, my Android mail client encodes the mails in something called base64
. But sending a tweet URL from Thunderbird on my computer works without any encoding. Here are two example mails I have sent to my dummy mail account.
if (!is_empty(mails)) mail_bodys <- mails %>% con$fetch_text()
cat(mail_bodys[[1]])
cat(mail_bodys[[2]])
As you can see, the mail sent from my computer is legible but the other one is gibberish. Thankfully, Allan Cameron helped me out on Stackoverflow to decode the mail. To decode the mail, the trick was to extract the parts between base64
and ----
.
There are two such texts in the encoded mail. Surprisingly, the first one decoded to a text without line breaks. This is why we take the second encoded part and decode it. However, this will give us an HTML text with all kinds of tags like <div>
and what not. Therefore, we use html_read()
and html_text2()
from the rvest
package to handle that. All of this is summarized in this helper function.
<- function(encoded_mails) {
decode_encoded_mails # Ressource: https://stackoverflow.com/questions/71772972/translate-encoding-of-android-mail-in-r
# Find location in each encoded string where actual text starts
<- encoded_mails %>%
start_encoded str_locate_all('base64\r\n\r\n') %>%
map(~pluck(., 4) + 1) %>%
unlist()
# Find location in each encoded string where actual text starts
<- encoded_mails %>%
end_encoded str_locate_all('----') %>%
map(~pluck(., 3) - 1)%>%
unlist()
# Use str_sub() to extract encoded text
<- tibble(
encoded_text string = unlist(encoded_mails),
start = start_encoded,
end = end_encoded
%>%
) pmap(str_sub)
# Decode: base64 -> raw -> char -> html -> text
%>%
encoded_text map(base64enc::base64decode) %>%
map(rawToChar) %>%
map(rvest::read_html) %>%
map(rvest::html_text2)
}
I feel like this is the most hacky part of this blog post. Unfortunately, your milage may vary here. If your phone or whatever you use encodes the mails differently, then you may have to adjust the function. But I hope that I have explained enough details and concepts for you to manage that if it comes to this.
Recall that I send both plain mails from Thunderbird and encoded mails from Android. Therefore, here is another helper that decoded mails if neccessary from both types in one swoop.
<- function(mail_bodys) {
decode_all_mails # Decode in case mail is base64 decoded
<- str_detect(mail_bodys, 'Content-Transfer-Encoding')
is_encoded <- mail_bodys[is_encoded]
encoded_mails <- mail_bodys[!is_encoded]
plain_mails <- encoded_mails %>% decode_encoded_mails()
decoded_mails c(decoded_mails, plain_mails)
}
The remaining part of the code should be familiar:
- Use
decode_all_mails()
for decoding - Grab URLs with
str_extract()
- Use
request_twitter_data()
with our URLs - Replace placeholders with
replace_template_placeholder()
- This time, replace mail placeholders too with another
str_replace()
iteration - Move files with
move_files()
The only new thing is that we use our postman connection to move the processed mails into a new directory (which I called “Processed”) on the email server. This way, the inbox is empty again or filled only with mails from unauthorized senders.
if (!is_empty(mails)) {
# Grab mail texts and URLs
<- mails %>% con$fetch_text() %>% decode_all_mails
mail_bodys <- mail_bodys %>% str_extract('https.*')
urls
# Remove mails from vector in case s.th. goes wrong
# and urls cannot be detected
<- mail_bodys[!is.na(urls)]
mail_bodys <- mails[!is.na(urls)]
mails <- urls[!is.na(urls)]
urls
# For each url request twitter data
<- map(urls, request_twitter_data, bearer_token = bearer_token)
requests
# Use requested twitter data to insert texts into Markdown template
# and write to current working directory
<-
replaced_templates_wo_mails map(requests, replace_template_placeholder, template = 'template.md')
# Now that we have mails, replace that placeholder too
<- replaced_templates_wo_mails %>%
replaced_templates map2(mail_bodys, ~str_replace(.x, 'insert_mail_here' ,.y)) %>%
map2(requests, ~write_replaced_text(.x, .y))
# Move markdown files and extracted pngs to correct place on HDD
walk2(
requests,
replaced_templates,
move_files, vault_location = vault_location,
attachments_dir = attachments_dir
)
# Move emails on imap server to Processed directory
$move_msg(mails, to_folder = 'Processed')
con }
Last Step: Execute R script automatically
Alright, alright, alright. We made it. We have successfully
- extracted URLs from mails,
- created new notes and
- moved them to their designated place
The only thing that is left to do is execute this script automatically. Again, if you don’t want to assemble the R script yourself using the code chunks in this blog post, check out this GitHub gist.
On Windows, you can write a VBS script that will execute the R script. Window’s task scheduler is easily set up to run that VBS script regularly, say every hour. For completeness’ sake let me give you an example VBS script. But beware that I have no frikkin clue how VBS scripts work beyond this simple call.
Set wshshell = WScript.CreateObject ("wscript.shell")
wshshell.run """C:\Program Files\R\R-4.0.5\bin\Rscript.exe"" ""D:\Local R Projects\Playground\TwitterTracking\my_twitter_script.R""", 6, True
set wshshell = nothing
The idea of this script is to call Rscript.exe
and give it the location of the R script that we want to execute. Of course, you will need to adjust the paths to your file system. Notice that there are super many double quotes in this script. This is somewhat dumb but it’s the only way I could find to make file paths with white spaces work (see StackOverflow).
On Ubuntu (and probably other Unix-based systems), I am sure that every Unix user knows that there is CronTab to schedule regular tasks. On Mac, I am sure there is something. But instead of wandering even further from my expertise, I will refer to your internet search skills.
Mind the possibilities
We made it! We connected to Twitter’s API and our dummy email to get data viz (what’s the plural here? viz, vizz, vizzes, vizzeses?) into our note-taking system. Honestly, I think that was quite an endeavor. But now we can use the same ideas for all kind of other applications! From the top of my head I can think of more scenarios where similar solutions should be manageable. Here are two ideas.
Take notes on the fly using emails and automatically incorporate the emails into your note-taking system.
Take a photo from a book/text you’re reading and send it to another dummy mail. Run a script that puts the photo and the mail directly into your vault.
So, enjoy the possibilities! If you liked this blog post, then consider following me on Twitter and/or subscribing to my RSS feed. Until next time!