It is mid-April and the #30daychartchallenge is well on its way. One glace at the hashtag’s Twitter feed suffices to realize that there are great contributions. That’s a perfect opportunity to collect data viz examples for future inspirations.
Ideally, I can scroll through Twitter and with a few clicks incorporate
these contributions straight into my Obsidian or
any other Markdown-based note-taking system. Unfortunately,
snapshot function does not seem to work anymore. So, let’s build
something on our own that gets the job done. The full script can be
found on GitHub
Here’s what we will need:
- Twitter app bearer token (to access Twitter’s API) - I’ll show you how to get that
- Elevated API access (just a few clicks once you have a bearer token)
- Dummy mail account to send tweets to
Before we begin, let me summarize what kind of note-taking process I have in mind:
Stroll through Twitter and see great data viz on twitter.
Send tweet link and a few comments via mail to a dummy mail account
A scheduled process accesses the dummy mail account and scans for new mails from authorized senders.
If there is a new mail, R extracts tweet URL and uses Twitter’s API to download the tweet’s pictures and texts.
A template Markdown file is used to create a new note that contains the images and texts.
Markdown file is copied to your note-taking system within your file system.
Ideally, your Markdown template contains tags like #dataviz and #twitter so that your new note can be easily searched for.
Next time you look for inspiration, stroll through your collections or search for comments.
Ok, we know what we want to accomplish. Time to get the prelims done.
First, we will need a Twitter developer account. Then, we have to mask
sensitive information in our code. If you already have a twitter app
resp. a bearer token and know the
keyring package, feel free to skip
Get Twitter developer account
Let’s create a developer account for Twitter. Unfortunately, there is no way to get such an account without providing Twitter with your phone number. Sadly, if this burden on your privacy is a problem for you, then you cannot proceed. Otherwise, create an account at developer.twitter.com.
In your developer portal, create a project. Within this project create an app. Along the way, you will get a bunch of keys, secrets, IDs and tokens. You will see them only once, so you will have to save them somewhere. I suggest saving them into a password manager like bitwarden.
When you create your app or shortly after, you will need to set the
authentication settings. I use
OAuth 2.0. This requires
- type of app:
Automated bot or app
- Callback URI / Redirect URI:
http://127.0.0.1:1410(DISCLAIMER: This is magic to me but the
rtweetdocs - or possibly some other doc (not entirely sure anymore)- taught me to set up an app that way)
- Website URL: Your Twitter link (in my case
Next, you will likely need to upgrade your project to ‘elevated’ status.
This can be done for free on your project’s dashboard. From what I
recall, you will have to fill out a form and tell Twitter what you want
to do with your app. Just be honest and chances are that your request
will immediately be granted. Just be yourself! What could possibly go
wrong? Go get the
girl elevated status (ahhh, what a perfect
opportunity for a Taylor
How to embed your bearer token and other sensitive material in your code
keyring package to first save secrets via
key_set and then
extract them in your session via
key_get(). This way, you won’t share
your sensitive information by accident when you share your code (like I
do). In this post, I do this for my bearer token, my dummy mail, my
dummy mail’s password and for the allowed senders (that will be the mail
where the tweets come from).
allowed_senders limitation is a precaution so that we do not
accidentally download some malicious spam mail from God knows who onto
our computer. I am no security expert but this feels like a prudent
thing to do. If one of you fellow readers knows more about this security
business, feel kindly invited to reach out to me with better security
What to do once we have a URL
Let’s assume for the sake of this section that we already extracted a
tweet URL from a mail. Here’s the URL that we will use. In fact, it’s
Christian Gebhard’s tweet that inspired
me to start this project. From the URL we can extract the tweet’s ID
(the bunch of numbers after
/status/). Also, we will need the URL of
Use GET() to access Twitter API
Next, we use the
GET() function from the
httr package to interact
with Twitter’s API.
So, how do we know how to use the
GET() function? Well, I am no expert
on APIs but let me try to explain how I came up with the arguments I
Remember those toys you would play with as a toddler where you try to get a square through a square-shaped hole, a triangle through a triangle-shaped hole and so on? You don’t? Well, neither do I. Who remembers that stuff from very early childhood?
But I hear that starting a sentence with “Remember those…” is good for building a rapport with your audience. So, great! Now that we feel all cozy and connected, I can tell you how I managed to get the API request to work.
And the truth is actually not that far from the toddler “intelligence
test”. First, I took a
look at a help
from Twitter’s developer page. Then, I hammered at the
until its output contained a URL that looks similar to the example I
found. Here’s the example code I was aiming at.
curl --request GET 'https://api.twitter.com/2/tweets?ids=1263145271946551300& expansions=attachments.media_keys& media.fields=duration_ms,height,media_key,preview_image_url,public_metrics,type,url,width,alt_text' --header 'Authorization: Bearer $BEARER_TOKEN'
This is not really R code but it looks like usually you have to feed a GET request with a really long URL. In fact, it looks like the URL needs to contain everything you want to extract from the API. Specifically, the structure of said URL looks like
- the API’s base URL (in this case https://api.twitter.com/2/tweets)
- a question mark
- pairs of
ids) and a specific value, e.g.
ids=1263145271946551300, that are connected via
Therefore, it is only a matter of figuring out how to make the output of
GET() deliver this result. Hints on that came from
GET() examples in
So, the first example shows how an argument
query can be filled with a
list that creates the URL we need. The second examples shows us that
there is something called
add_headers(). Do I know exactly what that
is? I mean, from a technical perspective of what is going on behind the
scenes? Definitely not. But Twitter’s example request had something
called header. Therefore,
add_headers() is probably something that
does what the Twitter API expects.
Alright, we successfully requested data. Now, it becomes time to parse
it to something useful. The
content() function will to that.
Extract tweet data from what the API gives us and download images
We have seen that
parsed_request is basically a large list that
contains everything we requested from the API. Unfortunately, it is a
highly nested list, so we have to do some work to extract the parts we
pluck() from the
purrr package is our best friend on
this one. Here’s all the information we extract from the
Next, download all the images via the
We will use
walk2() to download all files (in case there are multiple
images/URLs) and save the files into PNGs that are named using the
tweet_date IDs. Remember to set
mode = 'wb' in
download.file(). I am not really sure why but without it you will save
poor quality images.
So let’s do a quick recap of what we have done so far. We
- Assembled an API request
- Parsed the return of the request
- Cherrypicked the information that we want from the resulting list
- Used the image URLs to download and save the files to our working directory.
Let’s cherish this mile stone with a dedicated function.
Fill out Markdown template using extracted information and images
We have our images and the original tweet now. Thanks to our previous function, we can save all of the information in a list.
So, let’s bring all that information into a Markdown file. Here is the
template.md file that I have created for this joyous occasion.
As you can see, I started the Markdown template with two tags
![[...]] and added a placeholder
insert_img_name_here. This one will
be replaced by the file path to the image. Similarly, other placeholders
insert_mail_here allow me to save the
tweet and the mail content into my note taking system too.
To do so, I will need a function that replaces all the placeholders. First, I created a helper function that changes the image import placeholder properly, when there are multiple images.
Then, I created a function that takes the
request list that we got
from calling our own
request_twitter_data() function and iteratively
str_replace_all(). This iteration is done with
will replace all placeholders in
As you can see, my
replace_template_placeholder() function also
replaces the typical
# from Twitter with
(#). This is just a
precaution to avoid wrong interpretation of these lines as headings in
Markdown. Also, the original mail has not been inserted yet because we
have no mail yet. But soooon. Finally, we need to write the replaced
strings to a file. I got some helpers for that right here.
Shuffle files around on your file system
Awesome! We created new image files and a new Markdown note in our working directory. Now, we have to move them to our Obsidian vault. This is the place where I collect all my Markdown notes for use in Obsidian. In my case, I will need to move the Markdown note to the vault directory and the images to a subdirectory within this vault. This is because I changed settings in Obsidian that makes sure that all attachments, e.g. images, are saved in a separate subdirectory.
Here’s the function I created to get that job done. The function uses
request list again because it contains the file paths of the
attachments_dir are the file paths
to my Obsidian vault.
How to extract URL and other stuff from mail
Let’s take a quick breather and recap. We have written functions that
- take a tweet URL
- hussle the Twitter API to give us all its data
- download the images and tweet text
- save everything to a new Markdown note based on a template
- can move the note plus images to the location of our note-taking hub
Not to brag but that is kind of cool. But let’s not rest here. We still
have to get some work done. After all, we want our workflow to be
email-based. So, let’s access our mails using R. Then, we can extract a
Twitter URL and apply our previous functions. Also, this lets us finally
insert_mail_here placeholder in our Markdown note.
Postman gives you access
I have created a dummy mail account at gmail. Using the
package, we can establish a connection to our mail inbox. After the
connection is established, we can filter for all new emails that are
sent from our list of
Grab URLs from mail
mails is not empty, i.e. if there are new mails, then we need to
extract the tweet URLs from them. Unfortunately, depending on where you
sent your email from, the mail text can be encoded.
For example, I send most of the tweets via the share button on Twitter
using my Android smartphone. And for some reason, my Android mail client
encodes the mails in something called
base64. But sending a tweet URL
from Thunderbird on my computer works without any encoding. Here are two
example mails I have sent to my dummy mail account.
As you can see, the mail sent from my computer is legible but the other
one is gibberish. Thankfully, Allan Cameron helped me out on
to decode the mail. To decode the mail, the trick was to extract the
There are two such texts in the encoded mail. Surprisingly, the first
one decoded to a text without line breaks. This is why we take the
second encoded part and decode it. However, this will give us an HTML
text with all kinds of tags like
<div> and what not. Therefore, we use
html_text2() from the
rvest package to handle
that. All of this is summarized in this helper function.
I feel like this is the most hacky part of this blog post. Unfortunately, your milage may vary here. If your phone or whatever you use encodes the mails differently, then you may have to adjust the function. But I hope that I have explained enough details and concepts for you to manage that if it comes to this.
Recall that I send both plain mails from Thunderbird and encoded mails from Android. Therefore, here is another helper that decoded mails if neccessary from both types in one swoop.
The remaining part of the code should be familiar:
- Grab URLs with
request_twitter_data()with our URLs
- Replace placeholders with
- This time, replace mail placeholders too with another
- Move files with
The only new thing is that we use our postman connection to move the processed mails into a new directory (which I called “Processed”) on the email server. This way, the inbox is empty again or filled only with mails from unauthorized senders.
Last Step: Execute R script automatically
Alright, alright, alright. We made it. We have successfully
- extracted URLs from mails,
- created new notes and
- moved them to their designated place
The only thing that is left to do is execute this script automatically. Again, if you don’t want to assemble the R script yourself using the code chunks in this blog post, check out this GitHub gist.
On Windows, you can write a VBS script that will execute the R script. Window’s task scheduler is easily set up to run that VBS script regularly, say every hour. For completeness' sake let me give you an example VBS script. But beware that I have no frikkin clue how VBS scripts work beyond this simple call.
Set wshshell = WScript.CreateObject ("wscript.shell") wshshell.run """C:\Program Files\R\R-4.0.5\bin\Rscript.exe"" ""D:\Local R Projects\Playground\TwitterTracking\my_twitter_script.R""", 6, True set wshshell = nothing
The idea of this script is to call
Rscript.exe and give it the
location of the R script that we want to execute. Of course, you will
need to adjust the paths to your file system. Notice that there are
super many double quotes in this script. This is somewhat dumb but it’s
the only way I could find to make file paths with white spaces work (see
On Ubuntu (and probably other Unix-based systems), I am sure that every Unix user knows that there is CronTab to schedule regular tasks. On Mac, I am sure there is something. But instead of wandering even further from my expertise, I will refer to your internet search skills.
Mind the possibilities
We made it! We connected to Twitter’s API and our dummy email to get data viz (what’s the plural here? viz, vizz, vizzes, vizzeses?) into our note-taking system. Honestly, I think that was quite an endeavor. But now we can use the same ideas for all kind of other applications! From the top of my head I can think of more scenarios where similar solutions should be manageable. Here are two ideas.
Take notes on the fly using emails and automatically incorporate the emails into your note-taking system.
Take a photo from a book/text you’re reading and send it to another dummy mail. Run a script that puts the photo and the mail directly into your vault.