| Title: | R-Forge Forum Scraper (Saver) |
|---|---|
| Description: | Small package for scraping the content of specific R-Forge forums and converting it into a static Quarto website. |
| Authors: | Reto Stauffer [cre, aut] (ORCID: <https://orcid.org/0000-0002-3798-5507>) |
| Maintainer: | Reto Stauffer <[email protected]> |
| License: | GPL-2 | GPL-3 |
| Version: | 0.1-0 |
| Built: | 2026-06-10 11:40:08 UTC |
| Source: | http://codeberg.org/retostauffer/forgescraper |
Helper function to download and store attachments.
Helper function to download and store attachments.
download_attachment(x, dir, verbose = FALSE) download_attachment(x, dir, verbose = FALSE)download_attachment(x, dir, verbose = FALSE) download_attachment(x, dir, verbose = FALSE)
x |
named list with attachment name, url, and attachment_id (int). |
dir |
character, name of the 'thread output directory'. |
verbose |
logical, if set TRUE some output is shown. |
Invisibly returns the result of the request (typically unused).
Invisily returns the result of the request (typically unused).
Reto
Downloading Forum Topics
fs_get_all_threads(config, forums, verbose = TRUE)fs_get_all_threads(config, forums, verbose = TRUE)
config |
an object of class |
forums |
object of class |
verbose |
logical, defaults to TRUE. |
Returns a list of tibble data frame with all threads for the forums
listed in forums.
This information is also stored in a file
project_{PROJECT}/threads_{forum_id}.rds. If that file already exists, the
content of the file (tibble data frame) is loaded from disc. Else, the
website (forum) is scraped, the result stored to the RDS file mentioned
above and re-used if the function is called next time.
Reto
Setting up a config object with the required details which project
to scrape. Also allows for a few arguments forwarded to httr2 to
throttle the HTTP requests (being nice to the server).
fs_get_config( project, title, syntaxhighlighting = FALSE, timeout = 60, capacity = 1, filltime = 1 )fs_get_config( project, title, syntaxhighlighting = FALSE, timeout = 60, capacity = 1, filltime = 1 )
project |
character, name of the project (e.g., |
title |
character, title of the project (can be any title). |
syntaxhighlighting |
logical, defaults to |
timeout |
positive numeric, defaults to 60 seconds.
Forwarded to |
capacity |
positive numeric, defaults to 1.
Forwarded to |
filltime |
positive numeric, defaults to 1.
Forwarded to |
Returns an object of class fs_config containing the
details from a dot-env file as well as a request handler used
for scraping the website.
Reto
Whilst downloading (scraping) the content of the forum,
the topic/message details are stored in a forum_{forum_id}
folder. This function loads, checks, and returns all
the details to be used to create the Quarto website.
fs_get_forum_details(config, verbose = TRUE, n = NULL)fs_get_forum_details(config, verbose = TRUE, n = NULL)
config |
an object of class |
verbose |
logical, produces some output if set TRUE. |
n |
|
Expects two files located in a folder
named forum_{forum_id}, namely:
If any of the messages comes with an attachment, a folder
forum_{forum_id}/attachments exist containing the original attachments.
Will be checked by this function.
Object of class fs_forum_details, a named list with three
elements, namely threads (contains the list of all threads of the forum),
messages (list of messages, one for each thread), users (user profile
information) and config (a copy of the argument config).
Reto
Getting Available Forums
fs_get_forums(config, verbose = TRUE)fs_get_forums(config, verbose = TRUE)
config |
an object of class |
verbose |
logical, defaults to TRUE. |
Returns a tibble data frame with all available forums (if any). else an error will be thrown.
Reto
Scraping Messages from Thread/Topic
fs_get_messages(threads, config, verbose = TRUE)fs_get_messages(threads, config, verbose = TRUE)
threads |
object of class |
config |
an object of class |
verbose |
logical, will procucd some output if it evaluates to |
Invisily returns a list a named list of tibble data frames with all
messages for a specific thread, identified by the list names (thread ID).
This return is typically not used, as the scraped details are also stored in
the RDS files project_{config$PROJECT_NAME}/thread_{thread_id}.rds to be
used later creating the Quarto markdown files, alongside a
project_{config$PROJECT_NAME}/users.rds file with user details.
Reto
Based on the forum details found on disc (see fs_get_forum_details()) this
function creates the quarto sources for the (static) Quarto website.
fs_write_quarto( x, overwrite = FALSE, verbose = TRUE, forum_title = c(`open-discussion` = "Open Discussion", help = "Help", developers = "Developers") )fs_write_quarto( x, overwrite = FALSE, verbose = TRUE, forum_title = c(`open-discussion` = "Open Discussion", help = "Help", developers = "Developers") )
x |
object of class |
overwrite |
logical, defaults to |
verbose |
logical, produces some output if set TRUE. |
forum_title |
a named character vector to translate the forum names. The name of the vector corresponds to the 'original' forum name, the value is used for displaying the forum on the Quarto website. |
Reto
From an attachment link node, all required details are extracted
and returned as a named list. This is used to later download
the attachment and save locally (see download_attachment()).
get_attachment_info(x) get_attachment_info(x)get_attachment_info(x) get_attachment_info(x)
x |
an xml_node of an attachment. |
A named list with the name of the file, the URL for downloading the file later on, as well as the file ID.
Returns a named list with the name of the file, the URL for downloading the file later on, as well as the file ID.
Reto