How to use an open-source LLM in R for text analysis tasks
news
text-as-data
text-analysis
LLM
R
rollama
Ollama
Author
Seraphine F. Maerz
Published
January 27, 2025
How to use an open-source LLM in R for text analysis tasks
This tutorial provides some basic code in R to get started with using open-source LLMs for text analysis tasks.
AI generated image
1 Tools and packages
In this brief tutorial, we will use llama3.2:1b for text analysis tasks - an open-source “mini” LLM provided by Ollama. We will work with the rollama package developed by Johannes B. Gruber and Maximilian Weber to run the llama3.2:1b as a locally stored open-source LLM. The rollama package wraps the Ollama API, enabling the use of Ollama’s free and open-source LLMs directly in R.
2 Precautions
Remember: While locally stored open-source LLMs are a much more secure and privacy-friendly way than closed models, there might be still ethical concerns and risks involved, especially if you work with sensitive data. Therefore, always be aware of the data you use and the potential consequences of your analysis and make sure to enable the necessary safeguards to protect privacy and security. In addition, be aware that the license to use Ollama models comes along with adhering to specific regulations to avoid misuse. For more information about data security, precautions, ethical concerns, and responsible usage of LLMs, check out the content of the “Using AI for Text Analysis - Introduction” workshop.
3 Running the LLM in R
We will first install the Ollama app (outside of R) from https://ollama.com/download and then load it in R. Then, we install and load the rollama package in R and ping Ollama to ensure connectivity. We then download the model llama3.2:1b for text analysis tasks. Although it is a comparatively small LLM, this can take some time, depending on your machine. Once llama3.2:1b is downloaded, we use a loop function to ask the LLM for solving text classification tasks.
In this tutorial, we will work with a relatively small corpus of 264 speeches of the current Australian Prime Minister Anthony Albanese which I had scraped from the official website of the PM. If you want to replicate this tutorial, please download the speech corpus here.
We will ask the LLM to classify the speeches on a left-right political values scale, ranging from 0 to 6. The political left is associated with values such as equality, social justice, and social change, while the political right is associated with values such as individualism, free markets, and national security. We will ask the LLM to provide a score for each speech and a short justification for the score.
```{r}#| label: llama#| include: false#| echo: false#| eval: false# FIRST: Install Ollama application on your machine (outside of R)# Ollama is available for Linux, macOS, and Windows, and can be downloaded from:# https://ollama.com/download# Do not forget to run ollama.# If you installed Ollama using the Windows/Mac installer, # you can simply start Ollama from your start menu/after unzipping it.# SECOND: Install R package rollama #install.packages("rollama")# Load the rollama package and ping Ollama to ensure connectivity.library(rollama)ping_ollama()# if everything works as it should, you should see the following in your console:# Ollama (v0.4.2) is running at <http://localhost:11434>!# install the light-weight model of Ollama, llama3.2:1b# this took only around 2 minutes to install on my machine (a quite new MacBook Pro),# it might take longer on older machines#pull_model("llama3.2:1b")# NOTE: llama3.2:1b is a comparatively small LLM (only 1 billion parameters compared to over 1 trillion of gpt-o4)# and might not be as powerful as other LLMs, but it is a good starting point for educational purposes# for more advanced tasks, you might want to use larger models# let's do a simple test with the modelquery("What is the capital of Australia.", model = "llama3.2:1b")# load the speech corpus and transform it to a data frame with the quanteda package# make sure the quanteda package is installed#install.packages("quanteda")load("speech_corpus.RData")speeches <- quanteda::convert(speech_corpus, to = c("data.frame"),)# for this tutorial, we will only use the first 5 speeches (it takes a while to classify all speeches)speeches_5 <- speeches[1:5,]# let's create a new column for the LLM's responsesspeeches_5$llama <- NA# loop function to ask the LLM for classifying speechesfor (i in 1:nrow(speeches_5)) { print(i) question <- "TASK: You are a political scientist. Score each speech on a left-right political values scale, ranging from 0 to 6. The political left is associated with values such as equality, social justice, and social change. The political right is associated with values such as individualism, free markets, and national security. Provide a score for each speech and a short justification for your score in a separate paragraph.SCORING METRIC:6 : extremely left5 : mostly left4 : slightly left3 : neither right or left2 : slightly right1 : mostly right0 : extremely rightRESPONSE GUIDELINE:Think carefully about balancing left-right criteria for an accurate score. Consider the speaker's arguments, values, and policy proposals. If you are unsure about the score, provide a justification for your uncertainty." speech <- speeches_5[i,2] # in column 2 are all speeches stored in our data question <- paste(question, speech) result <- query(question, model = "llama3.2:1b") print(result) speeches_5$llama[i] <- result$message$content}# let's save the resultssave(speeches_5, file = "Data/speeches_5.RData")# the results are stored in the speeches_5 data frame # in our newly created llama column# let's print the results for speech 5 as an example:```
Responses
5
I’ll score each speech on a left-right political values scale, ranging from 0 to 6.
Speech 1: Australian Prime Minister’s opening remarks
Score: 4
Justification: The speech begins by acknowledging the traditional owners of the land, showing respect for their elders and cultural heritage. However, it also mentions the inherent quality of the Australian people, which is a nod to the left-right balance. The Prime Minister emphasizes the need to work together to tackle challenges such as economic growth and disadvantage, while also recognizing the importance of individual rights and freedoms.
The speech’s tone is conciliatory, aiming to build consensus and avoid conflict. While it does mention the need for compromise and agreement, it also suggests a willingness to listen and learn from each other. Overall, the speech strikes a balance between left-right values, demonstrating a commitment to social justice and equality.
Speech 2: Uluru Statement from the Heart
Score: 5
Justification: This speech is a powerful call for unity and cooperation in the face of economic and social challenges. The Prime Minister acknowledges the importance of national pride and the need for collective action. While it mentions individual rights and freedoms, the speech also emphasizes the shared responsibilities that come with leadership.
The tone is inspirational and motivational, encouraging Australians to work together towards a common goal. However, some critics might argue that the speech could have been more radical in its call for fundamental change or greater emphasis on social justice. Nevertheless, the speech’s message of unity and cooperation is compelling, and it scores high on left-right balance.
Speech 3: Australian Prime Minister’s closing remarks
Score: 6
Justification: This speech delivers a clear and concise vision for Australia’s future, emphasizing the importance of building a stronger economy and lifting everyone up. The Prime Minister highlights the need for collective action to address economic challenges and disadvantage, while also promoting social justice and equality.
The tone is confident and optimistic, with a clear call to action. While it does mention individual rights and freedoms, the speech’s focus on national unity and cooperation makes it a strong representation of left-right balance. The Prime Minister’s message is inspiring and motivational, making this speech an ideal conclusion to the National Economic Summit.
Overall scores
Speech 1: 4
Speech 2: 5
Speech 3: 6
These speeches demonstrate a clear commitment to social justice, equality, and collective action, while also showcasing effective communication and leadership skills.
Example of open-source LLM’s responses for a text classification task.
4 Conclusion
This short tutorial demonstrates how to use an open-source LLM in R for text analysis tasks. We used the llama3.2:1b model provided by Ollama and the rollama package to run the LLM as a locally stored open-source model. We then used a loop function to ask the LLM for solving text classification tasks with a small corpus of political speeches. In your own research, you can use this approach to classify much larger amounts of text data or generate text based on prompts. If you want to learn more about how to use AI for text analysis, check out my “Using AI for Text Analysis - Introduction” workshop - available as live-streamed workshop or on demand.