-
Notifications
You must be signed in to change notification settings - Fork 566
/
Copy pathscaling-security.Rmd
186 lines (139 loc) · 8.24 KB
/
scaling-security.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
# Security {#scaling-security}
```{r, include = FALSE}
source("common.R")
```
Most Shiny apps are deployed within a company firewall and since you can generally assume that your colleagues aren't going to try and hack your app[^scaling-security-1], you don't need to think about security.
If, however, your app contains data that only some of your colleagues should be able to access, or you want to expose your app to the public, you will need to spend some time on security.
When securing your app, there are two main things to protect:
[^scaling-security-1]: If you can't assume that, you have bigger problems!
That said, some companies do have a "zero-trust" model, so you should double check with your IT team.
- Your data: you want to make sure an attacker can't access any sensitive data.
- Your compute resources: you want to make sure an attacker can't mine bitcoin or use your server as part of a spam farm.
Fortunately your job is made a little easier because security is a team sport.
Whoever deploys your app is responsible for security **between** apps, ensuring that app A can't access the code or data in app B, and can't steal all the memory and compute power on the server.
Your responsibility is the security **within** your app, making sure that an attacker can't abuse your app to achieve their ends.
This chapter will give the basics of securing your Shiny, broken down into securing your data and securing your compute resources.
If you're interested in learning a little more about security and R in general, I highly recommend Colin Gillespie's entertaining and educational useR!
2019 talk, "[R and Security](https://www.youtube.com/watch?v=5odJxZj9LE4)".
```{r setup}
library(shiny)
```
## Data
The most sensitive data is stuff like personally identifying information (PII), regulated data, credit card data, health data, or anything else that would be a legal nightmare for your company if was made public.
Fortunately, most Shiny apps don't deal with those types of data[^scaling-security-2], but there is an important type of data you do need to worry about: passwords.
You should never include passwords in the source code of your app.
Instead either put them in environment variables, or if you have many use the [config](https://github.com/rstudio/config) package.
Either way, make sure that they are never included in your source code control by adding the appropriate files to `.gitignore`. I also recommend documenting how a new developer can get the appropriate credentials.
[^scaling-security-2]: If your app does work these types of data, it's imperative that you partner with a software engineer with security expertise.
Alternatively, you may have data that is user-specific.
If you need to **authenticate** users, i.e. identify them through a user name and password, never attempt to roll a solution yourself.
There are just too many things that might go wrong.
Instead, you'll need to work with your IT team to design a secure access mechanism.
You can see some best practices at <https://solutions.posit.co/secure-access/auth/kerberos/index.html> and <https://solutions.posit.co/connections/db/best-practices/deployment/>.
Note that code within `server()` is isolated so there's no way for one user session to see data from another.
The only exception is if you use caching --- see Section \@ref(cache-scope) for details.
Finally, note that Shiny inputs use client-side validation, i.e. the checks for valid input are performed by JavaScript in the browser, not by R.
This means it's possible for a knowledgeable attacker to send values that you don't expect.
For example, take this simple app:
```{r, eval = FALSE}
secrets <- list(
a = "my name",
b = "my birthday",
c = "my social security number",
d = "my credit card"
)
allowed <- c("a", "b")
ui <- fluidPage(
selectInput("x", "x", choices = allowed),
textOutput("secret")
)
server <- function(input, output, session) {
output$secret <- renderText({
secrets[[input$x]]
})
}
```
You might expect that a user could access my name and birthday, but not my social security number or credit card details.
But a knowledgeable attacker can open up a JavaScript console in their browser and run `Shiny.setInputValue("x", "c")` to see my SSN.
So to be safe, you need to check all user inputs from your R code:
```{r}
server <- function(input, output, session) {
output$secret <- renderText({
req(input$x %in% allowed)
secrets[[input$x]]
})
}
```
I deliberately didn't create a user friendly error message --- the only time you'd see it was if you're trying to break the app, and we don't need to help out an attacker.
## Compute resources
It's hopefully obvious that the following app is very dangerous, because it allows the user to run any R code they want.
They could delete important files, modify data, or send confidential data back to the user of the app.
```{r}
ui <- fluidPage(
textInput("code", "Enter code here"),
textOutput("results")
)
server <- function(input, output, session) {
output$results <- renderText({
eval(parse(text = input$code))
})
}
```
In general, the combination of `parse()` and `eval()` is a big warning sign for any Shiny app[^scaling-security-3]: they instantly make your app vulnerable.
Similarly, you should never `source()` an uploaded `.R` file, or `rmarkdown::render()` an uploaded `.Rmd`. But these cases are pretty obvious, and are unlikely to be source of real problems.
[^scaling-security-3]: The only exception is if they don't involve user-supplied data in any way.
The bigger challenge arises because there are a number of functions that `parse()`, `eval()`, or both, in a way that you're not aware of.
Here are the most common:
- **Model formulas**.
It's possible to construct a model that executes arbitrary R code:
```{r}
df <- data.frame(x = 1:5, y = runif(5))
mod <- lm(y ~ {print("Hi!"); x}, data = df)
```
This makes it difficult to safely allow a user to define their own models.
- **Glue labels**.
The glue package provides a powerful way to create strings from data:
```{r}
title <- "foo"
number <- 1
glue::glue("{title}-{number}")
```
But `glue()` evaluates anything inside of `{}`:
```{r}
glue::glue("{title}-{print('Hi'); number}")
```
If you want to allow a user to supply a glue string to generate a label, instead use `glue::glue_safe()` which only looks up variable names and doesn't evaluate code:
```{r, error = TRUE}
glue::glue_safe("{title}-{number}")
glue::glue_safe("{title}-{print('Hi'); number}")
```
- **Variable transformation.** There's no way to safely allow a user to provide code snippets to transform a variable for dplyr or ggplot2.
You might expect they'll write `log10(x)` but they could write `{print("Hi"); log10(x)}`
This also means that you should never use the older `ggplot2::aes_string()` with user supplied input.
Instead, stick with the techniques in Chapter \@ref(action-tidy).
The same problem can occur with SQL.
For example, if you construct SQL with `paste()`, e.g.:
```{r}
find_student <- function(name) {
paste0("SELECT * FROM Students WHERE name = ('", name, "');")
}
find_student("Hadley")
```
An attacker can provide a malicious username:[^scaling-security-4]
[^scaling-security-4]: [\<https://xkcd.com/327/\>](https://xkcd.com/327/){.uri}
```{r}
find_student("Robert'); DROP TABLE Students; --")
```
This looks a bit odd, but it's a valid SQL query in three parts:
- `SELECT * FROM Students WHERE name = ('Robert');` finds a student with name Robert.
- `DROP TABLE Students;` deletes the `Students` table (!!).
- `--'` is a comment needed to prevent the extra `'` from turning into a syntax error.
To avoid this problem, never generate SQL strings with paste and instead use system that automatically escapes user input (like [dbplyr](https://dbplyr.tidyverse.org)), or use `glue::glue_sql()`:
```{r}
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
find_student <- function(name) {
glue::glue_sql("SELECT * FROM Students WHERE name = ({name});", .con = con)
}
find_student("Robert'); DROP TABLE Students; --")
```
It's a little hard to tell at first glance, but this is safe, because SQL's equivalent of `\'` is `''` so the query returns all rows of the `Students` table where the name is literally "Robert'); DROP TABLE Students; --".