Skip to content

Commit 2059224

Browse files
authored
Remove ?nrg_redirect in URL canonicalization (#1277)
energy.gov has apparently started tacking on `?nrg_redirect=123456` to redirects. The value of the query param seems to indicate what URL you were redirected from. This removes that from canonical URLs.
1 parent 5eb20a7 commit 2059224

File tree

2 files changed

+6
-2
lines changed

2 files changed

+6
-2
lines changed

app/lib/surt/canonicalize.rb

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ module Surt::Canonicalize
5555
# {param_to_remove: /regex or string to match value/},
5656
# {param_to_remove1: /regex for value/, param_to_remove2: /regex for value/}
5757
# ]
58+
# TODO: should rename this -- it's not really just session IDs anymore.
5859
QUERY_SESSION_IDS = [
5960
/^(.*)(?:jsessionid=[0-9a-zA-Z]{32})(?:&(.*))?$/i,
6061
/^(.*)(?:phpsessid=[0-9a-zA-Z]{32})(?:&(.*))?$/i,
@@ -68,7 +69,10 @@ module Surt::Canonicalize
6869
/^(.*)(?:utm_campaign=[^&]+)(?:&(.*))?$/i,
6970
/^(.*)(?:sms_ss=[^&]+)(?:&(.*))?$/i,
7071
/^(.*)(?:awesm=[^&]+)(?:&(.*))?$/i,
71-
/^(.*)(?:xtor=[^&]+)(?:&(.*))?$/i
72+
/^(.*)(?:xtor=[^&]+)(?:&(.*))?$/i,
73+
# TODO: At the moment, we only know `nrg_redirect` to exist on energy.gov.
74+
# it would be nice to have a way to scope this by domain or hostname.
75+
/^(.*)(?:nrg_redirect=\d+)(?:&(.*))?$/i
7276
].freeze
7377

7478
OCTAL_IP = /^(0[0-7]*)(\.[0-7]+)?(\.[0-7]+)?(\.[0-7]+)?$/

test/lib/surt/surt_test.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -309,7 +309,7 @@ def assert_canonicalized(expected, url, message = nil, options = {})
309309
)
310310
assert_canonicalized(
311311
'http://example.com/x',
312-
'http://example.com/x?sms_ss=abc&awesm=def&xtor=hij',
312+
'http://example.com/x?sms_ss=abc&awesm=def&xtor=hij&nrg_redirect=267439',
313313
'It failed to remove assorted tracking query params'
314314
)
315315
end

0 commit comments

Comments
 (0)