Skip to content

Commit b8cd4a2

Browse files
committed
Add VGG network
1 parent 7969e7e commit b8cd4a2

8 files changed

+273
-132
lines changed

README.md

+22-9
Original file line numberDiff line numberDiff line change
@@ -17,26 +17,43 @@ luarocks install inn
1717

1818
## Usage
1919

20+
First, download the models by running the download script:
21+
22+
```
23+
bash download_models.sh
24+
```
25+
26+
This downloads the model weights for the VGG and Inception networks.
27+
2028
Basic usage:
2129

2230
```
2331
qlua main.lua --style <style.jpg> --content <content.jpg> --style_factor <factor>
2432
```
2533

26-
where `style.jpg` is the image that provides the style of the final generated image, and `content.jpg` is the image that provides the content. `style_factor` is a constant that controls the degree to which the generated image emphasizes style over content. By default it is set to 5E9.
34+
where `style.jpg` is the image that provides the style of the final generated image, and `content.jpg` is the image that provides the content. `style_factor` is a constant that controls the degree to which the generated image emphasizes style over content. By default it is set to 1E9.
2735

28-
The optimization of the generated image is performed on GPU. On a 2014 MacBook Pro with an NVIDIA GeForce GT 750M, it takes a little over 4 minutes to perform 500 iterations of gradient descent.
36+
This generates an image using the VGG-19 network by Karen Simonyan and Andrew Zisserman (http://www.robots.ox.ac.uk/~vgg/research/very_deep/).
2937

3038
Other options:
3139

40+
- `model`: {inception, vgg}. Convnet model to use. Inception refers to Google's [Inception architecture](http://arxiv.org/abs/1409.4842). Default is VGG.
3241
- `num_iters`: Number of optimization steps. Default is 500.
3342
- `size`: Long edge dimension of the generated image. Set to 0 to use the size of the content image. Default is 500.
3443
- `display_interval`: Number of iterations between image displays. Set to 0 to suppress image display. Default is 20.
35-
- `smoothness`: Constant that controls smoothness of generated image (total variation norm regularization strength). Default is 6E-3.
44+
- `smoothness`: Constant that controls smoothness of generated image (total variation norm regularization strength). Default is 1E-4.
3645
- `init`: {image, random}. Initialization mode for optimized image. `image` initializes with the content image; `random` initializes with random Gaussian noise. Default is `image`.
3746
- `backend`: {cunn, cudnn}. Neural network CUDA backend. `cudnn` requires the [Torch bindings](https://github.com/soumith/cudnn.torch/tree/R3) for CuDNN R3.
3847
- `optimizer`: {sgd, lbfgs}. Optimization algorithm. `lbfgs` is slower per iteration and consumes more memory, but may yield better results. Default is `sgd`.
3948

49+
### Out of memory?
50+
51+
The VGG network with the default L-BFGS optimizer gives the best results. However, this setting also requires a lot of GPU memory. If you run into CUDA out-of-memory errors, try running with the Inception architecture or with the SGD optimizer:
52+
53+
```
54+
qlua main.lua --style <style.jpg> --content <content.jpg> --model inception --optimizer sgd
55+
```
56+
4057
## Examples
4158

4259
The Eiffel Tower in the style of Van Gogh's *Starry Night*:
@@ -59,17 +76,13 @@ Picasso-fied Obama:
5976

6077
## Implementation Details
6178

62-
The primary difference between this implementation and the paper is that it uses Google's Inception architecture instead of VGG. Consequently, the hyperparameter settings differ from those given in the paper (they have been tuned to give aesthetically pleasing results).
63-
64-
The outputs of the following layers are used to optimize for style: `conv1/7x7_s2`, `conv2/3x3`, `inception_3a`, `inception_3b`, `inception_4a`, `inception_4b`, `inception_4c`, `inception_4d`, `inception_4e`.
79+
When using the Inception network, the outputs of the following layers are used to optimize for style: `conv1/7x7_s2`, `conv2/3x3`, `inception_3a`, `inception_3b`, `inception_4a`, `inception_4b`, `inception_4c`, `inception_4d`, `inception_4e`.
6580

6681
The outputs of the following layers are used to optimize for content: `inception_3a`, `inception_4a`.
6782

68-
By default, optimization of the generated image is performed using gradient descent with momentum of 0.9. The learning rate is decayed exponentially by 0.75 every 100 iterations. L-BFGS can also be used.
69-
7083
By default, the optimized image is initialized using the content image; the implementation also works with white noise initialization, as described in the paper.
7184

72-
In order to reduce high-frequency "screen door" noise in the generated image, total variation regularization is applied (idea from [cnn-vis](https://github.com/jcjohnson/cnn-vis) by [jcjohnson](https://github.com/jcjohnson)).
85+
In order to reduce high-frequency "screen door" noise in the generated image (especially when using the Inception network), total variation regularization is applied (idea from [cnn-vis](https://github.com/jcjohnson/cnn-vis) by [jcjohnson](https://github.com/jcjohnson)).
7386

7487
## Acknowledgements
7588

costs.lua

+89
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
--
2+
-- Cost functions
3+
--
4+
5+
-- compute the Gramian matrix for input
6+
function gram(input)
7+
local k = input:size(2)
8+
local flat = input:view(k, -1)
9+
local gram = torch.mm(flat, flat:t())
10+
return gram
11+
end
12+
13+
function collect_activations(model, activation_layers, gram_layers)
14+
local activations, grams = {}, {}
15+
for i, module in ipairs(model.modules) do
16+
local name = module._name
17+
if name then
18+
if activation_layers[name] then
19+
local activation = module.output.new()
20+
activation:resize(module.output:nElement())
21+
activation:copy(module.output)
22+
activations[name] = activation
23+
end
24+
25+
if gram_layers[name] then
26+
grams[name] = gram(module.output):view(-1)
27+
end
28+
end
29+
end
30+
return activations, grams
31+
end
32+
33+
--
34+
-- gradient computation functions
35+
--
36+
37+
local euclidean = nn.MSECriterion()
38+
euclidean.sizeAverage = false
39+
euclidean:cuda()
40+
41+
function style_grad(gen, orig_gram)
42+
local k = gen:size(2)
43+
local size = gen:nElement()
44+
local size_sq = size * size
45+
local gen_gram = gram(gen)
46+
local gen_gram_flat = gen_gram:view(-1)
47+
local loss = euclidean:forward(gen_gram_flat, orig_gram)
48+
local grad = euclidean:backward(gen_gram_flat, orig_gram)
49+
:view(gen_gram:size())
50+
51+
-- normalization helps improve the appearance of the generated image
52+
local norm = size_sq
53+
if opt.model == 'inception' then
54+
norm = torch.abs(grad):mean() * size_sq
55+
else
56+
norm = size_sq
57+
end
58+
if norm > 0 then
59+
loss = loss / norm
60+
grad:div(norm)
61+
end
62+
grad = torch.mm(grad, gen:view(k, -1)):view(gen:size())
63+
return loss, grad
64+
end
65+
66+
function content_grad(gen, orig)
67+
local gen_flat = gen:view(-1)
68+
local loss = euclidean:forward(gen_flat, orig)
69+
local grad = euclidean:backward(gen_flat, orig):view(gen:size())
70+
if opt.model == 'inception' then
71+
local norm = torch.abs(grad):mean()
72+
if norm > 0 then
73+
loss = loss / norm
74+
grad:div(norm)
75+
end
76+
end
77+
return loss, grad
78+
end
79+
80+
-- total variation gradient
81+
function total_var_grad(gen)
82+
local x_diff = gen[{{}, {}, {1, -2}, {1, -2}}] - gen[{{}, {}, {1, -2}, {2, -1}}]
83+
local y_diff = gen[{{}, {}, {1, -2}, {1, -2}}] - gen[{{}, {}, {2, -1}, {1, -2}}]
84+
local grad = gen.new():resize(gen:size()):zero()
85+
grad[{{}, {}, {1, -2}, {1, -2}}]:add(x_diff):add(y_diff)
86+
grad[{{}, {}, {1, -2}, {2, -1}}]:add(-1, x_diff)
87+
grad[{{}, {}, {2, -1} ,{1, -2}}]:add(-1, y_diff)
88+
return grad
89+
end

download_models.sh

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
echo "downloading inception weights"
2+
curl "http://kaishengtai.github.io/static/projects/neuralart/inception_caffe.th" -o models/inception_caffe.th
3+
echo "downloading vgg weights"
4+
curl "http://kaishengtai.github.io/static/projects/neuralart/vgg_normalized.th" -o models/vgg_normalized.th

inception_caffe.th

-26.7 MB
Binary file not shown.

main.lua

+59-95
Original file line numberDiff line numberDiff line change
@@ -15,125 +15,89 @@ require 'optim'
1515
local pl = require('pl.import_into')()
1616
local printf = pl.utils.printf
1717

18-
paths.dofile('inception.lua')
19-
paths.dofile('images.lua')
20-
2118
local cmd = torch.CmdLine()
2219
cmd:text()
2320
cmd:text('A Neural Algorithm of Artistic Style')
2421
cmd:text()
2522
cmd:text('Options:')
23+
cmd:option('--model', 'vgg', '{inception, vgg}. Model to use.')
2624
cmd:option('--style', 'none', 'Path to style image')
2725
cmd:option('--content', 'none', 'Path to content image')
28-
cmd:option('--style_factor', 5e9, 'Trade-off factor between style and content')
26+
cmd:option('--style_factor', 1e9, 'Trade-off factor between style and content')
2927
cmd:option('--num_iters', 500, 'Number of iterations')
3028
cmd:option('--size', 500, 'Length of image long edge (0 to use original content size)')
3129
cmd:option('--display_interval', 20, 'Iterations between image displays (0 to suppress display)')
32-
cmd:option('--smoothness', 6e-3, 'Total variation norm regularization strength (higher for smoother output)')
30+
cmd:option('--smoothness', 1e-4, 'Total variation norm regularization strength (higher for smoother output)')
3331
cmd:option('--init', 'image', '{image, random}. Initialization mode for optimized image.')
3432
cmd:option('--backend', 'cunn', '{cunn, cudnn}. Neural network CUDA backend.')
35-
cmd:option('--optimizer', 'sgd', '{sgd, lbfgs}. Optimization algorithm.')
36-
local opt = cmd:parse(arg)
33+
cmd:option('--optimizer', 'lbfgs', '{sgd, lbfgs}. Optimization algorithm.')
34+
opt = cmd:parse(arg)
3735
if opt.size <= 0 then
3836
opt.size = nil
3937
end
4038

41-
local euclidean = nn.MSECriterion()
42-
euclidean.sizeAverage = false
43-
euclidean:cuda()
44-
45-
-- compute the Gramian matrix for input
46-
function gram(input)
47-
local k = input:size(2)
48-
local flat = input:view(k, -1)
49-
local gram = torch.mm(flat, flat:t())
50-
return gram
51-
end
52-
53-
function collect_activations(model, activation_layers, gram_layers)
54-
local activations, grams = {}, {}
55-
for i, module in ipairs(model.modules) do
56-
local name = module._name
57-
if name then
58-
if activation_layers[name] then
59-
local activation = module.output.new()
60-
activation:resize(module.output:nElement())
61-
activation:copy(module.output)
62-
activations[name] = activation
63-
end
64-
65-
if gram_layers[name] then
66-
grams[name] = gram(module.output):view(-1)
67-
end
68-
end
39+
paths.dofile('models/util.lua')
40+
paths.dofile('models/vgg19.lua')
41+
paths.dofile('models/inception.lua')
42+
paths.dofile('images.lua')
43+
paths.dofile('costs.lua')
44+
45+
-- check for model files
46+
local inception_path = 'models/inception_caffe.th'
47+
local vgg_path = 'models/vgg_normalized.th'
48+
if opt.model == 'inception' then
49+
if not paths.filep(inception_path) then
50+
print('ERROR: could not find Inception model weights at ' .. inception_path)
51+
print('run download_models.sh to download model weights')
52+
error('')
6953
end
70-
return activations, grams
71-
end
72-
73-
function style_grad(gen, orig_gram)
74-
local k = gen:size(2)
75-
local size = gen:nElement()
76-
local size_sq = size * size
77-
local gen_gram = gram(gen)
78-
local gen_gram_flat = gen_gram:view(-1)
79-
local loss = euclidean:forward(gen_gram_flat, orig_gram)
80-
local grad = euclidean:backward(gen_gram_flat, orig_gram)
81-
:view(gen_gram:size())
82-
83-
-- normalization helps improve the appearance of the generated image
84-
local norm = torch.abs(grad):mean() * size_sq
85-
if norm > 0 then
86-
loss = loss / norm
87-
grad:div(norm)
54+
elseif opt.model == 'vgg' then
55+
if not paths.filep(vgg_path) then
56+
print('ERROR: could not find VGG model weights at ' .. vgg_path)
57+
print('run download_models.sh to download model weights')
58+
error('')
8859
end
89-
grad = torch.mm(grad, gen:view(k, -1)):view(gen:size())
90-
return loss, grad
60+
else
61+
error('invalid model: ' .. opt.model)
9162
end
9263

93-
function content_grad(gen, orig)
94-
local gen_flat = gen:view(-1)
95-
local loss = euclidean:forward(gen_flat, orig)
96-
local grad = euclidean:backward(gen_flat, orig):view(gen:size())
97-
local norm = torch.abs(grad):mean()
98-
if norm > 0 then
99-
loss = loss / norm
100-
grad:div(norm)
101-
end
102-
return loss, grad
103-
end
64+
-- load model
65+
local model, style_weights, content_weights
66+
if opt.model == 'inception' then
67+
style_weights = {
68+
['conv1/7x7_s2'] = 1,
69+
['conv2/3x3'] = 1,
70+
['inception_3a'] = 1,
71+
['inception_3b'] = 1,
72+
['inception_4a'] = 1,
73+
['inception_4b'] = 1,
74+
['inception_4c'] = 1,
75+
['inception_4d'] = 1,
76+
['inception_4e'] = 1,
77+
}
10478

105-
-- total variation gradient
106-
function total_var_grad(gen)
107-
local x_diff = gen[{{}, {}, {1, -2}, {1, -2}}] - gen[{{}, {}, {1, -2}, {2, -1}}]
108-
local y_diff = gen[{{}, {}, {1, -2}, {1, -2}}] - gen[{{}, {}, {2, -1}, {1, -2}}]
109-
local grad = gen.new():resize(gen:size()):zero()
110-
grad[{{}, {}, {1, -2}, {1, -2}}]:add(x_diff):add(y_diff)
111-
grad[{{}, {}, {1, -2}, {2, -1}}]:add(-1, x_diff)
112-
grad[{{}, {}, {2, -1} ,{1, -2}}]:add(-1, y_diff)
113-
return grad
114-
end
79+
content_weights = {
80+
['inception_3a'] = 1,
81+
['inception_4a'] = 1,
82+
}
11583

116-
-- load model
117-
local model = create_model('inception_caffe.th', opt.backend)
118-
collectgarbage()
84+
model = create_inception(inception_path, opt.backend)
85+
elseif opt.model == 'vgg' then
86+
style_weights = {
87+
['conv1_1'] = 1,
88+
['conv2_1'] = 1,
89+
['conv3_1'] = 1,
90+
['conv4_1'] = 1,
91+
['conv5_1'] = 1,
92+
}
11993

120-
-- choose style and content layers
121-
local style_weights = {
122-
['conv1/7x7_s2'] = 1,
123-
['conv2/3x3'] = 1,
124-
['inception_3a'] = 1,
125-
['inception_3b'] = 1,
126-
['inception_4a'] = 1,
127-
['inception_4b'] = 1,
128-
['inception_4c'] = 1,
129-
['inception_4d'] = 1,
130-
['inception_4e'] = 1,
131-
}
94+
content_weights = {
95+
['conv4_2'] = 1,
96+
}
13297

133-
local content_weights = {
134-
['inception_3a'] = 1,
135-
['inception_4a'] = 1,
136-
}
98+
model = create_vgg(vgg_path, opt.backend)
99+
end
100+
collectgarbage()
137101

138102
-- compute normalization factor
139103
local style_weight_sum = 0

inception.lua models/inception.lua

+1-28
Original file line numberDiff line numberDiff line change
@@ -3,34 +3,7 @@
33
-- Ref: Szegedy et al. 2015, Going Deeper with Convolutions
44
--
55

6-
function nn.Module:name(name)
7-
self._name = name
8-
return self
9-
end
10-
11-
function nn.Module:findByName(name)
12-
if self._name == name then return self end
13-
if self.modules ~= nil then
14-
for i = 1, #self.modules do
15-
local module = self.modules[i]:findByName(name)
16-
if module ~= nil then return module end
17-
end
18-
end
19-
end
20-
21-
function nn.Sequential:subnetwork(name)
22-
local subnet = nn.Sequential()
23-
for i, module in ipairs(self.modules) do
24-
subnet:add(module)
25-
if module._name == name then
26-
break
27-
end
28-
end
29-
subnet:cuda()
30-
return subnet
31-
end
32-
33-
function create_model(weights_file, backend)
6+
function create_inception(weights_file, backend)
347

358
local nnlib
369
local lrn -- local response normalization module

0 commit comments

Comments
 (0)