-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathindex.html
executable file
·468 lines (447 loc) · 21.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
<!DOCTYPE html>
<html lang="en">
<head>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-176430736-1"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-176430736-1');
</script>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>3D Reconstruction</title>
<link rel="shortcut icon" href="Blog/images/ss.ico" type="image/x-icon" />
<link rel="icon" href="Blog/images/ss.ico" type="image/x-icon" />
<!-- CSS -->
<link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet">
<link href="Blog/css/materialize.css" type="text/css" rel="stylesheet" media="screen,projection" />
<link href="Blog/css/style.css" type="text/css" rel="stylesheet" media="screen,projection" />
</head>
<body>
<nav class="nav-bar center scrollspy">
<img class="logo" src="Blog/images/logo.jpg" alt="logo">
</nav>
<div id="space">
</div>
<div class="container scrollspy center" id="title1">
<h4 style="color:#00838f;font-weight: lighter;">2D to 3D Reconstruction of Furniture Objects</h4>
<br><hr><br>
</div>
<div class="container">
<div class="container scrollspy" id="intro">
<div class="center">
<!-- <h4 style="color:#00838f;font-weight: lighter;">Introduction</h4> -->
</div>
<div class="row">
<p style="text-align: center;color: gray;">|   DECEMBER 05, 2019   •   15 MINUTE READ   |</p>
<p id="para">
<br>
Ever thought about how good a piece of furniture would look in your desired space before actually buying it?
<br><br>
We help you in figuring that out by reconstructing 3D models of furniture just from a single 2D image and you
can visualize how well it fits in your environment with the help of an Augmented Reality (AR) application on
your device.
<br>
<br>
In this project, we build and examine model-free and model-based deep learning methods for 3D reconstruction.
In Model Free approach, the reconstruction is done directly in 128*128*128 voxel space making it computationally
expensive. Whereas in Model based method, we used a pre-computed parametric shape representation to decrease
computational requirements. We compare the performances of both these models and discuss the trade-offs.
<br>
<br>
Let’s dive in!
</p>
</div>
</div>
</div>
<div class="container">
<div class="container scrollspy" id="modelfree">
<div class="section">
<h4 id="modelfree" class="center" style="color:#00838f;font-weight: lighter;">Model - Free Approach</h4>
</div>
<div>
<br><br>
<div class="center" id="modelfreepp">
<img src="Blog/images/modelfree.png" alt="modelfree pipeline">
<p style="text-align: center;font-size: 14px;">Figure [1]</p>
</div>
<p id="para">
Traditionally, multiple 2D images with different views provide us with the extra information that can be used to
solve this reconstruction problem. This problem is more challenging if the reconstruction has to be done from
just a single 2D image. We can tackle this problem by leveraging deep learning techniques.
</p>
<div class="center" id="priors">
<img src="Blog/images/np1.png" alt="prior_est">
<p style="text-align: center;font-size: 14px;">Figure [2] : 2.5D Sketch Estimation Network</p>
</div>
<p id="para">
In our framework shown in Figure [1], we first train a 2.5D sketch estimation network that takes in a 2D image and predicts the
prior knowledge required for 3D reconstruction. The outputs of this network are 2.5D images and comprises of
Depth image, Surface Normal Image and a Silhouette Image. It is a ResNet18 based encoder - decoder architecture
with Mean Squared Error as the Loss Function.
</p>
<div class="center" id="priors">
<img src="Blog/images/np2.png" alt="prior_est">
<p style="text-align: center;font-size: 14px;">Figure [3] : 3D Shape completion network</p>
</div>
<p id="para">
These 2.5D images are fed into a Shape Completion Network that reconstructs the 3D object in 128*128*128 voxel space.
This is also a ResNet18 based architecture with Binary Cross Entropy as the loss function. Both of these networks are
trained independently using a custom dataset which contains synthetically rendered 2D images from ShapeNet and Pix3D.
These images have a corresponding 2.5D images and 3D models to facilitate supervised learning. This synthetic data
provides us additional viewpoints to solve the problem of less training data.
<br><br>
Since, the networks are trained on rendered images, it is necessary to observe its performance on real images that
have different lighting conditions. One key contribution is to compare the performance of the model on real-world
images and synthetically rendered images. We take the real world images from Extended IKEA dataset.
</p>
</div>
<div class="section">
<h5 id="nparam_results" class="center" style="color:#00838f;font-weight: lighter;">Results</h5>
<p id="para">The results below show the input 2D images with the predicted 2.5D images and reconstructed 3D models with two views.</p>
</div>
<div class="center" id="nparamres">
<img src="Blog/images/np3.png" alt="pres1">
<p style="text-align: center;font-size: 14px;">Figure [4] : Output of synthetic image</p>
<p id="para">
In Figure [4], we can see that our network tries to learn different features of a chair like “Handles” for the
given 2D synthetic image.
</p>
</div>
<div class="center" id="nparamres">
<img src="Blog/images/np4.png" alt="pres2">
<p style="text-align: center;font-size: 14px;">Figure [5] : Output of real image</p>
<p id="para">
In Figure [5], input image is a real image which has a non-uniform lighting. We can see that lighting has an effect
on the 2.5D images, particularly silhouette and normal images are distorted.
</p>
</div>
<div class="center" id="nparamres">
<img src="Blog/images/np5.png" alt="pres3">
<p style="text-align: center;font-size: 14px;">Figure [6] : Output of real image - failed case</p>
<p id="para">
Figure [6] is one of the failed cases we observed in our experiments. In this case, 2.5D predictions are distorted,
resulting in an imperfect 3D reconstruction. This can be because the background and chairs are of the same colour.
</p>
</div>
</div>
</div>
<div class="container">
<div class="container scrollspy" id="modelbased">
<div class="section">
<h4 id="modelbased" class="center" style="color:#00838f;font-weight: lighter;">Model - Based Approach</h4>
</div>
<p id="para">
In the previous approach, the network predicts 128*128*128 points in voxel space. Training the network to
predict such a huge voxel space is both time consuming and computationally intensive. Hence, we trained a
model-based pipeline to reduce both training time and computational power needed for 3D reconstruction.
</p>
<br>
<div class="center" id="modelbasedpp">
<img src="Blog/images/pipeline.png" alt="model-based pipeline">
<p style="text-align: center;font-size: 14px;">Figure [7] : Model-based approach pipeline</p>
</div>
<br>
<div>
<p id="para">
In this approach, the shape is parameterized by a base model and per-instance deformations are calculated
from the base model. In order to calculate these deformations, Iterative Closest Point (ICP) algorithm is
employed to estimate the best alignment of two point-clouds (both point to point and point to plane).
<br><br>
Applying ICP on the vertices of the training objects is quite challenging. As the ground truth 3D meshes
have varied number of vertices, edges, and faces, meshes are down-sampled so that all the training 3D
meshes have the same number of vertices. Down-sampling the vertices was done manually on Blender.
<br><br>
Now that we have one-to-one correspondence between all the vertices, the vertices are flattened and the
deformations are calculated as given below:
<br><br>
</p>
<p class="center" style="font-size: larger;">Deformation = Flattened Input Vertices - Base Vertices</p>
<p id="para">
<br>
Now this deformation is calculated for all training data and then they are stacked as columns to build a
Deformation Matrix. Furthermore, Principal Component Analysis (PCA) is performed on the Deformation Matrix
to get the direction of maximum variation in deformation.
<br><br>
We can represent deformation of every training data as shown below.
<br><br>
<div class="center scrollspy" id="fig7">
<img src="Blog/images/param1.png" alt="param1">
<p class="scrollspy" style="text-align: center;font-size: 14px;">Figure [8]</p>
</div>
</p>
<p id="para">
<br>
Thus, we can get the deformation coefficients of every training model by taking pseudo-inverse
of the deformation matrix and multiplying it with deformation.
<br><br>
<div class="center" id="fig2">
<img src="Blog/images/param2.png" alt="param2">
<p style="text-align: center;font-size: 14px;">Figure [9]</p>
</div>
</p>
<p id="para">
<br>
For a new 2D image, these deformation coefficients are predicted using a trained CNNs. Training
the CNN is done in a supervised way with mean squared error as the loss function.
<br><br>
<div class="center" id="fig3">
<img src="Blog/images/param3.png" alt="param3">
<p style="text-align: center;font-size: 14px;">Figure [10]</p>
</div>
</p>
<p id="para">
<br>
Now that our model predicts the deformation coefficients, we can get instant specific deformation by simple
matrix multiplication as shown in Figure <a href="#fig7">[8]</a>. We will get our final predicted shape by adding these Deformations
to Base Vertices.
<br>
</p>
</div>
<div>
<div class="section">
<h5 id="param_results" class="center" style="color:#00838f;font-weight: lighter;">Results</h5>
</div>
<div class="center" id="paramres">
<img src="Blog/images/pres1.jpg" alt="pres1">
<p style="text-align: center;font-size: 14px;">Figure [11]</p>
<p id="para">
Here we see, by adding relevant deformations to the Base Model, we predict the output 3D shape.
The predicted output is very close to the ground truth model. It is trying to learn the curvature
along the side to an extent and also trying to fill in the gap which our base model had. Also it is
trying to eliminate the extra support of base model’s legs. But the outcome is noisy. Given we had to
predict only 15 coefficients, the pipeline does an excellent work of predicting 3D models.
<br><br>
Below, let us have a look at how our model learns the deformation.
</p>
</div>
<div class="center" id="paramres">
<img src="Blog/images/pres2.jpg" alt="pres2">
<p style="text-align: center;font-size: 14px;">Figure [12]</p>
<p id="para">
Here we see, how our model is trying to learn deformations and transforming from Base Model to Ground Truth.
The principal direction explains 59.19% of the variance in deformation. To see the transformation that our
model applies, we increase the coefficient along the principal direction of deformation. As we keep increasing,
the model keeps adding deformations to bring it closer to actual 3D model.
</p>
</div>
<div class="center" id="paramres">
<br>
<h5 class="center" style="color:#00838f;font-weight: lighter;">Limitations</h5>
<img src="Blog/images/pres5.jpg" alt="pres3">
<p style="text-align: center;font-size: 14px;">Figure [13]</p>
<p id="para">
Here, our ground truth model is relatively different from the base model and hence we see,
even though it is trying to learn the deformation, the resulting noise is too high in the
final predicted shape. Hence, even though it is computationally cheap, the results heavily
depend on the base model that we choose.
</p>
</div>
</div>
</div>
<div class="container">
<div class="center scrollspy" id="comparison">
<br>
<div class="container scrollspy">
<h4 class="center" style="color:#00838f;font-weight: lighter;">Performance Analysis</h4>
</div>
<div id="comp">
<img src="Blog/images/pres4.jpg" alt="pres4">
</div>
<p style="text-align: center;font-size: 14px;">Figure [14]</p>
<p id="para">
For real-world testing, we test our approaches with a real 2D image of a chair. The model-free approach
constructs the shape of chair but fails to capture the specific features such as thickness. This can be
attributed to the difference between the lighting and background differences between a synthetic and a
real image. The model-based approach captures the overall shape of the chair but results in a noisy shape.
<br><br>
Also, for model-based approach to give good results the base model should be very close to the output shape
that we want to predict whereas the model-free approach does not have the said constraint. But, model-free
approach requires a lot of computational power to predict 128*128*128 points whereas model-based approach
just has to predict 15 points!
<br><br>
Life comes with trade-offs! :)
</p>
<br>
</div>
</div>
</div>
<div class="container scrollspy" style="text-align:center;" id="demo">
<div>
<h4 style="font-weight: 200;color: #2eadad;">Demo</h4>
</div>
<br>
<div>
<video width="30%" controls>
<source src="videos/demo_video.mp4" type="video/mp4">
Your browser does not support HTML5 video.
</video>
</div>
</div>
<div class="container">
<div class="container scrollspy" id="further">
<br>
<div class="section">
<h4 class="center" style="color:#00838f;font-weight: lighter;">Scope for Improvement</h4>
</div>
<div>
<p style="font-size: larger;">
<li style="font-size: larger;">Choosing a better base model by computing a mean model that generalizes well for specific object categories.</li>
<br><br>
<li style="font-size: larger;">Employing a Generative Adversarial Network (GAN) for improving the naturalness of the generated object surfaces.</li>
<br><br>
<li style="font-size: larger;">Adding texture to the generated object shapes using style transfer techniques.</li>
</p>
</div>
</div>
</div>
<div class="container scrollspy" id="code">
<br>
<div class="section">
<h4 id="code" class="center" style="color:#00838f;font-weight: lighter;">Code</h4>
</div>
<div class="center">
<p style="font-size: larger;">
You can find the code for our project <a href="https://github.com/shyam31896/DeepVision" id="thanks">here</a>
</p>
</div>
</div>
<div class="container">
<div class="container scrollspy" id="references">
<div class="section">
<h4 class="center" style="color:#00838f;font-weight: lighter;">References</h4>
</div>
<div>
<p id="para">
<a href="https://arxiv.org/pdf/1711.03129.pdf" id="thanks">[1]</a> <b>MarrNet : 3D Shape Reconstruction via 2.5D Sketches</b>,
Wu J., Wang Y., Xue T., Sun X., Freeman W.T. & Tenenbaum J.B., NIPS (2017)
</p>
<p id="para">
<a href="https://arxiv.org/pdf/1512.03385.pdf" id="thanks">[2]</a> <b>Deep Residual Learning for Image Recognition</b>,
He K., Zhang X., Ren S. & Sun J., CVPR (2015)
</p>
<p id="para">
<a href="{https://papers.nips.cc/paper/6096-learning-a-probabilistic-latent-space-of-object-shapes-via-3d-generative-adversarial-modeling.pdf" id="thanks1">[3]</a>
<b>Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling</b>,
Wu J., Zhang C., Xue T., Freeman W.T. & Tenenbaum J.B., NIPS (2016)
</p>
<p id="para">
<a href="https://arxiv.org/pdf/1803.07549.pdf" id="thanks">[4]</a> <b>Learning Category-Specific Mesh Reconstruction
from Image Collections</b>, Kanazawa A., Tulsiani S., Efros A.A. & Malik J., ECCV (2018)
</p>
<p id="para">
<a href="https://arxiv.org/pdf/1809.05068.pdf" id="thanks">[5]</a> <b>Learning 3D Shape Priors for Shape Completion and
Reconstruction</b>, Wu J., Zhang C., Zhang X., Zhang Z., Freeman W.T. & Tenenbaum J.B., ECCV (2018)
</p>
<p id="para">
<a href="https://arxiv.org/pdf/1611.07700.pdf" id="thanks">[6]</a> <b>3D Menagerie: Modeling the 3D Shape and Pose of
Animals</b>, Zuffi S., Kanazawa A., Jacobs D. & Black M.J., CVPR (2017)
</p>
<p id="para">
<a href="https://arxiv.org/pdf/1711.07566.pdf" id="thanks">[7]</a> <b>Neural 3D Mesh Renderer</b>,
Hiroharu K., Yoshitaka U. & Tatsuya H., CVPR (2018)
</p>
</div>
</div>
</div>
<div class="container scrollspy" id="references1">
<div class="center">
<br><br>
<p style="font-size: larger;">
Special thanks to <a href="https://kpertsch.github.io/" id="thanks">Karl Pertsch</a> and
<a href="https://viterbi-web.usc.edu/~limjj/" id="thanks">Prof. Joseph Lim</a> for their support and guidance.
</p>
</div>
</div>
<div class="col hide-on-small-only hide-on-med-only m3">
<div class="toc-wrapper" style="display:none;">
<div style="height: 1px;">
<ul class="section table-of-contents">
<li><a href="#intro">Introduction</a></li>
<li><a href="#modelfree">Model-Free Approach</a></li>
<li><a href="#modelbased">Model-Based Approach</a></li>
<li><a href="#comparison">Performance Analysis</a></li>
<li><a href="#demo">Demo</a></li>
<li><a href="#further">Scope for Improvement</a></li>
<li><a href="#code">Code</a></li>
<li><a href="#references">References</a></li>
</ul>
</div>
</div>
</div>
<footer class="page-footer cyan darken-4">
<div class="container">
<div class="section">
<h5 id="course" class="center" style="color:#00eaff;font-weight: lighter;font-size: 35px;">Authors</h5>
</div>
<div class="row" id="projects">
<div class="center">
<p>Shyam Sundar Ravikumar</p>
</div>
<div class="center">
<p>Arpit Sharma</p>
</div>
<div class="center">
<p>Pruthvi Sumanth Kakani</p>
</div>
<div class="center">
<p>Prahalathan Sundaramoorthy</p>
</div>
<div class="center">
<p>Thota Mohan Krishna</p>
</div>
</div>
</div>
<div class="footer-copyright" id="projects">
<div class="container center">
DeepVision | © 2019
</div>
</div>
</footer>
<!-- Scripts-->
<script src="https://code.jquery.com/jquery-3.2.1.js"></script>
<script src="Blog/js/materialize.js"></script>
<script src="Blog/js/init.js"></script>
<script src="https://unpkg.com/scrollreveal"></script>
<script>
$(document).ready(function () {
$('.toc-wrapper').pushpin({
offset: 200
});
$('.scrollspy').scrollSpy();
});
</script>
<script>
$(document).ready(function () {
$(window).scroll(function () {
if ($(this).scrollTop() > 100) {
if($(this).scrollTop()>=$('#thanks1').position().top){
$('.toc-wrapper').fadeOut(300);
} else {
$('.toc-wrapper').fadeIn(300);
}
} else {
$('.toc-wrapper').fadeOut(300);
}
});
$('#scroll').click(function () {
$("html, body").animate({
scrollTop: 0
}, 600);
return false;
});
});
</script>
<script>
$(window).on('scroll', function(){
if($(window).scrollTop() > 5){
$('nav').addClass('white');
} else {
$('nav').removeClass('white');
}
});
</script>
</body>
</html>