-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hw3 #428
Open
vinayps
wants to merge
11
commits into
harvard-cs205:HW3
Choose a base branch
from
vinayps:HW3
base: HW3
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Hw3 #428
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
45366fd
Merge pull request #1 from harvard-cs205/master
vinayps 67ce021
Merge pull request #2 from harvard-cs205/master
vinayps ed6f016
remove extraneous +4
thouis 6706ee7
typo in typecast
thouis f7dc5f6
unused module, set include path
thouis b5bd0da
Merge pull request #3 from harvard-cs205/master
vinayps 36f5888
solutions for P3
vinayps f80a03c
solutions for warm-up P2
vinayps 43f1dda
solutions for P4
vinayps 43a404f
solutions for problem 5
vinayps c5f7f95
update to solutions for P5
vinayps File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
# Solutions for HW3 - P3 | ||
|
||
Hardware - | ||
#1: Intel(R) Iris(TM) Graphics 6100 on Apple | ||
|
||
Best Configuration - configuration ('coalesced', 512, 64): 0.00286168 seconds (majority winner based on a few runs) | ||
|
||
Raw Output: | ||
|
||
coalesced reads, workgroups: 8, num_workers: 4, 0.19061496 seconds | ||
coalesced reads, workgroups: 8, num_workers: 8, 0.09478136 seconds | ||
coalesced reads, workgroups: 8, num_workers: 16, 0.05535224 seconds | ||
coalesced reads, workgroups: 8, num_workers: 32, 0.03110032 seconds | ||
coalesced reads, workgroups: 8, num_workers: 64, 0.02122016 seconds | ||
coalesced reads, workgroups: 8, num_workers: 128, 0.00785568 seconds | ||
coalesced reads, workgroups: 16, num_workers: 4, 0.09534232 seconds | ||
coalesced reads, workgroups: 16, num_workers: 8, 0.04909168 seconds | ||
coalesced reads, workgroups: 16, num_workers: 16, 0.02945256 seconds | ||
coalesced reads, workgroups: 16, num_workers: 32, 0.02130744 seconds | ||
coalesced reads, workgroups: 16, num_workers: 64, 0.00840584 seconds | ||
coalesced reads, workgroups: 16, num_workers: 128, 0.00625968 seconds | ||
coalesced reads, workgroups: 32, num_workers: 4, 0.05298352 seconds | ||
coalesced reads, workgroups: 32, num_workers: 8, 0.02782392 seconds | ||
coalesced reads, workgroups: 32, num_workers: 16, 0.0139668 seconds | ||
coalesced reads, workgroups: 32, num_workers: 32, 0.011858 seconds | ||
coalesced reads, workgroups: 32, num_workers: 64, 0.00589408 seconds | ||
coalesced reads, workgroups: 32, num_workers: 128, 0.00364424 seconds | ||
coalesced reads, workgroups: 64, num_workers: 4, 0.02987728 seconds | ||
coalesced reads, workgroups: 64, num_workers: 8, 0.01375704 seconds | ||
coalesced reads, workgroups: 64, num_workers: 16, 0.0074668 seconds | ||
coalesced reads, workgroups: 64, num_workers: 32, 0.00482704 seconds | ||
coalesced reads, workgroups: 64, num_workers: 64, 0.00407016 seconds | ||
coalesced reads, workgroups: 64, num_workers: 128, 0.00323848 seconds | ||
coalesced reads, workgroups: 128, num_workers: 4, 0.03022608 seconds | ||
coalesced reads, workgroups: 128, num_workers: 8, 0.0170572 seconds | ||
coalesced reads, workgroups: 128, num_workers: 16, 0.01060368 seconds | ||
coalesced reads, workgroups: 128, num_workers: 32, 0.00647328 seconds | ||
coalesced reads, workgroups: 128, num_workers: 64, 0.00367552 seconds | ||
coalesced reads, workgroups: 128, num_workers: 128, 0.00308912 seconds | ||
coalesced reads, workgroups: 256, num_workers: 4, 0.025532 seconds | ||
coalesced reads, workgroups: 256, num_workers: 8, 0.00918576 seconds | ||
coalesced reads, workgroups: 256, num_workers: 16, 0.00548232 seconds | ||
coalesced reads, workgroups: 256, num_workers: 32, 0.00522008 seconds | ||
coalesced reads, workgroups: 256, num_workers: 64, 0.00321048 seconds | ||
coalesced reads, workgroups: 256, num_workers: 128, 0.00406888 seconds | ||
coalesced reads, workgroups: 512, num_workers: 4, 0.02146176 seconds | ||
coalesced reads, workgroups: 512, num_workers: 8, 0.01369368 seconds | ||
coalesced reads, workgroups: 512, num_workers: 16, 0.0064036 seconds | ||
coalesced reads, workgroups: 512, num_workers: 32, 0.00533056 seconds | ||
coalesced reads, workgroups: 512, num_workers: 64, 0.00286168 seconds | ||
coalesced reads, workgroups: 512, num_workers: 128, 0.0032056 seconds | ||
blocked reads, workgroups: 8, num_workers: 4, 0.15275448 seconds | ||
blocked reads, workgroups: 8, num_workers: 8, 0.08203064 seconds | ||
blocked reads, workgroups: 8, num_workers: 16, 0.05936312 seconds | ||
blocked reads, workgroups: 8, num_workers: 32, 0.03968904 seconds | ||
blocked reads, workgroups: 8, num_workers: 64, 0.01966736 seconds | ||
blocked reads, workgroups: 8, num_workers: 128, 0.01259184 seconds | ||
blocked reads, workgroups: 16, num_workers: 4, 0.08024176 seconds | ||
blocked reads, workgroups: 16, num_workers: 8, 0.04805832 seconds | ||
blocked reads, workgroups: 16, num_workers: 16, 0.03266928 seconds | ||
blocked reads, workgroups: 16, num_workers: 32, 0.0196948 seconds | ||
blocked reads, workgroups: 16, num_workers: 64, 0.01304792 seconds | ||
blocked reads, workgroups: 16, num_workers: 128, 0.00887808 seconds | ||
blocked reads, workgroups: 32, num_workers: 4, 0.0454124 seconds | ||
blocked reads, workgroups: 32, num_workers: 8, 0.02621024 seconds | ||
blocked reads, workgroups: 32, num_workers: 16, 0.01601688 seconds | ||
blocked reads, workgroups: 32, num_workers: 32, 0.0123308 seconds | ||
blocked reads, workgroups: 32, num_workers: 64, 0.00911488 seconds | ||
blocked reads, workgroups: 32, num_workers: 128, 0.00661224 seconds | ||
blocked reads, workgroups: 64, num_workers: 4, 0.0253416 seconds | ||
blocked reads, workgroups: 64, num_workers: 8, 0.01482472 seconds | ||
blocked reads, workgroups: 64, num_workers: 16, 0.01003168 seconds | ||
blocked reads, workgroups: 64, num_workers: 32, 0.00830936 seconds | ||
blocked reads, workgroups: 64, num_workers: 64, 0.00668752 seconds | ||
blocked reads, workgroups: 64, num_workers: 128, 0.01031256 seconds | ||
blocked reads, workgroups: 128, num_workers: 4, 0.02511456 seconds | ||
blocked reads, workgroups: 128, num_workers: 8, 0.01601144 seconds | ||
blocked reads, workgroups: 128, num_workers: 16, 0.01162648 seconds | ||
blocked reads, workgroups: 128, num_workers: 32, 0.00754224 seconds | ||
blocked reads, workgroups: 128, num_workers: 64, 0.00873912 seconds | ||
blocked reads, workgroups: 128, num_workers: 128, 0.01093456 seconds | ||
blocked reads, workgroups: 256, num_workers: 4, 0.02031408 seconds | ||
blocked reads, workgroups: 256, num_workers: 8, 0.0130572 seconds | ||
blocked reads, workgroups: 256, num_workers: 16, 0.00682696 seconds | ||
blocked reads, workgroups: 256, num_workers: 32, 0.00649288 seconds | ||
blocked reads, workgroups: 256, num_workers: 64, 0.00751592 seconds | ||
blocked reads, workgroups: 256, num_workers: 128, 0.0089632 seconds | ||
blocked reads, workgroups: 512, num_workers: 4, 0.01979328 seconds | ||
blocked reads, workgroups: 512, num_workers: 8, 0.0130584 seconds | ||
blocked reads, workgroups: 512, num_workers: 16, 0.00902016 seconds | ||
blocked reads, workgroups: 512, num_workers: 32, 0.00585392 seconds | ||
blocked reads, workgroups: 512, num_workers: 64, 0.00664848 seconds | ||
blocked reads, workgroups: 512, num_workers: 128, 0.00892288 seconds | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,19 @@ | ||
#include "median9.h" | ||
|
||
// clamped pixel fetch | ||
float FETCH(__global __read_only float *in_values, int w, int h, int x, int y) | ||
{ | ||
if(x < 0) | ||
x = 0; | ||
if(x >= w) | ||
x = w - 1; | ||
if(y < 0) | ||
y = 0; | ||
if(y >= h) | ||
y = h - 1; | ||
return in_values[y * w + x]; | ||
} | ||
|
||
// 3x3 median filter | ||
__kernel void | ||
median_3x3(__global __read_only float *in_values, | ||
|
@@ -12,7 +26,26 @@ median_3x3(__global __read_only float *in_values, | |
// Note: It may be easier for you to implement median filtering | ||
// without using the local buffer, first, then adjust your code to | ||
// use such a buffer after you have that working. | ||
|
||
// Global position of output pixel | ||
const int x = get_global_id(0); | ||
const int y = get_global_id(1); | ||
|
||
// Local position relative to (0, 0) in workgroup | ||
const int lx = get_local_id(0); | ||
const int ly = get_local_id(1); | ||
|
||
// coordinates of the upper left corner of the buffer in image | ||
// space, including halo | ||
const int buf_corner_x = x - lx - halo; | ||
const int buf_corner_y = y - ly - halo; | ||
|
||
// coordinates of our pixel in the local buffer | ||
const int buf_x = lx + halo; | ||
const int buf_y = ly + halo; | ||
|
||
// 1D index of thread within our work-group | ||
const int idx_1D = ly * get_local_size(0) + lx; | ||
|
||
// Load into buffer (with 1-pixel halo). | ||
// | ||
|
@@ -21,14 +54,36 @@ median_3x3(__global __read_only float *in_values, | |
// | ||
// Note that globally out-of-bounds pixels should be replaced | ||
// with the nearest valid pixel's value. | ||
|
||
int row; | ||
|
||
if (idx_1D < buf_w) | ||
for (row = 0; row < buf_h; row++) { | ||
buffer[row * buf_w + idx_1D] = \ | ||
FETCH(in_values, w, h, | ||
buf_corner_x + idx_1D, | ||
buf_corner_y + row); | ||
} | ||
|
||
barrier(CLK_LOCAL_MEM_FENCE); | ||
|
||
// Compute 3x3 median for each pixel in core (non-halo) pixels | ||
// | ||
// We've given you median9.h, and included it above, so you can | ||
// use the median9() function. | ||
|
||
if ((y < h) && (x < w)) // stay in bounds | ||
{ | ||
buffer[buf_y * buf_w + buf_x] = median9( buffer[(buf_y - 1) * buf_w + buf_x - 1], buffer[(buf_y - 1) * buf_w + buf_x], buffer[(buf_y - 1) * buf_w + buf_x + 1], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You write into a buffer place that is concurrently being accessed by another thread. |
||
buffer[buf_y * buf_w + buf_x - 1], buffer[buf_y * buf_w + buf_x], buffer[buf_y * buf_w + buf_x + 1], | ||
buffer[(buf_y + 1) * buf_w + buf_x - 1], buffer[(buf_y + 1) * buf_w + buf_x], buffer[(buf_y + 1) * buf_w + buf_x + 1] ); | ||
} | ||
|
||
|
||
barrier(CLK_LOCAL_MEM_FENCE); | ||
|
||
// Each thread in the valid region (x < w, y < h) should write | ||
// back its 3x3 neighborhood median. | ||
if ((y < h) && (x < w)) // stay in bounds | ||
out_values[y * w + x] = \ | ||
buffer[buf_y * buf_w + buf_x]; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if/why this is necessary.
If initially c = global_id, and the for loop condition is c<N, then why would a thread with a global_id >= N enter the loop execution?