NY-Taxi

Task

This project is designed to give you an opportunity to gain experience in programming systems in the Hadoop ecosystem. In this case, we use Spark to analyze taxi rides within New York.

We will use a data set which covers one month. You will find time and location for each trip’s start and end. In the following, this is the data that is meant when we refer to a trip.

The general question is: Can we match trips and return trips? For a given trip a, we consider another trip b as a return trip iff

b’s pickup time is within 8 hours after a’s dropoff time
b’s pickup location is within r meters of a’s dropoff location
b’s dropoff location is within r meters of a’s pickup location where r is a distance in meters between 50 and 200.

To compute the return trips, you may want to break the problem down into the following series of problems:

Given the (lat,lon) coordinates • a(40.79670715332031, −73.97093963623047) • b(40.789649963378906, −73.94803619384766) • c(40.73122024536133, −73.9823226928711) which trips have dropoff locations within r meters of a,b or c?
For each trip a in the dataset, compute the trips that have a pickup location within r meters of a’s dropoff location. These are the return trip candidates.
For all trips a in the dataset, compute all trips that may have been return trips for a.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
deliverable		deliverable
README.md		README.md
team.txt		team.txt
trips.scala		trips.scala

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NY-Taxi

About

Releases

Packages

Languages

agarwala95/NY-Taxi

Folders and files

Latest commit

History

Repository files navigation

NY-Taxi

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages