Skip to content

💻 A collection of projects on data structures and algorithms in Python for the EDA course at ISCTE. Implements queues, linked lists, sorting/searching algorithms, and graph analysis with Kruskal's for community detection.

Notifications You must be signed in to change notification settings

Silvestre17/DataStructures.and.Algorithms_Projects

Repository files navigation

🎲 Data Structures & Algorithms in Python: A Project Collection 💻

GitHub Repo

📝 Description

This repository is a collection of projects developed for the Estrutura de Dados e Algoritmos (Data Structures and Algorithms) course. It showcases the implementation and practical application of fundamental computer science concepts, from basic data structures like queues and linked lists to advanced graph algorithms for social network analysis.

🎓 Project Context

These projects were completed for the Licenciatura em Ciência de Dados (Bachelor Degree in Data Science) at ISCTE-IUL. They demonstrate a hands-on approach to implementing theoretical concepts in Python, focusing on real-world applications and algorithmic efficiency.

🛠️ Core Technologies Used

The primary tool for all implementations was Python, leveraging its standard libraries and key packages for data visualization and analysis. The projects also utilized Jupyter Notebooks for interactive development and documentation.

Python Jupyter
Pandas NumPy
Matplotlib Seaborn
NetworkX


📚 Project Breakdown

This project involved developing a system to manage customer queues at a service center ("Loja do Cidadão"). The system handles different service sections and accommodates priority clients.

Key Concepts Implemented

  • Object-Oriented Programming (OOP): The system was built around two main classes: Cliente (Client) and Fila_de_Atendimento (Queue), encapsulating data and behavior.
  • Queue Data Structure (FIFO): Implemented a First-In, First-Out (FIFO) queue for standard clients. The insert_client method adds clients to the end of the list, and the serve_client method removes them from the beginning.
  • Priority Queue Logic: A specialized insertion method, insert_p_client, was created to handle priority clients. This method breaks the strict FIFO order by placing priority clients at the front of the queue, but correctly after any existing priority clients.
  • Command-Line Interface (CLI): A menu() function was developed to provide a user-friendly, interactive text-based interface for managing the queues.
  • File I/O for Persistence: An optional feature was implemented to save the state of the queues to a text file at the end of the day and load it back at the start, ensuring no data loss between sessions.

This project focused on implementing the List ADT using dynamic memory allocation (linked lists) and analyzing the performance of different search and sorting algorithms.

Key Concepts Implemented

  • Linked Lists: Two different dynamic list representations were implemented from scratch:
    • Doubly Linked List: Each node contains a reference to both the next and prev nodes, allowing for efficient bidirectional traversal.
    • Singly Linked Circular List: Each node points only to the next node, with the tail node pointing back to the head, creating a circular structure.
  • Sorting Algorithms: Two fundamental sorting algorithms were implemented and analyzed:
    • Bubble Sort: An O(n²) incremental algorithm that repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order.
    • Merge Sort: An efficient O(n log n) divide-and-conquer algorithm. It works by recursively dividing the list into halves, sorting them, and then merging them back together.
  • Search Algorithms:
    • Sequential Search: Used as a baseline for finding elements by iterating through the list from the beginning.
    • Binary Search: An efficient O(log n) algorithm was implemented for the existe() method. This required first sorting a copy of the list and then repeatedly dividing the search interval in half to quickly locate an item.
  • Algorithmic Complexity Analysis:
    • Empirical Temporal Analysis: Measured and plotted the execution time of the implemented methods (insertion, removal, sorting) to empirically validate their theoretical time complexities.
    • Asymptotic Spatial Analysis: Analyzed the memory usage of the different list implementations and methods, highlighting the O(n) space complexity of the copiar() method versus the O(1) of most others.

The final project involved using Graph Theory to model and analyze a social network dataset from Facebook. The main goal was to identify and study communities within the network.

Facebook

Key Concepts Implemented

  • Graph Data Structure: An Abstract Data Type (TAD) for a Graph was implemented in Python.
    • Vertex and Edge Classes: Custom classes were created to represent users (vertices) and their connections (edges), with edges having weights corresponding to interaction strength.
    • Adjacency Map Representation: The graph was implemented using a dictionary where keys are vertices and values are dictionaries of their adjacent vertices, allowing for efficient lookups.
  • Kruskal's Algorithm for Minimum Spanning Tree (MST):
    • This classic greedy algorithm was implemented to find the MST of the social graph. The MST represents the "backbone" of the network, connecting all users with the minimum possible total interaction weight.
    • Union-Find Data Structure: This crucial disjoint-set data structure was implemented to efficiently track which vertices belong to which tree during the MST construction, preventing the formation of cycles.
  • Hierarchical Clustering:
    • A divisive clustering algorithm was implemented based on the MST. By progressively removing the k-1 edges with the highest weights (weakest connections) from the MST, the graph is partitioned into k distinct communities (connected components).
  • Social Network Analysis (SNA) with NetworkX: After forming the communities, the NetworkX library was used to analyze their structure by calculating three key centrality metrics:
    • Degree Centrality: Measures the number of direct connections a node has (popularity).
    • Closeness Centrality: Measures how quickly a node can reach all other nodes in the network (information spreading efficiency).
    • Betweenness Centrality: Measures how often a node lies on the shortest path between other nodes (role as a "bridge" or connector).

👥 Team Members

  • Project 1: André Silvestre (Individual Work)

  • Project 2 & Final Project (Group 12):
    • André Silvestre (Nº104532)
    • Diogo Catarino (Nº104745)
    • Eduardo Silva (Nº104943)
    • Francisco Gomes (Nº104944)

🇵🇹 Note

These projects were developed using Portuguese from Portugal 🇵🇹.

About

💻 A collection of projects on data structures and algorithms in Python for the EDA course at ISCTE. Implements queues, linked lists, sorting/searching algorithms, and graph analysis with Kruskal's for community detection.

Topics

Resources

Stars

Watchers

Forks