DATA SCIENCE INTERN

Introduction

Hi! I’m Saravanan K, a third-year B.Sc., Computer Science (Data Science and Analytics)student at Subbalakshmi Lakshmipathy College of Science. I’m excited to share that I’ve been selected as an intern at Elite Intern, where I’ll be gaining hands-on experience and deepening my skills in the tech field. This blog will serve as a journal of my internship journey highlighting what I learn.

Email : saravanank212006@gmail.com

LinkedIn : https://www.linkedin.com/in/saravanan-k-dsa/

Contact :+91 9344731828

Project I Did

Task 1:

Create a Pipeline for data preprocessing, Transformation and loading using tools like pandas and scikit-learn

🔧 Key Features:

Data Extraction: Robust file handling with dynamic file selection

Preprocessing: Handling missing values, encoding categorical variables, and scaling numerical data

Transformation Pipelines: Powered by `ColumnTransformer` for clean, modular preprocessing

Loading: Final cleaned dataset exported and ready for modeling or analysis.

How it works..
ETL = Extract → Transform → Load.

1.Extract

raw_data = extract_data(INPUT_FILE)

Loads the CSV file using .read_csv.
Checks if the file exists; if not, it throws an error.
After reading, it prints the shape (rows x columns).
Returns a DataFrame containing the raw data
2.Transform

processed_data = transform_data(raw_data)

Splits the data into x and y.
Identifies (Numerical and Categorical).

Pipelines:

Numerical pipeline:

Fills missing values with the column mean.

Standardizes (scales) data with `StandardScaler`.

Categorical pipeline:

Fills missing values with the most common value.

One-hot encodes categorical variable
Column Transformer : Applies the right pipeline to the right columns.
3. Load

load_data(processed_data, OUTPUT_FILE)
Ensures the folder for saving exists.
Saves the final DataFrame as a new CSV file.
Prints the save location.

Data extracted: 25128 rows, 21 columns.
Data transformed: 25128 rows, 55 columns.

Task 2:
Implement a deep learning model for image classification using tensorflow
TASK 2

Task 3:
Develop a full Data Science project , form data collection and preprocessing to model depolyment using flask or fastapi.

Project Title:🩺 Diabetes Prediction Web API using FastAPI

This project is a complete machine learning application that predicts whether a person is likely to have diabetes based on medical input data. It is built with Python, trained using scikit-learn, and deployed using the FastAPI web framework.

🔧 Features:

Cleaned and preprocessed the Pima Indian Diabetes dataset

Replaced missing values and standardized the input features

Trained a Random Forest Classifier for accurate prediction

Saved the trained model, imputer, and scaler using `joblib`

Created a FastAPI backend to expose the model as a REST API

Includes Swagger UI for easy testing and interaction
About Pima Indian Diabetes dataset:
Pregnancies: Number of times the patient has been pregnant.
Glucose: Plasma glucose concentration after a 2-hour oral glucose tolerance test.
BloodPressure: Diastolic blood pressure (mm Hg).
SkinThickness: Thickness of the triceps skin fold (mm), used to estimate body fat.
Insulin: 2-Hour serum insulin (mu U/ml), an indicator of insulin levels in the body.
BMI: Body Mass Index, calculated as weight in kg/(height in m)^2.
DiabetesPedigreeFunction: A score based on family history and genetics to quantify diabetes risk.
Age: Age of the patient (years).

Task 4:

Problem Definition: Defining a business problem, such as maximizing profit or minimizing cost.

Task 4

Search This Blog

Recaps of Student Events on Campus

What I Learned During My Data Science Internship at Elite Tech Intern

Pipelines:

Project Title:🩺 Diabetes Prediction Web API using FastAPI

This project is a complete machine learning application that predicts whether a person is likely to have diabetes based on medical input data. It is built with Python, trained using scikit-learn, and deployed using the FastAPI web framework.

🔧 Features:

Comments

Post a Comment

What I Learned During My Data Science Internship at Elite Tech Intern

Pipelines:

Data extracted: 25128 rows, 21 columns.Data transformed: 25128 rows, 55 columns.Task 2: Implement a deep learning model for image classification using tensorflow TASK 2Task 3: Develop a full Data Science project , form data collection and preprocessing to model depolyment using flask or fastapi.

Project Title:🩺 Diabetes Prediction Web API using FastAPI

This project is a complete machine learning application that predicts whether a person is likely to have diabetes based on medical input data. It is built with Python, trained using scikit-learn, and deployed using the FastAPI web framework.

🔧 Features:

Comments

Post a Comment