
Karol Kruszyński
I am a self-taught Data Analyst📊
Skills 👇🏻
Python including Pandas, NumPy, Matplotlib and Seaborn
SQL including PostgreSQL and ETL
I'm not afraid of Excel
I can also use Tableau
Check out my portfolio below 👇🏻
© Karol Kruszyński. All right reserved.
Real Estate in Slovakia
Stack: Python - Pandas - NumPy
Developed on Kaggle1. In this project I focused on cleaning data for use in analysis and visualization. I have removed the ranking columns which are useless due to lack of documentation of what criteria they use.
2. I removed false data such as buildings built in the year at the beginning of our era.
3. Then I mapped the NA values mostly using np.where()
4. I also converted string values to integer values using pd.to_numeric()
5. Finally, the data is ready for Analysis and Visualization of its results.
4. I created KPIs that calculate the average sales profit for each store and the average highest unit price.5. Using Seaborn, I have created charts for, among other things: Distribution of ratings, Distribution of payment methods by store, Total sales by product line, Total sales by day of the week, Hourly sales patterns.6. Finally, I analyzed the results of the KPIs and charts and created recommendations for a campaign to increase sales by women and men.
Market Sales Data Analysis
Stack: Python with
Pandas - Matplotlib - Seaborn
Developed on Kaggle1. I went through the dataset using df.shape
df.info()
df.columns
and summed the duplicate values using .nunique()
on the 'Invoice ID' column (customer unique id)2. I did an initial analysis using df.describe()
3. Next I grouped the records by the 'Branch' column and calculated their median for the 'Rating'. I also calculated the 'Rating' to 'Quantity' correlation which turned out to be a very week negative linear correlation


Shark Attacks Analysis
Stack: Python with
Pandas - Matplotlib - Seaborn
Developed on Kaggle1. I went through the dataset using basic Data Analysis function like info()
describe
columns
etc.2. I delete NaN Values from DB and unnecessary columns.3. The next step was to prepare the data so that it was consistent and suitable for use in Analysis.
I did, among other things:
Transforming column types and names
Deleting data older than 1900 Due to probable distortions
Mapping data for Type column
Clearing data from the 'Country' , 'Sex' , 'Fatal' columns
3. Ultimately, I made several visualizations with the division of shark attacks into: country of occurrence, gender of the victims, number of incidents divided into decades.


Bike Sales Analysis
Stack: Excel
Developed on GitHubI've transformed raw data into actionable insights. 🧹 Cleaned, prepared, and analyzed with precision. 🔍 Pivot Tables for clarity, charts for visualization, and a dynamic dashboard to tie it all together. 💡 Check out the power of data-driven decision-making!



Eurostat Transportation Deaths
Stack: Excel & SQL
Developed on GitHubEurostat data shows that the total number of deaths in land transport in Poland from 1999 to 2021 amounted to 99,546 deaths.
Car accidents are responsible for 42.19% of deaths
.The trend line indicates a decreasing number of deaths in subsequent years.
The average for the years 2000-2005 was 2,537 deaths,
while for the years 2018-2021 it is 1,220
*(no data from 2022 to complete the 6-year period).There is a chance that in 13 years we have reduced the average of deaths caused by cars almost twice.



Powerlifting Female Lifters Analysis
Stack: Python with Pandas, NumPy and Matplotlib
Developed on Kaggle1 I used basic analysis like info()
, describe()
etc.
2. I wrote down the columns to be removed, set the filter to positive results only. I defined criteria for the data I needed such as Only IPF Contest
Weight Categories
Grouping by Country
Median for TotalKg score
3. I then grouped the top5 and bottom5 results. And displayed a graph of the results.
I noticed an inconsistency in the results and made adjustments, taking into account only the results for countries that have been represented at the competition at least 100 times4 I displayed the new adjusted scores, and added a score chart for the median score for the top10 countries.5. I created a graph with the error range.6. I drew conclusions on how the results changed by applying the criterion of "above 100 performances".

Net worth analysis
Stack: Python with Pandas, Matplotlib and scikit-learn
Developed on KaggleI performed a basic data analysis and cleaned the data by deleting and transforming columns.Compared the average score for each column by gender.Performed linear regressions for car value vs age.I have answered some important questions about the data:
1. How does age affect annual earnings?
2. What is the relative credit card debt to car value based on annual income?
3. Is there a correlation between net wealth and car value?
4. What percentage of annual income is credit card debt for different age groups?
5. Is there a correlation between age and car value?
