Multiprocessing with Pandas
Multiprocessing is a powerful tool for improving the performance of data analysis tasks, and Pandas is a popular Python library for working with structured data. By leveraging the power of multiple CPU cores, multiprocessing allows Pandas to split data processing tasks across multiple processes, resulting in faster and more efficient computation. In this article, we will explore how to use multiprocessing with Pandas to speed up your data analysis workflow and improve the performance of your code.
In this post, we're going to use multiprocessing to process each subset of our dataframe parallelly.
A seperate core will process each group by seperately instead of doing it sequentially. This will increase the speed.
from multiprocessing import Pool
import pandas as pd
def take_mean_age(dataframe):
team, group = dataframe
return pd.DataFrame({
"Goals Scored": group["GF"].sum(),
"Goals Conceded": group["GA"].sum(),
"Total Points scored": group["Pts"].sum(),
"Final position of the team": group["Pos"].sum(),
"Win": group["W"].sum(),
"Draw": group["D"].sum(),
"Loss": group["L"].sum(),
}, index=[team])
pl = pd.read_csv('EPL Standings 2000-2022.csv', usecols=['Season', 'Pos', 'Team', 'Pld',
'W', 'D', 'L', 'GF', 'GA', 'GD', 'Pts'])
pl
Season | Pos | Team | Pld | W | D | L | GF | GA | GD | Pts | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2000-01 | 1 | Manchester United | 38 | 24 | 8 | 6 | 79 | 31 | 48 | 80 |
1 | 2000-01 | 2 | Arsenal | 38 | 20 | 10 | 8 | 63 | 38 | 25 | 70 |
2 | 2000-01 | 3 | Liverpool | 38 | 20 | 9 | 9 | 71 | 39 | 32 | 69 |
3 | 2000-01 | 4 | Leeds United | 38 | 20 | 8 | 10 | 64 | 43 | 21 | 68 |
4 | 2000-01 | 5 | Ipswich Town | 38 | 20 | 6 | 12 | 57 | 42 | 15 | 66 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
435 | 2021-22 | 16 | Everton | 38 | 11 | 6 | 21 | 43 | 66 | -23 | 39 |
436 | 2021-22 | 17 | Leeds United | 38 | 9 | 11 | 18 | 42 | 79 | -37 | 38 |
437 | 2021-22 | 18 | Burnley | 38 | 7 | 14 | 17 | 34 | 53 | -19 | 35 |
438 | 2021-22 | 19 | Watford | 38 | 6 | 5 | 27 | 34 | 77 | -43 | 23 |
439 | 2021-22 | 20 | Norwich City | 38 | 5 | 7 | 26 | 23 | 84 | -61 | 22 |
440 rows × 11 columns
with Pool(4) as p:
results = p.map(take_mean_age, pl.groupby("Team"))
results_df = pd.concat(results)
results_df.sort_values('Total Points scored', ascending=False)
Goals Scored | Goals Conceded | Total Points scored | Final position of the team | Win | Draw | Loss | |
---|---|---|---|---|---|---|---|
Manchester United | 1562 | 782 | 1698 | 63 | 507 | 177 | 152 |
Chelsea | 1538 | 757 | 1665 | 73 | 492 | 189 | 155 |
Arsenal | 1561 | 876 | 1603 | 83 | 470 | 193 | 173 |
Liverpool | 1516 | 808 | 1591 | 91 | 464 | 199 | 173 |
Manchester City | 1478 | 846 | 1440 | 126 | 428 | 156 | 214 |
Tottenham Hotspur | 1323 | 1011 | 1370 | 141 | 393 | 191 | 252 |
Everton | 1102 | 1059 | 1191 | 202 | 321 | 228 | 287 |
Newcastle United | 943 | 1100 | 973 | 222 | 260 | 193 | 307 |
West Ham United | 908 | 1060 | 895 | 225 | 238 | 181 | 303 |
Aston Villa | 866 | 1009 | 885 | 227 | 225 | 210 | 287 |
Southampton | 698 | 803 | 704 | 178 | 181 | 161 | 228 |
Fulham | 631 | 831 | 640 | 198 | 162 | 154 | 254 |
Leicester City | 583 | 592 | 554 | 114 | 150 | 104 | 164 |
Blackburn Rovers | 518 | 592 | 530 | 128 | 140 | 110 | 168 |
Sunderland | 520 | 795 | 520 | 215 | 127 | 139 | 266 |
Bolton Wanderers | 495 | 613 | 506 | 137 | 132 | 110 | 176 |
West Bromwich Albion | 510 | 772 | 490 | 197 | 117 | 139 | 238 |
Stoke City | 398 | 525 | 457 | 122 | 116 | 109 | 155 |
Crystal Palace | 428 | 545 | 437 | 131 | 115 | 92 | 173 |
Middlesbrough | 414 | 503 | 432 | 132 | 107 | 111 | 162 |
Wolverhampton Wanderers | 328 | 462 | 348 | 109 | 90 | 78 | 136 |
Wigan Athletic | 316 | 482 | 331 | 117 | 85 | 76 | 143 |
Charlton Athletic | 301 | 386 | 325 | 85 | 85 | 70 | 111 |
Burnley | 300 | 455 | 325 | 120 | 83 | 76 | 145 |
Swansea City | 306 | 383 | 312 | 85 | 82 | 66 | 118 |
Leeds United | 319 | 349 | 311 | 69 | 87 | 50 | 91 |
Birmingham City | 273 | 360 | 301 | 99 | 73 | 82 | 111 |
Portsmouth | 292 | 380 | 293 | 97 | 79 | 65 | 122 |
Watford | 275 | 441 | 261 | 113 | 67 | 60 | 139 |
Norwich City | 251 | 489 | 234 | 119 | 56 | 66 | 144 |
Bournemouth | 241 | 330 | 211 | 69 | 56 | 43 | 91 |
Brighton & Hove Albion | 190 | 258 | 209 | 72 | 48 | 65 | 77 |
Hull City | 181 | 323 | 171 | 88 | 41 | 48 | 101 |
Reading | 136 | 186 | 119 | 45 | 32 | 23 | 59 |
Sheffield United | 91 | 157 | 115 | 47 | 31 | 22 | 61 |
Ipswich Town | 98 | 106 | 102 | 23 | 29 | 15 | 32 |
Queens Park Rangers | 115 | 199 | 92 | 57 | 22 | 26 | 66 |
Derby County | 90 | 211 | 83 | 56 | 19 | 26 | 69 |
Cardiff City | 66 | 143 | 64 | 38 | 17 | 13 | 46 |
Huddersfield Town | 50 | 134 | 53 | 36 | 12 | 17 | 47 |
Brentford | 48 | 56 | 46 | 13 | 13 | 7 | 18 |
Blackpool | 55 | 78 | 39 | 19 | 10 | 9 | 19 |
Coventry City | 36 | 63 | 34 | 19 | 8 | 10 | 20 |
Bradford City | 30 | 70 | 26 | 20 | 5 | 11 | 22 |
Code snippet
Conclusion
Parallelize Pandas using Python's multiprocessing.