Ever since the revolution of data analytics and the success of Sabermetrics (commonly known as Moneyball) in baseball, many sports have started adopting data to get better results.
Cricket, being one of the most popular sports in the world, also went down this same path and came up with many metrics to contribute to the success of the teams and the players. These metrics range from standard metrics like the Strike rate of a Batsman, Economy of a Bowler, etc. to advanced metrics like Scoring Zones and Next Ball Probabilities.
But firstly, in order to calculate or invent these metrics, we need cricket datasets.
What are datasets?
The data set lists values for each of the variables, such as for example height and weight of an object, for each member of the data set.
When it comes to cricket, there are two types of data being recorded.
- Event Data
- Tracking Data
1. Event Data
Event data in cricket records the information for every ball bowled in the game, like the batsman on strike, the bowler who is bowling, runs scored on that ball, if a wicket is taken on that ball, etc. Event Data in cricket is also known as Ball-by-Ball data. By using this data, we can compute metrics like the Strike Rate of a Batsman, Runs per over, Economy of a bowler, etc.
2. Tracking Data
Tracking Data, as the name suggests, tracks the trajectory of the ball, pace of the ball, 3-dimensional coordinates of the ball, the direction and force with which the batsman is striking the ball, the location of the ball after the strike, etc using advanced Computer Vision technology like Hawk-eye.
This data is more complex compared to event data and it is not available to the public. It’s mostly used by high-profile teams, institutions, and organizations. Tracking Data is used to compute advanced metrics like Connection Percentage, False Shot Percentage, and also metrics like Next Ball Probability and Score Simulation Models using Machine Learning.
List of websites that have large Cricket Datasets
In this section, we will be covering the websites to find cricket datasets. Since Tracking Data isn’t publicly available, we will only be discussing the websites for event data and datasets which are aggregated from the event data.
Cricsheet is one of the most reliable sources of ball-by-ball data for cricket that can be accessible by the public. It provides ball-by-ball data for various competitions and formats of cricket like Men’s and Women’s Test Matches, One-day internationals,
Twenty20 Internationals and multiple club competitions such as the Big Bash League, County Championship, Caribbean Premier League, Indian Premier League, etc. The data ranges from the year 2002 to the current year 2022. Cricsheet also provides an option to download its cricket data in various formats like JSON, YAML, CSV, and XML.
ESPN Cricinfo is a cricket news website that features articles, live coverage of matches, and also a cricket database known as Statsguru. Even though this database doesn’t contain pure ball-by-ball data, it has data aggregated from ball-by-ball data into various metrics.
Statsguru allows the users to provide queries to its database to access the necessary information. The data available in Statsguru is quite extensive as it covers almost every official cricket match that has ever been played and the oldest recorded data in Statsguru goes all the way back to the 1800s.
Cricmetric is a website devoted to Cricket statistics and analytics. Similar to ESPN Cricinfo, Cricmetric doesn’t provide pure ball-by-ball data to its users but rather provides metrics that are aggregated from ball-by-ball data.
These metrics can be filtered on many levels like the competition, club, and country. The filtered data can be directly downloaded into a CSV if required. Cricmetric also provides a few of its in-house innovative metrics like Win Probability Added (WPA), Runs Above Average (RAA), EigenFactor Score (EFScore), and interactive dashboards to make analysis and draw insights directly from the website.
Cricbuzz is an Indian cricket news website owned by Times Internet. It features news, articles, and live coverage of cricket matches including videos, text commentary, player stats, and team rankings. Cricbuzz provides archives of aggregated stats in the form of scorecards and point tables of various matches and competitions. Similar to ESPN Cricinfo, some of the data available for these matches go all the way back to the 1800s.
HowSTAT is another website that provides one of the most comprehensive collections of cricket records, statistics, and graphs relating to every facet of international cricket. They provide many stats based on different levels of aggregation based on the player, player position, series, and matches. They have their very own rating tool that rates the performances of the players and also a player comparison tool that gives the user a brief comparison between players based on selected stats.
Different Statistics used in Cricket
There are different types of standard and advanced stats used in cricket. Standard stats are calculated by aggregating the ball-by-ball data and are simple to understand. Advanced stats, on the other hand, are usually calculated using tracking data and advanced techniques like Machine Learning and Computer Vision.
- Matches: The number of matches the player played in.
- Innings: The number of innings the player played in.
Batting metrics are the metrics used to analyze the performance of a batsman or a batting team.
- Runs: The number of runs scored.
- Batting average: The total number of runs divided by the total number of innings in which the batsman was out.
- Balls faced: The total number of balls received, including no-balls but not wides.
- Strike rate: The average number of runs scored per 100 balls faced.
- Run rate: The average number of runs a batsman scores in an over of 6 balls.
- 100s: The number of centuries scored by a batsman.
- 50s: The number of half-centuries scored by a batsman.
- 0s: The number of ducks scored by a batsman.
Bowling metrics are the metrics used to analyze the performance of a bowler or a bowling team.
- Overs: The number of overs bowled.
- Balls: The number of balls bowled.
- Maiden overs: The number of maiden overs (overs in which the bowler conceded zero runs) bowled.
- Runs: The number of runs conceded.
- Wickets: The number of wickets taken.
- No-balls: The number of no-balls bowled.
- Wides: The number of wides bowled.
- Bowling average: The average number of runs conceded per wicket.
- Strike rate: The average number of balls bowled per wicket taken.
- Economy rate: The average number of runs conceded per over
- Best bowling: The bowler’s best bowling performance, defined as firstly the greatest number of wickets, secondly the fewest runs conceded for that number of wickets.
- Hattricks: The number of times a bowler takes three wickets with consecutive deliveries.
- Fifers: The number of times a bowler took 5 wickets in an inning.
- Catches: The number of catches taken by a player.
- Stumpings: Number of stumpings made by a wicket-keeper.
Cricket analytics is in its nascent stage compared to sports like Basketball, Football and American Football. With cricket teams and organizations increasing the use of analytics in scouting players, opposition analysis and team analysis, it is likely that they will need an influx of cricket analysts in the near future.
By working with the ball-by-ball data and metrics used in cricket provided above, one can gain an upper hand over other aspiring cricket analysts!
Frequently Asked Questions
1. What is cricket data analysis?
Cricket data analysis involves ball tracking, bowling angle, player shot analysis, run rate, predicted score, match prediction, etc
2. What is the use of data in cricket?
Data is used in cricket to analyze the game, improve the overall team performance, and increase winning chances.
3. What is the meaning of metric?
It is a system or a standard of measurement.
4. Which companies provide cricket data?
ESPNcricifno, Cricbuzz, Crictracker, Cricviz, Cricmetric, Sportsmechanics, Formcept, etc.