Point & Figure Calculations with Python and MySQL
How to save plot symbols and buy/sell signals to a database for quantitative analysis.
A few weeks ago, I posted an article about how to use Python and MySQL to set up an ETL data pipeline into Amazon RDS for stock market data, using stock market data from Intrinio. In this article, I’m going to explain one way to make use of that RDS database to store calculations and generate trading signals using the Point & Figure charting methodology popular with many momentum-focused investors. This process assumes that you have already built and populated a historical database of security prices in AWS RDS as described in my previous article:
Build Your Own Stock Market Data ETL Pipeline on AWS RDS
Over the last decade, the internet has delivered a huge volume of stock market data into the hands of individual…
Point & Figure charting has been around for a long time. It was invented in the late nineteenth century and was used by Charles Dow and many other stock market analysts before computers were invented. P&F charts consist of alternating columns of “X” and “O” symbols, representing periods of rising or falling stock prices respectively. Stockcharts.com and Investopedia both provide extensive explanations of how P&F charts work. If you want to really study P&F charting in detail, the two most authoritative books on this topic are
Point & Figure Charting by Thomas Dorsey
The Definitive Guide to Point and Figure by Jeremy Du Plessis
In this explanation, I use the term “stock price” to represent any time series of price movements, which could be for a stock, ETF, index or any tradable market instrument. Essentially, when a stock’s price is rising, it is reflected in a rising column of X’s. A new X symbol is added to the top of the column when the price rises by more than a set threshold, called the “box size”. Conversely, a new O symbol is added to the bottom of the column when the stock price drops by more than the set box size. Box sizes can be set using several methods. I prefer the percentage method which automatically scales with the stock price.
Setting the box size higher will filter out more noise and highlight longer-term trends. A box size threshold is typically set higher for high volatility instruments, such as individual stocks, and lower for slower-moving instruments such as mutual funds or ETFs. Typically, box size settings range between 1% and 6%.
In this way, P&F charting filters out the noise by only recording price movements larger than the box size. A P&F chart will reverse from X’s to O’s, or vice versa, when the stock price has reversed by more than a set number of boxes, typically 2 or 3 boxes. Most P&F practitioners use a 3-box reversal threshold.
A “BUY” or “SELL” signal is observed when a column of X’s rises above the highest level of the most recent previous X column, or a column of O’s falls below the lowest level of the most recent previous column of O’s. Stop-loss points, entry points and target prices can be calculated in similar programmatic ways.
For example, here is a Point & Figure chart for AAPL with the box size set at 2% using the 3-box reversal threshold:
Here you can see how easy it is to observe trading signals as well as overall trends (blue line). It is this systematic method of generating trading signals that makes P&F charting so interesting to me. In this article, we will focus on how to generate these same patterns and signals in a database of stock prices using programming logic, i.e. a series of nested if/else statements in Python.
Why it be useful to record P&F calculations in a database? Well, as I stated above, the Point & Figure methodology provides clear unambiguous trend indicators and trading signals, which can be combined with other quantitative factors to produce profitable systematic trading strategies.
P&F calculations can also be applied to aggregations of stock prices at the industry, sector and market index level to reveal underlying patterns of relative strength which are not easily observed in the popular market cap weighted indexes. For example, we can calculate the percentage of stocks displaying BUY or SELL signals in the S&P 500 index, and then observe recent changes in this calculation, which could be going in the opposite direction from the price of the S&P 500 index itself. When that happens, you would know that the underlying “market internals” are not reflected in the current index price trend and it might be time to change your trading stance from bullish to bearish or vice versa.
In this and subsequent articles, I will spell out both the business logic and Python/SQL code to get this information into a useful database form using MySQL inside the AWS RDS database environment.
The first step is to create a historical record of stock prices and calculate the Point & Figure plot symbols and trading signals for each daily record. For this process, we’re going to need an Amazon Web Services account to set up an RDS instance and an S3 bucket to store the data, along with a JSON file in a secure location to store our login credentials. I explain how to do that in my previous ETL article.
Now we begin writing the Python script to download the data and run the P&F calculations. I chose to use Python for this purpose because the algorithm will need to group the data by FIGI code, then loop through each historical daily price record and key off the daily closing prices, i.e. run the calculations based on the percentage change from the prior day’s closing price. That’s just a whole lot easier to do in Python rather than SQL. Once we create the calculations in a Pandas dataframe, we will push them to an S3 bucket for archive purposes, and to an RDS instance for future queries.
The actual Python scripts are available on my Github account at
GitHub - bnsheehy/Stock-Market_Data_ETL: A collection of Jupyter Notebook Python files for…
A collection of Jupyter Notebook Python files for collecting stock market data in an AWS RDS database. - GitHub …
Here, we will walk through the business logic.
P&F Calculation History Table — The Python modules are set up to execute the following steps:
Import credentials and libraries — code is explained in the previous article linked above. Essentially, I store my credentials in a separate JSON file on my hard drive and upload them into the script. An example JSON file is included in my Github directory linked above.
Download and assemble the Intrinio bulk market data files — For bulk historical data downloads, Intrinio provides a set of zipped CSV files accessible from a JSON dictionary list containing URL links to the files. Since this script only needs to be run one time, you can either download the files manually from your Intrinio dashboard and pull them into a Pandas DataFrame using the pd.read.csv command, or you can use the Python urllib.request library to download them directly into a DataFrame. I have included modules for both methods in the Jupyter file on Github.
We could pull this data from the historical price data table we created in the previous ETL process, however since I’m running my RDS instance on the AWS Free Pricing Tier, there is a limit on the size of the data packages that can be pulled in each query, which means I’d have to pull the data in smaller chunks. So it would be faster to just pull the initial history load directly from Intrinio’s bulk download API.
Filter the price history data for only equities, ADRs and ETFs — Since right now, we only care about US common equities, ADRs and ETFs, we use one of Intrinio’s APIs to grab the most current list of tickers and FIGI codes for US exchanges. We use this list to filter the larger price history data set to eliminate any instruments, such as preferred stocks and warrants, that we don’t want in our universe. This step is optional, but saves some time in the calculation process.
Clean up the data — This step involves a variety of transformations that include
- Dropping any records that don’t yet have a FIGI code
- Dropping any duplicate rows that might show up. Dupes are based on ticker + figi + date columns.
- Dropping columns we don’t need
- Renaming the remaining columns as needed
- Sorting the records by ticker and date
We also add the new P&F data columns in this step that we will be filling in according to the logic explained in the next step.
Run all the Point & Figure calculations — Now we get into the real meat and potatoes of our project. The actual algorithms written in Python code are available in the script you can download from my Github account as mentioned above. Here, I will walk you through the logic in “pseudo code”, which you could use as a guide if you wanted to use a different scripting language instead, like Java, Scala or R.
We start with a dataframe structure that contains the following columns of price data for each stock, ETF or other trading instrument that you want to include in your investing universe, sorted by Date in ascending order:
- Ticker Symbol
- Adjusted High Price
- Adjusted Low Price
- Adjusted Close Price
Optionally, you can also include FIGI or CUSIP codes for dead historical tickers and Adjusted Open prices. In this exercise, we will focus on end-of-day close prices, but P&F calculations can also be applied to intraday periods and can use high and low prices in reversal calculations as well. So the initial dataframe could look something like this:
Then we add the following columns to the dataframe:
- percent_change — calculated as the percentage change in Close price from one period to the next
- plot_symbol — to be filled in with “X” or “O” values depending on the algorithm described below
- reversal — boolean, to be filled in with 1 or 0
- signal_name — to be filled in with “BUY” or “SELL” values depending on the algorithm described below
- high_point — the highest price reached in the current X column
- last_high_point — the highest price reached in the last X column before it reversed to an O column
- prev_high_point — the highest price reached in the previous X column before it reversed to an O column
- low_point — the lowest price reached in the current O column
- last_low_point — the lowest price reached in the last O column before it reversed to an X column
- prev_low_point — the lowest price reached in the previous O column before it reversed to an X column
- entry_x — previous day’s closing price when the plot_symbol reverses from X to O, used in next_entry and stop_loss calcs
- entry_o — previous day’s closing price when the plot_symbol reverses from O to X, used in next_entry and stop_loss calcs
- next_entry — if Signal = BUY, set at one box up from the price at the last reversal from X to O, which should be near the top of the previous X column. Else if Signal = SELL, Set at one box down from the price at the last reversal from O to X, which should be near the bottom of the previous O column
- stop_loss — if Signal = BUY, set at one box down from the price at the last reversal from O to X, which should be near the bottom of the previous O column. Else if Signal = SELL, set at one box up from the price at the last reversal from X to O, which should be near the top of the previous X column.
- target_price — Upon reversal from SELL to BUY, set equal to the size of the previous X column, times the box size, added to the bottom of the previous X column. Once calculated, it does not change for the balance of the current BUY signal. Or upon reversal from BUY to SELL, set equal to the size of the previous O column, times the box size, subtracted from the top of the previous O column. Once calculated, it does not change for the balance of the current SELL signal.
Except for the percent_change column, each of the new columns above will initially be set to NULL, and filled in one row at a time as we iterate through the dataframe for each stock.
Starting with the first row for each stock, we will set the initial Day 1 values for plot_symbol and signal_name at “X” and “BUY” respectively. Then we will set the initial Day 1 values for the other columns equal to the Close price for the first day.
Next, set boxSize = 0.02, reversalThreshold = 3 and reversalAmount = boxSize x reversalThreshold. These are the parameters used to adjust for volatility as explained above.
Then we loop through each new day, and apply the following logic (pseudo-code):
Archive the data to S3 in zipped CSV and Parquet files — Once the final historical DataFrame is created and transformed into the proper format, we want to save a copy of it to our S3 bucket. I like to save it in both compressed CSV and parquet formats. S3 storage space is cheap, so why not? That way, you always have a clean copy if something goes awry.
AWS provides a library, Boto3, that facilitates moving files in and out of their S3 storage system. Amazon’s Boto documentation is extensive, but hard to follow. For a quick start tutorial, I found this site, RealPython.com to be more helpful.
For those who don’t know about the parquet file format, it is designed to feed into “column-oriented” databases such as Amazon’s Redshift database. Column-oriented databases are designed to handle very large datasets, consisting of many columns and billions of rows. Without going into all the details, the primary advantage of these databases is that they allow you to run queries much faster, which facilitates more advanced machine learning algorithms. So down the road, if you want to start running complicated algorithms against this data, saving it in a parquet format will allow you to easily import it into Redshift or other “Big Data” platforms for more creative analyses. Here’s a notebook that you might want to check out to compare the performance of the most popular file storage formats — CSV, parquet and HDF5.
Create the P&F data history table in RDS and upload the data — Once the DataFrame is ready, and a copy is saved to S3, you want to go ahead and upload it to the RDS P&F data history table. Once again, there are several ways to accomplish this task.
Since you may only need to do this one time, you could just save the price history data to a large CSV file and use a couple of SQL statements to create the table and upload the data manually. These commands would do the trick in MySQL Workbench or DBeaver:
Create the P&F Data History table:
Load the data:
If you have a fat fiber optic internet pipe like I do, this operation could take as little as 20 minutes. Or if you have a more typical cable pipe with slower upload speeds, it could take several hours. In total, you’d be loading about 12 Gb of data.
Once the data is loaded, you want to index several of the fields that you will probably be using in queries more often. I suggest that you index the ticker, figi, date, last_update_date, last_corporate_action_date, and key_id fields. Indexing will dramatically speed up queries that filter results using these data fields. The key_id field is a concatenated combination of date, ticker and FIGI code. You could use these SQL commands to create the indexes:
Index the fields:
Finally, I suggest you also create a full backup price history table, again just in case something gets screwed up and you need a clean version. These SQL commands would do that trick:
Create backup table:
Now once you’re done with this table, I would suggest you run this process and replace the PnF history table entirely about once a quarter. Daily updates are great and pretty reliable, but Murphy’s Law says some details will be missed. In that case, you might want to run the whole thing from a Python script and not rely on manual SQL commands. For that purpose, I included a module in the Notebook that uses the SQL Alchemy library to execute the SQL commands above straight from the Jupyter Notebook interface. It is slower, but more automated.
Again, that Notebook is located at
GitHub - bnsheehy/Stock-Market_Data_ETL: A collection of Jupyter Notebook Python files for…
A collection of Jupyter Notebook Python files for collecting stock market data in an AWS RDS database. - GitHub …
The next step is to create a script to update the table with new daily stock prices and Point & Figure calculations. The process involves the following steps:
- Get the most recent date from the P&F history table we created above.
- Get the most recent daily stock price data (open, high, low, close, volume) from the last date to the present date.
- Get the most recent record for each ticker/FIGI from the P&F historical data table, and join it to the price data records we just retrieved. This gives us the most recent P&F calculations for each ticker/FIGI from which we will begin the new calculations.
- Run the P&F calculations on the new price data. Make sure to set the beginning values for the P&F calculations equal to those in the most recent records retrieved in the previous step.
- Remove the oldest day’s records from the new calculations so there is no overlap with the historical data table when we insert the new data into that table.
- Save the new data in a CSV file on AWS S3 for archive purposes, in case something goes wrong and we need to reload it.
- Finally, insert the new data to the P&F historical data table in RDS that we created above using the SQLAlchemy Python library.
Each of these steps is represented as a function in the Python script which you can view in my Github account. Once this update script is run, you should be able to do a quick QC check in your MySQL IDE app, such as MySQL Workbench or DBeaver. The following commands should give you the same values as printed in the Python script output.
Confirm the most recent dates in the historical table:
SELECT MAX(date) FROM base_pnf_data_historical;SELECT MAX(last_updated_date) FROM base_pnf_data_historical;
Confirm the number of new records inserted:
SELECT COUNT(figi) FROM base_pnf_data_historical WHERE date BETWEEN '2021–11–29' AND '2021–12–03';
Finally, after confirming that the daily update process will run successfully for a few days in a row, we will want to automate the update process so we don’t have to keep re-running it manually. I found the best app for automating this process is Prefect. Prefect works well with Python and offers a free tier for small jobs like this one. I have provided more details on how to implement Prefect in my previous ETL article.
In my Github repository, I have provided versions of the full Python codes for each process, both with and without the Prefect implementation.
Now that we have the Point & Figure calculations created and updated, what kinds of cool things can we do with them? Well, for one thing, remember the example I described above about calculating the percentage of stocks displaying BUY or SELL signals in the S&P 500 index? P&F aficionados call that calculation the Bullish Percent Index (BPI). If we could calculate the BPI figures for the S&P 500 index over recent months, we could get a sense of whether or not the recent S&P 500 market index highs (as of the last week of December 2021) are supported by the underlying “market internals”.
To figure this out, I created a reference table in the RDS database called “custom_lists”. In this table, I linked lists of stock tickers to a couple of market indices to which they belong, such as the S&P 500. Here’s a screenshot of what it looks like:
Next, I wrote a SQL query to calculate the BPI figures for the S&P 500 index, which is the percentage of index constituents on a “BUY” signal divided by the total count of constituents, over the last year. You can see the query here:
Or on Github at
Finally, I plugged this SQL query into a Python script (in my Github account linked above) to calculate P&F values for this BPI time series. The BPI P&F figures indicate that the S&P 500 Bullish Percent Index has been in a downtrend all year, and currently showing a “SELL” signal, despite having the index itself hitting new highs. That means that the index value is being supported by fewer constituent companies, mostly the large-cap stocks, while many of the other stocks in the index are generating SELL signals. Here’s what I’m talking about:
To show it on a graph, I created a simple Point & Figure BPI chart using the MPL Finance version of the Matplotlib library in Python. Here’s the result:
Pretty cool, huh? If you combine this chart with the knowledge that PE ratios for the S&P 500 are much higher than long-term averages, the Fed will soon be raising interests rates, and inflation is kicking up, I would say it is time to be more cautious with your stock market investments. There might be some further room for the current Santas Claus rally to run, but I would be tightening my Stop-Loss orders for sure!
There are more useful things you can do with this data, such as calculating BPI figures and Relative Strength matrices for every sector and industry, so that even if the market is going to head down soon, you can still find opportunities for outperformance.
In the meantime, I’m available for consulting assignments if your company could use help with this kind of analytical project. You can contact me on LinkedIn if you would like to dive deeper with me.
Bryant Sheehy has 15+ years of experience in financial data sales and business development, and is now transitioning to data engineering and analytics. You can follow him on Medium or contact him on LinkedIn.
The opinions expressed here do not constitute investment advice. If you intend to use these ideas to make your own investments, please seek the advice of a licensed professional advisor.