One of the projects I am working on is a fintech application–Céillí. One of the data points that are presented for the stock trading community is that of Earnings Per Share (EPS). It’s a key financial metric that represents the portion of a company’s profit allocated to each outstanding share of common stock. EPS is calculated by dividing the company’s net income (after preferred dividends, if any) by the average number of outstanding common shares during the period. It’s commonly used to gauge a company’s profitability and is a vital input for valuation metrics like the price-to-earnings (P/E) ratio.
Below is an example Python script that calculates the Earnings Per Share (EPS) for a commodity-producing company (or any company) and then uses that EPS value to compute the Price-to-Earnings (P/E) ratio.you should have the net income and outstanding shares data to calculate this:
def calculate_eps(net_income: float, num_shares: float) -> float:
"""
Calculate Earnings Per Share (EPS).
Args:
net_income (float): The company's net income.
num_shares (float): The average number of outstanding shares.
Returns:
float: EPS value.
"""
if num_shares == 0:
raise ValueError("Number of shares cannot be zero")
return net_income / num_shares
getting EPS is a cumbersome task as the data is not in a freely available API, and those that do have that data charge a hefty fee which can eat into your returns, this informatino is usually contained in transcripts of earnings reports and general market reports in the media.
For my acquisition of EPS data I rely on spidering news headlines which I also use for sentiment analysis purposes. I noticed that much of the information needed for fundamental analysis, rather then technical analysis, are contained in the news headlines as well as the transcripts of earnings calls where companies present their productive value at quarterly and annual conference calls. Usually, one would rely on REGEX to extract information from text. Now, with the advent of Large Language Models and Generative AI, this task can become alot less cumbersome and automated. One way to do this is to use python and a LLM that is hosted by Huggingface and developed by NuMind, https://huggingface.co/numind/NuExtract-1.5-tiny
At this link you can find more code samples using python. For my purposes, I used this code to extract EPS from the text. How this basically works is that you use a json model to structure the data and the LLM is able to use this to place key datapoints from the text into this model, which of course could easily be automatically pushed to a database for any UI needs you may have.
import json
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
def predict_NuExtract(model, tokenizer, text, schema, examples=["", "", ""]):
# Parse and reformat the schema
schema = json.dumps(json.loads(schema), indent=4)
input_llm = "<|input|>\n" + schema + "\n"
# Only add examples if they are non-empty valid JSON strings
for ex in examples:
if ex.strip(): # only process if not empty
input_llm += json.dumps(json.loads(ex), indent=4) + "\n"
# Add the text to extract data from
input_llm += "### Text:\n" + text + "\n<|output|>\n"
# Tokenize and generate output
input_ids = tokenizer(input_llm, return_tensors="pt", truncation=True, max_length=4000).to(device)
#output = tokenizer.decode(model.generate(**input_ids)[0], skip_special_tokens=True)
output = tokenizer.decode(
model.generate(**input_ids, use_cache=False)[0], skip_special_tokens=True)
return output.split("<|output|>")[1].split("<|end-output|>")[0]
model = AutoModelForCausalLM.from_pretrained("numind/NuExtract-1.5-tiny", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("numind/NuExtract-1.5-tiny", trust_remote_code=True)
model.to(device)
#device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
model.eval()
text = ["Relmada Therapeutics Q4 2024 GAAP EPS $(0.62) Beats $(0.70) Estimate.",
"Clearside Biomedical Q4 2024 GAAP EPS $(0.10), Inline, Sales $306.00K Beat $176.67K Estimate.",
"Argan Q4 2024 GAAP EPS $2.22 Beats $1.15 Estimate, Sales $232.474M Beat $197.500M Estimate.",
"Plus Therapeutics FY24 EPS $(1.95) Vs. $(4.24) YoY, Grant Revenue $5.8M Up From $4.9M YoY",
"SeaStar Medical Holding Q4 EPS $(0.90) Misses $(0.89) Estimate, Sales $67.00K Miss $150.00K Estimate.",
"Pulse Biosciences Q4 EPS $(0.31) Down From $(0.21) YoY.",
"CalAmp FY 2024 GAAP EPS $(11.04), Inline.",
"VirTra Q4 2024 GAAP EPS $(0.08) Misses $0.04 Estimate, Sales $5.40M Miss $7.45M Estimate.",
"Better Choice Q4 EPS $(0.50), Sales $7.2M Up 26% From YoY."]
schema = """{
"company": "",
"period": "",
"eps_data": {
"eps_type": "",
"actual_eps": "",
"eps_estimate": "",
"eps_result": ""
},
"sales_data": {
"actual_sales": "",
"sales_estimate": "",
"sales_result": ""
}
}"""
for i in text:
prediction = predict_NuExtract(model, tokenizer, i, schema)
print(prediction)
'''
Output:
{
"company": "Relmada Therapeutics",
"period": "Q4 2024",
"eps_data": {
"eps_type": "GAAP",
"actual_eps": "0.62",
"eps_estimate": "0.70",
"eps_result": "$(0.62)"
},
"sales_data": {
"actual_sales": "",
"sales_estimate": "",
"sales_result": ""
}
}
Setting `pad_token_id` to `eos_token_id`:151646 for open-end generation.
{
"company": "Clearside Biomedical",
"period": "Q4 2024",
"eps_data": {
"eps_type": "GAAP",
"actual_eps": "0.10",
"eps_estimate": "176.67K",
"eps_result": ""
},
"sales_data": {
"actual_sales": "$306.00K",
"sales_estimate": "$176.67K",
"sales_result": ""
}
}
Setting `pad_token_id` to `eos_token_id`:151646 for open-end generation.
{
"company": "Argan",
"period": "Q4 2024",
"eps_data": {
"eps_type": "GAAP",
"actual_eps": "$2.22",
"eps_estimate": "$1.15",
"eps_result": "$232.474M"
},
"sales_data": {
"actual_sales": "$232.474M",
"sales_estimate": "$197.500M",
"sales_result": ""
}
}
Setting `pad_token_id` to `eos_token_id`:151646 for open-end generation.
{
"company": "Plus Therapeutics",
"period": "FY24",
"eps_data": {
"eps_type": "EPS",
"actual_eps": "1.95",
"eps_estimate": "4.24",
"eps_result": ""
},
"sales_data": {
"actual_sales": "5.8M",
"sales_estimate": "",
"sales_result": ""
}
}
Setting `pad_token_id` to `eos_token_id`:151646 for open-end generation.
{
"company": "SeaStar Medical Holding",
"period": "Q4",
"eps_data": {
"eps_type": "",
"actual_eps": "0.90",
"eps_estimate": "0.89",
"eps_result": "Misses"
},
"sales_data": {
"actual_sales": "$67.00K",
"sales_estimate": "$150.00K",
"sales_result": ""
}
}
Setting `pad_token_id` to `eos_token_id`:151646 for open-end generation.
{
"company": "Pulse Biosciences",
"period": "Q4",
"eps_data": {
"eps_type": "EPS",
"actual_eps": "0.31",
"eps_estimate": "0.21",
"eps_result": "Down From"
},
"sales_data": {
"actual_sales": "",
"sales_estimate": "",
"sales_result": ""
}
}
Setting `pad_token_id` to `eos_token_id`:151646 for open-end generation.
{
"company": "CalAmp",
"period": "FY 2024",
"eps_data": {
"eps_type": "GAAP",
"actual_eps": "11.04",
"eps_estimate": "",
"eps_result": ""
},
"sales_data": {
"actual_sales": "",
"sales_estimate": "",
"sales_result": ""
}
}
Setting `pad_token_id` to `eos_token_id`:151646 for open-end generation.
{
"company": "VirTra",
"period": "Q4 2024",
"eps_data": {
"eps_type": "GAAP",
"actual_eps": "0.08",
"eps_estimate": "0.04",
"eps_result": "Misses"
},
"sales_data": {
"actual_sales": "$5.40M",
"sales_estimate": "$7.45M",
"sales_result": ""
}
}
Setting `pad_token_id` to `eos_token_id`:151646 for open-end generation.
{
"company": "Better Choice",
"period": "Q4",
"eps_data": {
"eps_type": "EPS",
"actual_eps": "0.50",
"eps_estimate": "",
"eps_result": ""
},
"sales_data": {
"actual_sales": "$7.2M",
"sales_estimate": "",
"sales_result": "Up 26% From YoY"
}
}
'''
As one can see results can vary, although one could introduce steps into this basic process to further refine the models or even use case-based models that are specific to different patterns of headlines, as there are set ways that financial data is presented in the market news. Other financial ratios are also reported in the news, so having such an ability can help, such as collecting EBITDA, P/E, etc.
OpenAI’s chatGPT had this to add to EPS discussion that I found interesting:
me: is there an optimal eps value or is it relative to each stock? what constitutes a miss?
ChatGPT said:
Reasoned about optimal EPS and “Miss” for a couple of seconds
EPS (Earnings Per Share) isn’t a one-size-fits-all metric—it’s relative to the company, its industry, historical performance, and market expectations. There isn’t an “optimal” EPS value that applies across the board. Instead, investors look at trends, growth rates, and how the current EPS compares to analysts’ forecasts or the company’s own guidance.
What Constitutes a “Miss”
A “miss” typically occurs when a company’s reported EPS is below the consensus estimate or its internal guidance. Analysts and investors use these estimates as benchmarks, so if the actual EPS falls short, it’s seen as underperformance—even if the EPS might be strong in absolute terms. The severity of the miss can affect investor sentiment and, consequently, the stock’s price.
Key Considerations
Relative Performance: For one company, an EPS of, say, $2.00 might be excellent if the industry average is $1.00, while another company might be expected to have an EPS of $5.00 based on its size and market position.
Growth Trends: Investors often consider not just the current EPS but also its trend over time. Consistent growth can be more important than a single high or low number.
Market Expectations: EPS is closely watched around earnings announcements. A miss relative to consensus estimates can trigger a negative reaction, even if the company is profitable overall.
In summary, the “optimal” EPS is relative and context-dependent, and a miss is typically defined by falling short of pre-determined expectations rather than a fixed numeric benchmark.
me: if i have the eps for a company how do i get the P/E, what is the price i use?
ChatGPT said:
Reasoned for 10 seconds
The Price/Earnings (P/E) ratio is calculated by dividing the current market price of a company’s stock by its earnings per share (EPS). So if you have the EPS, the price you use is typically the current share price—often the last traded or closing price.
For example, if a company’s EPS is $2.00 and its current share price is $30.00, then the P/E ratio is calculated as:P/E=Current Share PriceEPS=30.002.00=15\text{P/E} = \frac{\text{Current Share Price}}{\text{EPS}} = \frac{30.00}{2.00} = 15P/E=EPSCurrent Share Price=2.0030.00=15
There are a couple of variations depending on your analysis:
Trailing P/E: Uses the current share price and the EPS from the past 12 months.
Forward P/E: Uses the current share price and forecasted EPS.
In most standard cases, you’d use the latest closing price from the market to compute the P/E ratio.
o3-mini-high
You said:
me: how do you get the trailing p/e, do you use daily bars? and how do you get the forecasted EPS?
ChatGPT said:
Reasoned for 12 seconds
Trailing P/E is calculated by dividing the current market price (usually the last closing price) by the earnings per share over the trailing twelve months (TTM EPS). Here’s how it typically works:
Trailing P/E:
Price: You take the most recent closing price of the stock. While daily bars provide the closing price, you don’t calculate the EPS from daily bars. Instead, you use the EPS figure from the company’s quarterly reports and annualize it (i.e., sum up the last four quarters of reported EPS).
EPS: The earnings data comes from the company’s financial reports. For example, if you have EPS for Q1, Q2, Q3, and Q4, you add those together to get the TTM EPS.
Source: Forecasted EPS isn’t derived directly from historical price data or daily bars. Instead, it’s typically provided by analysts’ consensus estimates or company guidance. Financial data providers (like Bloomberg, FactSet, or Reuters) compile these forecasts.
Usage: You take the current stock price and divide it by the forecasted EPS (often for the next 12 months) to get the forward P/E ratio.
In summary, daily bars are mainly used to get the current price (closing price) for the calculation. The earnings figure for trailing P/E comes from reported quarterly earnings, and forecasted EPS comes from analysts’ estimates or management’s guidance.
Here is a development version, although rough in visual style as a prototype, gives you a picture of how you can put this altogether for algorithmic insights into the market.
In statistics, regression toward the mean (also called regression to the mean, reversion to the mean, and reversion to mediocrity) is the phenomenon where if one sample of a random variable is extreme, the next sampling of the same random variable is likely to be closer to its mean.[2][3][4] Furthermore, when many random variables are sampled and the most extreme results are intentionally picked out, it refers to the fact that (in many cases) a second sampling of these picked-out variables will result in “less extreme” results, closer to the initial mean of all of the variables.
Mathematically, the strength of this “regression” effect is dependent on whether or not all of the random variables are drawn from the same distribution, or if there are genuine differences in the underlying distributions for each random variable. In the first case, the “regression” effect is statistically likely to occur, but in the second case, it may occur less strongly or not at all.
Regression toward the mean is thus a useful concept to consider when designing any scientific experiment, data analysis, or test, which intentionally selects the most extreme events – it indicates that follow-up checks may be useful in order to avoid jumping to false conclusions about these events; they may be genuine extreme events, a completely meaningless selection due to statistical noise, or a mix of the two cases.
Mathematically, a continuous mean-reverting time series can be represented by an Ornstein-Uhlenbeck stochastic differential equation in the following form:
Where θ is the rate of reversion to the mean, μ is the mean value of the process, σ is the variance of the process and, finally, Wt is a Wiener process. The given equation implies that the change of the time series in the next period is proportional to the difference between the mean and the current value, with the addition of Gaussian noise.
We can see mean reversion as the line of linear regression as in this plot:
A key concept in testing for mean reversion is that of stationarity:
In mathematics and statistics, a stationary process (also called a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose statistical properties, such as mean and variance, do not change over time. More formally, the joint probability distribution of the process remains the same when shifted in time. This implies that the process is statistically consistent across different time periods. Because many statistical procedures in time series analysis assume stationarity, non-stationary data are frequently transformed to achieve stationarity before analysis.
The Augmented Dickey-Fuller test provides a quick check and confirmatory evidence that your time series is stationary or non-stationary. The ADF test is based on the simple observation that if the value level is higher than the mean, the next move will be downward while if the value is lower than the mean, the next move will be upward.
In the python code below we will simply interpret the result using the p-value from the test. A p-value below a specified threshold (we are going to use 5%) suggests we reject the null hypothesis (stationary), otherwise a p-value above the threshold suggests we accept the null hypothesis (non-stationary).
import numpy as np
from statsmodels.regression.linear_model import OLS
from statsmodels.tsa.tsatools import lagmat, add_trend
from statsmodels.tsa.adfvalues import mackinnonp
def adf(ts):
"""
Augmented Dickey-Fuller unit root test
"""
# make sure we are working with an array, convert if necessary
ts = np.asarray(ts)
# Get the dimension of the array
nobs = ts.shape[0]
# We use 1 as maximum lag in our calculations
maxlag = 1
# Calculate the discrete difference
tsdiff = np.diff(ts)
# Create a 2d array of lags, trim invalid observations on both sides
tsdall = lagmat(tsdiff[:, None], maxlag, trim='both', original='in')
# Get dimension of the array
nobs = tsdall.shape[0]
# replace 0 xdiff with level of x
tsdall[:, 0] = ts[-nobs - 1:-1]
tsdshort = tsdiff[-nobs:]
# Calculate the linear regression using an ordinary least squares model
results = OLS(tsdshort, add_trend(tsdall[:, :maxlag + 1], 'c')).fit()
adfstat = results.tvalues[0]
# Get approx p-value from a precomputed table (from stattools)
pvalue = mackinnonp(adfstat, 'c', N=1)
return pvalue
this code can also bevalidated by referencing the function adfuller, included in the Python module statsmodels
One can also test the stationarity by using the Hurst test. This measures the speed of diffusion in mean reversion, which should be slower then in a geometric random walk. The speed of diffusion is measured by it’s variance.
In code we can test for the Hurst exponent in the following code from Corrius (2018):
def hurst(ts):
"""
Returns the Hurst Exponent of the time series vector ts
"""
# make sure we are working with an array, convert if necessary
ts = np.asarray(ts)
# Helper variables used during calculations
lagvec = []
tau = []
# Create the range of lag values
lags = range(2, 100)
# Step through the different lags
for lag in lags:
# produce value difference with lag
pdiff = np.subtract(ts[lag:],ts[:-lag])
# Write the different lags into a vector
lagvec.append(lag)
# Calculate the variance of the difference vector
tau.append(np.sqrt(np.std(pdiff)))
# linear fit to double-log graph
m = np.polyfit(np.log10(np.asarray(lagvec)),
np.log10(np.asarray(tau).clip(min=0.0000000001)),
1)
# return the calculated hurst exponent
return m[0]*2.0
H=0.5, is a geometric random walk; for a mean reverting series, H<0.5, and, finally, for a trending series H>0.5. H also is an indicator for the degree of mean reversion or trendiness: as H decreases towards 0, the series is more mean reverting and as it increases towards 1, it is more trending.
To make sure it is not a random walk we can test the statistical significance of the H value with the Variance Ratio Test:
import numpy as np
def variance_ratio(ts, lag = 2):
"""
Returns the variance ratio test result
"""
# make sure we are working with an array, convert if necessary
ts = np.asarray(ts)
# Apply the formula to calculate the test
n = len(ts)
mu = sum(ts[1:n]-ts[:n-1])/n;
m=(n-lag+1)*(1-lag/n);
b=sum(np.square(ts[1:n]-ts[:n-1]-mu))/(n-1)
t=sum(np.square(ts[lag:n]-ts[:n-lag]-lag*mu))/m
return t/(lag*b);
#Source: Corrius (2018)
The test involves dividing the variance of group one by the variance of group two. If this ratio is close to one the conclusion drawn is that the variance of each group is the same. If the ratio is far from one the conclusion drawn is that the variances are not the same.
So how long will it take for the time series to mean revert, to diffuse back to the mean? This is seen in measuring the ‘half-life’ of the mean reversion.
import numpy as np
def half_life(ts):
"""
Calculates the half life of a mean reversion
"""
# make sure we are working with an array, convert if necessary
ts = np.asarray(ts)
# delta = p(t) - p(t-1)
delta_ts = np.diff(ts)
# calculate the vector of lagged values. lag = 1
lag_ts = np.vstack([ts[1:], np.ones(len(ts[1:]))]).T
# calculate the slope of the deltas vs the lagged values
beta = np.linalg.lstsq(lag_ts, delta_ts)
# compute and return half life
return (np.log(2) / beta[0])[0]
#source: Corrius (2018)
So we can see that we can understand mean regression in programming, namely for fintech, through the following steps:
Test for stationarity using the Augmented Dickey Fuller test (ADF Test)
Confirm by testing the Hurst Exponent (H)
Test for the variance ratio, F-ratio test
test for the time to mean revert using the half life test
A quick study of z-scores in varying samples of NVDA stock prices.
I used python to generate these plots.
animation of NVDA zscore scatter plot x , y are in standard deviation units, anything above 3 or below -3 is considered outliers in data science. DeepSeek release was nearly an outlier effect on NVDA value. Image shows last 15 days, 30 days, 45 days, 60 days and 93 days.
Z-Scores for QQQ ETF Tech Index stock, for same time periods above for NVDA. In an index stock such as QQQ one should see a smoother spectrum as it is less susceptible to volatility.
First, we see the last 93 trade days price chart. Then afterwards, we take a look at the 93, 60, 45, 30, 15 sample windows all going backward in time from Feb. 20th, 2025. One major petrubative wave that hit the stock was the release of DeepSeek which had a negative impact temporarily on NVDA value. The question is whether one can see a correlation to the action of the Index ETF for the sector NVDA is in and is a part of the portfolio. Examining the spread in the z-scores to see if one can tell if it is an indicator of up or down motion in relation to the index for NVDA stock prices.
Comparing different plots of Z-scores for the price of stocks for NVDA form Oct 4th, 2024 to February 20, 2025.
NVDA past 93 trading days. open price 124.92, close price 140.11 (gained 13%)
QQQ ETF, of which NVDA is a member of the portfolio, past 93 trading days, open price: 487.32 close price: 537.23 (gained 10.2%)
Z-Scores:
z-score is calcualted as:
93 Days:
NVDA past 93 trading days zscores
QQQ past 93 trading days
I use the terms "prices" as code for NVDA and "trends" for QQQ, the index ETF for tech stocks.
Some Data for Z-scores:
Shape of Z-score plots:
prices max/min, trends max/min: 1.7594241527219399 -2.9115388106292883 1.7722687913449942 -2.0281115555314564
NVDA length and mean of positive and negative:
prices positive len: 51 0.7123723838425315
prices negative len: 43 -0.8449067808364951
prices positive list: [0.14019500216746938, 0.13021735775384247, 0.9441080663512041, 0.9270035330706985, 0.35257629040040717, 0.4737334011373164, 0.6348010895287374, 0.4894125566444479, 0.5934651341008516, 0.3205052904994612, 0.40246451246854575, 1.2149298432925333, 1.6810283751862907, 1.502856153514364, 1.1650416212243906, 1.5969310865571407, 1.30900477633531, 1.3788482872307024, 0.6975177115572552, 0.43667357902955695, 1.4144827315650879, 1.254840420947041, 1.3660198872703233, 0.6932415782371287, 0.16585180208822778, 0.2200161574764928, 0.4523527345366844, 1.147937087943885, 1.136534065756884, 0.7630850891325253, 0.2456729573972512, 0.3169418460660218, 0.03614242471106574, 0.36825544590753456, 0.44665122344318386, 0.40531526801529805, 0.057523091311697735, 0.17440406872848058, 1.052436777127734, 1.7594241527219399, 0.4352482012561788, 0.43097206793605647, 0.08888140232595665, 0.5335992676190859, 1.4230349982053405, 1.4444156648059727, 0.7887418890532837, 0.25137446849075173, 0.32977024602640104, 0.30553882387901676, 0.43097206793605647]
prices negative list: [-1.6826224330881445, -2.0281115555314564, -1.5515748349199896, -1.2967600607041379, -1.3338240278628077, -1.2828610730196344, -1.0088524586680414, -1.4489870686772424, -1.4450159293388134, -1.4225128064210513, -1.2093949952586989, -1.146518622400242, -1.1107783683543844, -1.6137893512220423, -1.3516941548857364, -1.1531371879642918, -1.1478423355130543, -0.8327986146643587, -1.082980392985381, -1.9122866581606137, -1.6753420109676906, -1.7693256419771755, -1.3589745770061903, -0.47407236109295114, -0.032614037970937836, -0.26823497205105246, -1.0704051184136898, -0.8420646064540289, -0.6143859510507724, -0.6335797911865096, -0.5137837544772378, -0.4601733734084476, -0.4072248488960653, -0.22719986555395358, -0.49260434467228414, -0.19874003362854603, -0.10012340672422786, -0.16630906236470946, -0.36751345551177483, -0.4753960742057596, -0.507165188913192]
Trends (QQQ) length and mean for positive/negative:
trends positive len: 53 0.7493882152106839
trends negative len: 41 -0.9687213513699346
trends positive list: [0.05276545780528101, 0.09247685118957147, 0.07262115449743001, 0.011068494751777909, 0.1685903551761238, 0.2731636910880874, 0.6960900306307639, 0.6001208299520626, 0.9092078417931164, 0.6378466536671404, 0.520036186627085, 1.1368864971963693, 0.9105315549059249, 1.1772597471370638, 1.6829181562303424, 1.5260581523624006, 0.2466894288318963, 0.09446242085878412, 0.39163601468454706, 0.670277624930977, 1.1395339234219937, 1.1157070873914194, 0.6497600716824238, 0.18976976498108122, 0.3863411622333134, 0.7808076698505786, 0.16130993305566993, 0.16726664206331537, 0.2619121296292083, 0.022320056210664595, 0.595487834057233, 0.7980159403171032, 1.2421216896647371, 1.316911480538481, 1.117692657060632, 0.0971098470844085, 0.6001208299520626, 0.535258887424397, 0.6821910429462604, 0.631889944659495, 0.3552339040822852, 0.776174673955749, 0.9336965343800949, 1.1157070873914194, 0.6735869077129981, 1.0925421079172493, 1.009148181810243, 1.0296657350587888, 1.5326767179264504, 1.681594443117534, 1.7623409429989234, 1.7722687913449942, 1.6207036399282937]
trends negative list: [-1.6826224330881445, -2.0281115555314564, -1.5515748349199896, -1.2967600607041379, -1.3338240278628077, -1.2828610730196344, -1.0088524586680414, -1.4489870686772424, -1.4450159293388134, -1.4225128064210513, -1.2093949952586989, -1.146518622400242, -1.1107783683543844, -1.6137893512220423, -1.3516941548857364, -1.1531371879642918, -1.1478423355130543, -0.8327986146643587, -1.082980392985381, -1.9122866581606137, -1.6753420109676906, -1.7693256419771755, -1.3589745770061903, -0.47407236109295114, -0.032614037970937836, -0.26823497205105246, -1.0704051184136898, -0.8420646064540289, -0.6143859510507724, -0.6335797911865096, -0.5137837544772378, -0.4601733734084476, -0.4072248488960653, -0.22719986555395358, -0.49260434467228414, -0.19874003362854603, -0.10012340672422786, -0.16630906236470946, -0.36751345551177483, -0.4753960742057596, -0.507165188913192]
zscore silos:
prices 0 to 1: 35 0.4056562939002893
prices 1 to 2: 16 1.3722767337624457
prices 2 to 3: 0 nan
prices 3>: 0 nan
prices 0 to -1: 31 -0.4417795985574785
prices -1 to -2: 8 -1.4510292935080256
prices -2 to 3: 4 -2.7127490308408726
prices <3: 0 nan
trends 0 to 1: 35 -0.27555447552077394
trends 1 to 2: 17 1.344595325715943
trends 2 to 3: 0 nan
trends 3>: 0 nan
trends 0 to -1: 20 -0.41098618215387406
trends -1 to -2: 21 -1.3951694395818528
trends -2 to 3: 1 -2.0790028405616057
trends <3: 0 nan
60 DAYS
NVDA past 60 days open: 146.67, close: 140.11
QQQ past 60 days open: 504.98, close: 537.23
"prices" is for NVDA, "trends" is for QQQ
some data for 60 day plots:
shape of plots:
prices max/min, trends max/min: 1.9145807927766163 -2.645347442651133 1.8695586413001115 -1.7542554651846929
prices positive len: 34 0.6792199553321576
prices negative len: 26 -0.8882107108189714
prices positive list: [1.5305282468571573, 0.8737427335456234, 0.04858635771143208, 0.1738208835547299, 0.35888968285649864, 0.4117664826570029, 0.6385801239065363, 1.3176295529235456, 1.3064975950708098, 0.9419259753936441, 0.43681338782566403, 0.5063881244052757, 0.23226366228160591, 0.556481934742594, 0.6330141449801685, 0.5926607977639948, 0.18634433613906048, 0.2531360832554902, 0.36723865124605237, 1.2243994059068675, 1.9145807927766163, 0.6218821871274288, 0.617707702932656, 0.03327991566391562, 0.07919924180646105, 0.2837489673505192, 0.7178953236072966, 1.5861880361208474, 1.6070604570947318, 0.9669728805623052, 0.4423793667520319, 0.5189115769896063, 0.49525616655253607, 0.617707702932656]
prices negative list: [-1.7542554651846929, -1.669273142479056, -1.5853399842512712, -1.2999672462767844, -1.720682201893578, -1.2548531737293467, -0.6725668885240624, -0.506798901024174, -0.11546055078710396, -0.5487654801380664, -0.7900733100429681, -0.318998459489499, -0.6389936252329415, -1.098527666530088, -1.203444114314825, -0.32739177531227037, -0.6841076977803849, -0.6746652174797553, -1.5223901155804267, -1.6934039254695472, -1.7437638204062227, -0.524634697147575, -0.9044322381283226, -0.7858766521315705, -0.0913297677966126, -0.3767025057710995]
trends positive len: 34 0.7207852536147648
trends negative len: 26 -0.9425653316500864
trends positive list: [0.1636172003202925, 0.01148835103241942, 0.5014481621871478, 0.07129072626972463, 0.8623607425666361, 0.5035464911428407, 0.926359775715327, 1.7279214367907145, 1.4792694555408878, 0.12269978568424661, 0.8665574004780336, 0.828787479275527, 0.09017568687097201, 0.29791025348475275, 0.004144199687494526, 0.32518852990878333, 1.0291778945443708, 1.147733480541123, 0.8319349727090662, 0.01148835103241942, 0.14158474628549397, 0.06184824596909498, 0.2905661021398278, 0.5402672478675009, 0.828787479275527, 0.12794560807347868, 0.7920667225508667, 0.6598719983420994, 0.6923960971553621, 1.4897611003193638, 1.7258231078350217, 1.8538211741324033, 1.8695586413001115, 1.6292999758730682]
trends negative list: [-1.7542554651846929, -1.669273142479056, -1.5853399842512712, -1.2999672462767844, -1.720682201893578, -1.2548531737293467, -0.6725668885240624, -0.506798901024174, -0.11546055078710396, -0.5487654801380664, -0.7900733100429681, -0.318998459489499, -0.6389936252329415, -1.098527666530088, -1.203444114314825, -0.32739177531227037, -0.6841076977803849, -0.6746652174797553, -1.5223901155804267, -1.6934039254695472, -1.7437638204062227, -0.524634697147575, -0.9044322381283226, -0.7858766521315705, -0.0913297677966126, -0.3767025057710995]
zscore silos:
prices 0 to 1: 26 0.4857168466015017
prices 1 to 2: 7 1.4753238481642317
prices 2 to 3: 0 nan
prices 3>: 0 nan
prices 0 to -1: 20 -0.42879927835305054
prices -1 to -2: 3 -1.5721141625126993
prices -2 to 3: 4 -2.4158942235473466
prices <3: 0 nan
trends 0 to 1: 23 -0.06220277723223607
trends 1 to 2: 9 1.5519738719284009
trends 2 to 3: 0 nan
trends 3>: 0 nan
trends 0 to -1: 18 -0.4898406138886327
trends -1 to -2: 10 -1.5547354231811998
trends -2 to 3: 0 nan
trends <3: 0 nan
45 DAYS
NVDA past 45 days open: 134.25, close: 140.11
QQQ past 45 days open: 530.53, close: 537.23
some data for 45 day plot:
shape of plots:
prices max/min, trends max/min: 2.0056164520976707 -2.335406431884398 1.7445748897642444 -2.004426294606925
zscore means of prices and trends: 4.46309655899313e-15 -1.0288066694859784e-14
NVDA prices length and mean of positive/negative:
prices positive len: 24 0.7357314773777098
prices negative len: 21 -0.8408359741459445
prices positive list: [0.054341899730996124, 0.712714915702643, 0.7855730965445773, 0.7471569648279229, 0.36034625926711406, 0.4239315807291669, 2.943764882851058e-05, 0.5325565048935021, 1.3485681303231487, 2.0056164520976707, 0.7749755429675673, 0.7710014603761928, 0.2146298975832493, 0.2583448060884106, 0.45307485306593986, 0.8663794425692682, 1.6929886215759211, 1.7128590345328127, 1.1034997038548302, 0.13249885736143355, 0.6040899915383078, 0.6769481723802421, 0.6544283710290971, 0.7710014603761928]
prices negative list: [-0.7645562745724367, -1.014925110927081, -0.5261615999565064, -0.06787777341171075, -0.10162313831168997, -0.8581724481659111, -1.334961797397784, -1.443817813204148, -0.5348700812210075, -0.9049805349626544, -0.8951834935400782, -1.7747401012554962, -1.9521754070198687, -2.004426294606925, -0.739519390936971, -1.1335781681560086, -0.19088507127290136, -1.010570870294818, -0.18326515016646283, -0.28994404565668896, -0.048283690566570704, -0.1310142625794062, -0.5860324086500015, -0.0624349726213975]
pos/neg length and mean:
trends positive len: 21 0.8835238047359075
trends negative len: 24 -0.7730833291439385
trends positive list: [0.7659593076650326, 1.5976192684256507, 1.3396305109645679, 0.7039113786554122, 0.6647232129651197, 0.11391177298491237, 0.14221433709456596, 0.8726382031552712, 0.9956455010164618, 0.6679888934393077, 0.10629185187847384, 0.3653691694976192, 0.6647232129651197, 0.6266236074328899, 0.4894650275168725, 0.5232103924168393, 1.3505161125452068, 1.5954421481095253, 1.7282464873932923, 1.7445748897642444, 1.4952946135676752]
trends negative list: [-0.7645562745724367, -1.014925110927081, -0.5261615999565064, -0.06787777341171075, -0.10162313831168997, -0.8581724481659111, -1.334961797397784, -1.443817813204148, -0.5348700812210075, -0.9049805349626544, -0.8951834935400782, -1.7747401012554962, -1.9521754070198687, -2.004426294606925, -0.739519390936971, -1.1335781681560086, -0.19088507127290136, -1.010570870294818, -0.18326515016646283, -0.28994404565668896, -0.048283690566570704, -0.1310142625794062, -0.5860324086500015, -0.0624349726213975]
zscore silos:
prices 0 to 1: 19 0.5159193883045129
prices 1 to 2: 4 1.4639440396810512
prices 2 to 3: 1 2.005079787550808
prices 3>: 0 nan
prices 0 to -1: 14 -0.38097024812529595
prices -1 to -2: 4 -1.454678571075471
prices -2 to 3: 3 -2.1703421886682137
prices <3: 0 nan
trends 0 to 1: 13 0.18729980332171914
trends 1 to 2: 8 1.5002054504426652
trends 2 to 3: 0 nan
trends 3>: 0 nan
trends 0 to -1: 16 -0.42194589789701936
trends -1 to -2: 7 -1.3777869923741477
trends -2 to 3: 1 -2.0051154449596136
trends <3: 0 nan
30 DAYS
NVDA past 30 days, during which it took a major dive with the release of DeepSeek close: open: 140.14, close: 140.11
QQQ past 30 days open: 515.18, close: 537.23
Some Data for 30 day plot
shape of zscore plot
prices max/min, trends max/min: 1.7395624418627935 -2.0102455727613866 1.635662150597431 -1.980244163451975
beats_condition, max, min: False True False
avg zscores list: [-0.04900609322786642, -0.043237954955050006, -1.4069222487518587, -1.9069027691119136, -2.1376722944324356, -0.3679613081523311, -1.0756478274354047, 0.34156933192037153, 1.0456770404126563, 2.5158374261919283, 2.6528832614306106, 1.7724244766420925, -2.8159600623456074, -0.7210526618453241, -1.4730439252605447, -1.1233948714886564, -1.765168780240289, -2.622450849103493, -1.7105258315300362, -0.7023407103306647, 0.0587929352193316, -0.500214456769451, 0.6220642834292512, 0.39529360882055975, 0.2241537498518348, 1.531306979740637, 2.2043609778904045, 2.3999373023930133, 2.3948264891602498, 2.2623747818778503]
zscore differential avg: -4.4704980458239636e-15
zscore means of prices and trends: -7.919590908992784e-16 -3.796962744218036e-15
trend count and mean of pos or neg:
prices positive len: 17 0.7172287353811013
prices negative len: 13 -0.9379145001137494
prices positive list: [0.8708241976370331, 0.8671431033818425, 0.35178990765469526, 0.022945487524039902, 0.3922819444618296, 0.06466455574957113, 0.5726555629663304, 0.9554893655064959, 1.7211569705868235, 1.7395624418627935, 1.1751279893997306, 0.06466455574957113, 0.2757139597140209, 0.7125371446636966, 0.780023872675586, 0.7591643385628187, 0.8671431033818425]
prices negative list: [-0.9198302908648995, -0.9103810583368925, -1.758712156406554, -1.9298482566359534, -1.980244163451975, -0.7602432526141607, -1.1403123831849757, -0.23108623104595882, -1.0216720192222633, -0.22373682796862931, -0.326628471051326, -0.09354740202724439, -0.1733409211526078, -0.6122052763421065, -0.10719629345658256]
trend count and mean of pos or neg:
trends positive len: 15 0.8125990002508011
trends negative len: 15 -0.8125990002508086
trends positive list: [0.09018767490616038, 0.7946804556051048, 0.9133208195678171, 0.5972964872423618, 0.055540488970154546, 0.30542019359958456, 0.5941467430663635, 0.5573997276796802, 0.42511047228762966, 0.45765782877296995, 1.2555930200266159, 1.4918238332267078, 1.6199134297174271, 1.635662150597431, 1.395231678496008]
trends negative list: [-0.9198302908648995, -0.9103810583368925, -1.758712156406554, -1.9298482566359534, -1.980244163451975, -0.7602432526141607, -1.1403123831849757, -0.23108623104595882, -1.0216720192222633, -0.22373682796862931, -0.326628471051326, -0.09354740202724439, -0.1733409211526078, -0.6122052763421065, -0.10719629345658256]
zscore silos:
prices 0 to 1: 14 0.5195964415593288
prices 1 to 2: 3 1.5885045560705986
prices 2 to 3: 0 nan
prices 3>: 0 nan
prices 0 to -1: 7 -0.38925651381809206
prices -1 to -2: 5 -1.4606024370377964
prices -2 to 3: 1 -2.0120560681267605
prices <3: 0 nan
trends 0 to 1: 11 0.23112599712881077
trends 1 to 2: 5 1.4620954964259174
trends 2 to 3: 0 nan
trends 3>: 0 nan
trends 0 to -1: 9 -0.4261916020115731
trends -1 to -2: 4 -1.522227807745561
trends -2 to 3: 1 -2.0472318717466145
trends <3: 0 nan
15 DAYS
NVDA past 15 days during which price regained momentum and climbed back up. open: 124.65, min on day81 at 116.64, close: 140.11
QQQ past 15 days open: 523.05, close: 537.23.
Some Data for 15 day plot
shape of z-score:
prices max/min, trends max/min: 1.2978590301422221 -1.7892515585278195 1.4875489718437718 -1.7015455028399435
prices positive len: 8 0.7961212804164766
prices negative len: 7 -0.9098528919045411
prices positive list: [0.4368908744960763, 0.3355230641218386, 0.11698986253581116, 0.6633228665008721, 1.1319844313480085, 1.2043900101867528, 1.1820101040002302, 1.2978590301422221]
prices negative list: [-0.9657152036835966, -1.0789198650922653, -1.7015455028399435, -0.7542012310515969, -0.3996918966402359, -0.9850791589245527, -0.04220349219180732, -0.22988490452723273, -0.18370931895265175]
trends positive len: 6 1.056825095650698
trends negative len: 9 -0.7045500637670981
trends positive list: [0.009930233456925725, 0.9483372951340528, 1.2834826743044578, 1.465205946565748, 1.4875489718437718, 1.1464454525992316]
trends negative list: [-0.9657152036835966, -1.0789198650922653, -1.7015455028399435, -0.7542012310515969, -0.3996918966402359, -0.9850791589245527, -0.04220349219180732, -0.22988490452723273, -0.18370931895265175]
zscore silos:
prices 0 to 1: 5 0.3387271345868971
prices 1 to 2: 4 1.13156977020328
prices 2 to 3: 0 nan
prices 3>: 0 nan
prices 0 to -1: 3 -0.4156379102954067
prices -1 to -2: 3 -1.6576670076204636
prices -2 to 3: 0 nan
prices <3: 0 nan
trends 0 to 1: 1 0.584379796953797
trends 1 to 2: 4 1.3465306598267066
trends 2 to 3: 0 nan
trends 3>: 0 nan
trends 0 to -1: 7 -0.3399223000040687
trends -1 to -2: 3 -1.315505544516218
trends -2 to 3: 0 nan
trends <3: 0 nan
The above charts and data are generated in the following code snippets.
This code snippet gets data into Pandas Dataframes from the Alpaca API.
############### INIT CEILLI CLASSES ####################
from classes.stock_list import StockList
from classes.config import Config
from classes.alpaca import Alpaca
from classes.utilities import Utilities
from classes.market_beat import MarketBeat
from classes.profit_loss import ProfitLoss
from classes.plots import Plots
util = Utilities(pd.DataFrame())
conf = Config(api_key=api_key, api_secret=api_secret, api_base_url=api_base_url, algo_version=ALGO_VERSION)
mb = MarketBeat(pd.DataFrame(), api_key=api_key, api_secret=api_secret, api_base_url=api_base_url, algo_version=ALGO_VERSION)
alpa = Alpaca(api_key=api_key, api_secret=api_secret, api_base_url=api_base_url, algo_version=ALGO_VERSION)
stocks = StockList()
plots = Plots(pd.DataFrame())
############## SETTINGS ###################
#CONSTANTS, see setting.toml for conflicts, set here to overide settings.toml file Constants
ALGO_VERSION = conf.algo_version
BASE_CURRENCY = conf.base_currency
############# LOGGING #################
import logging
logging.basicConfig(
filename="logs/charts_"+ALGO_VERSION+".log",
level=logging.INFO,
format="%(asctime)s:%(levelname)s:%(message)s"
)
alpa = Alpaca(api_key=api_key, api_secret=api_secret, api_base_url=api_base_url, algo_version=ALGO_VERSION)
############################### CONFIGS ###################################################
# API Credentials alpaca4 edge
API_KEY = conf.api_key
API_SECRET = conf.api_secret
API_BASE_URL = conf.api_base_url
SECRET_KEY = API_SECRET
#CONSTANTS
TIMEZONE_OFFSET = -4.0 #set in config file, this is deprecated, i think
if DEBUG:
PROCESS_ROWS = 0 #set to low number for debugging, otherwise 1000
else:
PROCESS_ROWS = 1000
########################### DRIVER #######################################
date = DATE
from datetime import date
from datetime import timedelta
import datetime
from datetime import datetime, timezone, timedelta
N_DAYS_AGO = 500
YESTERDAY = 1
#today = datetime.now()
today = date.today()
n_days_ago = today - timedelta(days=N_DAYS_AGO)
one_day_ago = today - timedelta(days=YESTERDAY)
today = date.today()
timezone_offset = -4 # EST is -4, that is 4 hours behind GMT
tzinfo = timezone(timedelta(hours=timezone_offset))
now = datetime.now(tzinfo)
back_time = now - timedelta(minutes=15)
date = back_time.strftime("%Y-%m-%d %H:%M:%S")
start_time = now - timedelta(minutes=45)
start = start_time.strftime("%Y-%m-%d %H:%M:%S")
end = date
beg_date = str(n_days_ago) + ' 00:00:00'
end_date = str(one_day_ago) + ' 23:59:00'
if MODE == 'SCREENER' or MODE == 'HISTORICAL':
try:
#STOCK_LIST = stocks.TECH_AL
STOCK_LIST = ['NVDA', 'MSFT']
STOCK_SET = set(STOCK_LIST) #remove duplicates from list
STOCK_LIST = list(STOCK_SET)
STOCK_LIST = sorted(STOCK_LIST)
symbol_list = STOCK_LIST
index_symbol = stocks.stock_index(ALGO_VERSION)
cnt = 0
for symbol in symbol_list:
print(ALGO_VERSION)
print(symbol)
print(index_symbol)
hundred_dates = alpa.get_calendar(str(n_days_ago), str(one_day_ago))
#get prices for symbol in trading list
symbol_price_data = alpa.stockbars_by_symbol_by_day(symbol, beg_date, end_date)
symbol_price_data = symbol_price_data.reset_index(level=("symbol", "timestamp"))
prices_data = symbol_price_data
#get prices for trend index for symbol above
index_price_data = alpa.stockbars_by_symbol_by_day(index_symbol, beg_date, end_date)
index_price_data = index_price_data.reset_index(level=("symbol", "timestamp")) #alpaca dataframe return has an index of symbol, timestamp format
column_names = index_price_data.columns
trends_data = index_price_data
symbol_prices = symbol_price_data
column_names = prices_data.columns
print(column_names)
prices_data = symbol_prices[['timestamp', 'symbol', 'open', 'close', 'vwap']].copy()
trends_data = trends_data[['timestamp', 'symbol', 'open', 'close', 'vwap']].copy()
prices_data.rename(columns = {'timestamp':'date'}, inplace = True)
trends_data.rename(columns = {'timestamp':'date'}, inplace = True)
#prices_data = prices_data.reset_index()
#trends_data = trends_data.reset_index()
date_stamp = prices_data.iloc[0]['date']
print()
print()
print("Statistical Analysis: ")
prices_arr = np.array(prices_data['close'])
from scipy.stats import skew, kurtosis
# Calculate the skewness
print("Symbol Prices skew: ")
print(skew(prices_data['close'], axis=0, bias=True))
print("Index skew: ")
print(skew(trends_data['close'], axis=0, bias=True))
# Calculate the kurtosis
print("Symbol Prices kurtosis: ")
print(kurtosis(prices_data['close'], axis=0, bias=True))
print("Index kurtosis: ")
print(kurtosis(trends_data['close'], axis=0, bias=True))
print()
print("Covariance between the two: ")
cov_matrix = np.stack((prices_data['close'], trends_data['close']), axis = 0)
print(np.cov(cov_matrix))
print()
print("Correlation between the two: ")
correlations = np.correlate(prices_data['close'], trends_data['close'])
print(correlations)
print()
print()
print("Mean of the Symbol: ")
data_mean = np.mean(prices_data['close'])
data_max = max(prices_data['close'])
data_min = min(prices_data['close'])
print("mean is: " + str(data_mean))
print("max/min is: "+str(max(prices_data['close'])), str(min(prices_data['close'])))
print()
print()
print("Variance of the Symbol: ")
m = sum(prices_data['close']) / len(prices_data['close'])
std_dev = np.std(prices_data['close'])
print("std dev: "+str(std_dev))
import scipy.stats as scipy
zscore_list = scipy.zscore(prices_data['close'])
print("symbol z-scores list: ")
print(zscore_list)
trends_zscore_list = scipy.zscore(trends_data['close'])
print("trends z-scores list: ")
print(trends_zscore_list)
import statistics
# Calculate the variance from a sample of data
data_variance = statistics.variance(prices_data['close'])
print("variance result: "+str(data_variance))
print()
print()
print("Market Beat Metrics: ")
#prices = prices_data.iloc[:lookback_period]
vars, vibe_check = mb.compare_rates(trends_data, prices_data)
vars15, vibe_check15 = mb.compare_rates(trends_data[-15:], prices_data[-15:])
vars30, vibe_check30 = mb.compare_rates(trends_data[-30:], prices_data[-30:])
vars45, vibe_check45 = mb.compare_rates(trends_data[-45:], prices_data[-45:])
vars60, vibe_check60 = mb.compare_rates(trends_data[-60:], prices_data[-60:])
print(vars)
the zscores are put into silos based on standard deviation in a Market Beat Class function, a snippet from that is following, which appends the zscore value to a list based on each silo or bin, I included the logic here because it can be beneficial to be able to sort these ito bins:
if current_idx_z >= 0:
trends_positive.append(current_idx_z)
else:
trends_negative.append(current_idx_z)
if current_price_z >= 0:
prices_positive.append(current_price_z)
else:
prices_negative.append(current_price_z)
if current_price_z > 0 and current_price_z < 1:
prices_0to1.append(current_price_z)
elif current_price_z > 1 and current_price_z < 2 :
prices_1to2.append(current_price_z)
elif current_price_z > 2 and current_price_z < 3:
prices_2to3.append(current_price_z)
elif current_price_z > 3 and current_price_z < 8:
prices_3up.append(current_price_z)
elif current_price_z <= 0 and current_price_z > -1:
prices_0toneg1.append(current_price_z)
elif current_price_z <= 1 and current_price_z > -2:
prices_neg1toneg2.append(current_price_z)
elif current_price_z <= 2 and current_price_z > -3:
prices_neg2toneg3.append(current_price_z)
elif current_price_z <= 3:
prices_neg3.append(current_price_z)
if current_idx_z >= 0 and current_idx_z < 1:
trends_0to1.append(current_price_z)
elif current_idx_z >= 1 and current_idx_z < 2:
trends_1to2.append(current_idx_z)
elif current_idx_z >= 2 and current_idx_z < 3:
trends_2to3.append(current_idx_z)
elif current_idx_z >= 3:
trends_3up.append(current_idx_z)
elif current_idx_z < 0 and current_idx_z > -1:
trends_0toneg1.append(current_idx_z)
elif current_idx_z < 1 and current_idx_z > -2:
trends_neg1toneg2.append(current_idx_z)
elif current_idx_z < 2 and current_idx_z > -3:
trends_neg2toneg3.append(current_idx_z)
elif current_idx_z < -3 and current_idx_z > -8:
trends_neg3.append(current_idx_z)
The graphing part is handled in a Plots Class that is called by this code:
path_15_index = 'plots/stats/zscores/scatter/'+str(today)+'_'+index_symbol+'_15.png'
print(path_15_index)
isFile = os.path.isfile(path_15_index)
if isFile == False:
symbol_zscores_plot = plots.zscores_scatter_by_day(today, index_symbol, trends_data['zscores'][-15:], '15')
else:
print(index_symbol + ' zscores scatter plot file exists for this date')
Then in the plots class I generate the plots:
in the Plots Class:
def zscores_scatter_by_day(self, plot_date, symbol, data, periodicity='all'):
plot_date = str(plot_date)
zscores = data.reset_index(drop = True)
#zscores = zscores.tolist()
print(zscores)
# PLOTTING
import matplotlib.pyplot as plt
zscores_set = set(zscores) #remove duplicates from list
zscores_list = list(zscores_set)
zscores_list = sorted(zscores_list)
print("zscores sorted and unique: ", zscores_list)
import seaborn as sns
sns.displot(zscores_list, color="maroon")
plt.xlabel("zscore", labelpad=14)
plt.ylabel("probability of occurence", labelpad=14)
plt.title("Percent Ratio Z-scores distribution" + plot_date, y=1.015, fontsize=10);
#plt.show()
plt.savefig('plots/stats/zscores/'+symbol+'_'+str(plot_date)+'_'+periodicity+'.png',bbox_inches='tight')
plt.clf()
import matplotlib.pyplot as plt2
x_cnt = 0
color = 'grey'
# https://matplotlib.org/stable/gallery/color/named_colors.html
for i in zscores:
if i < 0 and i > -1:
color = 'orange'
elif i < -1 and i > -2:
color = 'indianred'
elif i < -2 and i > -3:
color = 'firebrick'
elif i < -3 and i > -4:
color = 'maroon'
elif i > 0 and i < 1:
color = 'yellow'
elif i > 1 and i < 2:
color = 'green'
elif i > 2 and i < 3:
color = 'forestgreen'
elif i > 3 and i < 4:
color = 'darkgreen'
elif i > 4 and i < 5:
color = 'darkolivegreen'
elif i > 5:
color = 'black'
print(zscores)
plt2.scatter(i, zscores[x_cnt], c=color)
x_cnt += 1
# depict first scatted plot
#plt.scatter(x, y, c='blue')
print('plots/stats/zscores/scatter/'+str(plot_date)+'_'+symbol+'_'+periodicity+'.png')
plt2.savefig('plots/stats/zscores/scatter/'+str(plot_date)+'_'+symbol+'_'+periodicity+'.png',bbox_inches='tight')
plt2.clf()
# depict illustration
#plt.show()
this function outputs the plots into a directory for safe keeping and reference as needed. The first part of the function generates the zscore bar charts and the second part of the function generates the rainbow spectrum charts of zscores. You have to pass in a dataframe of zscores to be plotted, plus the other apparent variables that are easy to figure out for oneself. You’ll need to include these libraries in your own code for this function to work.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
from plotly.offline import iplot, init_notebook_mode
import seaborn as sns
def zscores_scatter_by_day(self, plot_date, symbol, data, periodicity='all'):
...
...
...
I hope this can provide some insights to others on how to plot z-scores and work with pandas.
As an older developer that is used to the LAMP stack getting used to the new Javascript Frameworks was a bit challenging. One thing that I had found difficult was finding documentation on building out Auth and Crud functionality for the Sveltekit Framework, which is the latest fast kid on the block that is smashing it as far as page load goes compared to more established JS Frameworks.
Do to sveltekit being a new kid on the block there aren’t many established code examples or tutorials on how to do such simple things as CRUD and Auth. I found a nice Auth code base on github.com but it didn’t have a walk through of the code base or any CRUD, so the following article is on how to add CRUD to this very nice Auth open source codebase using sveltekit.
To begin you will need to have some things set up on your development computer, such as Postgresql for the database. Here are the following steps to get things set up.
CREATE USER your_new_username WITH PASSWORD 'your_password';
4. CREATE DATABASE auth_demo;
5. add permissions to the database for your postgres username:
GRANT ALL PRIVILEGES ON DATABASE auth_demo TO auth_user;
6. open a new cl window or terminal and go to the directory you downloaded sveltekit-auth
>cd to svelte-kit-master dir
7. get the db ready, run the migration
> npx drizzle-kit generate:pg
8. push to the db
> npx drizzle-kit push:pg
If you need to make changes to the schema, you can do so and then run the generate and push commands again to update the database.
9. in the root of the codebase directory copy sample.env to .env file
10. update settings in .env file
# rename to .env and put your values in
# General
# I used postgresql for the PRISMA_URL for this project
# but you should be able to use any DB prisma supports.
# https://www.prisma.io/docs/reference/database-reference/supported-databases
DATABASE_URL="postgresql://postgres_user:user_password@localhost:5432/db_name"
# Good video on how to set up OAuth with Google https://www.youtube.com/watch?v=KfezTtt2GsA
GOOGLE_CLIENT_ID=
GOOGLE_CLIENT_SECRET=
# Email
FROM_EMAIL = 'first last <user@domain.com>'
# use blank values in AWS variables if you want to use SMTP
#AWS SES KEYS
AWS_ACCESS_KEY_ID= ''
AWS_SECRET_ACCESS_KEY= ''
AWS_REGION= '' # us-east-1
AWS_API_VERSION= '' # 2010-12-01
# if AWS SES not set the SMTP will be a fallback
SMTP_HOST=localhost
SMTP_PORT=1025
SMTP_SECURE=0 # use 1 for secure
SMTP_USER=somethinghere
SMTP_PASS=somepassword
# Logging
# Clear these to fallback to console.log
AXIOM_TOKEN=your-axiom-token
AXIOM_ORG_ID=your-axiom-org-id
AXIOM_DATASET = your-axiom-dataset
you should now see a user interface when you open http://localhost:5173 in my case with branding added for my project it looks like this:
Extending the Codebase and adding CRUD Functionality
now that the app is up and running we can start to add functionality that allows one to add db tables and add, edit and delete records to these new tables, essential to any application.
As an example I am going to use creating a settings module for my project. One may want to review the ORM used for db related management, in this case we are using drizzle, https://orm.drizzle.team/docs/get-started-postgresql
there is a drizzle.config.ts file that defines where drizzle schemas are kept. The schema defines the table for the ORM.
see /lib/server/database/drizzle.schemas.ts
add a settings schema to the existing schemas already in that file for user and sessions, in this case one for settings:
we are going to be working primarily with the key/secret pair which is used for any standard api calls, in this case an api for getting stock data.
then add a type for settings in the same file and an update type:
export type Settings = typeof settingsTable.$inferInsert;
export type UpdateSettings = Partial<typeof settingsTable.$inferInsert>;
then you need to generate and migrate the changes:
> npx drizzle-kit generate:pg
after you generate there will be a new migration file that is numbered sequentially in ascending order in /lib/server/database/0000_snapshot.json, 0001_snapshot.json
then push changes using:
> npx drizzle-kit push:pg
in the cl or terminal window you should see this:
> auth % npx drizzle-kit generate:pg
drizzle-kit: v0.20.14
drizzle-orm: v0.29.3
No config path provided, using default ‘drizzle.config.ts’
before you run this if you went to the postgres cl and did a list tables you see this,
auth_demo=# \dt
public | sessions | table | auth_user
public | users | table | auth_user
then after you do generate and push you see
auth_demo=# \dt
public | sessions | table | auth_user
public | settings | table | auth_user
public | users | table | auth_user
also note that you have a new sql file generated with the additional table information in /src/lib/server/database/migrations/0001_gigantic_quasimodo.sql
CRUD functions are handled in src/lib/server/database/settings-model.ts, this is where you add the functionality for the database model.
this should be copied into the /src/routes/(protected)/profile directory so we get /src/routes/(protected)/profile/settings/…
in this example we are going to add a typical key and api string for any api type saas setting.
the eventual directory structure will be like this:
src
— routes
— (protected)
— profile
+page.svelte
+page.server.ts
— settings
+page.svelte
+page.server.ts
— editor
+page.svelte
+page.server.ts
— lister
+page.svele
+page.server.ts
you place files you do not want to be exposed to the user without a valid session id and their session id matching their user.session information, i.e. they own the content in the interface. these parts of your UI are placed in the (protected) directory.
step one is to create a profile directory under (protected)
past settings dir into (protected)/profile/
create file /lib/config/zod-schemas-settings.js
paste into file:
mport { z } from 'zod';
export const settingsSchema = z.object({
id: z.string().optional(),
user_id: z.string().optional(),
key: z
.string({ required_error: 'key is required' })
.min(1, { message: 'key is required' })
.trim(),
secret: z
.string({ required_error: 'secret is required' })
.min(1, { message: 'secret is required' })
.trim(),
createdAt: z.date().optional(),
updatedAt: z.date().optional()
});
export type SettingsSchema = typeof settingsSchema;
after registering as a user update db since not setting up email protocol yet:
UPDATE users SET verified = ‘t’ where email = ‘reg_user@domain.com’;
restart server after updating
npm run dev
When we go to the dashboard section of the site:
work with two files dashboard/+page.server.ts which initializes things for the view.
important files to require are:
import { db } from '$lib/server/db';
import { settingsTable } from '$lib/server/database/drizzle-schemas';
these files are used for accessing the database. because in this view we are performing a data check to see if there is a record for the user we need the drizzle-schemas. db is the equivalent of the DB object in older systems based on PHP, or other backend code bases.
+page.svelte is the User interface file. One important require file is that of a zod schema, which is used for data validattion in the form that is included in the file.
section: working with editing data and updating the db
if you don’t have any keys then the dashboard provides a link to add a key, this brings up profile/settings/
in +page.server.ts make sure you have any db and db.schema requirements met. in this case since not reading any data or accessing the db, they are not.
in +page.svelte we then include the db schema for writing and the validation zod schemas:
import * as Form from '$lib/components/ui/form';
import { settingsSchema } from '$lib/config/zod-schemas-settings';
import type { SuperValidated } from 'sveltekit-superforms';
once the reqs are loaded then in the ui we see a form with this code:
once the submit button is pushed after the data is validated using zod schema then the data is inserted into the database as long as the settings schema is defined correctly.
Once we have a setting submitted we can then list the settings.
Listing the Settings and Editing
in dir /profile/settings/lister
we do the usual requirements with the addition of one important new piece of code that handles form actions, such as update, or delete. Here we are only concerned with deleting so we add this to the file +server.page.ts:
const deleteSettingsSchema = settingsSchema.pick({
id: true,
key: true,
secret: true
});
type DeleteSettingsSchema = typeof deleteSettingsSchema;
export let form: SuperValidated<DeleteSettingsSchema>;
then in the user interface we can iterate through the settings record rows, although here we are limited to only one:
<ul>
{#each data.json_results as alpaca}
<li>{alpaca.key} - {alpaca.secret} <a href="/profile/settings/editor/{alpaca.id}">[edit]</a>
<form method="POST" onsubmit="return confirm('sure you want to delete this key?');" action="?/delete">
<input type="hidden" id="id" name="id" value="{alpaca.id}" />
<button>Delete</button>
</form>
</li>
--------- <br />
{/each}
</ul>
Editing the Settings
Next we move onto editing. If one were to see in the previous code that there is a link to editing. This is where we use a new directory [id] which is at /profile/settings/editor/[id]
see https://svelte.dev/docs/kit/advanced-routing for more information on how this works but basically we are passing in a dynamic ‘id’ variable to display a user interface based on the dynamic argument.
for editing we are working with the files in the /profile/settings/editor/[id]/
in page.server.ts we need to make sure we require the schemas for settings update and other functionality:
import { editSetting } from '$lib/server/database/settings-model';
import { updateSetting } from '$lib/server/database/settings-model';
import { settingsTable } from '$lib/server/database/drizzle-schemas';
and again we need to add the actions for editing:
export const actions = {
default: async (event) => {
const form = await superValidate(event, keySchema);
if (!form.valid) {
return fail(400, {
form
});
}
//add user to db
try {
console.log('updating profile');
const user = event.locals.user;
if (user) {
await updateSetting(event.params.id, {
key: form.data.key,
secret: form.data.secret
});
setFlash({ type: 'success', message: 'Keys update successful.' }, event);
}
} catch (e) {
console.error(e);
return setError(form, 'There was a problem updating your trading keys.');
}
console.log('keys updated successfully');
return message(form, 'keys updated successfully.');
}
};
in +page.svelte we make sure we have schema for validation:
import { settingsSchema } from '$lib/config/zod-schemas-settings';
import type { SuperValidated } from 'sveltekit-superforms';
and we need:
const keySchema = settingsSchema.pick({
key: true,
secret: true,
});
type keySchema = typeof keySchema;
export let form: SuperValidated<keySchema>;
form = data.form;