https://github.com/adriancorrendo/modified-alcc
Code to estimate soil test critical values following Correndo, A.A., Salvagiotti, F., García, F.O. and Gutiérrez-Boem, F.H., 2017. A modification of the arcsine–log calibration curve for analysing soil test value–relative yield relationships. Crop and Pasture Science, 68(3), pp.297-304. doi: 10.1071/CP16444
Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary
Repository
Code to estimate soil test critical values following Correndo, A.A., Salvagiotti, F., García, F.O. and Gutiérrez-Boem, F.H., 2017. A modification of the arcsine–log calibration curve for analysing soil test value–relative yield relationships. Crop and Pasture Science, 68(3), pp.297-304. doi: 10.1071/CP16444
Basic Info
- Host: GitHub
- Owner: adriancorrendo
- License: mit
- Default Branch: main
- Size: 1.94 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.Rmd
--- title: "Modified ALCC" author: "Adrian A. Correndo" date: "03/24/2022" output: github_document contact: correndo@agro.uba.ar - correndo@ksu.edu --- This code was prepared as a tutorial for potential users of the Modified Arcsine-log Calibration Curve (modALCC) detailed in Correndo et al. (2017).
*Correndo, A.A., Salvagiotti, F., García, F.O. and Gutiérrez-Boem, F.H., 2017. A modification of the arcsine–log calibration curve for analysing soil test value–relative yield relationships. Crop and Pasture Science, 68(3), pp.297-304. https://doi.org/10.1071/CP16444 *
Instructions for users
1. Load your dataframe with soil test value (STV) and relative yield (RY) data.
2. Specify the following arguments into the function -modALCC()-:
(i) 'data' (optional),
(ii) soil test value 'STV' and relative yield 'RY',
(iii) 'target' of relative yield (e.g. 90%),
(iv) desired confidence level (e.g. 0.95 for 1 - alpha(0.05)). Used for the estimation of critical soil test value (CSTV) lower and upper limits.
3. Run and check results in a data.frame.
4. Check residuals plot, and warnings related to potential leverage points.
5. Adjust curve plots as desired.
Please, refer any question to Adrian Correndo, correndo@agro.uba.ar - correndo@ksu.edu.
Note: RY should be expressed relative to a maximum in order to obtain values bounded at %100. Otherwise, arcsine transformation doesn't work. If RY values > 100% are found, the function will cap them up to 100% and will display a warning about this.
References
Correndo, A.A., F. Salvagiotti, F.O. García, y F.H. Gutiérrez-Boem. 2017. A modification of the arcsine-log calibration curve for analysing soil test value - relative yield relationships. Crop & Pasture Science 68 (3): 297-304, doi: 10.10171/CP16444
```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ### Last update: 03-24-2022 \newpage # 1. Libraries ```{r warning=F, message=F} # Install if needed # install.packages("easypackages") # install.packages("devtools") library(easypackages) # Helps to load packages and install & load them if they are not installed yet. library(devtools) packages("readxl") # Open xlsx files packages("tidyverse", "ggpmisc") # Data wrangling and plots packages("smatr") # SMA regression analysis for reference packages("agridat") # For cotton example dataset ``` # 2. Datasets ```{r} # Example 1 dataset data_1 = data.frame("RY" = c(65,80,85,88,90,94,93,96,97,95,98,100,99,99,100), "STV" = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)) # Example 2 dataframe. Imported from csv file # It can be easily replaced with your own csv file data_2 = read.csv(file = "data_test.csv") # Example 3 dataframe. Imported from xlsx file data_3 = readxl::read_xlsx(path ="data_3.xlsx", sheet = 1) # Create nested structure as example of multiple datasets data.all = bind_rows(data_1, data_2, data_3, .id = "id") %>% tidyr::nest(data = c("STV", "RY")) ``` \newpage # 3. modALCC() function ```{r warning=FALSE, message=F} modALCC <- function(data=NULL, RY, STV, target, confidence){ # Add a function to cap if there are RY values > 100 ry <- rlang::eval_tidy(data = data, rlang::quo(ifelse({{RY}} > 100, 100, as.double({{RY}})) )) n <- length(ry) # Sample size df <- n - 2 # Degrees of freedom prob <- 1-((1-confidence)/2) # Probability for t-dist tvalue <- qt(p=prob, df = df) # Student-t value arc_RY <- asin(sqrt(ry/100)) - asin(sqrt(target/100)) # RY transformation (centered to target) ln_STV <- rlang::eval_tidy(data = data, rlang::quo(log({{STV}}) ) ) # STV natural log transformation r <- cor(ln_STV, arc_RY, method = "pearson") # Pearson correlation (r) p_value <- cor.test(ln_STV,arc_RY, method = "pearson")$p.value # p-value of r slope <- sd(ln_STV)/sd(arc_RY) # SMA slope for ln_STV ~ arc_RY intercept <- mean(ln_STV) - (mean(arc_RY)*slope) # Intercept SMA_line <- intercept + slope * arc_RY # Fitted ln_STV for observed RY CSTV <- exp(intercept) # Critical STV for specified RY-target and confidence (1-alpha) MSE <- sum((SMA_line-ln_STV)^2)/df # Mean Square Error of ln_STV SSx <- sum((mean(arc_RY)-arc_RY)^2) # Sum of Squares of arc_RY SE_int <- sqrt(MSE*((1/n)+ ((mean(arc_RY)^2)/SSx))) # Standard Error intercept CSTV_lower <- exp(intercept - (tvalue * SE_int)) # Lower limit of CSTV CSTV_upper <- exp(intercept + (tvalue * SE_int)) # Upper limit of CSTV new_RY <- seq(min(ry),100, by=0.2) # New RY vector up to %100 to fit curve new_arc_RY <- asin(sqrt(new_RY/100)) - asin(sqrt(target/100)) # Transforming new_RY vector fitted_Line <- intercept + slope * new_arc_RY # Fitted ln_STV for curve plot fitted_STV <- exp(fitted_Line) # Fitted ln_STV for new_RY residuals <- ln_STV - SMA_line # Residuals of SMA Regression fitted_axis <- ln_STV + slope * arc_RY # Fitted axis to check SMA residuals target <- target # Target RY to show on summary confidence <- confidence # Confidence level to show on summary # Critical STV for RY = 90 & 100 arc_ry_100 <- asin(sqrt(ry/100)) - asin(sqrt(1)) cstv.100 <- exp(mean(ln_STV) - (mean(arc_ry_100)*(sd(ln_STV)/sd(arc_ry_100)))) arc_ry_90 <- asin(sqrt(ry/100)) - asin(sqrt(90/100)) cstv.90 <- exp(mean(ln_STV) - (mean(arc_ry_90)*(sd(ln_STV)/sd(arc_ry_90)))) # Count cases with STV > x2 cstv90 and STV > cstv100 n.90x2 <- rlang::eval_tidy(data=data, rlang::quo(length(which({{STV}} > (2*cstv.90))) ) ) n.100 <- rlang::eval_tidy(data=data, rlang::quo(length(which({{STV}} > cstv.100)) ) ) # Outcome results <- as.data.frame(list("n" = n, "r" = r, "target" = target,"CSTV" = CSTV, "LL" = CSTV_lower,"UL" = CSTV_upper,"confidence" = confidence,"p_value" = p_value, "CSTV90" = cstv.90, "n.90x2" = n.90x2,"CSTV100" = cstv.100,"n.100" = n.100)) %>% bind_cols(., as.data.frame(list("RY.fitted" = new_RY, "STV.fitted" = fitted_STV))%>% tidyr::nest(Curve = c("RY.fitted", "STV.fitted"))) %>% bind_cols(., as.data.frame(list("ln_STV" = ln_STV, "arc_RY" = arc_RY, "SMA_line" = SMA_line,"residuals" = residuals,"fitted_axis" = fitted_axis)) %>% tidyr::nest(SMA = c("ln_STV", "arc_RY", "SMA_line","residuals", "fitted_axis") ) ) # WARNINGS rlang::eval_tidy(data = data, rlang::quo(if (max({{RY}}) > 100) { warning("One or more original RY values exceeded 100%. All RY values greater than 100% have been capped to 100%.", call. = FALSE) } ) ) if (results$n.100 > 0) {warning(paste0(n.100," STV points exceeded the CSTV for 100%. Risk of leverage. You may consider a sensitivity analysis by removing extreme points, re-run the modALCC(), and check results."), call. = FALSE) } if (results$n.90x2 > 0) {warning(paste0(n.90x2," STV points exceeded two-times (2x) the CSTV for 90%. Risk of leverage. You may consider a sensitivity analysis by removing extreme points, re-run the modALCC(), and check results."), call. = FALSE) } return(results) } ``` \newpage # 4. Fit examples ## 4.1. Fit ALCC models individually ```{r warning=TRUE, message=TRUE} # RY target = 90%, confidence level = 0.95, replace with your desired values # Data 1 # Using dataframe fit_example_1 = modALCC(data = data_1, RY = RY, STV = STV, target=90, confidence = 0.95) # Alternative using the vectors #fit_example_1 = ALCC(RY = data_1$RY,STV = data_1$STV, target=90,confidence = 0.95) fit_example_1 # Data 2 fit_example_2 = modALCC(data = data_2, RY = RY, STV = STV, target=90, confidence = 0.95) fit_example_2 # Data 3 fit_example_3 = modALCC(data = data_3, RY = RY, STV = STV, target=90, confidence = 0.95) fit_example_3 ``` ## 4.2. Fit multiple ALCC models with mapping ```{r warning=T, message=F} # Run multiple examples at once with map() fit_examples = data.all %>% mutate(modALCC = map(data, ~ modALCC(RY = .$RY, STV = .$STV, target=90, confidence = 0.95))) %>% unnest(., cols = c("modALCC")) head(fit_examples) # Alternative with group_map, this does not required nested data. fit_all = bind_rows(data_1, data_2, data_3, .id = "id") %>% group_by(id) %>% group_map(~ modALCC(data = ., RY = RY, STV = STV, target = 90, confidence = 0.95)) head(fit_all) ``` \newpage # 5. Plots Examples using ggplot
## 5.1. Example 1 ```{r warning=F, message=F} # Extracting curve data as a data.frame to plot curve_example1 = fit_example_1 %>% unnest(., cols = Curve) # Plot data_1 %>% # Want to remove leverage points? #dplyr::filter(STV < fit_example_1$CSTV100) %>% #dplyr::filter(STV < 2*fit_example_1$CSTV90) %>% ggplot()+ # Points geom_point(aes(x = STV, y = RY), fill = "orange", shape = 21, size = 4, alpha = 0.75)+ # Highlight potential leverage points >2xCSTV90 geom_point(data = data_1 %>% dplyr::filter(STV > 2*fit_example_1$CSTV90), aes(x = STV, y = RY, shape = ">2xCSTV90"), col = "dark red", size = 4, alpha = 1)+ # Highlight potential leverage points >2xCSTV90 #geom_point(data = data_1 %>% dplyr::filter(STV > fit_example_1$CSTV100), # aes(x = STV, y = RY, shape = ">CSTV100"), col = "dark red", size = 4, alpha = 1)+ scale_shape_manual(name = "", values = c(7,10))+ # Fitted ALCC geom_line(data = curve_example1, aes(x= STV.fitted, y = RY.fitted), size = 2)+ # Critical value geom_vline(xintercept = fit_example_1$CSTV, col = "red", size = 1.25, linetype = "dashed")+ # Confidence limits # Lines geom_vline(xintercept = fit_example_1$LL, col = "red", size = 0.75, linetype = "dotted")+ geom_vline(xintercept = fit_example_1$UL, col = "red", size = 0.75, linetype = "dotted")+ # Shade ggpp::annotate(geom = "rect", xmin = fit_example_1$LL, xmax = fit_example_1$UL, ymin = min(data_1$RY), ymax = 100, alpha = .3, fill = "red")+ # Axis titles labs(x = "Soil Test Value (units)", y = "Relative Yield (%)")+ theme_bw()+ theme(legend.position = "top")+ # Annotate critical values data ggpp::annotate(geom = "table", y = min(data_1$RY), x = fit_example_1$UL + 0.5, hjust= 0, vjust = 0, label = fit_example_1 %>% dplyr::select(CSTV, LL, UL, r) %>% mutate_at(.vars = c("r"), ~round(.,2)) %>% mutate_at(.vars = c("CSTV","LL","UL"), ~round(.,1)) ) # SMA regression SMA_example1 = fit_example_1 %>% unnest(., cols = SMA) SMA_example1 %>% ggplot(aes(x = arc_RY, y = ln_STV))+ ggtitle("SMA Regression. Dataset 1")+ geom_point(shape=21, fill = "orange", size = 4, alpha = 0.75)+ #SMA Line geom_path(aes(x=arc_RY, y = SMA_line, linetype = "SMA_fit"), size = 2, col = "grey25")+ scale_linetype_manual(name="", values = c("solid"))+ #Critical value geom_vline(xintercept = 0, col = "grey10", size = 1.25, linetype = "dashed")+ theme_bw()+ # Axis titles labs(y = "ln_STV", y = "asin(sqrt(RY))-centered") # Residuals plot SMA_example1 %>% ggplot(aes(x = fitted_axis, y = residuals))+ ggtitle("Residuals SMA. Dataset 2")+ geom_point(shape=21, fill = "orange", size = 4, alpha = 0.75)+ geom_hline(yintercept = 0, col = "grey10", size = 1.25, linetype = "dashed")+ theme_bw()+ # Axis titles labs(x = "Fitted Axis -SMA- (see Warton et al. 2006)", y = "Residuals (STV units)") ``` \newpage ## 5.2. Example 2 ```{r warning=F, message=F} # Extracting curve data as a data.frame to plot curve_example2 = fit_example_2 %>% unnest(., cols = Curve) # Plot data_2 %>% ggplot()+ # Want to remove leverage points? #dplyr::filter(STV < fit_example_2$CSTV100) %>% #dplyr::filter(STV < 2*fit_example_2$CSTV90) %>% # Points geom_point(aes(x = STV, y = RY), fill = "#88dbc8", shape = 21, size = 4, alpha = 0.75)+ # Highlight potential leverage points >2xCSTV90 geom_point(data = data_2 %>% dplyr::filter(STV > 2*fit_example_2$CSTV90), aes(x = STV, y = RY, shape = ">2xCSTV90"), col = "dark red", size = 4, alpha = 1)+ # Highlight potential leverage points >CSTV100 geom_point(data = data_2 %>% dplyr::filter(STV > fit_example_2$CSTV100), aes(x = STV, y = RY, shape = ">CSTV100"), col = "dark red", size = 4, alpha = 1)+ scale_shape_manual(name = "", values = c(7,10))+ # Fitted ALCC geom_line(data = curve_example2, aes(x= STV.fitted, y = RY.fitted), size = 2)+ # Critical value geom_vline(xintercept = fit_example_2$CSTV, col = "red", size = 1.25, linetype = "dashed")+ # Confidence limits geom_vline(xintercept = fit_example_2$LL, col = "red", size = 0.75, linetype = "dotted")+ geom_vline(xintercept = fit_example_2$UL, col = "red", size = 0.75, linetype = "dotted")+ ggpp::annotate(geom = "rect", xmin = fit_example_2$LL, xmax = fit_example_2$UL, ymin = min(data_2$RY), ymax = 100, alpha = .3, fill = "red")+ # Axis titles labs(x = "Soil Test Value (units)", y = "Relative Yield (%)")+ theme_bw()+ theme(legend.position = "top")+ # Annotate critical values data ggpp::annotate(geom = "table", y = min(data_2$RY), x = fit_example_2$UL + 0.5, hjust= 0, vjust = 0, label = fit_example_2 %>% dplyr::select(CSTV, LL, UL, r) %>% mutate_at(.vars = c("r"), ~round(.,2)) %>% mutate_at(.vars = c("CSTV","LL","UL"), ~round(.,1)) ) # SMA regression SMA_example2 = fit_example_2 %>% unnest(., cols = SMA) SMA_example2 %>% ggplot(aes(x = arc_RY, y = ln_STV))+ ggtitle("SMA Regression. Dataset 2")+ geom_point(shape=21, fill = "#88dbc8", size = 4, alpha = 0.5)+ #SMA Line geom_path(aes(x=arc_RY, y = SMA_line, linetype = "SMA_fit"), size = 2, col = "grey25")+ scale_linetype_manual(name="", values = c("solid"))+ #Critical value geom_vline(xintercept = 0, col = "grey10", size = 1.25, linetype = "dashed")+ theme_bw()+ # Axis titles labs(y = "ln_STV", y = "asin(sqrt(RY))-centered") # Residuals plot resid_example2 = fit_example_2 %>% unnest(., cols = SMA) resid_example2 %>% ggplot(aes(x = fitted_axis, y = residuals))+ ggtitle("Residuals SMA. Dataset 2")+ geom_point(shape = 21, fill = "#88dbc8", size = 4, alpha = 0.5)+ geom_hline(yintercept = 0, col = "grey10", size = 1.25, linetype = "dashed")+ theme_bw()+ # Axis titles labs(x = "Fitted Axis -SMA- (see Warton et al. 2006)", y = "Residuals (STV units)") ``` \newpage # 6. Revised SoilTestCocaCola Code Great initiative from Dr. Austin Pearce working for automatize critical nutrient values estimations for the FRST (Fertilizer Recommendation System Tool) Project.
Original source: https://github.com/austinwpearce/SoilTestCocaCola.git
Copyright (c) 2022 Austin Pearce
*Adapted by Adrian Correndo on 03/22/2022* **List of modifications**:
1. Modified variables (e.g. x, xt_centered, y, yt) and arguments names (inside functions; e.g. sufficiency for target) for more familiar terminology I personally use.
(i). For example, replaced "lower_cl" with "lower_cstv" to maintain consistency.
2. Removed sma argument from formulae. Not needed as the modified ALCC is entirely based on SMA. OLS steps described by Correndo et al. (2017) are just some tricks to obtain parameters using a spreadsheet where we cannot run a sma() model (Warton et al., 2006).
3. Modified slope formula for more explicit one that removes the need of estimating an OLS slope and then rotate. New formula is simply the ratio between standard deviations of variables, as follows:
(i). slope = sd(ln_STV) / sd(arc_ry) 4. Shortened some steps regarding centering variables.
5. Fixed model pvalue (replaced student-t pvalue for Pearson-r pvalue)
6. Modified df as n-2 for the t-test. (see Table 4, Warton et al., 2006)
7. Modified test_stat formula, obtained directly from qt() instead of pt()
8. Added probability formula (prob), which includes confidence to later estimate the t_stat.
9. Shortened se formula, obtained directly from qt() instead of pt()
10. Modified ssx formula for more explicit one
11. Fixed model pvalue (replaced student-t pvalue for Pearson-r pvalue)
12. Fixed issue with dplyr::if_else() when capping RY to 100%
13. Modified data points removal criteria (remove2x replaced by remove.leverage, which gives users more autonomy)
(i). Users can choose now between options remove.leverage = c("none", "cstv90_2x", "cstv100")
(ii). Added sample size (n) used for CSTV calculations as outcome to show when removing points.
(ii). Modified printed text in the alcc_plot() in accordance with the new remove.leverage criteria.
14. Added some extra warning regarding the specification of relative yield "target" and "confidence". Although defaults are provided, I consider important to create awareness on users about this, which are key features the ALCC model brings.
15. By the end of stage_3 of alcc(), I replaced "mutate(remove2x = "TRUE")" with "mutate(remove.leverage = remove.leverage)", otherwise the summary table always printed "TRUE" even though STV > cstv90_2x were NOT removed (remove2x = FALSE). It currently prints the user specification ("none", "cstv90_2x", "cstv100").
\newpage ## 6.1. Revised alcc() functions ```{r warning=F, message=F} # ALCC # ============================================================================= alcc_core.rev <- function(data, # # AC: I would rather use the acronym of soil test value (stv), # as the same is done for "ry". #stv, stv, ry, # AC: WARNING: There is no really an option for "SMA" or not, it is ALWAYS SMA. # sma, # AC: I think the word sufficiency is not appropriate for the case... # most likely this is just a "target" RY value # sufficiency sounds like 100%, a target would be just an "acceptable" RY target, # AC: I added confidence as a needed arg. to define confidence) { if (nrow(data) < 8) { stop("Too few data points. Try at least 8.") } if (missing(stv)) { stop("Specify the column containing soil test values (e.g., stv = STK)") } if (missing(ry)) { stop("Specify the column containing relative yield (%) values (e.g., ry = RY)") } # AC: # Add statement for missing target, optional since a DEFAULT is provided (90) if (missing(target)) { stop("Specify the target value of relative yield (%) (e.g., target = 90)") } # AC: # Add statement for missing confidence, optional since a DEFAULT is provided (95) if (missing(confidence)) { stop("Specify the desired confidence level (%) for cstv estimation (e.g., confidence = 95)") } # enquo let's the user enter their own column names for x and y variables # AC: I think this is particularly confusing because in the SMA regression # Soil test is not in the x-axis but on the y-axis, # I would rather use "STV" instead of "x", and then you flip them for transformed (xt, yt) #x <- enquo(soil_test) STV <- enquo(stv) # AC: for SMA regression RY is not in the y-axis but on the x-axis, # I would rather use "STV" instead of "x" RY <- enquo(ry) #y <- enquo(ry) steps_1_4 <- data %>% mutate( stv = !!STV, # RY values greater than 100 are capped at 100 ry_cap = if_else(!!RY > 100, 100, as.double(!!RY)), # Tranform back to numeric, just in case ry_cap = as.numeric(ry_cap), # AC: Model is not necessary, it is always SMA #model = case_when(sma == TRUE ~ "ALCC-SMA", # sma == FALSE ~ "ALCC"), # get sample size n = nrow(data), # AC: added degrees of freedom of the model with two pars # (See Warton et al. 2006, Table 4) # n-1 was used df = n-2, # Step 1 Transform (t for "transformed" added to x and y) # AC: ### THIS IS NOT ADDED TO X AND Y, note you have flipped them... # Thus, I would avoid using x & y # for ALCC, soil test goes to Y-axis and RY goes to X-axis # the names of variables are difficult to follow, at least for me, # AC: I would suggest use ln_STV and arc_RY ln_stv = log(stv), # Step 2 Center # AC: I would center right here as well arc_ry = asin(sqrt(ry_cap / 100)) - asin(sqrt(target/100)), target = target, # I think its not necessary # AC: UNNECESSARY # adjust_by = asin(sqrt(sufficiency / 100)), ### NOT NEEDED, adjust_by = asin(sqrt(target / 100)), ### xt_centered = xt - adjust_by, # Step 3 Correlation (Pearson) r = cor(ln_stv, arc_ry, method = "pearson"), # AC: The student-t formula here DID NOT include the confidence level so it # was always estimated for 95% confidence level #t_stat = (r * sqrt(df)) / sqrt(1 - r ^ 2), # AC: Warning: the df here are n-2, also this is informed later as the p-value #of the model # but this is not the pvalue of the model, it's just the pvalue of the t-test # (if B0 =/ from zero) # the pvalue of the model is the pvalue of the Pearson correlation test #pvalue = pt(t_stat, df = n - 1, lower.tail = FALSE), # AC: I would replace with the following prob = 1-((1-confidence/100)/2), # Probability for t-dist # AC: The Student-t value from quantile fn (qt) saves a step for the LL #and UL estimations t_stat = qt(p=prob, df = df), # AC: This is the pvalue of the model # p-value of Pearson correlation (r) pvalue = stats::cor.test(ln_stv, arc_ry, method = "pearson")$p.value, # Step 4 Means # AC: We DON'T ACTUALLY NEED THE MEANS HERE, can be included into the intercept formula #mean_xt = mean(xt_centered), #mean_yt = mean(yt), ) # Step 5 Fit linear model to transformed, centered data # AC: # THIS STEP IS UNNECESSARY AS THE SMA-slope = sd(ln_stv)/ sd(arc_ry) # ols_center <- lm(yt ~ xt_centered, data = steps_1_4) steps_5_9 <- steps_1_4 %>% mutate( #intercept = coef(ols_center)[[1]], #slope = coef(ols_center)[[2]], # Step 6 Rotate the regression (SMA) # slope must come first # AC: I replaced the slope for the direct formula for SMA regression slope = sd(ln_stv) / sd (arc_ry), #slope = case_when(sma == TRUE ~ slope / pearson, # sma == FALSE ~ slope), # AC: I replaced the slope for the direct formula for SMA regression intercept = mean(ln_stv) - (mean(arc_ry)*slope), #intercept = case_when(sma == TRUE ~ mean_yt - slope * mean_xt, # sma == FALSE ~ intercept), # Step 7 Estimate Critical Soil Test Concentration cstv = exp(intercept), # Step 8 Estimate the confidence interval pred_ln_stv = intercept + slope * arc_ry, # Fitted ln_STV for observed RY mse = sum((pred_ln_stv - ln_stv) ^ 2) / df, # AC: I would use a more explicit formula here and not this indirect way #ssx = var(xt_centered) * (n - 1), ssx = sum((mean(arc_ry)-arc_ry)^2), confidence = confidence, se = sqrt(mse * ((1/n) + ((mean(arc_ry) ^ 2) / ssx))), #lower_cstv = exp(intercept - se * qt(1 - (1 - confidence / 100) / 2, # df = n - 2)), #upper_cstv = exp(intercept + se * qt(1 - (1 - confidence / 100) / 2, # df = n - 2)), # AC: I replaced boundaries formula for simpler ones lower_cstv = exp(intercept - (t_stat * se)), # Lower limit of CSTV upper_cstv = exp(intercept + (t_stat * se)), # Upper limit of CSTV # Step 9 Back-transform fitted_stv = exp(pred_ln_stv), fitted_ry = 100 * (sin( asin(sqrt(target/100)) + ((pred_ln_stv - intercept) / slope))) ^ 2 ) %>% # 'dataset' might be problematic, not defined in scope select(target, cstv, lower_cstv, upper_cstv, confidence, fitted_stv, fitted_ry, pvalue, r, everything()) } alcc.rev <- function(data, stv, ry, #sma = TRUE, target = 90, confidence = 95, remove.leverage = "none", #remove2x = FALSE, summary = FALSE){ if (missing(stv)) { stop("Specify the soil test variable (e.g., stv = STK)") } if (missing(ry)) { stop("Enter name of relative yield (%) variable (e.g., ry = RY)") } # AC: # Add statement for missing target if (missing(target)) { stop("Specify the target value of relative yield (%) (e.g., target = 90)") } # AC: # Add statement for missing confidence if (missing(confidence)) { stop("Specify the desired confidence level (%) for cstv estimation (e.g., confidence = 95)") } # enquo let's the user enter their own column names for x and y variables # AC: # I think this is particularly confusing because in the SMA regression # Soil test is not in the x-axis but on the y-axis, I will rather use "STV" # instead of "x" #x <- enquo(stv) STV <- enquo(stv) # AC: for SMA regression RY is not in the y-axis but on the x-axis, I will #rather use "STV" instead of "x" RY <- enquo(ry) #y <- enquo(ry) test <- data %>% select(!!STV, !!RY) if (max(test[,2]) > 100) { warning("One or more original RY values exceeded 100%.\nAll RY values greater than 100% have been capped to 100% for arcsine transformation.", call. = FALSE) } if (min(test[,1]) < 0) { stop("One or more soil test values are less than 0. Remove all data with STV values less than 0 prior to fitting.", call. = FALSE) } # stage 1, 2, and 3 described by Dyson and Conyers 2013 stage_1 <- data %>% alcc_core.rev( stv = !!STV, ry = !!RY, # AC: sma arg. not needed. #sma = sma, target = 100, confidence = confidence ) # AC: Added a small note # identify CSTV for 100% cstv_100 <<- unique(stage_1$cstv) # AC: Added a small note # identify CSTV for 90% stage_2 <- data %>% # AC: Warning! This filter line ALWAYS filter out stv > cstv100. # The user should be the one deciding. I don't agree with Dyson here, # as many others may share. The cstv90 estimate should be obtained # without removing anything. # All removes must be OPTIONAL and decision exclusive to researchers. # So the filter step should be at stage 3 only #filter(!!STV <= cstv_100) %>% alcc_core.rev( stv = !!STV, ry = !!RY, # AC: sma arg. not needed. #sma = sma, target = 90, confidence = confidence ) # Extract cstv_90 cstv90_2x <<- unique(stage_2$cstv) * 2 # AC: What if I don't want to remove soil test values > CSTV_100 ???? stage_3 <- data %>% # AC: I would suggest use 3 options as a string: #(i) "cstv90_2x", (ii) "cstv100", (iii) "none" #filter(case_when(remove2x == TRUE ~ !!STV <= cstv90_2x, # remove2x == FALSE ~ !!STV <= cstv_100)) %>% filter(case_when(remove.leverage == "none" ~ !!STV != 0, remove.leverage == "cstv90_2x" ~ !!STV <= cstv90_2x, remove.leverage == "cstv100" ~ !!STV <= cstv_100 ) ) %>% alcc_core.rev(stv = !!STV, ry = !!RY, #sma = sma, target = target, confidence = confidence) %>% # AC: With this line, in the summary tabla always printed remove2x = "TRUE", #It should print the chosen option instead mutate(remove.leverage = remove.leverage, # AC: I added a line to count the number of observation used for the CSTV estimations n = nrow(.)) #mutate(remove2x = "TRUE") if(summary == TRUE) { return( stage_3 %>% select( #model, n, # new outcome target, cstv, lower_cstv, upper_cstv, confidence, pvalue, r, # Replaced remove2x remove.leverage) %>% distinct(across(everything())) ) } else { return(stage_3) } } ``` \newpage ## 6.1. Plot function ```{r} # Function alcc_plot() creates a scatter plot of soil test correlation data # arguments are passed to alcc() alcc_plot.rev <- function(data, stv, ry, #sma = TRUE, target, #sufficiency = 90, confidence, remove.leverage = "none") { if (missing(stv)) { stop("Specify the soil test variable (e.g., stv = STK)") } if (missing(ry)) { stop("Enter name of relative yield (%) variable (e.g., ry = RY)") } # AC: # Add statement for missing target if (missing(target)) { stop("Specify the target value of relative yield (%) (e.g., target = 90)") } # AC: # Add statement for missing confidence if (missing(confidence)) { stop("Specify the desired confidence level (%) for cstv estimation (e.g., confidence = 95)") } # enquo let's the user enter their own column names for x and y variables # AC: Replaced names STV <- enquo(stv) RY <- enquo(ry) #x <- enquo(stv) #y <- enquo(ry) input <- data %>% transmute(stv = !!STV, # RY values greater than 100 are capped at 100 # AC: the false statement should always be of type double, just in case, # use as.double() # I have just tried with a dataset where I found this problem (data_2) ry_cap = if_else(!!RY > 100, 100, as.double(!!RY))) # pass to alcc function output <- alcc.rev(data, stv = !!STV, ry = !!RY, # AC: sma not needed # sma = sma, target = target, confidence = confidence, # AC: I changed remove2x for remove.leverage remove.leverage = remove.leverage) # for plot annotations maxx <- max(input$stv) cstv <- (unique(output$cstv)) cstv <- if_else(cstv < 10, round(cstv, 1), round(cstv, 0)) lower_cstv <- (unique(output$lower_cstv)) lower_cstv <- if_else(lower_cstv < 10, round(lower_cstv, 1), round(lower_cstv, 0)) upper_cstv <- (unique(output$upper_cstv)) upper_cstv <- if_else(upper_cstv < 10, round(upper_cstv, 1), round(upper_cstv, 0)) #sufficiency <- unique(output$sufficiency) r <- round(unique(output$r), 2) # AC: WARNING! WHAT IF I DON'T WANT TO REMOVE ANYTHING? Currently, the default # is to remove STV > CSTV100 # Default option should be using all available data, then the remove2x = TRUE/FALSE # plot includes all data, even if remove2x == TRUE, but curve will follow remove2x output %>% ggplot() + geom_point(data = input, aes(stv, ry_cap), size = 2, alpha = 0.7, color = case_when( # AC: I changed the remove2x arg. for remove.leverage # None remove.leverage == "none" & input$stv <= cstv_100 ~ "black", remove.leverage == "none" & input$stv > cstv_100 ~ "purple", # >100 remove.leverage == "cstv100" & input$stv <= cstv_100 ~ "black", remove.leverage == "cstv100" & input$stv > cstv_100 ~ 'red', # >90_2x remove.leverage == "cstv90_2x" & input$stv <= cstv90_2x ~ "black", remove.leverage == "cstv90_2x" & input$stv > cstv90_2x ~ "red" #remove.leverage == TRUE & input$stv > cstv90_2x ~ "red" #remove2x == FALSE & input$stv > cstv_100 ~ red, #remove.leverage == FALSE & input$stv > cstv90_2x ~ gold, #remove.leverage == TRUE & input$stv > cstv90_2x ~ red, #TRUE ~ black )) + geom_vline(xintercept = lower_cstv, alpha = 0.5, linetype = 3) + geom_vline(xintercept = upper_cstv, alpha = 0.5, linetype = 3) + geom_line(aes(x = fitted_stv, y = fitted_ry), color = "red", size = 1.5) + # the cstv at X% sufficiency geom_vline(xintercept = cstv, color = "blue", alpha = 1) + geom_hline(yintercept = target, alpha = 0.2) + annotate( "text", label = paste0( "CSTV = ", cstv, " ppm at ", target, "% RY", "\n", confidence,"% CI [", lower_cstv, " - ", upper_cstv, "]", "\n r = ", r), x = maxx, y = 0, hjust = 1, vjust = 0, alpha = 0.8 ) + scale_x_continuous(breaks = seq(0, 10000, by = if_else( maxx >= 200, 40, if_else(maxx >= 100, 20, if_else(maxx >= 50, 10, 5)) ))) + scale_y_continuous(breaks = seq(0, 105, 10)) + labs(subtitle = "Modified Arcsine-Log Calibration Curve", # AC: Here there could be a problem for people using STV in other than mg/kg ... x = expression(Soil ~ Test ~ Value ~ (mg ~ kg ^ -1)), y = "Relative yield (%)", caption = paste0( # AC: I Replaced this in line with the new arg. remove.leverage #if_else(remove.leverage == TRUE, #"Red points > CSTV90 * 2 excluded from model", #"Red points > CSTV100 excluded from model. Yellow points > CSTV90 * 2 not excluded."), case_when(remove.leverage == "none" ~ "No points excluded from model. Purple points (>CSTV100).", remove.leverage == "cstv100" ~ "Red points (STV > CSTV100) were excluded from model.", remove.leverage == "cstv90_2x" ~ "Red points (STV > 2*CSTV90) were excluded from model."), "\nVertical dotted lines show lower and upper confidence limits." ) ) } ``` \newpage # 7. Comparison to SoilTestCocaCola ## 7.1. Cotton (from agridat package) For the cotton dataset, Austin's Code using alcc() produces:
i. different cstv, lower & upper limits when compared to Adrian's modALCC() and the reference (SMA regression). This issue is nothing wrong with formulae but a result of using a dataset that presents points with STV > cstv100, since the alcc() removes those points by default, while Adrian's modALCC() does not, as I prefer to leave this decision up to the user.
ii. wrong p=values, since it uses the student-t one instead of the pvalue to test correlation significance (used by SMA regression).
```{r warning=T, message=F} # Load Austin's ALCC functions # alcc.core() and alcc() source_url("https://raw.githubusercontent.com/austinwpearce/SoilTestCocaCola/main/alcc.R") # alcc_plot() #source_url("https://raw.githubusercontent.com/austinwpearce/SoilTestCocaCola/main/alcc_plot.R") # EXAMPLE 1. Cotton. Relative yield vs soil test potassium cotton <- tibble(stk = agridat::cate.potassium$potassium, ry = agridat::cate.potassium$yield, dataset = "cotton") ### CODES COMPARISON # Adrian's modALCC(data = cotton, RY = ry, STV = stk, target=90, confidence = 0.95) # Austin's, sma = TRUE # Check that even stating remove2x = FALSE it always print "remove2x = TRUE" on the summary table. alcc(cotton, soil_test = stk, ry = ry, sufficiency = 90, confidence = 95, summary = T, remove2x = FALSE, sma = T) # Austin's, sma = FALSE alcc(cotton, soil_test = stk, ry = ry, sufficiency = 90, confidence = 95, summary = T, remove2x = FALSE, sma = F) ## For the Austin's Code using alcc() -> # cstv, lower & upper limit, and p=values are wrong compared to # SMA regression as reference. # Austin's results (REFERENCE) [ADRIAN's]: # CSTV = 67.49 (76.89) [76.89] # LL = 55.87 (62.36) [62.35] # UL = 81.54 (94.79) [94.83] # pval = 8.87e-05 (5.57e-05) [5.57e-05] # NOW COMPARE TO MY REVISED VERSION of alcc (alcc.rev) # Adrian's modALCC(data = cotton, RY = ry, STV = stk, target=90, confidence = 0.95) # Revised alcc functions alcc.rev(cotton, stv = stk, ry = ry, target = 90, confidence = 95, summary = T, remove.leverage = "none") alcc.rev(cotton, stv = stk, ry = ry, target = 90, confidence = 95, summary = T, remove.leverage = "cstv100") alcc.rev(cotton, stv = stk, ry = ry, target = 90, confidence = 95, summary = T, remove.leverage = "cstv90_2x") ``` \newpage ### 7.1.2. Double-check with SMA (reference method) ```{r} # USE SMA as reference method target = 90 cotton <- cotton %>% mutate(# Transform stv ln_stv = log(stk), # Cap if RY > 100 RY = ifelse(ry > 100, 100, as.double(ry)), # Transform and center RY arc_ry = (asin(sqrt(RY/100)) - asin(sqrt(target/100)))) # SMA model sma.cotton <- sma(data = cotton, method = c("SMA"), formula = ln_stv ~ arc_ry) # Print sma.cotton # Extract intercept, its CI, and model p-value print("CSTV from SMA") exp(sma.cotton[["elevtest"]][[1]][["a"]]) print("Lower limit") exp(sma.cotton[["elevtest"]][[1]][["a.ci"]][[1]]) print("Upper limit") exp(sma.cotton[["elevtest"]][[1]][["a.ci"]][[2]]) print("p-value") sma.cotton[["pval"]][[1]] ``` \newpage ## 7.2. Other examples Data_1: The alcc() from Austin works the same than Adrian's modALCC() and Spreadsheet version.
Data_2: Austin's original alcc() code fails with data_2 due to an issue with the "if_else()" used to cap RY values. The function needs to have specified the FALSE statement as.double(), otherwise it may result problematic with some datasets loaded from csv files (as data_2). ```{r} # Adrian's fit_example_1 = modALCC(data = data_1, RY = RY, STV = STV, target=90, confidence = 0.95) fit_example_1 # Austin's alcc(data_1, soil_test = STV, ry = RY, sufficiency = 90, confidence = 95, summary = T, remove2x = FALSE, sma = TRUE) # Fixed alcc.rev alcc.rev(data_1, stv = STV, ry = RY, target = 90, confidence = 95, summary = T, remove.leverage = "none") # plot alcc_plot.rev(data_1, stv = STV, ry = RY, target = 90, confidence = 95, remove.leverage = "cstv90_2x") # Using dataframe # Alternative using the vectors # fit_example_1 = modALCC(Y = data_1$RY,STV = data_1$STV, target=90,confidence = 0.95) # Adrian's fit_example_2 = modALCC(data = data_2, RY = RY, STV = STV, target=90, confidence = 0.95) fit_example_2 # Revised alcc.rev # # remove.leverage = "none" alcc.rev(data_2, stv = STV, ry = RY, target = 90, confidence = 95, summary = T, remove.leverage = "none") # plot alcc_plot.rev(data_2, stv = STV, ry = RY, target = 90, confidence = 95, remove.leverage = "none") # remove.leverage = "cstv100" alcc.rev(data_2, stv = STV, ry = RY, target = 90, confidence = 95, summary = T, remove.leverage = "cstv100") # plot alcc_plot.rev(data_2, stv = STV, ry = RY, target = 90, confidence = 95, remove.leverage = "cstv100") # remove.leverage = "cstv90_2x" alcc.rev(data_2, stv = STV, ry = RY, target = 90, confidence = 95, summary = T, remove.leverage = "cstv90_2x") # plot alcc_plot.rev(data_2, stv = STV, ry = RY, target = 90, confidence = 95, remove.leverage = "cstv90_2x") # Austin's original alcc() fails with data_2 # alcc(data_2, soil_test = STV, ry = RY, sufficiency = 90, confidence = 95, summary = T, remove2x = FALSE, sma = TRUE) ``` \newpage ## 7.3. Extra example data_3 Data_3: Austin's alcc() does not produce the same results than the modALCC() or the Spreadsheet version. Again, this is related to the fact that Austin's alcc() always remove points STV > CSTV100 when present, so it produces different results than Adrian's modALCC().
Using the Adrian's revised alcc.rev() functions, Austin's alcc() is equivalent to use the alcc.rev(remove.leverage = 'cstv100'). ```{r} # Example 3 dataframe. Imported from xlsx file data_3 = readxl::read_xlsx(path ="data_3.xlsx", sheet = 1) # Adrian's modALCC() modALCC(data = data_3, RY = RY, STV = STV, target=90, confidence = 0.95) # Austin's alcc() alcc(data_3, soil_test = STV, ry = RY, sufficiency = 90, confidence = 95, summary = T, remove2x = FALSE, sma = TRUE) # Fixed alcc.rev(remove.leverage = "none") alcc.rev(data_3, stv = STV, ry = RY, target = 90, confidence = 95, summary = T, remove.leverage = "none") # Fixed alcc.rev(remove.leverage = "cstv100") alcc.rev(data_3, stv = STV, ry = RY, target = 90, confidence = 95, summary = T, remove.leverage = "cstv100") # Fixed alcc.rev(remove.leverage = "cstv90_2x") alcc.rev(data_3, stv = STV, ry = RY, target = 90, confidence = 95, summary = T, remove.leverage = "cstv90_2x") # plot alcc_plot.rev(data_3, stv = STV, ry = RY, target = 90, confidence = 95, remove.leverage = "none") alcc_plot.rev(data_3, stv = STV, ry = RY, target = 90, confidence = 95, remove.leverage = "cstv100") ```
Owner
- Name: Adrian Correndo
- Login: adriancorrendo
- Kind: user
- Location: Manhattan, KS, USA
- Company: Kansas State University
- Website: https://adriancorrendo.github.io/
- Twitter: aacorrendo
- Repositories: 6
- Profile: https://github.com/adriancorrendo
Postdoctoral Fellow, Digital Agronomy Advocate of Applied Stats & Programming for Agriculture.
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0